The CONTAINS Predicate

The CONTAINS predicate is used to determine whether values in full-text registered columns contain certain words and phrases. Currently, these predicates can reference base tables only.

The CONTAINS predicate syntax is as follows:

Here are the definitions of the syntax terminology.

column_ref or *
The column or columns to be searched.
column_ref
A specific column that is full-text registered.
*
All columns in the table that are full-text registered.
AND, OR, and AND NOT
The Boolean operators used to join, or combine, terms.

simple_term

This term is used to match the exact word or phrase being searched for.

The simple_term syntax is as follows, where word refers to one or more characters without spaces or punctuation, and phrase refers to multiple words with spaces in between. Asian languages can have phrases made up of multiple words without spaces in between.

In keeping with the standard for full-text products, the search function is always case insensitive.

Here are some examples of simple terms used in the context of the CONTAINS predicate in a WHERE clause.

WHERE CONTAINS( context_col, 'hockey' )

WHERE CONTAINS( context_col, ' "ice hockey" ')

  

Suppose you have one row with a context_col value of “This is a dissertation on the use of ice cream sandwiches as hockey pucks,” and another row with the value “Dissertation on new ways of splitting the atom.” Since “this,” “is,” “a,” and so on are noise words, they are not stored in the full-text index. Therefore, a query with the CONTAINS predicate:

   CONTAINS (context_col, '"this is a dissertation"')

  

is the same as this query:

   CONTAINS (context_col, 'dissertation')

  

Both rows are returned as hits because in the first query, the noise words are removed before processing the query.

Combining Terms

As with other SQL search conditions, more complex conditions can be specified by linking individual operands with Boolean operators. In this case, the operands are any of the types of terms being discussed. Except for the restrictions that the OR NOT combination is not supported and that NOT cannot be specified before the first term, the rules are exactly the same as those used to combine individual predicates to form search conditions. For instance, parentheses may be used to change the default priority order in which the operators are applied.

Here are some examples of simple terms being combined within a CONTAINS predicate in a WHERE clause.

WHERE CONTAINS( context_col, 'hockey OR curling' )

WHERE CONTAINS( context_col, 'hockey AND NOT field')

WHERE CONTAINS( context_col, 

                ' ("ice hockey" OR curling) AND NOT 

                  Canada ' )

  

prefix_term

The prefix_term is used to match words or phrases that begin with the specified text.

The prefix_term syntax is as follows:

A prefix term consists of a simple term appended with an asterisk (*) to activate prefix matching on the word or phrase. All text that starts with the material before the * is matched. The wildcard symbol (*) in this case is similar to the % symbol in the LIKE predicate in that it matches zero, one or more characters of the root words in a word or phrase. In a phrase, each word within the phrase is considered to be a prefix; for example, the term “local bus *” matches “locality busy,” “local bush,” and “locale bust.”

Here are some examples of prefix terms used in the context of the CONTAINS predicate in a WHERE clause.

WHERE CONTAINS( context_col, ' "atom*" ' )

  

This matches context_col values that contain the word “atom,” “atomic,” “atomism,” “atomy,” and so on.

WHERE CONTAINS( abstract, ' "wine*" OR "vine*" ') 

  

This matches abstract values that contain the word “wine,” or “vine,” or, alternatively, words such as “winery,” “wines,” “vineyard,” or “vinegar.”

proximity_term

This is used when the words or phrases being searched for must be in proximity to one another.

The proximity_term syntax is as follows:

The proximity term is similar to the AND operator in that more than one word or phrase must exist in the column being searched. It differs from AND because the relevance of the match increases as the words appear closer together.

The syntax is designed to be extensible for possible future support for specification of units of proximity such as words, sentences, paragraphs, chapters, and so on.

NEAR(), NEAR, and ~ share the same meaning: the first word or the phrase is close to the second word or phrase. “Close” is a purposefully vague term that can mean “within 50 words,” but the algorithm is complicated. While words within the same sentence are one word distance apart, larger distances are assigned between units such as sentences, paragraphs, and chapters. Even if words or phrases are very far apart, the query is still considered to be satisfied; the row just has a low (zero) rank value. However, if the contains_condition consists only of NEAR proximity terms, then SQL Server does not return rows with a rank value of zero.

It is possible to chain-code the proximity matching. For example, “a ~ b ~ c” means a should be near b, which should be near c. Because of the fuzzy nature of full-text searches, it is often desirable to see the rank values. The CONTAINSTABLE() rowset-valued function can be used to execute queries that return a rank value for each row.

Here are some examples of proximity terms used in the context of the CONTAINS predicate in a WHERE clause.

WHERE CONTAINS( context_col, ' hockey ~ player ' )

  

This matches context_col values that contain the word “hockey” in proximity to the word “player.”

WHERE CONTAINS( context_col, ' hockey ~ "play*" ') 

  

This matches context_col values that contain the word “hockey” in proximity to a word that starts with “play.”

WHERE CONTAINS( context_col, '  "great*" 

                              ~ "Mike Nash" ') 

  

This matches context_col values that contain a word starting with “great” in proximity to the phrase “Mike Nash.”

generation_term

The generation_term is used when the words being searched for need to be expanded to include the variants of the original word.

The generation_term syntax is as follows:

The INFLECTIONAL keyword means that plural and singular forms of nouns or the various tenses of verbs are matched. A single term does not match both exclusive noun and exclusive verb forms. The syntax is designed to be extensible enough to handle other linguistically generated forms, such as derivational, soundex, and thesaurus.

Here is an example of a generation term used in the context of the CONTAINS predicate in a WHERE clause.

WHERE CONTAINS(curriculum_vite, 

               ' FORMSOF (INFLECTIONAL, skate) ' )

  

This matches curriculum_vite values that contain words such as “skate,” “skates,” “skated,” and “skating.”

weighted_term

The weighted_term is used for queries that match a list of words and phrases, each optionally given its own weighting. Matching values must match only one element in the list.

The weighted_term syntax is as follows, where n.nnn: represents a decimal constant from zero through one:

A row is returned if there is a match on any one of the ISABOUT elements.

Each component in the vector can be optionally weighted. The assigned weight forces a different measurement of the rank value that is assigned to each row that matches the query.

Here are some examples of weighted terms used in the context of the CONTAINS predicate in a WHERE clause.

WHERE CONTAINS( article, 

                ' ISABOUT(hockey, puck, goalie) ' )

This matches article values that contain any of the words “hockey,” “puck,” or “goalie.” The better matches will contain more than one of the words.

WHERE CONTAINS( article, 

         'ISABOUT("Canadian ice hockey" WEIGHT(1.0),

                  "ice hockey" WEIGHT(.5),

                  hockey WEIGHT(.2) ) 

         ' )

  

This matches article values that may have information about Canadian ice hockey, with higher rank values assigned to articles that have more words from the phrase.