Full-text Index and Querying Concepts

The principal design requirement for full-text indexing, querying, and synchronization is the presence of a full-text unique key column (or single-column primary key) on all tables that are registered for full-text search. A full-text index keeps track of which significant words are used and where they are located.

For example, consider a full-text index for a DevTools table. A full-text index may indicate that the word Microsoft is found at word number 423 and word number 982 in the Abstract column for the row associated with a ProductID of 6. This index structure supports an efficient search for all items containing indexed words and advanced search operations, such as phrase searches and proximity searches.

To prevent the full-text index from becoming bloated with words that do not help the search, extraneous (noise) words such as a, and, is, or the are ignored. For example, specifying the phrase “the products ordered during these summer months” is the same as specifying the phrase “products ordered during summer months.” Rows with either string are returned.

Noise-word lists for many languages are provided in the directory \Mssql7\Ftdata\Sqlserver\Config. This directory is created, and the noise-word files are installed when you set up Microsoft® SQL Server™ with the full-text search support. The noise-word files can be edited. For example, system administrators at high-tech companies might add the word computer to their noise-word list. (If you edit a noise-word file, you must repopulate the full-text catalogs before the changes will take effect.) The table shows the noise-word files and their respective languages.

Noise-word file	Language
Noise.chs	Simplified Chinese
Noise.cht	Traditional Chinese
Noise.dat	Language Neutral
Noise.deu	German
Noise.eng	English UK
Noise.enu	English US
Noise.esn	Spanish
Noise.fra	French
Noise.ita	Italian
Noise.jpn	Japanese
Noise.kor	Korean
Noise.nld	Dutch
Noise.sve	Swedish

When processing a full-text query, the search engine returns the key values of the rows that match the search criteria to Microsoft SQL Server. Consider a SciFi table in which the Book_No column is the primary key column:

`Book_No`	`Writer`	`Title`
`A025`	`Asimov`	`Foundation’s Edge`
`A027`	`Asimov`	`Foundation and Empire`
`C011`	`Clarke`	`Childhood’s End`
`V109`	`Verne`	`Mysterious Island`

Suppose you want to use a full-text retrieval query to find the book titles that include the word Foundation. In this case, the values of A025 and A027 are obtained from the full-text index. SQL Server then uses these keys and other field information to respond to the query.

This table shows the language in which the full-text index data is stored. The language is based on the Unicode collation locale identifier selected during SQL Server Setup.

Unicode collation locale identifier	Language for full-text data storage
Chinese Bopomofo (Taiwan)	Traditional Chinese
Chinese Punctuation	Simplified Chinese
Chinese Stroke Count	Simplified Chinese
Chinese Stroke Count (Taiwan)	Traditional Chinese
Dutch	Dutch
English UK	English UK
French	French
General Unicode	English US
German	German
German Phonebook	German
Italian	Italian
Japanese	Japanese
Japanese Unicode	Japanese
Korean	Korean
Korean Unicode	Korean
Spanish Modern	Spanish
Swedish/Finnish	Swedish

All other Unicode collation locale identifier values that are not in this list get mapped to the neutral language word breaker and stemmer, which uses white spaces to delimit words.

Note The Unicode collation locale identifier setting is used against all data types eligible for full-text indexing (such as char, nchar, and so on). If you have the sort order of a char, varchar, or text type column set to a language setting different from the Unicode collation locale identifier language, the Unicode collation locale identifier is still used during full-text indexing and querying of the char, varchar, and text type columns.