Full-text Index and Querying Concepts

The principal design requirement for full-text indexing, querying, and synchronization is the presence of a full-text unique key column (or single-column primary key) on all tables that are registered for full-text search. A full-text index keeps track of which significant words are used and where they are located.

For example, consider a full-text index for a DevTools table. A full-text index may indicate that the word Microsoft is found at word number 423 and word number 982 in the Abstract column for the row associated with a ProductID of 6. This index structure supports an efficient search for all items containing indexed words and advanced search operations, such as phrase searches and proximity searches.

To prevent the full-text index from becoming bloated with words that do not help the search, extraneous (noise) words such as a, and, is, or the are ignored. For example, specifying the phrase “the products ordered during these summer months” is the same as specifying the phrase “products ordered during summer months.” Rows with either string are returned.

Noise-word lists for many languages are provided in the directory \Mssql7\Ftdata\Sqlserver\Config. This directory is created, and the noise-word files are installed when you set up Microsoft® SQL Server™ with the full-text search support. The noise-word files can be edited. For example, system administrators at high-tech companies might add the word computer to their noise-word list. (If you edit a noise-word file, you must repopulate the full-text catalogs before the changes will take effect.) The table shows the noise-word files and their respective languages.

Noise-word file Language
Noise.chs Simplified Chinese
Noise.cht Traditional Chinese
Noise.dat Language Neutral
Noise.deu German
Noise.eng English UK
Noise.enu English US
Noise.esn Spanish
Noise.fra French
Noise.ita Italian
Noise.jpn Japanese
Noise.kor Korean
Noise.nld Dutch
Noise.sve Swedish

When processing a full-text query, the search engine returns the key values of the rows that match the search criteria to Microsoft SQL Server. Consider a SciFi table in which the Book_No column is the primary key column:

Book_No Writer Title
A025 Asimov Foundation’s Edge
A027 Asimov Foundation and Empire
C011 Clarke Childhood’s End
V109 Verne Mysterious Island

Suppose you want to use a full-text retrieval query to find the book titles that include the word Foundation. In this case, the values of A025 and A027 are obtained from the full-text index. SQL Server then uses these keys and other field information to respond to the query.

This table shows the language in which the full-text index data is stored. The language is based on the Unicode collation locale identifier selected during SQL Server Setup.

Unicode collation locale identifier Language for full-text data storage
Chinese Bopomofo (Taiwan) Traditional Chinese
Chinese Punctuation Simplified Chinese
Chinese Stroke Count Simplified Chinese
Chinese Stroke Count (Taiwan) Traditional Chinese
Dutch Dutch
English UK English UK
French French
General Unicode English US
German German
German Phonebook German
Italian Italian
Japanese Japanese
Japanese Unicode Japanese
Korean Korean
Korean Unicode Korean
Spanish Modern Spanish
Swedish/Finnish Swedish

All other Unicode collation locale identifier values that are not in this list get mapped to the neutral language word breaker and stemmer, which uses white spaces to delimit words.


Note The Unicode collation locale identifier setting is used against all data types eligible for full-text indexing (such as char, nchar, and so on). If you have the sort order of a char, varchar, or text type column set to a language setting different from the Unicode collation locale identifier language, the Unicode collation locale identifier is still used during full-text indexing and querying of the char, varchar, and text type columns.


  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.