The principal design requirement for full-text indexing, querying, and synchronization is the presence of a full-text unique key column (or single-column primary key) on all tables that are registered for full-text search. A full-text index keeps track of which significant words are used and where they are located.
For example, consider a full-text index for a DevTools table. A full-text index may indicate that the word Microsoft is found at word number 423 and word number 982 in the Abstract column for the row associated with a ProductID of 6. This index structure supports an efficient search for all items containing indexed words and advanced search operations, such as phrase searches and proximity searches.
To prevent the full-text index from becoming bloated with words that do not help the search, extraneous (noise) words such as a, and, is, or the are ignored. For example, specifying the phrase “the products ordered during these summer months” is the same as specifying the phrase “products ordered during summer months.” Rows with either string are returned.
Noise-word lists for many languages are provided in the directory \Mssql7\Ftdata\Sqlserver\Config. This directory is created, and the noise-word files are installed when you set up Microsoft® SQL Server™ with the full-text search support. The noise-word files can be edited. For example, system administrators at high-tech companies might add the word computer to their noise-word list. (If you edit a noise-word file, you must repopulate the full-text catalogs before the changes will take effect.) The table shows the noise-word files and their respective languages.
Noise-word file | Language |
---|---|
Noise.chs | Simplified Chinese |
Noise.cht | Traditional Chinese |
Noise.dat | Language Neutral |
Noise.deu | German |
Noise.eng | English UK |
Noise.enu | English US |
Noise.esn | Spanish |
Noise.fra | French |
Noise.ita | Italian |
Noise.jpn | Japanese |
Noise.kor | Korean |
Noise.nld | Dutch |
Noise.sve | Swedish |
When processing a full-text query, the search engine returns the key values of the rows that match the search criteria to Microsoft SQL Server. Consider a SciFi table in which the Book_No column is the primary key column:
Book_No |
Writer |
Title |
A025 |
Asimov |
Foundation’s Edge |
A027 |
Asimov |
Foundation and Empire |
C011 |
Clarke |
Childhood’s End |
V109 |
Verne |
Mysterious Island |
Suppose you want to use a full-text retrieval query to find the book titles that include the word Foundation. In this case, the values of A025 and A027 are obtained from the full-text index. SQL Server then uses these keys and other field information to respond to the query.
This table shows the language in which the full-text index data is stored. The language is based on the Unicode collation locale identifier selected during SQL Server Setup.
Unicode collation locale identifier | Language for full-text data storage |
---|---|
Chinese Bopomofo (Taiwan) | Traditional Chinese |
Chinese Punctuation | Simplified Chinese |
Chinese Stroke Count | Simplified Chinese |
Chinese Stroke Count (Taiwan) | Traditional Chinese |
Dutch | Dutch |
English UK | English UK |
French | French |
General Unicode | English US |
German | German |
German Phonebook | German |
Italian | Italian |
Japanese | Japanese |
Japanese Unicode | Japanese |
Korean | Korean |
Korean Unicode | Korean |
Spanish Modern | Spanish |
Swedish/Finnish | Swedish |
All other Unicode collation locale identifier values that are not in this list get mapped to the neutral language word breaker and stemmer, which uses white spaces to delimit words.
Note The Unicode collation locale identifier setting is used against all data types eligible for full-text indexing (such as char, nchar, and so on). If you have the sort order of a char, varchar, or text type column set to a language setting different from the Unicode collation locale identifier language, the Unicode collation locale identifier is still used during full-text indexing and querying of the char, varchar, and text type columns.