Chapter 22: Textual Searches on Database Data

Microsoft SQL Server version 7.0 introduces facilities that support textual queries on data residing in SQL Server as well as textual queries on data in the file system.

This chapter describes support for textual queries against data within SQL Server tables. The full-text search concepts are introduced, followed by the form that a full-text search takes and the information that can be retrieved by means of such queries. The internal design and architecture of the full-text search system is presented, and then how this system can be administered by means of stored procedures or graphical user interfaces (GUIs) through SQL Server Enterprise Manager is described.

This chapter provides an overview of the capability of SQL Server 7.0 full-text support and describes how the various subcomponents interact to provide this support.

A large portion of digitally stored information is in the form of unstructured data, primarily text. While the bulk of this data is stored in the file system, some organizations have begun to manage it by storing it in relational databases in character-based columns such as varchar and text. This means that relational database users need a mechanism to effectively retrieve textual data from the database. Traditional relational database management systems (RDBMSs), such as Microsoft SQL Server version 6.5, were not designed for efficient full-text retrieval. For example, while SQL Server 6.5 has some capabilities for retrieving text based on pattern matching, it cannot search for words and phrases in proximity to one another.

There are two major types of textual searches:

The lack of integrated textual-search facilities in relational databases has forced customers to use third-party products. These solutions usually involve pulling data out of the database through bridges or gateways and storing the data as files in the file system so that full-text indexing can be applied. This is not a seamless way for a user to combine a full-text search with a regular structured relational query.

Some relational database products offer customers relational and full-text search conditions seamlessly integrated into the same query. A demonstration of how such a query might be specified follows.

Suppose that the contents of a set of plain-text documents reside in the DocText column of the doc_collection table, and that the table also contains the StorName, Size, and DocAuthor columns. It should be possible to issue the following query:

SELECT Q.StorName, Q.Size, Q.DocAuthor, W.Citizenship 

FROM doc_collection as Q,

     writers as W

WHERE CONTAINS(DocText, ' "SQL Server" NEAR text') 

  AND Q.DocAuthor = W.writer_name

  

This query is issued to obtain the names, authors, and sizes of all documents, where the document contains the phrase “SQL Server” in proximity to the word “text”. This information is joined with the writers table to obtain the author’s citizenship.

Microsoft has been engaged in a project that makes it possible to issue such queries. The objective is to introduce the Microsoft text search technology as part of SQL Server 7.0 so that users can issue entry-level full-text searches against plain-text data in relational database tables. The syntax is a natural extension to the structured query language (SQL).

Existing technologies have been leveraged to provide full-text searches against SQL Server data. The Microsoft information retrieval technologies have been around for some time and are included with Microsoft Indexing Services version 2.0 and Microsoft Site Server version 3.0. Various components have been combined within these technologies, and these have been integrated into SQL Server 7.0 to provide relational database customers with full-text retrieval support.

Two additional technologies outside of SQL Server also have been integrated:

This chapter has three purposes: