Sample Full-Text Queries

This section provides several examples that answer queries by combining database data and file data.

The following query returns the title and publication year of qualifying books that are represented by files in the virtual directory that has the /Pubs alias. In order to qualify, a book must cost less than $20.00, and text in the Characterization property must indicate that the book is about ice hockey. It is known that the year portion of the Create property is always the publication year of the book. The customer has defined the BookCost property (of type money), which filters out the cost of each book.

SELECT Q.DocTitle, DATEPART(year, Q.Create) 

FROM OpenQuery(FileSystem,

               'SELECT DocTitle, Create

                FROM SCOPE('' "/pubs" '')

                WHERE BookCost <= 20.00

                  AND CONTAINS( Characterization, '' "ice hockey" '' )

               ') AS Q  

  

Notice that the table alias value of Q has been assigned to the table returned by the OPENQUERY() function. This alias is then used to qualify the items in the outer select list. Previous examples specified the SELECT * statement and passed on all values returned by the inner SELECT statement. Here, the SQL Server DATEPART() function is used to pass on only the year portion of the create datetime value.

The following query returns the same information as the previous query. The difference is that the price of the book is obtained from the document_cost column in the BookCost table in the database, rather than from a property in the file system. The primary key of the BookCost table is the combination of the document_author and document_title columns.

SELECT Q.DocTitle, DATEPART(year, Q.Create) 

FROM OpenQuery(FileSystem,

               'SELECT DocTitle, Create, DocAuthor, DocTitle 

                FROM SCOPE('' "/pubs" '')

                  AND CONTAINS( Characterization, '' "ice hockey" '' )

               ') AS Q, 

     BookCost as B

WHERE Q.DocAuthor = B.document_author

  AND Q.DocTitle = B.document_title

  AND B.document_cost <= 20.00 

  

The table returned by the OPENQUERY() function is joined to the real BookCost table in the database, then rows with a suitable cost are filtered for inclusion in the outer SELECT statement.

The next query also joins data from the file system and the database, and this time, data from both appears in the outer SELECT list. Furthermore, the Rank property, which indicates how well the selected rows met the selection criteria, appears in the select list and is used to ensure that higher-ranking rows appear before lower-ranking rows in the outer SELECT statement. In this example, the wording on the plaques in the Hockey Hall of Fame is recorded on files. There is a file for each plaque, and the plaque number can be obtained through the DocSubject property. The HockeyHall table contains PlaqueNo, PlayerName, StartYear, and LastYear columns, with the primary key in the PlaqueNo column. You want to return the PlayerName and PlaqueNo columns from the table and the Rank and DocComments properties from the file. Only players who might have played for Canadian or U.S. teams in the early 1900s are to be returned.

SELECT HH.PlayerName, HH.PlaqueNo, Q.Rank, Q.DocComments

FROM OpenQuery(FileSystem,

               'SELECT DocSubject, DocComments, Rank 

                FROM SCOPE('' "/hall_of_fame" '')

                WHERE CONTAINS( Contents, '' Canada OR "United States" 
                                         '' )

               ') AS Q, 

     HockeyHall as HH

WHERE Q.DocSubject = HH.PlaqueNo

  AND HH.StartYear < 1915
AND HH.EndYear < 1899

ORDER BY Q.Rank DESC 

  

In the next example, an international construction company stores a large number of onsite progress reports in a Microsoft Word document in a central site. All the documents have been registered within the /Site_report virtual directory. Each document can be identified by its unique FileIndex property. These documents are tracked in the database. The following tables in the database are of interest.

Projects:     project_number    char(8)        primary key,

              project_name      nvarchar(40),

              project_leader    smallint,      --employee number

              budgeted          money,

              spent             money,

              ...   

Employees:    employee_number   smallint       primary key,

              employee_name     nvarchar(40),

              nationality       nvarchar(20),

              ...

Reports:      project_number    char(8),

              file_index        decimal(19,0), --link to the file

              ...

              primary key is project_number and file_index

  

A rush order has been issued for 5,000 saunas in a heavily forested area with no electricity. The salesperson vaguely recalls a similar, successful project about 10 years ago. He issues a query that returns the project number, the paths to the onsite reports, and the ranking value of the projects managed by someone from a Scandinavian country that came in under budget between 8 and 12 years ago.

The highest ranking is given if the onsite report contains the phrase “wood burning” in proximity to “sauna.” Points are also given if the document contains the phrase “Northern Ontario” or the word “island.” Here is an example of what the query could look like:

SELECT P.project_number, Q.path, Q.Rank

FROM OpenQuery(FileSystem,

               'SELECT FileIndex, Path, Rank, Write 

                FROM SCOPE('' "/site_reports" '')

                WHERE CONTAINS( Contents, 

                        '' ISABOUT( Sauna ~ "wood burning" WEIGHT (.9),

                                        "Northern Ontario" WEIGHT (.4),

                                         Ontario           WEIGHT (.2),

                                         island            WEIGHT (.2) )

                        '' )

                AND Rank > 5

               ') AS Q, 

     Projects AS P,

     Employees AS E, 

     Reports AS R

WHERE Q.FileIndex = R.file_index

  AND R.project_number = P.project_number

  AND P.project_leader = E.employee_number

  AND E.nationality IN ('FINNISH', 'DANISH', 'SWEDISH', 'NORWEGIAN')

  AND P.spent < P.budgeted

  AND YEAR(Q.Write) > 1986 AND YEAR(Q.Write) < 1992