Using File Properties for File Content Searches

Microsoft® Indexing Service filters and maintains in excess of 50 file properties. All of these can be specified in text file search queries. From the perspective of writing a SELECT statement, there are 3 types of properties:

Property name Data type Description Use in ORDER BY Use in select list
Access datetime Most recent date and time that the file was accessed. Yes Yes
Characterization nvarchar or ntext Abstract of the contents of the file. Note that, in this version of Indexing Service this is usually the first paragraph or first section of a document. In a future version, it will be an actual summary.   Yes
Contents nvarchar or ntext Main contents of the file.    
Create datetime Date and time that the file was created. Yes Yes
Directory nvarchar Physical path to the file, not including the file name. Yes Yes
DocAuthor nvarchar Document author. Yes Yes
DocComments nvarchar Comments about the document. Yes Yes
DocLastAuthor nvarchar Most recent user that edited the document. Yes Yes
DocLastPrinted datetime Date and time that the document was last printed. Yes  
DocPageCount integer Number of pages in the document. Yes  
DocPartTitles array of varchar Names of document parts:
Slide titles in Microsoft PowerPoint®
Spreadsheets in Microsoft Excel
Documents in Microsoft Word
   
DocSubject nvarchar Subject of the document. Yes Yes
DocTitle nvarchar Title of the document. Yes Yes
DocWordCount integer Number of words in the document. Yes -
FileIndex decimal(19,0) Unique identifier of the file. Yes Yes
FileName nvarchar Name of the file. Yes Yes
HitCount integer Number of words matching query. Yes Yes
Path nvarchar Full physical path to the file, including file name. Yes Yes
Rank integer Value from 0 to 1000 indicating how closely this row matches the selection criteria. Yes Yes
Size decimal(19,0) Size of file, in bytes. Yes Yes

Customers and third-party vendors can write filters to add to this set of properties.  They can also add properties, for example, by adding their own tags to an HTML document. In addition, to permit the query and retrieval of such user-defined properties, the Transact-SQL extensions to Indexing Service include support for a SET statement that allows the specification of new property names and their associated types.

Example Queries Using Virtual Tables

You can specify a query with the equivalent of a table in the file system resulting in the select list and FROM clause. For other parts of a SELECT statement, the properties can be used in place of columns in the WHERE clause and the ORDER BY clause. However, the GROUP BY and HAVING clause are not supported. The examples that follow show the use of all supported clauses. 

This query selects the full physical path and the file creation timestamp of all files in the SQL-standards virtual directory and all its subdirectories in which the document contains the phrase “overloaded function”.

SELECT *

FROM OPENQUERY(FileSystem,

               'SELECT Path, Create

                FROM SCOPE('' "/SQL-standards" '')

                WHERE CONTAINS(Contents, '' "overloaded function" '') '

              ) 

This query is similar to the previous, except that only files directly in the SQL-standards virtual directory are considered. 

SELECT *

FROM OPENQUERY(FileSystem,

               'SELECT Path, Create

                FROM SCOPE('' SHALLOW TRAVERSAL OF "/SQL-standards" '')

                WHERE CONTAINS(Contents, '' "overloaded function" '')   ' ) 

In this next query, only files directly in the SQL3 virtual subdirectory are considered.

SELECT *

FROM OPENQUERY(FileSystem,

               'SELECT Path, Create

                FROM SCOPE('' "/SQL-standards/SQL3" '')

                WHERE CONTAINS(Contents, '' "overloaded function" '')  ' ) 

This query selects author, title, subject, and file name of documents, in all files that are either in the /corpus virtual directory and its subdirectories or in the \temp directory on the C drive, where the document is at least 5,000 words and the author is either Wendy Vasse or Anas Abbar, and rows representing documents with the most pages are ordered highest:

SELECT *

FROM OPENQUERY(FileSystem,

               'SELECT DocAuthor, DocTitle, DocSubject, FileName

                FROM SCOPE('' "/corpus" '',

                           '' "C:\temp" '' )

                WHERE DocWordCount >= 5000 AND

                      ( DocAuthor = ''Wendy Vasse'' OR

                        DocAuthor = ''Anas Abar'' )

                ORDER BY DocPageCount DESC                    ' ) 

  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.