Microsoft® Indexing Service filters and maintains in excess of 50 file properties. All of these can be specified in text file search queries. From the perspective of writing a SELECT statement, there are 3 types of properties:
Property name | Data type | Description | Use in ORDER BY | Use in select list |
---|---|---|---|---|
Access | datetime | Most recent date and time that the file was accessed. | Yes | Yes |
Characterization | nvarchar or ntext | Abstract of the contents of the file. Note that, in this version of Indexing Service this is usually the first paragraph or first section of a document. In a future version, it will be an actual summary. | Yes | |
Contents | nvarchar or ntext | Main contents of the file. | ||
Create | datetime | Date and time that the file was created. | Yes | Yes |
Directory | nvarchar | Physical path to the file, not including the file name. | Yes | Yes |
DocAuthor | nvarchar | Document author. | Yes | Yes |
DocComments | nvarchar | Comments about the document. | Yes | Yes |
DocLastAuthor | nvarchar | Most recent user that edited the document. | Yes | Yes |
DocLastPrinted | datetime | Date and time that the document was last printed. | Yes | |
DocPageCount | integer | Number of pages in the document. | Yes | |
DocPartTitles | array of varchar | Names of document parts: Slide titles in Microsoft PowerPoint® Spreadsheets in Microsoft Excel Documents in Microsoft Word |
||
DocSubject | nvarchar | Subject of the document. | Yes | Yes |
DocTitle | nvarchar | Title of the document. | Yes | Yes |
DocWordCount | integer | Number of words in the document. | Yes | - |
FileIndex | decimal(19,0) | Unique identifier of the file. | Yes | Yes |
FileName | nvarchar | Name of the file. | Yes | Yes |
HitCount | integer | Number of words matching query. | Yes | Yes |
Path | nvarchar | Full physical path to the file, including file name. | Yes | Yes |
Rank | integer | Value from 0 to 1000 indicating how closely this row matches the selection criteria. | Yes | Yes |
Size | decimal(19,0) | Size of file, in bytes. | Yes | Yes |
Customers and third-party vendors can write filters to add to this set of properties. They can also add properties, for example, by adding their own tags to an HTML document. In addition, to permit the query and retrieval of such user-defined properties, the Transact-SQL extensions to Indexing Service include support for a SET statement that allows the specification of new property names and their associated types.
You can specify a query with the equivalent of a table in the file system resulting in the select list and FROM clause. For other parts of a SELECT statement, the properties can be used in place of columns in the WHERE clause and the ORDER BY clause. However, the GROUP BY and HAVING clause are not supported. The examples that follow show the use of all supported clauses.
This query selects the full physical path and the file creation timestamp of all files in the SQL-standards virtual directory and all its subdirectories in which the document contains the phrase “overloaded function”.
SELECT *
FROM OPENQUERY(FileSystem,
'SELECT Path, Create
FROM SCOPE('' "/SQL-standards" '')
WHERE CONTAINS(Contents, '' "overloaded function" '') '
)
This query is similar to the previous, except that only files directly in the SQL-standards virtual directory are considered.
SELECT *
FROM OPENQUERY(FileSystem,
'SELECT Path, Create
FROM SCOPE('' SHALLOW TRAVERSAL OF "/SQL-standards" '')
WHERE CONTAINS(Contents, '' "overloaded function" '') ' )
In this next query, only files directly in the SQL3 virtual subdirectory are considered.
SELECT *
FROM OPENQUERY(FileSystem,
'SELECT Path, Create
FROM SCOPE('' "/SQL-standards/SQL3" '')
WHERE CONTAINS(Contents, '' "overloaded function" '') ' )
This query selects author, title, subject, and file name of documents, in all files that are either in the /corpus virtual directory and its subdirectories or in the \temp directory on the C drive, where the document is at least 5,000 words and the author is either Wendy Vasse or Anas Abbar, and rows representing documents with the most pages are ordered highest:
SELECT *
FROM OPENQUERY(FileSystem,
'SELECT DocAuthor, DocTitle, DocSubject, FileName
FROM SCOPE('' "/corpus" '',
'' "C:\temp" '' )
WHERE DocWordCount >= 5000 AND
( DocAuthor = ''Wendy Vasse'' OR
DocAuthor = ''Anas Abar'' )
ORDER BY DocPageCount DESC ' )