Mike Cheng
Microsoft Corporation
May 1998
Click to copy the updated wizard discussed in the Appendix of this article.
Site Server Search database indexing enables full text indexing of Microsoft® SQL Server™ and Access databases using ODBC and Active Server Pages (ASP). Microsoft Site Server 3.0 provides the easy-to-use Create New Catalog Definition Wizard for creating search indexes. This document explains how database indexing works, provides troubleshooting tips, and demonstrates how to accomplish advanced tasks.
At the heart of the Site Server search functionality is the search catalog—essentially an index and property store that can be searched. Creating the database catalog is a two-step process. First, the Create New Catalog Definition Wizard collects the following information about the database catalog from the user:
It then creates the following set of .asp and .htm files and scripts, which are used for either the indexing or searching process:
After these files are generated, Site Server Search creates the catalog using dir.asp and row.asp. Dir.asp generates links to records in the database that are fetched by row.asp; both use ActiveX® Data Objects (ADO) to access the ODBC data source. This document focuses on dir.asp, row.asp, and view.asp. For more information on search.htm and results.asp, please refer to the Site Server documentation.
Let's walk through creating a database catalog using the Products table of the AdvWorks database, a sample database included with Site Server. The schema for the Products table is shown in Table 1.
Table 1. The Products table of the AdvWorks sample database
Column name | Data type |
ProductID | COUNTER, Nullable |
ProductCode | TEXT(10), Nullable |
ProductType | TEXT(20), Nullable |
ProductIntroductionDate | DATETIME, Nullable |
ProductName | TEXT(50), Nullable |
ProductDescription | TEXT(255), Nullable |
ProductSize | TEXT(5), Nullable |
ProductImageURL | TEXT(255), Nullable |
UnitPrice | DOUBLE, Nullable |
OnSale | BIT, Required |
To create a database catalog project for the AdvWorks database:
The Create New Catalog Definition Wizard appears.
The columns (except ProductDescription and ProductName) from the Products table are listed.
The scripts generated by the wizard for dir.asp, row.asp, and view.asp follow.
<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\dir.asp
1: <HTML><HEAD>
2: <TITLE></TITLE>
3: <META NAME="ROBOTS" CONTENT="NOINDEX">
4: </HEAD><BODY>
5: <% Set r = CreateObject("ADODB.Recordset.1.5")
6: r.open "SELECT ProductID FROM Products", "UID=;PWD=;DSN=AdvWorks"
7: sf=0
8: sf=Request("sf")
9: i=0
10: if not sf = 0 then r.Move sf
11: while i < 1000 and not r.Eof %>
12: <a href="row.asp?id=<% =r ("ProductID") %>">Link </a>
13: <% i=i+1
14: r.movenext
15: Wend
16: sf = 1000+sf
17: if not r.eof then
18: Response.Write "<a href=dir.asp?sf=" & sf & "> NextPage </A>"
19: End If %>
20: </BODY></HTML>
<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\row.asp
1: <% Set r = createobject ("ADODB.Recordset.1.5") %>
2: <% r.Open "SELECT ProductID, ProductType, ProductName, ProductDescription FROM Products WHERE ProductID = " & Request.QueryString("id"), "UID=;PWD=;DSN=AdvWorks" %>
3: <% Dim x() %>
4: <% ReDim x(4) %>
5: <% x(0) = r ("ProductID") %>
6: <% x(1) = r ("ProductType") %>
7: <% x(2) = r ("ProductName") %>
8: <% x(3) = r ("ProductDescription") %>
9: <HTML><HEAD><TITLE>
10: <% y=VarType(x(2)) %>
11: <% if y <> 0 And y <> 1 Then %>
12: <% =server.htmlencode (x(2)) %>
13: <% End if %>
14: </TITLE>
15: <% y=VarType(x(0)) %>
16: <% if y <> 0 And y <> 1 Then %>
17: <META NAME="ProductID" content="<% =server.htmlencode (x(0)) %>">
18: <% End if %>
19: <% y=VarType(x(1)) %>
20: <% if y <> 0 And y <> 1 Then %>
21: <META NAME="ProductType" content="<% =server.htmlencode (x(1)) %>">
22: <% End if %>
23: </HEAD><BODY>
24: <% y=VarType(x(3)) %>
25: <% if y <> 0 And y <> 1 Then %>
26: <% =server.htmlencode (x(3)) %>
27: <% End if %>
28: </BODY></HTML>
<Install Directory>\SiteServer\Knowledge\Search\database\Search\view.asp
1: <% Response.expires=0 %>
2: <!--Copyright 1997, 1998 Microsoft Corporation. All rights reserved.-->
3: <% Set r = createobject ("ADODB.Recordset.1.5") %>
4: <% r.Open "SELECT ProductID, ProductCode, ProductType, ProductIntroductionDate, ProductName, ProductDescription, ProductSize, ProductImageURL, UnitPrice, OnSale FROM Products WHERE ProductID = " & Request.QueryString("id"), "UID=;PWD=;DSN=AdvWorks" %>
5: <% Dim x() %>
6: <% ReDim x(10) %>
7: <% x(0) = r ("ProductID") %>
8: <% x(1) = r ("ProductCode") %>
9: <% x(2) = r ("ProductType") %>
10: <% x(3) = r ("ProductIntroductionDate") %>
11: <% x(4) = r ("ProductName") %>
12: <% x(5) = r ("ProductDescription") %>
13: <% x(6) = r ("ProductSize") %>
14: <% x(7) = r ("ProductImageURL") %>
15: <% x(8) = r ("UnitPrice") %>
16: <% x(9) = r ("OnSale") %>
17: <HTML>
18: <HEAD>
19: <TITLE>
20: <% y=VarType(x(4)) %>
21: <% if y <> 0 And y <> 1 Then %>
22: <% =server.htmlencode (x(4)) %>
23: <% End if %>
24: </TITLE>
25: </HEAD>
26: <body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17 leftmargin=17 background="images/RightBkgd.gif">
27: <font face="Arial,Helvetica">
28: <b>
29: <center>
30: Database Record:
31: </center>
32: </b>
33: <b>
34: <center>
35: <% y=VarType(x(4)) %>
36: <% if y <> 0 And y <> 1 Then %>
37: <% =server.htmlencode (x(4)) %>
38: <% End if %>
39: </center>
40: </b>
41: <p>
42: <table border=1 cellpadding=5 align=center>
43: <tr>
44: <td><font size=2>
45: ProductName
46: </font></td>
47: <td><font size=2>
48: <% y=VarType(x(4)) %>
49: <% if y <> 0 And y <> 1 Then %>
50: <% =server.htmlencode (x(4)) %>
51: <% Else %>
52: <% =server.htmlencode ("<n/a>")%>
53: <% End if %>
54: </font></td>
55: </tr>
56: <tr>
57: <td><font size=2>
58: ProductID
59: </font></td>
60: <td><font size=2>
61: <% y=VarType(x(0)) %>
62: <% if y <> 0 And y <> 1 Then %>
63: <% =server.htmlencode (x(0)) %>
64: <% Else %>
65: <% =server.htmlencode ("<n/a>")%>
66: <% End if %>
67: </font></td>
68: </tr>
69: <tr>
70: <td><font size=2>
71: ProductType
72: </font></td>
73: <td><font size=2>
74: <% y=VarType(x(2)) %>
75: <% if y <> 0 And y <> 1 Then %>
76: <% =server.htmlencode (x(2)) %>
77: <% Else %>
78: <% =server.htmlencode ("<n/a>")%>
79: <% End if %>
80: </font></td>
81: </tr>
82: <tr>
83: <td><font size=2>
84: ProductDescription
85: </font></td>
86: <td><font size=2>
87: <% y=VarType(x(5)) %>
88: <% if y <> 0 And y <> 1 Then %>
89: <% =server.htmlencode (x(5)) %>
90: <% Else %>
91: <% =server.htmlencode ("<n/a>")%>
92: <% End if %>
93: </font></td>
94: </tr>
95: </BODY></HTML>
Dir.asp, when executed, creates a record set with all the rows in the specified table of the ODBC data source. This is accomplished by the SQL query "SELECT <Primary key column name> FROM <Table name>" (see line 6 of dir.asp). For each row in the record set, an HTML link to row.asp (line 12 of dir.asp) is created with the primary key value as the parameter. Row.asp takes a column value as a parameter, assuming it to be a value of a primary key. Using this parameter, row.asp fetches the data from the row specified. This is accomplished by the SQL query "SELECT <Column 1>, <Column 2>, … <Column N> FROM <Table name> WHERE <Primary key column name> = <Parameter>", where column 1 through N are columns marked to be indexed or marked as retrievable (line 2 of row.asp). The content column and the title column are indexed and marked as retrievable by default. Row.asp is used to generate an HTML page dynamically. The content of the title column is used as the HTML title (lines 7 and 12 of row.asp). The content of the content column is used as the HTML body (see lines 8 & 26 of row.asp). All other columns marked to be indexed or marked as retrievable are used in HTML meta tag: for example, "<META NAME="<Column M>" content="<content of Column M>">" (lines 5, 17, 6, and 21 of row.asp). When the catalog builder accesses dir.asp, all the links will be followed. Since each link represents a row in the ODBC data source, the Catalog Builder is able to index the ODBC data source.
View.asp and row.asp are closely related. The URL mapping automatically created by the wizard allows row.asp to retrieve data and view.asp to display data. For example, http://<HostName>/SiteServer/Knowledge/Search/database/Index/AdvWorks/row.asp is mapped to http://<HostName>/SiteServer/Knowledge/Search/database/Search/AdvWorks/view.asp. Row.asp and view.asp differ in two significant ways. Row.asp selects only columns that are marked to be indexed or retrieved while view.asp selects all columns. Row.asp puts indexed or retrieved columns into HTML meta tags beside the Content and Title columns. Although view.asp selects all columns, it only displays retrievable columns in a table. With all columns available, view.asp can easily be modified to present data in custom ways.
If a database is really large, storage space can be an issue. By shortening the URL for view.asp, storage space can be saved. Assuming a virtual root named "dbsearch" exists on the machine, the mapping can be changed by following the steps listed below. For information on how to create a virtual root, please refer to the Microsoft Internet Information Server (IIS) documentation.
A database catalog is actually just a crawl catalog, therefore general Site Server search troubleshooting techniques are just as valuable as the ones mentioned here. The following points are specific to the creation of a database catalog (for general Site Server search troubleshooting, please refer to the Site Server documentation):
If the wizard does not display a specific data source, make sure the ODBC data source is a system data source. To verify a data source, a tool like MS Query can be used. Use the same user name and password supplied to the wizard when verifying the data source. If user name or password turns out to be a problem, both can be changed in the generated dir.asp, row.asp, and view.asp. User name can be found in line 6 of dir.asp, line 2 of row.asp, and line 4 of view.asp (UID=<User name>). Password can be found in line 6 of dir.asp, line 2 of row.asp, and line 4 of view.asp (PWD=<Password>).
Dir.asp and row.asp are critical to the generation of the database catalog. To verify dir.asp, use your browser to open it. If it is working properly, you should see one or more links to row.asp (line 12 of row.asp). Click on the link to verify row.asp. View the HTML source to verify that the data retrieved from the database is indeed what was expected. If the wizard did not report any error while generating the pages and the pages are not modified after generation, but you cannot verify dir.asp and row.asp using your browser, the problem is most likely caused by ASP script timeout.
There are two classes of timeouts that affect the creation of a database catalog. One is the ASP script timeout and the other is the catalog build server timeout. To set the ASP script timeout:
To set the Catalog Build Server timeout, use the following steps:
If the SQL queries used in dir.asp and row.asp have been modified, use a tool such as Microsoft Query to verify that the queries do indeed produce the desired results. Once the queries are verified, it is a good idea to use a browser to verify the new scripts.
The Create New Catalog Definition wizard has not been tested with Oracle databases. With the information given in this document, one should have little or no trouble creating catalogs for Oracle databases. If you have trouble using the wizard on Oracle databases, the following tips can be used. Create a database catalog for AdvWorks as described previously, but give the catalog a different name. This gives you a set of scripts to work with. Modify the queries used in the scripts to retrieve the desired data. Change the connection parameters to the ones needed to connect to the Oracle database. After all the modifications are finished, verify the scripts.
To search across all the properties, use the query "@all <text>". The title column is indexed as content as well as doctitle. Title specific searches can done by using the query "@doctitle <text>".
The wizard only supports cases that involve a single table and a simple query. In order to use a more advanced query, you need to create all the pages from scratch or modify the pages generated by the wizard. Creating all the pages from scratch is beyond the scope of this document; therefore we will focus on modifying the pages generated by the wizard.
The query used to generate the row set is in line 6 of dir.asp, and the query used to get data from the row set is in line 2 of row.asp. Any query that is supported by ADO 1.5 can be used in place of the generated query. It is a good idea to verify the new query independently before trying to build the catalog. After the query is verified, the browser can be used to confirm that things are working properly. An example of how to incorporate an advanced query in dir.asp is included in the example below.
This example shows how to modify the query used in dir.asp. As example data, a few rows from the Products table in the AdvWorks database are shown in Table 2.
Table 2. Example data from the Products table of the AdvWorks database
ProductID | ProductName | ProductDescription | ProductType |
1 | North Face Sunspot | windproof, water-resistant, even heat distribution, storage sack included, red (re) | SleepingBag |
2 | Polar Star | ultralight goose-down insulated alpinist's bag, contour closure with locking zipper, storage sack included, green (gn) | SleepingBag |
3 | Big Sur | generously cut sleeping bag, goose down with polyester taffeta, cotton storage sack included, blue (bl) | SleepingBag |
4 | Cascade | backpacking sleeping bag, goose down with nylon shell, locking zipper, storage sack included, choice of zipper location, blue (bl) | SleepingBag |
5 | Everglades | medium-weight, waterproof leather boots, good traction, brown (br) | Boot |
7 | Rockies | lightweight Nubuck leather/nylon, water-resistant boots, polyurethane soles, green/gray (gg) or green/brown (gbn) | Boot |
13 | Starlight | 4-person dome tent, roomy, vaulted pole sleeves, black/gray (bg) | Tent |
If we only want to index sleeping bags, we can modify the dir.asp as follows:
1: <HTML><HEAD>
2: <TITLE></TITLE>
3: <META NAME="ROBOTS" CONTENT="NOINDEX">
4: </HEAD><BODY>
5: <% Set r = CreateObject("ADODB.Recordset.1.5")
6: r.open "SELECT ProductID FROM Products where ProductType='SleepingBag'", "UID=;PWD=;DSN=AdvWorks"
7: sf=0
8: sf=Request("sf")
9: i=0
10: if not sf = 0 then r.Move sf
11: while i < 1000 and not r.Eof %>
12: <a href="row.asp?id=<% =r ("ProductID") %>">Link </a>
13: <% i=i+1
14: r.movenext
15: Wend
16: sf = 1000+sf
17: if not r.eof then
18: Response.Write "<a href=dir.asp?sf=" & sf & "> NextPage </A>"
19: End If %>
20: </BODY></HTML>
Only line 6 of dir.asp needs to be changed. When viewed with the browser, the new dir.asp generates only one link for each sleeping bag in the database, resulting in a total of four links. Any valid query supported by ADO 1.5 can be used to customize the catalog being built. Row.asp and view.asp can be modified in a similar fashion.
For large databases (half-a-million records or more), decent performance requires a powerful machine (recommended: 200-MHz Dual Pentium Pro processor with 256 MB of RAM) and modifications to the default configuration of IIS and Site Server. ASP script timeout should be set to at least 900 seconds. Both the "Wait for connection" and "Wait for request acknowledgement" timeouts should be set to at least 1000 seconds. Refer to the troubleshooting section for details on how to set the timeout values.
To handle large databases, changing the timeouts alone might not be sufficient. When there are millions of rows in a database, generating links to all the rows in dir.asp can be prohibitively expensive. In order to overcome this issue, a new wizard file (dbwiz6.asp) is available with this article. The original wizard maps the title column to DocTitle and uses the content column as the document content. The updated wizard will do the same, but it also makes the title column and content column available as HTML meta tags. This way, one can perform specific searches on title and content.
To use the new wizard file, follow the following steps:
Assuming the same catalog definition discussed in this document is created using the new wizard, a different dir.asp will be generated, as follows:
<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\dir.asp
1: <HTML><HEAD>
2: <TITLE></TITLE>
3: <META NAME="ROBOTS" CONTENT="NOINDEX">
4: </HEAD><BODY>
5: <% Set r = CreateObject("ADODB.Recordset.1.5")
6: r.open "SELECT ProductID FROM Products", "UID=;PWD=;DSN=AdvWorks"
7: sf=0
8: sf=Request("sf")
9: l2 = 0
10: l2 = Request("l2")
11: i=0
12: if not l2 = 0 then
13: r.Move sf
14: while i < 10000 and not r.Eof %>
15: <a href="row.asp?id=<% =r ("ProductID") %>">Link </a>
16: <%
17: i=i+1
18: r.movenext
19: wend
20: sf = 10000+sf
21: if not r.eof then
22: Response.Write "<a href=dir.asp?sf=" & sf & "&l2=1 " & "> NextPage </A>"
23: end if %>
24: <%
25: else
26: i=0
27: while not r.Eof
28: LinkName = "Page " & i & " "
29: Response.Write "<a href=dir.asp?sf=" & i & "&l2=1 " & "> " & LinkName & "</A>"
30: i = i+10000
31: r.Move 10000
32: wend
33: end if
34: %>
35: </BODY></HTML>
Instead of linking directly to row.asp for each row in the database, the new dir.asp uses a two-level hierarchy. The new dir.asp takes two parameters. When the second parameter does not exist (or is zero), a link (see line 29) back to dir.asp is created for every 10,000 rows in the database. These links use the first parameter to indicate which 10,000 rows the links need to be generated for (see line 13 and 15). With this new structure, links can be generated and processed simultaneously.