Microsoft Site Server 3.0 Search Database Indexing

Mike Cheng
Microsoft Corporation

May 1998

Click to copy the updated wizard discussed in the Appendix of this article.

Introduction

Site Server Search database indexing enables full text indexing of Microsoft® SQL Server™ and Access databases using ODBC and Active Server Pages (ASP). Microsoft Site Server 3.0 provides the easy-to-use Create New Catalog Definition Wizard for creating search indexes. This document explains how database indexing works, provides troubleshooting tips, and demonstrates how to accomplish advanced tasks.

Creating the Database Catalog

At the heart of the Site Server search functionality is the search catalog—essentially an index and property store that can be searched. Creating the database catalog is a two-step process. First, the Create New Catalog Definition Wizard collects the following information about the database catalog from the user:

It then creates the following set of .asp and .htm files and scripts, which are used for either the indexing or searching process:

After these files are generated, Site Server Search creates the catalog using dir.asp and row.asp. Dir.asp generates links to records in the database that are fetched by row.asp; both use ActiveX® Data Objects (ADO) to access the ODBC data source. This document focuses on dir.asp, row.asp, and view.asp. For more information on search.htm and results.asp, please refer to the Site Server documentation.

Example

Let's walk through creating a database catalog using the Products table of the AdvWorks database, a sample database included with Site Server. The schema for the Products table is shown in Table 1.

Table 1. The Products table of the AdvWorks sample database

Column name Data type
ProductID COUNTER, Nullable
ProductCode TEXT(10), Nullable
ProductType TEXT(20), Nullable
ProductIntroductionDate DATETIME, Nullable
ProductName TEXT(50), Nullable
ProductDescription TEXT(255), Nullable
ProductSize TEXT(5), Nullable
ProductImageURL TEXT(255), Nullable
UnitPrice DOUBLE, Nullable
OnSale BIT, Required

To create a database catalog project for the AdvWorks database:

  1. Start Site Server Service Admin (HTML).

  2. Click Search.

  3. On the menu frame, click Catalog Build Definitions.

  4. On the Catalog Build Definitions page, click Create.

    The Create New Catalog Definition Wizard appears.

  5. Select A database, and then click Next.

  6. Specify AdvWorks as the ODBC data source and as the catalog name. Leave the user name and password blank.

  7. Click Next.

  8. Select the Products table, and then click Next.

  9. Select ProductDescription as the Content column, ProductID as the Primary key column, and ProductName as the Hyperlink column.

  10. Click Next.

    The columns (except ProductDescription and ProductName) from the Products table are listed.

  11. Select ProductID and ProductType as searchable and as retrievable.

  12. To build the database catalog at this time, select Build the database catalog now.

  13. Click Next.

  14. Click Finish.

The scripts generated by the wizard for dir.asp, row.asp, and view.asp follow.

dir.asp

<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\dir.asp
 1: <HTML><HEAD>
 2: <TITLE></TITLE>
 3: <META NAME="ROBOTS" CONTENT="NOINDEX">
 4: </HEAD><BODY>
 5: <% Set r = CreateObject("ADODB.Recordset.1.5") 
 6:  r.open "SELECT ProductID FROM Products", "UID=;PWD=;DSN=AdvWorks" 
 7: sf=0
 8: sf=Request("sf")
 9: i=0
10: if not sf = 0 then r.Move sf
11: while i < 1000 and not r.Eof %>
12: <a href="row.asp?id=<% =r ("ProductID") %>">Link &nbsp;</a>
13: <% i=i+1
14: r.movenext
15: Wend
16: sf = 1000+sf
17: if not r.eof then
18: Response.Write "<a href=dir.asp?sf=" & sf & "> NextPage </A>"
19: End If %>
20: </BODY></HTML>

row.asp

<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\row.asp
 1: <% Set r = createobject ("ADODB.Recordset.1.5") %>
 2: <% r.Open "SELECT ProductID, ProductType, ProductName, ProductDescription FROM Products WHERE ProductID = " & Request.QueryString("id"), "UID=;PWD=;DSN=AdvWorks" %>
 3: <% Dim x() %>
 4: <% ReDim x(4) %>
 5: <% x(0) = r ("ProductID") %>
 6: <% x(1) = r ("ProductType") %>
 7: <% x(2) = r ("ProductName") %>
 8: <% x(3) = r ("ProductDescription") %>
 9: <HTML><HEAD><TITLE>
10: <% y=VarType(x(2)) %>
11: <% if y <> 0 And y <> 1 Then %>
12: <% =server.htmlencode (x(2)) %>
13: <% End if %>
14: </TITLE>
15: <% y=VarType(x(0)) %>
16: <% if y <> 0 And y <> 1 Then %>
17: <META NAME="ProductID" content="<% =server.htmlencode (x(0)) %>">
18: <% End if %>
19: <% y=VarType(x(1)) %>
20: <% if y <> 0 And y <> 1 Then %>
21: <META NAME="ProductType" content="<% =server.htmlencode (x(1)) %>">
22: <% End if %>
23: </HEAD><BODY>
24: <% y=VarType(x(3)) %>
25: <% if y <> 0 And y <> 1 Then %>
26: <% =server.htmlencode (x(3)) %>
27: <% End if %>
28: </BODY></HTML>

view.asp

<Install Directory>\SiteServer\Knowledge\Search\database\Search\view.asp
 1: <% Response.expires=0 %>
 2: <!--Copyright 1997, 1998 Microsoft Corporation. All rights reserved.-->
 3: <% Set r = createobject ("ADODB.Recordset.1.5") %>
 4: <% r.Open "SELECT ProductID, ProductCode, ProductType, ProductIntroductionDate, ProductName, ProductDescription, ProductSize, ProductImageURL, UnitPrice, OnSale FROM Products WHERE ProductID = " & Request.QueryString("id"), "UID=;PWD=;DSN=AdvWorks" %>
 5: <% Dim x() %>
 6: <% ReDim x(10) %>
 7: <% x(0) = r ("ProductID") %>
 8: <% x(1) = r ("ProductCode") %>
 9: <% x(2) = r ("ProductType") %>
10: <% x(3) = r ("ProductIntroductionDate") %>
11: <% x(4) = r ("ProductName") %>
12: <% x(5) = r ("ProductDescription") %>
13: <% x(6) = r ("ProductSize") %>
14: <% x(7) = r ("ProductImageURL") %>
15: <% x(8) = r ("UnitPrice") %>
16: <% x(9) = r ("OnSale") %>
17: <HTML>
18: <HEAD>
19: <TITLE>
20: <% y=VarType(x(4)) %>
21: <% if y <> 0 And y <> 1 Then %>
22: <% =server.htmlencode (x(4)) %>
23: <% End if %>
24: </TITLE>
25: </HEAD>
26: <body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17 leftmargin=17 background="images/RightBkgd.gif">
27: <font face="Arial,Helvetica">
28: <b>
29: <center>
30: Database Record:
31: </center>
32: </b>
33: <b>
34: <center>
35: <% y=VarType(x(4)) %>
36: <% if y <> 0 And y <> 1 Then %>
37: <% =server.htmlencode (x(4)) %>
38: <% End if %>
39: </center>
40: </b>
41: <p>
42: <table border=1 cellpadding=5 align=center>
43: <tr>
44:   <td><font size=2>
45:     ProductName
46:   </font></td>
47:   <td><font size=2>
48: <% y=VarType(x(4)) %>
49: <% if y <> 0 And y <> 1 Then %>
50:     <% =server.htmlencode (x(4)) %>
51: <% Else %>
52:     <% =server.htmlencode ("<n/a>")%>
53: <% End if %>
54:   </font></td>
55: </tr>
56: <tr>
57:   <td><font size=2>
58:     ProductID
59:   </font></td>
60:   <td><font size=2>
61: <% y=VarType(x(0)) %>
62: <% if y <> 0 And y <> 1 Then %>
63:     <% =server.htmlencode (x(0)) %>
64: <% Else %>
65:     <% =server.htmlencode ("<n/a>")%>
66: <% End if %>
67:   </font></td>
68: </tr>
69: <tr>
70:  <td><font size=2>
71:     ProductType
72:   </font></td>
73:   <td><font size=2>
74: <% y=VarType(x(2)) %>
75: <% if y <> 0 And y <> 1 Then %>
76:     <% =server.htmlencode (x(2)) %>
77: <% Else %>
78:     <% =server.htmlencode ("<n/a>")%>
79: <% End if %>
80:   </font></td>
81: </tr>
82: <tr>
83:   <td><font size=2>
84:     ProductDescription
85:   </font></td>
86:   <td><font size=2>
87: <% y=VarType(x(5)) %>
88: <% if y <> 0 And y <> 1 Then %>
89:     <% =server.htmlencode (x(5)) %>
90: <% Else %>
91:     <% =server.htmlencode ("<n/a>")%>
92: <% End if %>
93:   </font></td>
94: </tr>
95: </BODY></HTML>

How It Works

Dir.asp, when executed, creates a record set with all the rows in the specified table of the ODBC data source. This is accomplished by the SQL query "SELECT <Primary key column name> FROM <Table name>" (see line 6 of dir.asp). For each row in the record set, an HTML link to row.asp (line 12 of dir.asp) is created with the primary key value as the parameter. Row.asp takes a column value as a parameter, assuming it to be a value of a primary key. Using this parameter, row.asp fetches the data from the row specified. This is accomplished by the SQL query "SELECT <Column 1>, <Column 2>, … <Column N> FROM <Table name> WHERE <Primary key column name> = <Parameter>", where column 1 through N are columns marked to be indexed or marked as retrievable (line 2 of row.asp). The content column and the title column are indexed and marked as retrievable by default. Row.asp is used to generate an HTML page dynamically. The content of the title column is used as the HTML title (lines 7 and 12 of row.asp). The content of the content column is used as the HTML body (see lines 8 & 26 of row.asp). All other columns marked to be indexed or marked as retrievable are used in HTML meta tag: for example, "<META NAME="<Column M>" content="<content of Column M>">" (lines 5, 17, 6, and 21 of row.asp). When the catalog builder accesses dir.asp, all the links will be followed. Since each link represents a row in the ODBC data source, the Catalog Builder is able to index the ODBC data source.

Mapping

View.asp and row.asp are closely related. The URL mapping automatically created by the wizard allows row.asp to retrieve data and view.asp to display data. For example, http://<HostName>/SiteServer/Knowledge/Search/database/Index/AdvWorks/row.asp is mapped to http://<HostName>/SiteServer/Knowledge/Search/database/Search/AdvWorks/view.asp. Row.asp and view.asp differ in two significant ways. Row.asp selects only columns that are marked to be indexed or retrieved while view.asp selects all columns. Row.asp puts indexed or retrieved columns into HTML meta tags beside the Content and Title columns. Although view.asp selects all columns, it only displays retrievable columns in a table. With all columns available, view.asp can easily be modified to present data in custom ways.

If a database is really large, storage space can be an issue. By shortening the URL for view.asp, storage space can be saved. Assuming a virtual root named "dbsearch" exists on the machine, the mapping can be changed by following the steps listed below. For information on how to create a virtual root, please refer to the Microsoft Internet Information Server (IIS) documentation.

  1. Copy <Install Directory>\SiteServer\Knowledge\Search\database\Search\view.asp to <virtual root dbsearch>\view.asp.

  2. In the Microsoft Management Console (MMC), open the property page of the AdvWorks catalog under Catalog Build Server.

  3. Select the URLs property tab.

  4. Select the existing mapping.

  5. Click Remove.

  6. Click Yes.

  7. Click Add.

  8. Click Next.

  9. Enter http://<HostName>/SiteServer/Knowledge/Search/database/Index/AdvWorks/row.asp as the Access location.

  10. Enter http://<HostName>/dbsearch/view.asp as the Display location.

  11. Click Finish.

Troubleshooting

A database catalog is actually just a crawl catalog, therefore general Site Server search troubleshooting techniques are just as valuable as the ones mentioned here. The following points are specific to the creation of a database catalog (for general Site Server search troubleshooting, please refer to the Site Server documentation):

Advanced Database Query

The wizard only supports cases that involve a single table and a simple query. In order to use a more advanced query, you need to create all the pages from scratch or modify the pages generated by the wizard. Creating all the pages from scratch is beyond the scope of this document; therefore we will focus on modifying the pages generated by the wizard.

The query used to generate the row set is in line 6 of dir.asp, and the query used to get data from the row set is in line 2 of row.asp. Any query that is supported by ADO 1.5 can be used in place of the generated query. It is a good idea to verify the new query independently before trying to build the catalog. After the query is verified, the browser can be used to confirm that things are working properly. An example of how to incorporate an advanced query in dir.asp is included in the example below.

Limitations

Advanced Database Query Example

This example shows how to modify the query used in dir.asp. As example data, a few rows from the Products table in the AdvWorks database are shown in Table 2.

Table 2. Example data from the Products table of the AdvWorks database

ProductID ProductName ProductDescription ProductType
1 North Face Sunspot windproof, water-resistant, even heat distribution, storage sack included, red (re) SleepingBag
2 Polar Star ultralight goose-down insulated alpinist's bag, contour closure with locking zipper, storage sack included, green (gn) SleepingBag
3 Big Sur generously cut sleeping bag, goose down with polyester taffeta, cotton storage sack included, blue (bl) SleepingBag
4 Cascade backpacking sleeping bag, goose down with nylon shell, locking zipper, storage sack included, choice of zipper location, blue (bl) SleepingBag
5 Everglades medium-weight, waterproof leather boots, good traction, brown (br) Boot
7 Rockies lightweight Nubuck leather/nylon, water-resistant boots, polyurethane soles, green/gray (gg) or green/brown (gbn) Boot
13 Starlight 4-person dome tent, roomy, vaulted pole sleeves, black/gray (bg) Tent

If we only want to index sleeping bags, we can modify the dir.asp as follows:

Modified dir.asp

 1: <HTML><HEAD>
 2: <TITLE></TITLE>
 3: <META NAME="ROBOTS" CONTENT="NOINDEX">
 4: </HEAD><BODY>
 5: <% Set r = CreateObject("ADODB.Recordset.1.5") 
 6: r.open "SELECT ProductID FROM Products where ProductType='SleepingBag'", "UID=;PWD=;DSN=AdvWorks" 
 7: sf=0
 8: sf=Request("sf")
 9: i=0
10: if not sf = 0 then r.Move sf
11: while i < 1000 and not r.Eof %>
12: <a href="row.asp?id=<% =r ("ProductID") %>">Link &nbsp;</a>
13: <% i=i+1
14: r.movenext
15: Wend
16: sf = 1000+sf
17: if not r.eof then
18: Response.Write "<a href=dir.asp?sf=" & sf & "> NextPage </A>"
19: End If %>
20: </BODY></HTML>

Only line 6 of dir.asp needs to be changed. When viewed with the browser, the new dir.asp generates only one link for each sleeping bag in the database, resulting in a total of four links. Any valid query supported by ADO 1.5 can be used to customize the catalog being built. Row.asp and view.asp can be modified in a similar fashion.

Appendix: Handling Large Databases with an Updated Wizard

For large databases (half-a-million records or more), decent performance requires a powerful machine (recommended: 200-MHz Dual Pentium Pro processor with 256 MB of RAM) and modifications to the default configuration of IIS and Site Server. ASP script timeout should be set to at least 900 seconds. Both the "Wait for connection" and "Wait for request acknowledgement" timeouts should be set to at least 1000 seconds. Refer to the troubleshooting section for details on how to set the timeout values.

To handle large databases, changing the timeouts alone might not be sufficient. When there are millions of rows in a database, generating links to all the rows in dir.asp can be prohibitively expensive. In order to overcome this issue, a new wizard file (dbwiz6.asp) is available with this article. The original wizard maps the title column to DocTitle and uses the content column as the document content. The updated wizard will do the same, but it also makes the title column and content column available as HTML meta tags. This way, one can perform specific searches on title and content.

To use the new wizard file, follow the following steps:

  1. Rename dbwiz6.asp in <Install Directory>\SiteServer\Admin\Knowledge\Search\ to dbwiz6.bak.

  2. Copy the new dbwiz6.asp to <Install Directory>\SiteServer\Admin\Knowledge\Search\ as dbwiz6.asp.

Assuming the same catalog definition discussed in this document is created using the new wizard, a different dir.asp will be generated, as follows:

<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\dir.asp
 1: <HTML><HEAD>
 2: <TITLE></TITLE>
 3: <META NAME="ROBOTS" CONTENT="NOINDEX">
 4: </HEAD><BODY>
 5: <% Set r = CreateObject("ADODB.Recordset.1.5") 
 6:  r.open "SELECT ProductID FROM Products", "UID=;PWD=;DSN=AdvWorks" 
 7: sf=0
 8: sf=Request("sf")
 9: l2 = 0
10: l2 = Request("l2")
11: i=0
12: if not l2 = 0 then
13:  r.Move sf
14:   while i < 10000 and not r.Eof %>
15: <a href="row.asp?id=<% =r ("ProductID") %>">Link &nbsp;</a>
16: <%
17:     i=i+1
18:     r.movenext
19:   wend
20:   sf = 10000+sf
21:   if not r.eof then
22:     Response.Write "<a href=dir.asp?sf=" & sf & "&l2=1 " & "> NextPage </A>"
23:   end if %>
24: <%
25: else
26:   i=0
27:   while not r.Eof
28:     LinkName = "Page " & i & " "
29:     Response.Write "<a href=dir.asp?sf=" & i & "&l2=1 " & "> " & LinkName & "</A>"
30:     i = i+10000
31:     r.Move 10000
32:   wend
33: end if
34: %>
35: </BODY></HTML>

Instead of linking directly to row.asp for each row in the database, the new dir.asp uses a two-level hierarchy. The new dir.asp takes two parameters. When the second parameter does not exist (or is zero), a link (see line 29) back to dir.asp is created for every 10,000 rows in the database. These links use the first parameter to indicate which 10,000 rows the links need to be generated for (see line 13 and 15). With this new structure, links can be generated and processed simultaneously.