ID Number: Q64259
1.00 1.10 1.11 4.20
OS/2
Summary:
SYMPTOMS
We have a table search_list that is created using the following
command:
CREATE TABLE products..search_list
(keyword char(10) NOT NULL,
catalog_number char(7) NOT NULL)
Approximately 350,000 rows of data are downloaded into this table
from an ASCII file using the SQL bulk copy utility. The following
commands are then run to establish the file index:
CREATE INDEX search_index ON search_list(keyword)
UPDATE STATISTICS search_list
When we perform a text search, we want to find all catalog numbers
that match every keyword provided for the search. An individual key
search may produce up to 6000 matches.
Method 1, as follows, incorporates a self-join of the search_list
table:
SELECT s1.catalog_number FROM
search_list s1, search_list s2, search_list s3
WHERE s1.keyword LIKE 'ultra%' AND
s2.keyword LIKE 'film%' AND
s3.keyword LIKE 'color%' AND
s1.catalog_number = s2.catalog_number AND
s2.catalog_number = s3.catalog_number
Method 2, as follows, uses SELECT INTO to form several intermediate
tables, and has proven to be the most time efficient method:
SELECT catalog_number INTO S1 WHERE keyword LIKE 'ultra%'
SELECT catalog_number INTO S2 WHERE keyword LIKE 'color%'
SELECT catalog_number INTO S3 WHERE keyword LIKE 'film%'
SELECT S1.catalog_number FROM S1, S2, S3
WHERE S1.catalog_number = S2.catalog_number AND
S2.catalog_number = S3.catalog_number
DROP TABLE S1
DROP TABLE S2
DROP TABLE S3
When performing a single key search, the difference between the two
methods is minimal. However, as the number of keys grows, or the
number of matches per key increases, Method 2 is increasingly more
efficient than Method 1.
Is there something in our system or data configuration that could
cause the self-join approach to degrade?
RESOLUTION
To improve the performance of the self-join, drop the search_index
index and re-create it as a clustered index as follows:
CREATE CLUSTERED INDEX search_index ON search_list(keyword)
UPDATE STATISTICS search_list
SQL Server will then use the index as opposed to a table scan when
optimizing the Method 1 query. Please note that there may be only
one clustered index per table. The clustered index ensures that the
data is stored in the order of the specified keys (in this case,
keyword order). For queries on larger tables, the optimizer will
use a clustered index where applicable to reduce the number of rows
that need to be searched. While this can markedly improve your
query performance, there is an associated overhead involved when
new data is inserted into or deleted from the table because the
data must be stored in sorted order. Therefore, a clustered index
is most useful for relatively static tables.
If SET SHOWPLAN ON is activated prior to execution, some of the
internal steps taken by the Server in processing a query can be
viewed. In this case, it showed that the index was not being used
before it was changed to a clustered index.
Additional reference words: 1.00 1.10 4.20