PRB: Use of Clustered Index Improves Self-Join Performance

ID Number: Q64259

1.00 1.10 1.11 4.20

OS/2

Summary:

SYMPTOMS

We have a table search_list that is created using the following

command:

CREATE TABLE products..search_list

(keyword char(10) NOT NULL,

catalog_number char(7) NOT NULL)

Approximately 350,000 rows of data are downloaded into this table

from an ASCII file using the SQL bulk copy utility. The following

commands are then run to establish the file index:

CREATE INDEX search_index ON search_list(keyword)

UPDATE STATISTICS search_list

When we perform a text search, we want to find all catalog numbers

that match every keyword provided for the search. An individual key

search may produce up to 6000 matches.

Method 1, as follows, incorporates a self-join of the search_list

table:

SELECT s1.catalog_number FROM

search_list s1, search_list s2, search_list s3

WHERE s1.keyword LIKE 'ultra%' AND

s2.keyword LIKE 'film%' AND

s3.keyword LIKE 'color%' AND

s1.catalog_number = s2.catalog_number AND

s2.catalog_number = s3.catalog_number

Method 2, as follows, uses SELECT INTO to form several intermediate

tables, and has proven to be the most time efficient method:

SELECT catalog_number INTO S1 WHERE keyword LIKE 'ultra%'

SELECT catalog_number INTO S2 WHERE keyword LIKE 'color%'

SELECT catalog_number INTO S3 WHERE keyword LIKE 'film%'

SELECT S1.catalog_number FROM S1, S2, S3

WHERE S1.catalog_number = S2.catalog_number AND

S2.catalog_number = S3.catalog_number

DROP TABLE S1

DROP TABLE S2

DROP TABLE S3

When performing a single key search, the difference between the two

methods is minimal. However, as the number of keys grows, or the

number of matches per key increases, Method 2 is increasingly more

efficient than Method 1.

Is there something in our system or data configuration that could

cause the self-join approach to degrade?

RESOLUTION

To improve the performance of the self-join, drop the search_index

index and re-create it as a clustered index as follows:

CREATE CLUSTERED INDEX search_index ON search_list(keyword)

UPDATE STATISTICS search_list

SQL Server will then use the index as opposed to a table scan when

optimizing the Method 1 query. Please note that there may be only

one clustered index per table. The clustered index ensures that the

data is stored in the order of the specified keys (in this case,

keyword order). For queries on larger tables, the optimizer will

use a clustered index where applicable to reduce the number of rows

that need to be searched. While this can markedly improve your

query performance, there is an associated overhead involved when

new data is inserted into or deleted from the table because the

data must be stored in sorted order. Therefore, a clustered index

is most useful for relatively static tables.

If SET SHOWPLAN ON is activated prior to execution, some of the

internal steps taken by the Server in processing a query can be

viewed. In this case, it showed that the index was not being used

before it was changed to a clustered index.

Additional reference words: 1.00 1.10 4.20