PRB: Use of Clustered Index Improves Self-Join PerformanceLast reviewed: April 25, 1997Article ID: Q64259 |
The information in this article applies to:
- Microsoft SQL Server version 4.2 for OS/2
SYMPTOMSWe have a table search_list that is created using the following command:
CREATE TABLE products..search_list (keyword char(10) NOT NULL, catalog_number char(7) NOT NULL)Approximately 350,000 rows of data are downloaded into this table from an ASCII file using the SQL bulk copy utility. The following commands are then run to establish the file index:
CREATE INDEX search_index ON search_list(keyword) UPDATE STATISTICS search_listWhen we perform a text search, we want to find all catalog numbers that match every keyword provided for the search. An individual key search may produce up to 6000 matches. Method 1, as follows, incorporates a self-join of the search_list table:
SELECT s1.catalog_number FROM search_list s1, search_list s2, search_list s3 WHERE s1.keyword LIKE 'ultra%' AND s2.keyword LIKE 'film%' AND s3.keyword LIKE 'color%' AND s1.catalog_number = s2.catalog_number AND s2.catalog_number = s3.catalog_numberMethod 2, as follows, uses SELECT INTO to form several intermediate tables, and has proven to be the most time efficient method:
SELECT catalog_number INTO S1 WHERE keyword LIKE 'ultra%' SELECT catalog_number INTO S2 WHERE keyword LIKE 'color%' SELECT catalog_number INTO S3 WHERE keyword LIKE 'film%' SELECT S1.catalog_number FROM S1, S2, S3 WHERE S1.catalog_number = S2.catalog_number AND S2.catalog_number = S3.catalog_number DROP TABLE S1 DROP TABLE S2 DROP TABLE S3When performing a single key search, the difference between the two methods is minimal. However, as the number of keys grows, or the number of matches per key increases, Method 2 is increasingly more efficient than Method 1. Is there something in our system or data configuration that could cause the self-join approach to degrade?
WORKAROUNDTo improve the performance of the self-join, drop the search_index index and re-create it as a clustered index as follows:
CREATE CLUSTERED INDEX search_index ON search_list(keyword) UPDATE STATISTICS search_listSQL Server will then use the index as opposed to a table scan when optimizing the Method 1 query. Please note that there may be only one clustered index per table. The clustered index ensures that the data is stored in the order of the specified keys (in this case, keyword order). For queries on larger tables, the optimizer will use a clustered index where applicable to reduce the number of rows that need to be searched. While this can markedly improve your query performance, there is an associated overhead involved when new data is inserted into or deleted from the table because the data must be stored in sorted order. Therefore, a clustered index is most useful for relatively static tables. If SET SHOWPLAN ON is activated prior to execution, some of the internal steps taken by the Server in processing a query can be viewed. In this case, it showed that the index was not being used before it was changed to a clustered index.
|
Additional query words:
© 1998 Microsoft Corporation. All rights reserved. Terms of Use. |