Why Is Index Selection Difficult?

Despite a long history of development in the area of index selection, no widely deployed commercial products are available that select indexes automatically. Several factors make automating physical database design extremely difficult.

First, when viewed as a search problem, the variety of alternatives for indexes is large. A database may have many tables and each table may have many columns that need to be considered for indexing. An index may be clustered or nonclustered. Furthermore, in recommending a set of indexes, we cannot restrict ourselves to single-column indexes only. Considering multicolumn indexes increases the search space dramatically because for a given set of k columns, k! multicolumn indexes are possible.

Second, the textbook solution of using semantic information such as uniqueness, reference constraints, and rudimentary statistics ("small" versus "big" tables) to produce a physical database design leads to poor performance because it ignores valuable information on usage statistics. For example, the indexing requirement for a decision-support application is very different from the requirements of online transaction processing (OLTP) applications.

Third, even when index selection tools have taken the usage statistics into account, they suffer from being disconnected from the query processor. Modern query optimizers use complex strategies such as index intersection and index-only access. For example, if a table has 100 columns but a query references only 4 of the 100 columns, an index on those 4 columns may benefit the query significantly even if the query has no selection condition on any of the 4 columns. This is because the index acts as a vertical partition and saves the cost of scanning the remaining 96 columns of the table. Similarly, if a query has two selection conditions on columns A and B of a table, the query optimizer may choose to use indexes on both A and B to answer the query by taking their intersection. An index selection tool that does not take into account these strategies of the query processor can result in gross inefficiencies and poor quality of design. Therefore, even tools that adopt an expert system-like approach are unsatisfactory because they often rely on an inaccurate model of index usage by the query processor.

Finally, even if we are successful in identifying an ideal set of indexes for each Transact-SQL statement in the workload, it is challenging to obtain a set of indexes that acts as the best compromise, particularly when the workload contains queries as well as INSERT, DELETE, and UPDATE statements.

The Index Tuning Wizard in SQL Server 7.0 can help you avoid these problems. It is guided by usage statistics, synchronized with the query processor in evaluating promise of indexes, and it uses a unique search strategy to navigate the search space of indexes.