Index Selection

How indexes are chosen significantly affects the amount of disk I/O generated and, subsequently, performance. Nonclustered indexes are appropriate for retrieving a few rows and clustered indexes are good for range scans. In addition, you should try to keep indexes compact (few columns and bytes). This is especially true for clustered indexes because nonclustered indexes use the clustered index to locate row data. For more information, see SQL Server Books Online.

Consider selectivity for nonclustered indexes because, if a nonclustered index is created on a large table with only a few unique values, use of that nonclustered index does not save I/O during data retrieval. In fact, using the index causes much more I/O than a sequential table scan of the table. Possible candidates for a nonclustered index include invoice numbers, unique customer numbers, social security numbers, and telephone numbers.

Clustered indexes are much better than nonclustered indexes for queries that match columns or search for ranges of columns that do not have many unique values, because the clustered index physically orders the table data and allows for sequential 64-KB I/O on the key values. Possible candidates for a clustered index include states, company branches, date of sale, zip codes, and customer district. Defining a clustered index on the columns that have unique values is not beneficial unless typical queries on the system fetch large sequential ranges of the unique values. To pick the best column on each table to create the clustered index, ask if there will be many queries that must fetch many rows based on the order of this column. The answer is very specific to each user environment. One company may do more queries based on ranges of dates, whereas another company may do many queries based on ranges of bank branches.

These are examples of WHERE clauses that benefit from clustered indexes:

WHERE column_name >some_value

WHERE column_name BETWEEN some_value AND some_value

WHERE column_name < some_value