Index Organization

An index (clustered or nonclustered) speeds retrieval of the rows in a table. An index contains key values built from one or more columns in a table. These keys are stored in a structure that allows SQL Server to find rows associated with the key values quickly and efficiently. If a table has no indexes, the data rows are not stored in any particular order, which is called a heap.

Clustered Indexes

In a clustered index, the order of the values in the index is the same as the order of the data in the table. (To some degree, the storage of the data is the index.) A clustered index is analogous to the alphabetic listings contained in a telephone directory.

Because a clustered index dictates the storage of the table’s data, a table can contain only one clustered index; however, an index can be created on multiple columns. For example, a telephone directory is organized by last name, and then by first name.

A clustered index contains a hierarchical tree, with the range of values stored in a given area of the index. When searching for data based on a clustered-index value, SQL Server quickly isolates the page with the specified value, and then searches the page for the records with the specified value. The lowest level, or leaf node, of the index tree is the page that contains the data.

Nonclustered Indexes

A nonclustered index is analogous to an index in a textbook, in which the data is stored in one place and the index is stored in another. Pointers indicate the storage location of the index items in the data. The leaf node of a nonclustered index is the storage location (its page number and offset in the page) of the index entry. Therefore, a nonclustered index has an extra level between the index structure and the data itself.

When SQL Server searches for data based on a nonclustered index, it searches the index for the specified value to obtain the location of the rows of data, and then retrieves the data directly from its storage locations. This makes nonclustered indexes the optimal choice for exact-match queries.

Some nonclustered indexes contain multiple indexes. You can define a nonclustered index for each of the columns commonly used to find data in the table.

Because nonclustered indexes store clustered index keys as pointers to data rows, it is important to keep clustered index keys as small as possible. Avoid choosing large columns as the keys to clustered indexes if a table also has nonclustered indexes.

SQL Server 7.0 supports up to 249 nonclustered indexes on each table. Nonclustered indexes and clustered indexes share a similar B-tree structure. The difference is that nonclustered indexes have no effect on the order of the data rows. The collection of data pages for a heap is not affected if nonclustered indexes are defined for the table.

Index Distribution Statistics

All indexes have distribution statistics that describe the selectivity and distribution of the key values in the index. Selectivity is a property that relates to how many rows are typically identified by a key value. A unique key value has high selectivity; a nonunique key value (one found in 1,000 rows, for example) has poor selectivity.

The selectivity and distribution statistics are used by SQL Server to optimize its navigation through tables when processing Transact-SQL statements. The statistics for each index are not limited to a single page, but are stored as a long string of bits across multiple pages in the same way image data is stored.