Distribution Statistics

All indexes have distribution statistics that describe the selectivity and distribution of the key values in the index. Selectivity is a property that relates to how many rows are typically identified by a key value. A unique key has high selectivity, a key value that is found in 1,000 rows has poor selectivity. The selectivity and distribution statistics are used by Microsoft® SQL Server™ to optimize its navigation through tables when processing Transact-SQL statements. The distribution statistics are used to estimate how efficient an index would be in retrieving data associated with a key value or range specified in the query. The statistics for each index are not limited to a single page but are stored as a long string of bits across multiple pages in the same way image data is stored. The column sysindexes.statblob points to this distribution data. You can use the DBCC SHOW_STATISTICS statement to get a report on the distribution statistics for an index.

Distribution statistics may also be maintained for unindexed columns. These can be defined manually using the CREATE STATISTICS statement or created automatically by the query optimizer. Statistics on unindexed columns count against the limit of 249 nonclustered indexes allowed on a table.

To be useful to the optimizer, distribution statistics must be kept reasonably current. The distribution statistics should be refreshed anytime there are significant numbers of changes to keys in the index. Distribution statistics can be updated manually using the UPDATE STATISTICS statement. SQL Server 7.0 can also detect when distribution statistics are out of date and update the statistics automatically. This update is performed by the task that detected that the statistics needed to be updated. The update is performed using a sophisticated sampling method that minimizes the effect of the update on transaction throughput.

Distribution Statistics

See Also