Microsoft® SQL Server™ allows statistical information regarding the distribution of values in a column to be created. This statistical information can be used by the query processor to determine the optimal strategy for evaluating a query. When you create an index, SQL Server automatically stores statistical information regarding the distribution of values in the indexed column(s). The query optimizer in SQL Server uses these statistics to estimate the cost of using the index for a query. Additionally, when the auto create statistics database option is true (default), SQL Server automatically creates statistics for columns without indexes that are used in a predicate.
As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less-than-optimal decisions on how to process a query. For example, if you create a table with an indexed column and 1,000 rows of data, all with unique values in the indexed column, the query optimizer considers the indexed column a good way to collect the data for a query. If you update the data in the column so there are many duplicated values, the column is no longer an ideal candidate for use in a query; however, the query optimizer still considers it to be a good candidate based on the index’s outdated distribution statistics, which are based on the data before the update.
Note Out-of-date or missing statistics are indicated as warnings (table name in red text) when the execution plan of a query is graphically displayed using SQL Server Query Analyzer. For more information, see Graphically Displaying the Execution Plan Using SQL Server Query Analyzer. Additionally, monitoring the Missing Column Statistics event class using SQL Server Profiler indicates when statistics are missing. For more information, see Error and Warning Event Category.
Therefore, SQL Server automatically updates this statistical information periodically as the data in the tables changes. The sampling is random across data pages, and taken from the table or the smallest nonclustered index on the columns needed by the statistics. Once a data page has been read from disk, all the rows on the data page are used to update the statistical information. The frequency at which the statistical information is updated is determined by the volume of data in the column or index and the amount of changing data. For example, the statistics for a table containing 10,000 rows may need updating when 1,000 index values have changed because 1,000 values may represent a significant percentage of the table. However, for a table containing 10 million index entries, 1,000 changing index values is less significant, and so the statistics may not be automatically updated. SQL Server, however, always ensures that a minimum number of rows is sampled; tables that are smaller than 8 megabytes (MB) are always fully scanned to gather statistics.
The cost of this automatic statistics update is minimized by sampling the data rather than analyzing all of it. Under some circumstances, statistical sampling will not be able to accurately characterize the data in a table. You can control the amount of data that is sampled during manual statistics updates on a table-by-table basis by using the SAMPLE and FULLSCAN clauses of the UPDATE STATISTICS statement. The FULLSCAN clause specifies that all of the data in the table is scanned to gather statistics, whereas the SAMPLE clause can be used to specify either the percentage of rows to sample or the number of rows to sample. It is also possible to tell SQL Server not to maintain statistics for a given column or index in the following ways:
If you tell SQL Server not to maintain statistics automatically, you will need to manually update the statistical information.
Statistics can also be created on all eligible columns in all user tables in the current database in a single statement by using the sp_createstats system stored procedure. Columns not eligible for statistics include computed columns, or columns of image, text, and ntext data types.
The statistics generated for a column can be deleted if you no longer want to retain and maintain them. Statistics created on columns by SQL Server (when auto create statistics database option is true) are automatically deleted when no longer used.
Creating statistics manually allows you to create statistics that contain multiple column densities (average number of duplicates for the combination of columns). For example, a query contains the clause:
WHERE a = 7 and b = 9
Creating manual statistics on both columns together (a, b) can allow SQL Server to make a better estimate for the query because the statistics also contain the average number of distinct values for the combination of columns a and b.
To create statistics on a column
To create statistics on all eligible columns on all user tables
To manually update statistics
To view the statistics for a table
To delete the statistics for a column
CREATE INDEX | sp_autostats |
Distribution Statistics | sp_dboption |