Processing a single SQL statement is the simplest case of how Microsoft® SQL Server™ executes SQL statements. The steps used to process a single SELECT statement that only references local base tables (no views or remote tables) illustrates the basic process.
A SELECT statement does not state the exact steps the database server should use to retrieve the requested data. This means the database server must analyze the statement to determine the most efficient way to extract the data. This is called optimizing the SELECT statement, and the component that does this is called the query optimizer.
A SELECT statement only defines:
A query execution plan is a definition of:
There are typically many different sequences in which the database server can access the base tables to build the result set. For example, if the SELECT statement references three tables, the database server could first access TableA, use the data from TableA to extract matching rows from TableB, then use the data from TableB to extract data from TableC. Or the database server could access the tables in the reverse sequence, or in the sequence TableB, TableA, or TableC, or in the sequence TableB, TableC, TableA, or the sequence TableC, TableA, TableB.
There are also typically different methods for accessing the data in each table. If only a few rows with specific key values are needed, the database server can use an index. If all the rows in the table are needed, the database server can ignore the indexes and do a table scan. If all the rows in a table are needed, but there is an index whose key columns are in an ORDER BY, doing an index scan instead of a table scan may save a separate sort of the result set. If a table is very small, table scans may be the most efficient method for almost all access to the table.
The process of choosing one execution plan out of several possible plans is called optimization. The query optimizer is one of the most important components of an SQL database system. While some overhead is used by the optimizer to analyze the query and choose a plan, this overhead is saved back several-fold when the optimizer picks an efficient execution plan. For example, two construction companies can be given identical blueprints for a house. If one company spends a few days at the start to plan how they will build the house, and the other company just starts building without planning, the company that takes the time to plan their project will most likely finish first.
The SQL Server query optimizer is a cost-based optimizer. Each possible execution plan has an associated cost in terms of the amount of computing resources used. The optimizer must analyze the possible plans and choose the one with the lowest estimated cost. Some complex SELECT statements have thousands of possible execution plans. In these cases, the optimizer does not analyze all possible combinations. It instead uses sophisticated algorithms to quickly find an execution plan that has a cost reasonably close to the theoretical minimum.
The SQL Server query optimizer does not strictly choose the execution plan with the lowest resource cost; it chooses the plan that most quickly returns results to the user with a reasonable cost in resources. For example, processing a query in parallel typically uses more resources than processing it serially, but completes the query faster. The SQL Server optimizer will use a parallel execution plan to return results if the load on the server will not be adversely affected.
The optimizer relies heavily on distribution statistics when estimating the resource costs of different methods of extracting information from a table or index. Distribution statistics are kept for columns and indexes. They indicate the selectivity of the values in a particular index or column. For example, in a table representing cars, many cars have the same manufacturer, but each car has a unique vehicle identification number. An index on the vehicle identification number is more selective than an index on manufacturer. If the index statistics are not current, the optimizer may not make the best choice for the current state of the table. For more information about keeping index statistics current, see Statistical Information.
The optimizer is important because it lets the database server adjust dynamically to changing conditions in the database without needing input from a programmer or database administrator. This frees programmers to focus on describing the final result of the query. They can trust the optimizer to always build an efficient execution plan for the state of the database each time the statement is run.
The basic steps SQL Server uses to process a single SELECT statement are:
The basic steps described for processing a SELECT statement also apply to other SQL statements such as UPDATE, DELETE, and INSERT. UPDATE and DELETE statements both have to target the set of rows to be modified or deleted, the process of identifying these rows is the same as that used to identify the source rows that contribute to the result set of a SELECT statement. The UPDATE and INSERT statements may both contain embedded SELECT statements that provide the data values to be updated or inserted.
Even DDL statements such as CREATE PROCEDURE or ALTER TABLE are ultimately resolved to a series of relational operations on the system catalog tables and sometimes (such as ALTER TABLE ADD COLUMN) against the data tables.