Single SQL Statement Processing

Processing a single SQL statement is the simplest case of how Microsoft® SQL Server™ executes SQL statements. The steps used to process a single SELECT statement that only references local base tables (no views or remote tables) illustrates the basic process.

Optimizing SELECT Statements

A SELECT statement does not state the exact steps the database server should use to retrieve the requested data. This means the database server must analyze the statement to determine the most efficient way to extract the data. This is called optimizing the SELECT statement, and the component that does this is called the query optimizer.

A SELECT statement only defines:

The format of the result set. This is specified mostly in the select list, although other clauses such as ORDER BY and GROUP BY also affect the final form of the result set.
The tables containing the source data. This is specified in the FROM clause.
How the tables are logically related for the purposes of the SELECT statement. This is defined in the join specifications.
What conditions the rows in the source tables must satisfy to qualify for the SELECT statement. These are specified in the WHERE and HAVING clauses.

A query execution plan is a definition of:

The sequence in which the source tables are accessed.
There are typically many different sequences in which the database server can access the base tables to build the result set. For example, if the SELECT statement references three tables, the database server could first access TableA, use the data from TableA to extract matching rows from TableB, then use the data from TableB to extract data from TableC. Or the database server could access the tables in the reverse sequence, or in the sequence TableB, TableA, or TableC, or in the sequence TableB, TableC, TableA, or the sequence TableC, TableA, TableB.
The methods used to extract data from each table.
There are also typically different methods for accessing the data in each table. If only a few rows with specific key values are needed, the database server can use an index. If all the rows in the table are needed, the database server can ignore the indexes and do a table scan. If all the rows in a table are needed, but there is an index whose key columns are in an ORDER BY, doing an index scan instead of a table scan may save a separate sort of the result set. If a table is very small, table scans may be the most efficient method for almost all access to the table.

The process of choosing one execution plan out of several possible plans is called optimization. The query optimizer is one of the most important components of an SQL database system. While some overhead is used by the optimizer to analyze the query and choose a plan, this overhead is saved back several-fold when the optimizer picks an efficient execution plan. For example, two construction companies can be given identical blueprints for a house. If one company spends a few days at the start to plan how they will build the house, and the other company just starts building without planning, the company that takes the time to plan their project will most likely finish first.

The SQL Server query optimizer is a cost-based optimizer. Each possible execution plan has an associated cost in terms of the amount of computing resources used. The optimizer must analyze the possible plans and choose the one with the lowest estimated cost. Some complex SELECT statements have thousands of possible execution plans. In these cases, the optimizer does not analyze all possible combinations. It instead uses sophisticated algorithms to quickly find an execution plan that has a cost reasonably close to the theoretical minimum.

The SQL Server query optimizer does not strictly choose the execution plan with the lowest resource cost; it chooses the plan that most quickly returns results to the user with a reasonable cost in resources. For example, processing a query in parallel typically uses more resources than processing it serially, but completes the query faster. The SQL Server optimizer will use a parallel execution plan to return results if the load on the server will not be adversely affected.

The optimizer relies heavily on distribution statistics when estimating the resource costs of different methods of extracting information from a table or index. Distribution statistics are kept for columns and indexes. They indicate the selectivity of the values in a particular index or column. For example, in a table representing cars, many cars have the same manufacturer, but each car has a unique vehicle identification number. An index on the vehicle identification number is more selective than an index on manufacturer. If the index statistics are not current, the optimizer may not make the best choice for the current state of the table. For more information about keeping index statistics current, see Statistical Information.

The optimizer is important because it lets the database server adjust dynamically to changing conditions in the database without needing input from a programmer or database administrator. This frees programmers to focus on describing the final result of the query. They can trust the optimizer to always build an efficient execution plan for the state of the database each time the statement is run.

Processing a SELECT Statement

The basic steps SQL Server uses to process a single SELECT statement are:

The parser scans the SELECT statement and breaks it down into logical units such as keywords, expressions, operators, and identifiers.
A query tree, sometimes called a sequence tree, is built describing the logical steps needed to transform the source data into the format needed by the result set.
The optimizer then analyzes all the ways the source tables can be accessed and selects the series of steps that returns the results fastest while consuming less resources. The query tree is updated to record this exact series of steps, and the final, optimized version of the query tree is called the execution plan.
The relational engine begins executing the execution plan. As steps that need data from the base tables are processed, the relational engine uses OLE DB to request that the storage engine pass up data from the rowsets requested from the relational engine.
The relational engine processes the data returned from the storage engine into the format defined for the result set, and returns the result set to the client.

Processing Other Statements

The basic steps described for processing a SELECT statement also apply to other SQL statements such as UPDATE, DELETE, and INSERT. UPDATE and DELETE statements both have to target the set of rows to be modified or deleted, the process of identifying these rows is the same as that used to identify the source rows that contribute to the result set of a SELECT statement. The UPDATE and INSERT statements may both contain embedded SELECT statements that provide the data values to be updated or inserted.

Even DDL statements such as CREATE PROCEDURE or ALTER TABLE are ultimately resolved to a series of relational operations on the system catalog tables and sometimes (such as ALTER TABLE ADD COLUMN) against the data tables.