Gary Schroeder
SQL Server Escalation Engineer
Microsoft Corporate Support
Microsoft® SQL Server includes a very intelligent cost-based query optimizer which, given an ad-hoc query, can quickly determine the best access method for retrieving the data, including the order in which to join tables and whether or not to use indexes that may be on those tables. By using a cost-based query optimizer, the System Administrator or end user is released from having to determine the most efficient way of structuring the query to get optimal performance—instead, the optimizer looks at all possible join orders, and the cost of using each index, and picks the plan with the least cost in terms of page I/Os.
Detailed information on the final access method that the optimizer chooses can be displayed for the user by executing the Transact-SQL™ "SET SHOWPLAN ON" command. This command will show each step that the optimizer uses in joining tables and which, if any, indexes it chooses to be the least-cost method of accessing the data. This can be extremely beneficial when analyzing certain queries to determine if the indexes that have been defined on a table are actually being considered by the optimizer as useful in getting to the data. This document will define and explain each of the output messages from SHOWPLAN, and give example queries and the output from SHOWPLAN to illustrate the point. The format will be consistent throughout: a heading that corresponds to the exact text of a SHOWPLAN statement, followed by a description of what it means, a sample query that generates that particular message, and the full output from executing the query with the SHOWPLAN option on. Wherever possible, the queries will use the existing tables and indexes, unaltered, from the SQL Server "Pubs" sample database.
This statement will be included in the SHOWPLAN output for every query, where n is an integer, beginning with "STEP 1". For some queries, SQL Server cannot effectively retrieve the results in a single step, and must break the query plan into several steps. For example, if a query includes a GROUP BY clause, the query will need to be broken into at least two steps: one step to select the qualifying rows from the table, and another step to group them. The following query demonstrates a single-step query.
Query: | SELECT au_lname, au_fname |
FROM Authors | |
WHERE city = "Oakland" | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan |
This SHOWPLAN statement indicates that SQL Server needs to insert some of the query results into an intermediate worktable, and later in the query processing will then select the values out of that table. This is most often seen with a query that involves a GROUP BY clause, as the results are first put into a work table, and then the qualifying rows in the work table are grouped based on the given column in the GROUP BY clause. The following query returns a list of all cities and indicates the number of authors that live in each city. The query plan is composed of two steps: the first step selects the rows into a worktable, and the second step retrieves the grouped rows from the worktable:
Query: | SELECT city, total_authors = count(*) |
FROM Authors | |
GROUP BY city | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT (into a worktable) | |
GROUP BY | |
Vector Aggregate | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
FROM TABLE | |
Worktable | |
Nested iteration | |
Table Scan |
This statement describes the type of query for each step. For most user queries, the value for <query type> will be SELECT, INSERT, UPDATE, or DELETE. If SHOWPLAN is turned on while other commands are issued, the <query type> will reflect the command that was issued. The following examples show various outputs for different queries/commands:
Query 1: | CREATE TABLE Mytab (col1 int) |
SHOWPLAN 1: | STEP 1 |
The type of query is TABCREATE | |
Query 2: | INSERT Publishers |
VALUES ("9904", "NewPubs", "Seattle", "WA") | |
SHOWPLAN 2: | STEP 1 |
The type of query is INSERT | |
The update mode is direct | |
Table Scan | |
TO TABLE | |
publishers |
There are two methods or "modes" that SQL Server can use to perform update operations such as INSERT, DELETE, UPDATE, and SELECT INTO. These methods are called deferred update and direct update. When the deferred method is used, the changes are applied to all rows of the table by making log records in the transaction log to reflect the old and new value of the column(s) being modified (in the case of UPDATE operations), or the values that will be inserted or deleted (in the case of INSERT and DELETE, respectively). When all of the log records have been constructed, the changes are then applied to the data pages. This method generates more log records than a direct update (discussed later), but it has the advantage of allowing the execution of commands that may cascade changes throughout a table. For example, consider a table that has a column "col1" with a unique index on it, and data values numbered consecutively from 1 to 100 in that column. Assume an UPDATE statement is executed to increase the value in each row by 1:
Query 1: | UPDATE Mytable |
SET col1 = col1 + 1 | |
SHOWPLAN 1: | STEP 1 |
The type of query is UPDATE | |
The update mode is deferred | |
FROM TABLE | |
Mytable | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Mytable |
Consider the consequences of starting at the first row in the table, and updating each row, through the end of the table. Updating the first row (which has an initial value of 1) to 2 would cause an error, as the unique index would be violated since there is already a value of 2 in the table; likewise, updating the second row (which has an initial value of 2) to 3 would also cause a unique key violation, as would all rows through the end of the table, except for the last row. By using deferred updates, this problem is easily avoided. The log records are first constructed to show what the new values for each row will be, the existing rows are deleted, and the new values inserted.
Just as with UPDATE commands, INSERT commands may also be deferred for very similar reasons. Consider the following query (there is no clustered index or unique index on the "roysched" table):
Query 2: | INSERT roysched SELECT * FROM roysched |
SHOWPLAN 2: | STEP 1 |
The type of query is INSERT | |
The update mode is deferred | |
FROM TABLE | |
roysched | |
Nested iteration | |
Table Scan | |
TO TABLE | |
roysched |
Since there is no clustered index on the table, the new rows will be added to the end of the table. The query processor needs to be able to differentiate between the existing rows that are currently in the table (prior to the INSERT command) and the rows that will be inserted, so as to not get into a continuous loop of selecting a row, inserting it at the end of the table, selecting that row that it just inserted, and re-inserting it again. By using the deferred method of inserting, the log records can first be constructed to show all of the currently existing values in the table, then SQL Server will re-read those log records to insert them into the table.
Whenever possible, SQL Server will attempt to use the direct method of applying updates to tables, since it is faster and requires fewer log records to be generated than the deferred method. Depending on the type of command, one or more criteria must be met in order for SQL Server to perform the update using the direct method. Those criteria are:
Query 1: | DELETE |
FROM authors | |
WHERE au_id = "172-32-1176" | |
SHOWPLAN 1: | STEP 1 |
The type of query is DELETE | |
The update mode is direct | |
FROM TABLE | |
authors | |
Nested iteration | |
Using Clustered Index | |
TO TABLE | |
authors | |
Query 2: | UPDATE titles |
SET type = "popular_comp" | |
WHERE title_id = "BU2075" | |
(continued on next page) | |
SHOWPLAN 2: | STEP 1 |
The type of query is UPDATE | |
The update mode is direct | |
FROM TABLE | |
titles | |
Nested iteration | |
Using Clustered Index | |
TO TABLE | |
titles | |
Query 3: | UPDATE titles |
SET price = $5.99 | |
WHERE title_id = "BU2075" | |
SHOWPLAN 3: | STEP 1 |
The type of query is UPDATE | |
The update mode is deferred | |
FROM TABLE | |
titles | |
Nested iteration | |
Using Clustered Index | |
TO TABLE | |
titles |
Note that the only difference between the second and third example queries is the column of the table that is being updated. In the second query, the direct update method is used, whereas in the third query, the deferred method is used. This difference is due to the datatype of the column being updated: the titles.type column is defined as "char(12) NOT NULL", while the titles.price column is defined as "money NULL". Since the titles.price column is not a fixed-length datatype, the direct method cannot be used.
This statement appears in the SHOWPLAN output for any query that contains a GROUP BY clause. Queries that contain a GROUP BY clause will always be at least two-step queries: one step to select the qualifying rows into a worktable and group them, and another step to return the rows from the worktable. The following example illustrates this:
Query: | SELECT type, AVG(advance), SUM(ytd_sales) |
FROM titles | |
GROUP BY type | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT (into a worktable) | |
GROUP BY | |
Vector Aggregate | |
FROM TABLE | |
titles | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
FROM TABLE | |
Worktable | |
Nested iteration | |
Table Scan |
Transact-SQL includes the aggregate functions AVG(), COUNT(), COUNT(*), MAX(), MIN(), and SUM(). Whenever an aggregate function is used in a SELECT statement that does not include a GROUP BY clause, it produces a single value, regardless of whether it is operating on all of the rows in a table or on a subset of the rows defined by a WHERE clause. When an aggregate function produces a single value, the function is called a "scalar aggregate", and is listed as such by SHOWPLAN. The following example shows the use of scalar aggregate functions:
Query: | SELECT AVG(advance), SUM(ytd_sales) |
FROM titles | |
WHERE type = "business" | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT | |
Scalar Aggregate | |
FROM TABLE | |
titles | |
Nested iteration | |
Table Scan | |
STEP 2 | |
The type of query is SELECT | |
Table Scan |
Notice that SHOWPLAN considers this a two-step query, which is very similar to the SHOWPLAN from the GROUP BY query listed earlier. Since the query contains a scalar aggregate, which will return a single value, SQL Server keeps internally a "variable" to store the result of the aggregate function. It can be thought of as a temporary storage space to keep a running total of the aggregate function as the qualifying rows from the table are evaluated. After all rows have been evaluated from the table (Step 1), the final value from the "variable" is then selected (Step 2) to return the scalar aggregate result.
When a GROUP BY clause is used in a query that also includes an aggregate function, the aggregate function produces a value for each group. These values are called "vector aggregates". The "Vector Aggregate" statement from SHOWPLAN indicates that the query includes a vector aggregate. Below is an example query and SHOWPLAN that includes a vector aggregate:
Query: | SELECT title_id, AVG(qty) |
FROM sales | |
GROUP BY title_id | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT (into a worktable) | |
GROUP BY | |
Vector Aggregate | |
FROM TABLE | |
sales | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
FROM TABLE | |
Worktable | |
Nested iteration | |
Table Scan |
This SHOWPLAN step indicates the table that the query is reading from. In most queries, the "FROM TABLE" will be followed on the next line by the name of the table that is being selected from. In other cases, it may indicate that it is selecting from a worktable (discussed later). The main importance of examining the table names after the "FROM TABLE" output is to determine the order in which the query optimizer is joining the tables. The order of the tables listed after the "FROM TABLE" statements in the SHOWPLAN output indicate the same order that the tables were joined; this order may be (and often times is) different than the order that they are listed in the FROM clause of the query, or the order that they appear in the WHERE clause of the query. This is because the query optimizer examines all different join orders for the tables involved, and picks the join order that will require the least amount of I/Os.
Query: | SELECT authors.au_id, au_fname, au_lname |
FROM authors, titleauthor, titles | |
WHERE authors.au_id = titleauthor.au_id | |
AND titleauthor.title_id = titles.title_id | |
AND titles.type = "psychology" | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
titles | |
Nested iteration | |
Table Scan | |
FROM TABLE | |
titleauthor | |
Nested iteration | |
Table Scan | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan |
This query illustrates the order in which the SQL Server query optimizer chooses to join the tables, which is not the order that they were listed in the FROM clause or the WHERE clause. By examining the order of the "FROM TABLE" statements, it can be seen that the qualifying rows from the titles table are first located (using the search clause <titles.type = "psychology">). Those rows are then joined with the titleauthor table (using the join clause <titleauthor.title_id = titles.title_id>), and finally the titleauthor table is joined with the authors table to retrieve the desired columns (using the join clause <authors.au_id = titleauthor.au_id>).
When a command is issued that makes or attempts to make a modification to one or more rows of a table, such as INSERT, DELETE, UPDATE, or SELECT INTO, the "TO TABLE" statement will show the target table that is being modified. For some operations that require an intermediate step that inserts rows into a worktable (discussed later), the "TO TABLE" will indicate that the results are going to the "Worktable" table, rather than a user table. The following examples illustrate the use of the "TO TABLE" statement:
Query 1: | INSERT sales |
VALUES ("8042", "QA973", "7/15/92", 7, "Net 30", "PC1035") | |
SHOWPLAN 1: | STEP 1 |
The type of query is INSERT | |
The update mode is direct | |
Table Scan | |
TO TABLE | |
sales | |
Query 2: | UPDATE publishers |
SET city = "Los Angeles" | |
WHERE pub_id = "1389" | |
SHOWPLAN 2: | STEP 1 |
The type of query is UPDATE | |
The update mode is deferred | |
FROM TABLE | |
publishers | |
Nested iteration | |
Using Clustered Index | |
TO TABLE | |
publishers |
Notice that the SHOWPLAN for the second query indicates that the publishers table is used both as the "FROM TABLE" as well as the "TO TABLE". In the case of UPDATE operations, the optimizer needs to read the table that contains the row(s) to be updated, resulting in the "FROM TABLE" statement, and then needs to modify the row(s), resulting in the "TO TABLE" statement.
For some types of queries, such as those that require the results to be ordered or displayed in groups, the SQL Server query optimizer may determine that it is necessary to create its own temporary worktable. The worktable is used to hold the intermediate results of the query, at which time the result rows can be ordered or grouped, and then the final results selected from that worktable. When all results have been returned, the worktable is automatically dropped. The worktables are always created in the Tempdb database, so it is possible that the system administrator may have to increase the size of Tempdb to accommodate the queries that require very large worktables. Since the query optimizer creates these worktables for its own internal use, the names of the worktables will not be listed in the tempdb..sysobjects table.
Worktables will always need to be used when a query contains a GROUP BY clause. For queries involving ORDER BY, it is possible that the ordering can be done without the use of the worktable. If there is a clustered index on the column(s) in the ORDER BY clause, the optimizer knows that the rows are already stored in sorted order, so a sort in a worktable is not necessary (although there are exceptions to this, depending on the sort order that is installed on the server). Since the data is not stored in sorted order for nonclustered indexes, the worktable will not be necessary if the cheapest access plan is by using the nonclustered index. However, if the optimizer determines that scanning the entire table will require fewer I/Os than using the nonclustered index, then a worktable will need to be created for the ordering of the results. The following examples illustrate the use of worktables:
Query 1: | SELECT type, AVG(advance), SUM(ytd_sales) |
FROM titles | |
GROUP BY type | |
SHOWPLAN 1: | STEP 1 |
The type of query is SELECT (into a worktable) | |
GROUP BY | |
Vector Aggregate | |
FROM TABLE | |
titles | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
FROM TABLE | |
Worktable | |
Nested iteration | |
Table Scan | |
Query 2: | SELECT * |
FROM authors | |
ORDER BY au_lname, au_fname | |
SHOWPLAN 2: | STEP 1 |
The type of query is INSERT | |
The update mode is direct | |
Worktable created for ORDER BY | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
This step involves sorting | |
FROM TABLE | |
Worktable | |
Using GETSORTED | |
Table Scan | |
Query 3: | SELECT * |
FROM authors | |
ORDER BY au_id | |
SHOWPLAN 3: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan |
In the third example above, notice that no worktable was created for the ORDER BY clause. This is because there is a unique clustered index on the authors.au_id column, so the data is already stored in sorted order based on the au_id value, and an additional sort for the ORDER BY is not necessary. In the second example, there is a composite nonclustered index on the columns au_lname and au_fname. However, since the optimizer chose not to use the index, and due to the sort order on the SQL Server, a worktable needed to be created to accommodate the sort.
SQL Server's SELECT INTO operation performs two functions: it first creates a table with the exact same structure as the table being selected from, and then it insert all rows that meet the WHERE conditions (if a WHERE clause is used) of the table being selected from. The "Worktable created for SELECT_INTO" statement is slightly misleading, in that the "worktable" that it refers to is actually the new physical table that is created. Unlike other worktables, it is not dropped when the query finishes executing. In addition, the worktable is not created in Tempdb, unless the user specifies Tempdb as the target database for the new table.
Query: | SELECT * |
INTO seattle_stores | |
FROM stores | |
WHERE city = "seattle" | |
SHOWPLAN: | STEP 1 |
The type of query is TABCREATE | |
STEP 2 | |
The type of query is INSERT | |
The update mode is direct | |
Worktable created for SELECT_INTO | |
FROM TABLE | |
stores | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable |
When a query is issued that includes the DISTINCT keyword, all duplicate rows are excluded from the results so that only unique rows are returned. To accomplish this, SQL Server first creates a worktable to store all of the results of the query, including duplicates, just as though the DISTINCT keyword was not included. It then sorts the rows in the worktable, and is able to easily discard the duplicate rows. Finally, the rows from the worktable are returned, which insures that no duplicate rows will appear in the output.
Query: | SELECT DISTINCT city |
FROM authors | |
SHOWPLAN: | STEP 1 |
The type of query is INSERT | |
The update mode is direct | |
Worktable created for DISTINCT | |
FROM TABLE | |
authors | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
This step involves sorting | |
FROM TABLE | |
Worktable | |
Using GETSORTED | |
Table Scan |
As discussed previously, queries that include an ORDER BY clause will often require the use of a temporary worktable. When the optimizer cannot use an available index for the ordering, it creates a worktable for use in sorting the result rows prior to returning them. Below is an example that shows the worktable being created for the ORDER BY clause:
Query: | SELECT * |
FROM authors | |
ORDER BY city | |
SHOWPLAN: | STEP 1 |
The type of query is INSERT | |
The update mode is direct | |
Worktable created for ORDER BY | |
FROM TABLE | |
authors | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
This step involves sorting | |
FROM TABLE | |
Worktable | |
Using GETSORTED | |
Table Scan |
When joining tables, SQL Server may in some cases choose to use a "reformatting strategy" to join the tables and return the qualifying rows. This strategy is only considered as a last resort, when the tables are large and neither table in the join has a useful index to use. The reformatting strategy inserts the rows from the smaller of the two tables into a worktable. Then, a clustered index is created on the worktable, and the clustered index is then used in the join to retrieve the qualifying rows from each table. The main cost in using the reformatting strategy is the time and I/Os necessary to build the clustered index on the worktable; however, that cost is still cheaper than joining the tables with no index. If user queries are using the reformatting strategy, it is generally a good idea to examine the tables involved and create indexes on the columns of the tables that are being joined. The following example illustrates the reformatting strategy. Since none of the tables in the Pubs database are large enough for the optimizer to consider using this strategy, two new tables are used. Each table has 5 columns defined as "char(200)". Tab1 has 500 rows and Tab2 has 250 rows.
Query: | SELECT Tab1.col1 |
FROM Tab1, Tab2 | |
WHERE Tab1.col1 = Tab2.col1 | |
SHOWPLAN: | STEP 1 |
The type of query is INSERT | |
The update mode is direct | |
Worktable created for REFORMATTING | |
FROM TABLE | |
Tab2 | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
FROM TABLE | |
Tab1 | |
Nested iteration | |
Table Scan | |
FROM TABLE | |
Worktable | |
Nested iteration | |
Using Clustered Index |
This SHOWPLAN statement indicates that the query must sort the intermediate results before returning them to the user. Queries that specify DISTINCT will require an intermediate sort, as well as queries that have an ORDER BY clause that cannot use an available index. As stated earlier, the results are put into a worktable, and the worktable is then sorted. The following example demonstrates a query that requires a sort:
Query: | SELECT DISTINCT state |
FROM stores | |
SHOWPLAN: | STEP 1 |
The type of query is INSERT | |
The update mode is direct | |
Worktable created for DISTINCT | |
FROM TABLE | |
stores | |
FROM TABLE | |
stores | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
This step involves sorting | |
FROM TABLE | |
Worktable | |
Using GETSORTED | |
Table Scan |
This statement indicates one of the ways in which the result rows can be returned from a table. In the case of "Using GETSORTED", the rows will be returned in sorted order. However, not all queries that return rows in sorted order will have this step. In the case of a query that has an ORDER BY clause, and an index with the proper sort sequence exists on those columns being ordered, an intermediate sort may not be necessary, and the rows can simply be returned in order by using the available index. The "Using GETSORTED" method is used when SQL Server must first create a temporary worktable to sort the result rows, and then return them in the proper sorted order. The following example shows a query that requires a worktable to be created and the rows returned in sorted order:
Query: | SELECT au_id, au_lname, au_fname, city |
FROM authors | |
ORDER BY city | |
SHOWPLAN: | STEP 1 |
The type of query is INSERT | |
The update mode is direct | |
Worktable created for ORDER BY | |
FROM TABLE | |
authors | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan | |
TO TABLE | |
Worktable | |
STEP 2 | |
The type of query is SELECT | |
This step involves sorting | |
FROM TABLE | |
Worktable | |
Using GETSORTED | |
Table Scan |
The "Nested iteration" is the default technique used to join tables and/or return rows from a table. It simply indicates that the optimizer is using one or more sets of loops to go through a table and retrieve a row, qualify the row based on the search criteria given in the WHERE clause, return the row to the front-end, and loop again to get the next row. The method in which it gets the rows (such as using an available index) is discussed later. The following example shows the optimizer doing nested iterations through each of the tables in the join:
Query: | SELECT title_id, title |
FROM titles, publishers | |
WHERE titles.pub_id = publishers.pub_id | |
AND publishers.pub_id = '1389' | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
publishers | |
Nested iteration | |
Using Clustered Index | |
FROM TABLE | |
titles | |
Nested iteration | |
Table Scan |
This SHOWPLAN step is very similar to the previous one of "Nested iteration". The difference, however, is that this step indicates a nested iteration on a table that is part of an existence test in a query. There are several ways an existence test can be written in Transact-SQL, such as "EXISTS", "IN", or "=ANY". Prior to SQL Server version 4.2, queries that contained an IN clause followed by a subquery were treated as table joins. Beginning with version 4.2, these queries are now treated the same as if they were written with an EXISTS clause. The following examples demonstrate the SHOWPLAN output with queries that test for existence of values:
Query 1: | SELECT au_lname, au_fname |
FROM authors | |
WHERE EXISTS | |
(SELECT * | |
FROM publishers | |
WHERE authors.city = publishers.city) | |
SHOWPLAN 1: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan | |
FROM TABLE | |
publishers | |
EXISTS TABLE : nested iteration | |
Table Scan | |
Query 2: | SELECT title |
FROM titles | |
WHERE pub_id IN | |
(SELECT pub_id | |
FROM publishers | |
WHERE city LIKE "B%") | |
SHOWPLAN 2: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
titles | |
Nested iteration | |
Table Scan | |
FROM TABLE | |
publishers | |
EXISTS TABLE : nested iteration | |
Table Scan |
This SHOWPLAN statement indicates which method was used to retrieve the physical result rows from the given table. When the "table scan" method is used, the execution begins with the first row in the table; each row is then retrieved and compared with the conditions in the WHERE clause, and returned to the front-end if it meets the given criteria. Regardless of how many rows qualify, every row in the table must be looked at, so for very large tables, a table scan can be very costly in terms of page I/Os. If a table has one or more indexes on it, the query optimizer may still choose to do a table scan instead of using one of the available indexes if the optimizer determines that the indexes are too costly or are not useful for the given query. The following query shows a typical table scan:
Query: | SELECT au_lname, au_fname |
FROM authors | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
authors | |
Nested iteration | |
Table Scan |
This SHOWPLAN statement indicates that the query optimizer chose to use the clustered index on a table to retrieve the rows. Unlike a table scan, using an index to retrieve rows does not require the optimizer to examine every row in the table (unless the WHERE clause applies to all rows). For queries that return a small percentage of the rows from a large table, the savings in terms of I/Os of using an index versus doing a table scan can be very significant. The following query shows the clustered index being used to retrieve the rows from the table:
Query: | SELECT title_id, title |
FROM titles | |
WHERE title_id LIKE "PS2%" | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
titles | |
Nested iteration | |
Using Clustered Index |
Like the previous statement with the clustered index, this statement indicates that the optimizer chose to use an index to retrieve the rows instead of doing a table scan. The <index name> that follows the "Index :" label will always be the name of a nonclustered index on the table. Remember that each table can have no more than one clustered index, but can have up to 249 nonclustered indexes. The following query illustrates the use of a nonclustered index to find and return rows. This query uses the sysobjects table in the master database as an example, rather than a table in Pubs, since using a nonclustered index on the Pubs tables is generally more costly in terms of I/O than a straight table scan, due to the fact that most of the tables are only 1 page in size.
Query: | SELECT * |
FROM master..sysobjects | |
WHERE name = "mytable" | |
AND uid = 5 | |
SHOWPLAN: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
master..sysobjects | |
Nested iteration | |
Index : ncsysobjects |
This SHOWPLAN statement indicates that the query optimizer has chosen to build its own index during the execution of the query, for use in its "OR strategy". Since queries involving OR clauses are generally not very efficient in terms of being able to quickly access the data, the SQL Server optimizer may choose to use the OR strategy. When the OR strategy is used, the optimizer makes several passes through the table—one pass for each argument to each OR clause. The results of each pass are added to a single worktable, and the worktable is then sorted to remove any duplicate rows. The worktable does not contain the actual data rows from the table, but rather it contains the row IDs for the matching rows. The row IDs are simply a combination of the page number and row number on that page for each of the rows.
When the duplicates have been eliminated, the optimizer considers the worktable of row IDs to be, essentially, its own index ("Dynamic Index") pointing to the table's data rows. It can then simply scan through the worktable, get each row ID, and return the data row from the table that has that row ID.
The OR strategy is not limited only to queries that contain OR clauses. When an IN clause is used to list a group of possible values, SQL Server interprets that the same way as though the query had a separate equality clause for each of the values in the IN clause. To illustrate the OR strategy and the use of the Dynamic Index, the queries will be based on a table with 10,000 unique data rows, a unique nonclustered index on column "col1", and a unique nonclustered index on column "col2".
Query 1: | SELECT * |
FROM Mytable | |
WHERE col1 = 355 | |
OR col2 = 732 | |
SHOWPLAN 1: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
Mytable | |
Nested iteration | |
Index : col1_idx | |
FROM TABLE | |
Mytable | |
Nested iteration | |
Index : col2_idx | |
FROM TABLE | |
Mytable | |
Nested iteration | |
Using Dynamic Index | |
Query 2: | SELECT * |
FROM Mytable | |
WHERE col1 IN (700, 1503, 311) | |
SHOWPLAN 2: | STEP 1 |
The type of query is SELECT | |
FROM TABLE | |
Mytable | |
Nested iteration | |
Index : col1_idx | |
FROM TABLE | |
Mytable | |
Nested iteration | |
Index : col1_idx | |
FROM TABLE | |
Mytable | |
Nested iteration | |
Index : col1_idx | |
FROM TABLE | |
Mytable | |
Nested iteration | |
Using Dynamic Index |
SQL Server does not always resort to using the OR strategy for every query that contains OR clauses. The following conditions must be met before it will choose to use the OR strategy: