Views and FASTFIRSTROW

Kalen Delaney

Can you tell me if views are optimized like stored procedures are, in terms of being precompiled?

Views are pre-parsed and normalized, and an internal form of the code called a "query tree" is saved in sysprocedures, but they're not precompiled. For views, SQL Server doesn't store any optimized execution plans in procedure cache for reuse by other processes. In fact, SQL Server regenerates execution plans for views every time the view is accessed -- and that's logical, because as "virtual" tables, only their definitions are stored.

The query trees for views are brought into cache when a view is accessed, and you can see this by running DBCC MEMUSAGE. The third section of output shows you the 20 largest objects in procedure cache, so if you already have lots of large procedures taking up room, you might not see your views. They typically take up very little space. If you've just stopped and restarted your SQL Server -- so that cache is empty -- you'll have a much better chance of seeing the DBCC MEMUSAGE report for any views you've accessed. Here's what the DBCC MEMUSAGE output for a view called "Cookbooks" might look like:

Procedure Name: Cookbooks

Database Id: 4

Object Id: 1968010042

Version: 0

Uid: 1

Type: view

Number of trees: 1

Size of trees: 0.002647 Mb, 2776.000000 bytes, 2 pages

Number of plans: 0

Size of plans: 0.000000 Mb, 0.000000 bytes, 0 pages

Notice that although the output says "Procedure Name", a bit further down the output reports that this object is indeed a view. We can see that we have a tree for this view, but no plans. You should never see any plans for a view in cache, as they're never saved. As you can probably surmise, the reasons for using views typically don't include any performance benefits, but rather simplification and/or security. For example, if you have a SELECT statement that you use frequently, or that's complex, storing that statement as a view means that you'll do less coding every time you want to use that statement. It's not hard to imagine how views can be used to control data access -- if there are rows or columns in a table that shouldn't be visible to all users, you can create a view that selects only those rows or columns that the users are allowed to see. You can then give the users permission to access the view, but not give them permission to select from the underlying table. If the owner of the view and the table are the same, the user can access the view data without encountering any permission violations. The user won't be able to access the table directly.

For example, suppose you only wanted users to see cookbooks from the titles table in the pubs database. In fact, you don't even want them to know what other types of books are available. You could create a view using the following statement:

create view Cookbooks

as

select * from titles

where type like '%cook%'

Once you give SELECT permission on this view to your users, they can treat this view as if it were a table. They can access subsets of the rows or subsets of the columns, or they can join the view with another table or view. Every time they access the view, SQL Server will expand it to the underlying SELECT statement. Suppose the user issues the following SELECT:

select title_id, price

from Cookbooks

SQL Server will add the extra WHERE clause from the view definition, so it's really executing the following:

select title_id, price from titles where type like '%cook%'

Or, if you execute a query that groups by the type column, you'd expect to only see the types "mod_cook" and "trad_cook". The first SELECT:

select type, avg(price)

from Cookbooks

group by type

will be processed as:

select type, avg(price) from titles where type like '%cook%' group by type

The output will only show two rows -- for the two types of cookbooks. There is, however, one problem with assuming that views will provide complete security. There's an option to the SQL GROUP BY clause that allows you to see all the different values in the grouped column, even if the WHERE clause would have eliminated all the rows containing those values. This is fine in a normal SELECT from a table:

select type, avg(price)

from titles

where type like '%cook%'

group by ALL type

The preceding select would return something like this:

type                                    

------------ -------------------------- 

business     (null)                     

mod_cook     15.29                      

popular_comp (null)                     

psychology   (null)                     

trad_cook    21.25                      

UNDECIDED    (null)                     

(6 row(s) affected)

Note that we get a row back for every different type, even though the average price reported for all non-cookbooks is NULL. Those rows haven't been included in any aggregate computation, but the value for type does show up. In some cases, this can be useful; we might want to see all the possible types, but only get the average for rows that meet a certain condition in the WHERE clause. However, if we use this same GROUP BY with our view, we'll also get the previously shown output, including the rows for non-cookbooks!

select type, avg(price) from Cookbooks group by ALL type

So, in this case, even though we have the restriction in the view to disallow users from ever seeing type names other than for cookbooks, by adding the ALL keyword, we've basically found a "back door" into the data.

Why doesn't the SQL Server optimizer decide to use my index when I execute a SELECT with ORDER BY? I have an index created on the column I'm sorting by, but the optimizer doesn't use it. Using the optimizer hint to force the index seems to work, but I don't think that's the best solution. Do you have any other ideas?

Is this a clustered or nonclustered index? How do you know that the index isn't being used? The only way to know for sure is to look at the output from SET SHOWPLAN ON. For a clustered index, the data will automatically be in the right order, and no sorting at all will have to be done. Let's compare the SHOWPLAN output for two different ORDER BY queries. In the first one, the titles table in the pubs database has a clustered index on title-id, and we'll sort by that column. In the second query, we'll sort by price -- which has no index at all. Here's the input:

SET SHOWPLAN ON

SET NOEXEC ON

go

select * from titles

order by title_id

go

select * from titles

order by price

go

Here's the output for the first query:

STEP 1

The type of query is SELECT

FROM TABLE

titles 

Nested iteration

Table Scan

All SQL Server is doing in this case is scanning the table. It doesn't tell us that it's using the index, because technically it isn't. It knows the clustered index sorts the data itself according to the clustered index column, so that by doing a simple table scan, the data will be returned in the desired order. We can compare this to the output for the second query:

STEP 1

The type of query is INSERT

The update mode is direct

Worktable created for ORDER BY

FROM TABLE

titles 

Nested iteration

Table Scan

TO TABLE

Worktable 1

STEP 2

The type of query is SELECT

This step involves sorting

FROM TABLE

Worktable 1

Using GETSORTED Table Scan

Notice that when there's no clustered index to pre-sort the data, SQL Server must do a lot of work. It's actually building a temporary table to hold the result set, and then sorting that temporary table. The fact that SQL Server didn't have to do all this work in the first query, even though we specified ORDER BY, means that SQL Server is taking advantage of the clustered index.

A nonclustered index is organized differently than a clustered index. The leaf level of the (nonclustered) index does have all the key values stored in sorted order, but it only has the key values, and not the rest of the data in each row. If SQL Server were to use the ordering in the nonclustered index, it would have to scan the leaf level of the index. For every single value, it would then have to follow a pointer to the referenced page in the table (and possibly retrieve that page from disk) in order to return the complete contents of the row. (Our query did ask for all the columns by specifying SELECT *.)

SQL Server uses a very fast and efficient sorting algorithm. It turns out that the total time needed to build the temporary table and completely sort it is actually less than the time needed to follow all the pointers from the leaf level of the nonclustered index. Sorting is fast; page accesses that potentially require lots of disk reads aren't.

However, if SQL Server is doing a complete sort of the data, no results will be returned until the entire sort is completed. If you have a client application that needs to display some data on a form, you might want to be able to return the first few rows quickly. There's an optimizer hint that can help with this. The optimizer hint for specifying a particular index isn't intended to be used for indicating a sorting method. Instead, there's an optimizer hint called FASTFIRSTROW that tells SQL Server that you want to follow the leaf level pointers so the first few rows can be returned immediately. In fact, the rows will all be returned as they're read from the data pages, and they'll be read in sorted order.

The total time to completely read all the rows in sorted order might be much longer than the alternative of sorting the data first, but if you want to start getting data back immediately, use the FASTFIRSTROW hint. The following example shows the use of the hint with the titles table, which has a nonclusterered index on the title column:

SET SHOWPLAN ON

SET NOEXEC ON

go

select * from titles (FASTFIRSTROW)

order by title

go

Here's the output:

STEP 1

The type of query is SELECT

FROM TABLE

titles 

Nested iteration

Index : titleind

If you were to use this hint on a very large table, you'd actually see the rows start coming back immediately. Without the FASTFIRSTROW hint, you'd have a long delay while waiting for the sorting operation to complete.

We have automated processes that insert a great deal of data into tables on our 6.5 SQL Server. Every month, these processes create a new table and start inserting the new data into this table. Examples of the table names are Logs_011998, Logs_021998, and Logs_041998. Our problem is that there are numerous stored procedures that we need to run every month, but first we must modify all of them and change the table names to the previous month's table. We'd like to automate the process.

This is actually a very frequently asked question on Microsoft's SQL Server newsgroups, and you might want to get a newsreader (Microsoft's Outlook or Outlook Express work fine) and connect to the news server msnews.microsoft.com. No password is required. Search for newsgroups containing SQL. Lots of very experienced, helpful people regularly answer questions there, but keep in mind that it's all volunteer, peer-to-peer support.

The short answer to your question is that table names can only be parameterized if you use the "dynamic execute" capability. Following the word exec, you have a set of parentheses ( ), the contents of which must resolve to a complete SQL command. You can use concatenation inside the parentheses, but any function calls must be done prior to the exec and the results stored in variables. Your code would then look something like this:

create proc DoSomethingToTable @tname varchar(30)
as
...
exec ('insert into ' + @tname + ' values ( ........)' )

The exec is like a preprocessor. It will resolve the variable names and build the actual SQL command prior to sending it off to be processed by SQL Server.

Kalen Delaney started working with SQL Server in 1987, when she worked for Sybase in Tech Support. Since then, she's done SQL Server training both for Sybase and for Microsoft. Kalen currently provides independent training and consulting from her Seattle-area consultancy. Kalen_Delaney@compuserve.com.