Just-In-Time Record Set Creation

Bob Pfeiff and Al Noel

This article explains a very fast, scalable technique for creating and retrieving record sets -- even when dealing with a very large number of rows.

Part of Microsoft's Universal Data Access Strategy is the API known as ActiveX Data Objects (ADO). ADO has been designed to be a single data interface for both client/server and Web-based data-driven development. What might surprise you is that ADO, which requires less code than other techniques such as Remote Data Objects (RDO) or VB SQL, still offers good performance. Nevertheless, we all know that there are high data volume situations where any data access technology will be stretched to the limit. The "generic" approach that relies on default values rarely scales. In this article, we'll describe an easy-to-implement solution for tackling situations with large record sets -- a way to scale up the use of ADO.

Permit us to wax philosophic for a moment and note that scalability isn't just a modern problem. Arnold Pacey, author of The Maze of Ingenuity Ideas and Idealism in the Development of Technology, for example, comments on Renaissance-era scalability problems:

"Before Galileo left the Venetian Republic in 1610, he tackled another problem which had been confusing local artisans and machine-builders. Some of these people had made great claims for machines they had invented and had demonstrated their ideas by constructing models. But there were often disappointments, for it was found that full-sized machines made on the same pattern as the models were often not strong enough to perform their function, or failed in other ways.

"More experienced craftsmen, and men like Lorini, the architect of Palmanova, knew that there was a fundamental difference between models and fully scaled up versions of the same design but could not entirely explain the reason for it."

The architects' theories, concerned with the aesthetics and not the strengths of buildings, were still supposed to be founded on laws of nature and had sufficient authority to discourage serious thought about the effects of the size of a structure as opposed to just its shape.

The scale of some data retrieval requirements we're familiar with put the application concerned well beyond a model or prototype, and meeting the associated performance goals has been an interesting challenge. We've encountered situations where a SQL Server database is tasked to return a large number of rows -- for example, 100,000+ -- to a client application and to also allow the user to be able to "page" back and forth through these rows, browsing them. Now, before anyone yells, "Huh? Did you think to ask if this is a valid requirement?", yes, indeed, we did, and the answer received back from the customer has been, on more than one occasion, "Make it happen." (As we've said for years, nothing is impossible for the person who doesn't have to do it himself.) When you can't successfully push back, you have to forge ahead.

In such a situation, it's important to come to a reasonable understanding. Our refined version of the requirement is that the end user wishes to be able to see a small set of rows (limited to whatever will fit on the screen), and that small set can be changed very quickly. The point is that requiring that all the rows that might have to appear on the screen be retrieved at the same time is dictating a solution. As long as the user can see a small subset displayed, with very little delay when the data to be displayed on the screen changes, then it's not necessary to bring back all 100,000 rows. As long as the rows to be viewed get to the screen "just-in-time," then the requirement has been met. Because this "just-in-time" solution tasks the system to provide only what's needed for the user to observe at that very moment -- and no more -- it's very efficient. Only what's needed right now is retrieved, transported across the network, and displayed. Indeed, this approach has features at the server and the client that make this possible.

At the server, in the database, certain conditions must exist. For our example, we use the Jobs table of the familiar Pubs database. The Jobs table has a clustered index on the job_id column and the values in that column increase monotonically. This is an easy situation to exploit, but the existence of a clustered index isn't an essential condition. Others might choose to contrive a similar situation by using an identity column. The techniques will work equally well where there's a non-clustered index, as long as the high order column of the non-clustered index is unique. For example, if a table has order information in it, and the column order_id (of the integer data type) is the primary key but isn't clustered, the approach will work if the order_id is the only value retrieved in the query (the query is covered). The order_id values retrieved can then be used in the WHERE clause of a second query to retrieve other columns to display. Also, a non-clustered index of a composite nature -- such as (order_id, order_date) -- could be used if the stored procedures we use only look for order_id for a range of dates. We caution you that case sensitivity will impact this solution if the key in use is character data.

For the server component, our solution uses two stored procedures: sp_getnextpage and sp_getprevpage. In our example, we use SET ROWCOUNT and a WHERE clause to limit the number of rows retrieved (15, in our example) rather than try to build a result set that contains all rows. The result limits network traffic and places minimal strain on both client (which only has to cope with screen-size chunks of data) and server (the query optimizer takes advantage of SET ROWCOUNT) resources. The key to retrieving the previous pages is the use of a small temporary table of tempdb. If you want, you can go further with the concept and bring back only key values, and then run a second query to bring all the columns to the client.

In contrast, paging using the .PageSize, .PageCount, and .AbsolutePage properties in ADO relies on a keyset-driven cursor, so if you want to browse all rows in a big table, you get one of two things: a huge keyset table in the SQL Server tempdb (specifically, an API server cursor) that contains keys to every row in the table(s), or a huge structure on the client (IIS Server when using Active Server Pages) that also contains keys to every row in the table(s). As you can imagine, the results are less than desirable.

The code
Here's the code for the sp_getnextpage stored procedure. Note that the @id number parameter is what points you to the correct set of rows:

create proc sp_getnextpage
@id int = 0,
@rows int = 0
as
set nocount on
set rowcount @rows
SELECT
  j.job_id,
  j.job_desc,
  j.min_lvl,
  j.max_lvl
FROM 
  jobs j 
WHERE 
  j.job_id > @id
ORDER BY
  j.job_id
set rowcount 0
set nocount off
GO


The code for the stored procedure sp_getprevpage is available in the accompanying Download file. Note that it's necessary not only o pass the @id parameter from which you wish to retrieve records above, but also to specify a descending option to the ORDER BY clause that puts the rows into the temp table. From there, it's just a matter of now retrieving them out in the default, ascending order.

Here's the VB code that lets the user browse backward and forward in the result set using VCR buttons and a list box (see Figure 1). This subroutine gets pages based on a direction argument used in the call (next or previous). We save public variables for the maximum and minimum primary key values in the current recordset as references for navigating through the virtual recordset (the entire jobs table of the pubs database).



Private Sub GetPage(ByVal Direction As Integer)
Dim strSQL As String
Dim strResults As String
Dim adoCmd As adodb.Command
Dim adoRS As adodb.Recordset
Dim adoParam As adodb.Parameter
Dim adoField As adodb.Field
Dim intInput As Long
Set adoRS = New adodb.Recordset
'Open a recordset to get a rowcount for the jobs table.
'This is a faster way to get a rowcount than "Select 
'count(*) from jobs".
With adoRS
   .ActiveConnection = adoConn
   .Source = "select rows from sysindexes where id = _
      object_id('jobs') and indid < 2"
   .Open
End With
plngRows = adoRS(0)
adoRS.Close
Set adoRS = Nothing
'Get the number of pages based on the number of rows
'and the rows per page setting.
plngPages = CLng(plngRows / plngRowsPerPage)
'Correct the number of pages for the rounding 
'of the CInt function.
If (plngRows / plngRowsPerPage) > plngPages Then
    plngPages = plngPages + 1
End If
'If we're on the last page, don't try to move next. 
'If we're on the first page, don't move to previous.
If (plngPage = plngPages And Direction = MOVE_NEXT) _
   Or (plngPage = 1 And Direction = MOVE_PREV) Then
   Exit Sub
End If


Here's the event procedure to move to the very first page:

Private Sub cmdFirstPage_Click(Index As Integer)
'Move to the first "page" by setting the starting 
'point primary key value to 0 and the page # to 1.

plngMax = 0
plngPage = 1
GetPage MOVE_NEXT
'Set the page number text box value.
pstrPages = "Page " & plngPage & " of " & _
   Str(plngPages)
txtPageNo = pstrPages
End Sub


Here's the event procedure code for moving to the next page (the code for moving backwards is similar):

Private Sub cmdNextPage_Click(Index As Integer)
'Move to the next "page".
GetPage MOVE_NEXT
'If we're on the last page, then don't change the page
'number text box value.
If plngPage = plngPages Then
   Exit Sub
Else
   plngPage = plngPage + 1
   pstrPages = "Page " & plngPage & " of " & _
      Str(plngPages)
   txtPageNo = pstrPages
End If
End Sub


The key to going to the last page is to set the starting point primary key value to the appropriate value based on the rows per page setting, and then calculating the number of pages. You can view the source code in the accompanying Download file.

Download sample code for this article here.

Bob Pfeiff , an MCSD and SQL Server MVP, works for Spectrum Technology Group as a senior consultant and project manager in McLean, VA. Bob authored a chapter in SQL Server 6.5 Unleashed (third edition) and is writing a chapter for the forthcoming SQL Server 7.0 version. bobp@spectrumtech.com.

Al Noel, an MCSD, MCSE, and MCT, works for Spectrum as a consultant. Al has been in the computer business for 18 years, including a year at the AI lab at MIT as a visiting scientist. anoel@netcom.com.