ADO Unplugged

Rob Macdonald

Have you ever considered using an ADO RecordSet as a data structure in place of an array or collection? Let’s face it–ADO RecordSets provide built-in sorting, searching, and persistence capabilities. In this article, Rob shows you how the RecordSet stands up against more traditional data structures.

Yes, I realize that most people use ADO to interact with databases. But there’s nothing stopping you from creating RecordSets in memory just like regular data structures and using your ADO skills to manipulate them. Personally, I’ve found the ability to sort, filter, and search RecordSets so valuable that I wanted to see how RecordSets stacked up against arrays and collections for a range of standard data processing tasks. This article shows you the tests I ran and what I discovered. Even if you don’t like the idea of replacing more traditional data structures with RecordSets, the results should provide some interesting insights into how RecordSets behave with regular data–at least when you’re using a client-side cursor.

From the top

In case you’ve never done it before, I’ll quickly show you how to create a pseudo- or fabricated RecordSet–one that you build in code, rather than attain from a Provider. In order to run my performance tests, I defined a data structure with four columns: unique string (different for each record/row), non-unique string, unique long, and non-unique long. Not very imaginative, I know, but at least you’ll know what I’m using in each test. The following code shows you how to define a fabricated RecordSet and stuff it with data:

  Dim rsStandard As Recordset
  Dim vColumns As Variant
  Set rsStandard = New Recordset
  rsStandard.Fields.Append "lUnique", adInteger
  rsStandard.Fields.Append "sUnique", adVarChar, 20
  rsStandard.Fields.Append "lNonUnique", adInteger
  rsStandard.Fields.Append "sNonUnique", adVarChar, 20
  rsStandard.Open
  vColumns = Array("lUnique", "sUnique", _
      "lNonUnique", "sNonUnique")
  For l = 0 To DATA_SIZE _ 
      rsStandard.AddNew vColumns, _
         Array(usrBase(l).lUnique, usrBase(l).sUnique, _
         usrBase(l).lNonUnique, usrBase(l).sNonUnique)
  Next

As you can see, I’ve created a standard RecordSet object and then appended fields to it, specifying the data type and size where appropriate. You can only do this to closed RecordSets, but once the fields are defined, you can open the RecordSet and then use AddNew to add the records. Note that when you’re adding many records, it’s a lot faster to use AddNew with the field names and values as arguments (as I did previously) than to specify each field value as a separate statement DAO/RDO-style.

The contestants

Here are the contenders in the Data Structure Grand Prix:

1. Standard RecordSet: As created previously, with four columns.

2. Indexed RecordSet: This is a standard Recordset, but with client-side indexes defined to speed up searching.

3. Standard collection: I’ve defined a VB class called TestObj with four properties to hold the four data fields already described. The Standard Collection is a VB collection of instances of this class.

4. Keyed collection: This is a standard collection, but when objects are added to it, the unique string property value is used as a key in the collection. This will enhance search times when searching by the unique string.

5. Array of user-defined Types: A user-defined type called "TestData" holds the four field values. I then defined an array of this UDT.

6. Variant array: This is a 2-D variant-based data structure.

The tests

I put each data structure through six tests using 10,000 records. Each data structure has the four fields/columns mentioned previously. To get accurate timings, some of the tests have been run 100 times, in which case I show the total, not the average. In each case, I’ve run the complete test four times and presented the average of the second, third, and fourth tests.

All of my tests were performed on a Pentium II running at 400 MHz with Windows 2000 Beta 3. I’ve also run the tests on smaller machines and older (all right, actually released!) operating systems, and the results scale pretty much as expected. All figures are based on compiled VB6 code. The usual warnings about relying too heavily on benchmarks apply as ever–nothing beats "suck it and see" when it comes to performance tuning.

Test 1: Creating data

Before you use a data structure, you need to be able to populate it. Each of the following tests place the "same" data into the data structures.

Data set size	10,000
Iterations	1

Standard RecordSet	2.87s
Indexed RecordSet	3.32s
Standard collection	2.07s
Keyed collection	2.22s
UDT array	0.03s
Variant array	1.53s

The results speak for themselves. UDT arrays are blisteringly fast to create. The others are all very slow by comparison, and RecordSets come out the worst.

Test 2: Iterating through data

In each of the following tests, the code loops through each record/row, copying its four fields into local variables:

Data set size	10,000
Iterations	1

Standard RecordSet	1.06s
Indexed RecordSet	1.08s
Standard collection	0.06s
Keyed collection	0.06s
UDT array	0.02s
Variant array	0.05s

Again, it doesn’t look too good for the RecordSets.

Test 3: Finding one record based on one criterion

In each data structure, the sUnique field/column contains a unique string. The test creates a random string known to be one of those used in the data structure and searches the data structure for that string. The test is repeated 100 times to get a good figure.

Data set size	10,000
Iterations	100

Standard RecordSet	3.60s
Indexed RecordSet	0.03s
Standard collection	2.12s
Keyed collection	0.01s
UDT array	0.25s
Variant array	1.17s

This very common test puts a different slant on things. The Indexed RecordSet and the Keyed collection search at the speed of light, and even the UDT array is slow by comparison. As for the others, they look like they stumbled into some quicksand en route.

Of course, Keyed collections are designed specifically for this type of retrieval operation, and they appear to be a good bet at this stage. You might be wondering how the Indexed RecordSet performs so well. The time to create the Indexed RecordSet was slightly longer than the time for the Standard RecordSet (see Test 1). The extra time occurred because I asked the RecordSet to create a client-side index for each field. Please understand that this has nothing to do with database indexes–it’s entirely a feature of the ADO client cursor library. When you build a client-side index for an ADO 2.x RecordSet that uses a client cursor, you’ll benefit from greatly enhanced client-based searching and filtering. Here’s the code for creating the four client-side indexes used in my Indexed RecordSet:

rsIndexed.Fields("lUnique").Properties("OPTIMIZE") _
   = True
rsIndexed.Fields("sUnique").Properties("OPTIMIZE") _
   = True
rsIndexed.Fields("lNonUnique").Properties("OPTIMIZE") _
   = True
rsIndexed.Fields("sNonUnique").Properties("OPTIMIZE") _
   = True

Test 4. Finding n records based on one criterion

The lNonUnique field of each record/row is set to a random long between one and one-tenth of the data set size. There will therefore be about 10 records matching each possible value. This means that an iterative search can’t stop on the first hit–it must search right through the data set to find all of the matches.

Data set size	10,000
Iterations	100

Standard RecordSet	3.42s
Indexed RecordSet	0.14s
Standard collection	3.12s
Keyed collection	3.18s
UDT array	0.03s
Variant array	0.81s

Again, the results show a very good performance for the Indexed RecordSet, but the Keyed collection no longer performs well. This is because a key has to be unique, so key-based searching isn’t possible when more than one record might match the criteria. One result that surprised me was the staggering performance of the UDT array. In uncompiled code, it was significantly slower than the Indexed RecordSet. This result shows just how efficient a 32-bit compiler is at handling longs. It also shows that none of these compiler optimizations can apply when variants are used.

For this test, the collection and array-based searches must perform an iterative search. However, the RecordSet-based search doesn’t require this sort of coding. For RecordSets, all you need to do is to set the Filter property of the RecordSet to the criteria. You can then use standard iteration code to go through the RecordSet, and you’ll only see the records that meet the criteria. The following code shows how easy this is:

Dim fd as ADODB.Field
rsIndexed.Filter = "lNonUnique = " & _
   CLng(Rnd * (DATA_SIZE / 10))
Set fd = rsStandard("lNonUnique")
While Not rsIndexed.EOF
   ' use the data here
   rsIndexed.MoveNext
Wend
rsIndexed.Filter = adFilterNone

Before the Filter property is set, the RecordCount property of this RecordSet would be equal to the full data set size (10,000 in my trials). After the filter is set, and up until it’s reset in the last line, the RecordCount property will be approximately 10 (representing the records visible through the filter), depending on how the values were randomly allocated when the RecordSet was created.

Test 5. Finding one record based on two criteria

In this final "find-based" test, I’m searching on the two string fields. There’s no guarantee that a match will be found–in fact, in 100 iterations, an average of only about 20 matches are made.

Data set size	10,000
Iterations	100

Standard RecordSet	8.44s
Indexed RecordSet	0.09s
Standard collection	4.97s
Keyed collection	4.94s
UDT array	0.87s
Variant array	3.93s

The compiler can’t optimize the array handling quite as well as it did in Test 4, and collections can only have one key, so the Keyed collection offers no great advantage. The result: Indexed RecordSet wins hands down. It’s also the easiest to program, since the only thing that changes from Test 3 is the Filter criterion that supports a range of Boolean operations to search for multiple field values.

Test 6: Sorting

Lack of decent sorting capability has been one of VB’s key weaknesses over the years. However, look at the following table, and see if it changes your mind. The timings include the time to sort the unsorted data and then iterate through it in sorted order:

Data set size	1,000	5,000	10,000
Iterations	1	1	1

Standard RecordSet	0.12s	0.64s	1.29s
Indexed RecordSet	0.12s	0.61s	1.25s
Standard collection	14.07s	>1,500s	?
Keyed collection	10.28s	1,421.19s	?
UDT array	0.58s	13.76s	53.02s
Variant array	0.88s	20.08s	76.16s

When considering sorting, I’d ask you to consider not only performance, but also the coding effort required to implement it. The speed of RecordSet sorting clearly leaves all of the other approaches in the dust. But just as important is the fact that I only needed to add one line of code to my RecordSet iteration routine to have the data sorted this quickly:

  rsIndexed.Sort = "lNonUnique"

Obviously, for array and collection sorting, there’s serious coding involved. Yes, I know you can implement a BubbleSort quickly, but we all know how inefficient it is, especially when sorting large data sets. So I’ve taken the QuickSort algorithm from Bruce McKinney’s excellent Hard Core Visual Basic and modified it for my own purposes. (Needless to say, if you find anything wrong with my sorting algorithm, it will be me, not Bruce, who’s at fault.)

For array processing, Bruce’s algorithms only take up about two screenfuls of code, but it still took me half an hour to adapt it and get it working–and the end result is hardly snappy. As for the collection sorting, well, yes, there are faster ways to sort collections, but life is just too short . . .

Conclusion

This article has focused on a performance comparison between RecordSets and other, more traditional data structures. In writing it, I’m basically encouraging you to think about using ADO RecordSets for more than just retrieving SQL results. What I’ve shown is that, while RecordSets do take longer to create and iterate through, they can deliver dazzling performance for some complex operations–and at virtually no programming cost.

No, nothing beats arrays for straight data churning–although arrays aren’t that great when it comes to adding and deleting elements. And while UDT arrays can provide amazing performance, there’s a definite performance penalty associated with variant arrays–especially in compiled code.

In other words, I’m proposing that RecordSets have something to offer as standalone data structures. Bear in mind that these tests show RecordSets can perform quite well–even on what’s undoubtedly their weakest ground (an unnetworked PC). A RecordSet, remember, is a data structure that can be saved to file, marshaled efficiently across the Internet, bound to controls, and processed in 101 other ways that leave arrays and collections begging. Have I set you thinking?

Rob Macdonald is an independent software specialist based in London and southern England. In addition to consulting and training in Windows, client/server, VB, COM, and systems design and management, he also runs the UK ODBC User Group and is author of RDO and ODBC: Client Server Database Programming with Visual Basic, published by Pinnacle. Rob can be contacted in England at +44 1722 782 433 or via e-mail at rob@salterton.com.