Rob Macdonald
Have you ever considered using an ADO RecordSet as a data structure in place of an array or collection? Lets face itADO RecordSets provide built-in sorting, searching, and persistence capabilities. In this article, Rob shows you how the RecordSet stands up against more traditional data structures.
Yes, I realize that most people use ADO to interact with databases. But theres nothing stopping you from creating RecordSets in memory just like regular data structures and using your ADO skills to manipulate them. Personally, Ive found the ability to sort, filter, and search RecordSets so valuable that I wanted to see how RecordSets stacked up against arrays and collections for a range of standard data processing tasks. This article shows you the tests I ran and what I discovered. Even if you dont like the idea of replacing more traditional data structures with RecordSets, the results should provide some interesting insights into how RecordSets behave with regular dataat least when youre using a client-side cursor.
From the top
In case youve never done it before, Ill quickly show you how to create a pseudo- or fabricated RecordSetone that you build in code, rather than attain from a Provider. In order to run my performance tests, I defined a data structure with four columns: unique string (different for each record/row), non-unique string, unique long, and non-unique long. Not very imaginative, I know, but at least youll know what Im using in each test. The following code shows you how to define a fabricated RecordSet and stuff it with data:
Dim rsStandard As Recordset
Dim vColumns As Variant
Set rsStandard = New Recordset
rsStandard.Fields.Append "lUnique", adInteger
rsStandard.Fields.Append "sUnique", adVarChar, 20
rsStandard.Fields.Append "lNonUnique", adInteger
rsStandard.Fields.Append "sNonUnique", adVarChar, 20
rsStandard.Open
vColumns = Array("lUnique", "sUnique", _
"lNonUnique", "sNonUnique")
For l = 0 To DATA_SIZE _
rsStandard.AddNew vColumns, _
Array(usrBase(l).lUnique, usrBase(l).sUnique, _
usrBase(l).lNonUnique, usrBase(l).sNonUnique)
Next
As you can see, Ive created a standard RecordSet object and then appended fields to it, specifying the data type and size where appropriate. You can only do this to closed RecordSets, but once the fields are defined, you can open the RecordSet and then use AddNew to add the records. Note that when youre adding many records, its a lot faster to use AddNew with the field names and values as arguments (as I did previously) than to specify each field value as a separate statement DAO/RDO-style.
The contestants
Here are the contenders in the Data Structure Grand Prix:
1. Standard RecordSet: As created previously, with four columns.
2. Indexed RecordSet: This is a standard Recordset, but with client-side indexes defined to speed up searching.
3. Standard collection: Ive defined a VB class called TestObj with four properties to hold the four data fields already described. The Standard Collection is a VB collection of instances of this class.
4. Keyed collection: This is a standard collection, but when objects are added to it, the unique string property value is used as a key in the collection. This will enhance search times when searching by the unique string.
5. Array of user-defined Types: A user-defined type called "TestData" holds the four field values. I then defined an array of this UDT.
6. Variant array: This is a 2-D variant-based data structure.
The tests
I put each data structure through six tests using 10,000 records. Each data structure has the four fields/columns mentioned previously. To get accurate timings, some of the tests have been run 100 times, in which case I show the total, not the average. In each case, Ive run the complete test four times and presented the average of the second, third, and fourth tests.
All of my tests were performed on a Pentium II running at 400 MHz with Windows 2000 Beta 3. Ive also run the tests on smaller machines and older (all right, actually released!) operating systems, and the results scale pretty much as expected. All figures are based on compiled VB6 code. The usual warnings about relying too heavily on benchmarks apply as evernothing beats "suck it and see" when it comes to performance tuning.
Test 1: Creating data
Before you use a data structure, you need to be able to populate it. Each of the following tests place the "same" data into the data structures.
Data set size |
10,000 |
Iterations |
1 |
Standard RecordSet |
2.87s |
Indexed RecordSet |
3.32s |
Standard collection |
2.07s |
Keyed collection |
2.22s |
UDT array |
0.03s |
Variant array |
1.53s |
The results speak for themselves. UDT arrays are blisteringly fast to create. The others are all very slow by comparison, and RecordSets come out the worst.
Test 2: Iterating through data
In each of the following tests, the code loops through each record/row, copying its four fields into local variables:
Data set size |
10,000 |
Iterations |
1 |
Standard RecordSet |
1.06s |
Indexed RecordSet |
1.08s |
Standard collection |
0.06s |
Keyed collection |
0.06s |
UDT array |
0.02s |
Variant array |
0.05s |
Again, it doesnt look too good for the RecordSets.
Test 3: Finding one record based on one criterion
In each data structure, the sUnique field/column contains a unique string. The test creates a random string known to be one of those used in the data structure and searches the data structure for that string. The test is repeated 100 times to get a good figure.
Data set size |
10,000 |
Iterations |
100 |
Standard RecordSet |
3.60s |
Indexed RecordSet |
0.03s |
Standard collection |
2.12s |
Keyed collection |
0.01s |
UDT array |
0.25s |
Variant array |
1.17s |
This very common test puts a different slant on things. The Indexed RecordSet and the Keyed collection search at the speed of light, and even the UDT array is slow by comparison. As for the others, they look like they stumbled into some quicksand en route.
Of course, Keyed collections are designed specifically for this type of retrieval operation, and they appear to be a good bet at this stage. You might be wondering how the Indexed RecordSet performs so well. The time to create the Indexed RecordSet was slightly longer than the time for the Standard RecordSet (see Test 1). The extra time occurred because I asked the RecordSet to create a client-side index for each field. Please understand that this has nothing to do with database indexesits entirely a feature of the ADO client cursor library. When you build a client-side index for an ADO 2.x RecordSet that uses a client cursor, youll benefit from greatly enhanced client-based searching and filtering. Heres the code for creating the four client-side indexes used in my Indexed RecordSet:
rsIndexed.Fields("lUnique").Properties("OPTIMIZE") _
= True
rsIndexed.Fields("sUnique").Properties("OPTIMIZE") _
= True
rsIndexed.Fields("lNonUnique").Properties("OPTIMIZE") _
= True
rsIndexed.Fields("sNonUnique").Properties("OPTIMIZE") _
= True
Test 4. Finding n records based on one criterion
The lNonUnique field of each record/row is set to a random long between one and one-tenth of the data set size. There will therefore be about 10 records matching each possible value. This means that an iterative search cant stop on the first hitit must search right through the data set to find all of the matches.
Data set size |
10,000 |
Iterations |
100 |
Standard RecordSet |
3.42s |
Indexed RecordSet |
0.14s |
Standard collection |
3.12s |
Keyed collection |
3.18s |
UDT array |
0.03s |
Variant array |
0.81s |
Again, the results show a very good performance for the Indexed RecordSet, but the Keyed collection no longer performs well. This is because a key has to be unique, so key-based searching isnt possible when more than one record might match the criteria. One result that surprised me was the staggering performance of the UDT array. In uncompiled code, it was significantly slower than the Indexed RecordSet. This result shows just how efficient a 32-bit compiler is at handling longs. It also shows that none of these compiler optimizations can apply when variants are used.
For this test, the collection and array-based searches must perform an iterative search. However, the RecordSet-based search doesnt require this sort of coding. For RecordSets, all you need to do is to set the Filter property of the RecordSet to the criteria. You can then use standard iteration code to go through the RecordSet, and youll only see the records that meet the criteria. The following code shows how easy this is:
Dim fd as ADODB.Field
rsIndexed.Filter = "lNonUnique = " & _
CLng(Rnd * (DATA_SIZE / 10))
Set fd = rsStandard("lNonUnique")
While Not rsIndexed.EOF
' use the data here
rsIndexed.MoveNext
Wend
rsIndexed.Filter = adFilterNone
Before the Filter property is set, the RecordCount property of this RecordSet would be equal to the full data set size (10,000 in my trials). After the filter is set, and up until its reset in the last line, the RecordCount property will be approximately 10 (representing the records visible through the filter), depending on how the values were randomly allocated when the RecordSet was created.
Test 5. Finding one record based on two criteria
In this final "find-based" test, Im searching on the two string fields. Theres no guarantee that a match will be foundin fact, in 100 iterations, an average of only about 20 matches are made.
Data set size |
10,000 |
Iterations |
100 |
Standard RecordSet |
8.44s |
Indexed RecordSet |
0.09s |
Standard collection |
4.97s |
Keyed collection |
4.94s |
UDT array |
0.87s |
Variant array |
3.93s |
The compiler cant optimize the array handling quite as well as it did in Test 4, and collections can only have one key, so the Keyed collection offers no great advantage. The result: Indexed RecordSet wins hands down. Its also the easiest to program, since the only thing that changes from Test 3 is the Filter criterion that supports a range of Boolean operations to search for multiple field values.
Test 6: Sorting
Lack of decent sorting capability has been one of VBs key weaknesses over the years. However, look at the following table, and see if it changes your mind. The timings include the time to sort the unsorted data and then iterate through it in sorted order:
Data set size |
1,000 |
5,000 |
10,000 |
Iterations |
1 |
1 |
1 |
Standard RecordSet |
0.12s |
0.64s |
1.29s |
Indexed RecordSet |
0.12s |
0.61s |
1.25s |
Standard collection |
14.07s |
>1,500s |
? |
Keyed collection |
10.28s |
1,421.19s |
? |
UDT array |
0.58s |
13.76s |
53.02s |
Variant array |
0.88s |
20.08s |
76.16s |
When considering sorting, Id ask you to consider not only performance, but also the coding effort required to implement it. The speed of RecordSet sorting clearly leaves all of the other approaches in the dust. But just as important is the fact that I only needed to add one line of code to my RecordSet iteration routine to have the data sorted this quickly:
rsIndexed.Sort = "lNonUnique"
Obviously, for array and collection sorting, theres serious coding involved. Yes, I know you can implement a BubbleSort quickly, but we all know how inefficient it is, especially when sorting large data sets. So Ive taken the QuickSort algorithm from Bruce McKinneys excellent Hard Core Visual Basic and modified it for my own purposes. (Needless to say, if you find anything wrong with my sorting algorithm, it will be me, not Bruce, whos at fault.)
For array processing, Bruces algorithms only take up about two screenfuls of code, but it still took me half an hour to adapt it and get it workingand the end result is hardly snappy. As for the collection sorting, well, yes, there are faster ways to sort collections, but life is just too short . . .
Conclusion
This article has focused on a performance comparison between RecordSets and other, more traditional data structures. In writing it, Im basically encouraging you to think about using ADO RecordSets for more than just retrieving SQL results. What Ive shown is that, while RecordSets do take longer to create and iterate through, they can deliver dazzling performance for some complex operationsand at virtually no programming cost.
No, nothing beats arrays for straight data churningalthough arrays arent that great when it comes to adding and deleting elements. And while UDT arrays can provide amazing performance, theres a definite performance penalty associated with variant arraysespecially in compiled code.
In other words, Im proposing that RecordSets have something to offer as standalone data structures. Bear in mind that these tests show RecordSets can perform quite welleven on whats undoubtedly their weakest ground (an unnetworked PC). A RecordSet, remember, is a data structure that can be saved to file, marshaled efficiently across the Internet, bound to controls, and processed in 101 other ways that leave arrays and collections begging. Have I set you thinking?
Rob Macdonald is an independent software specialist based in London and southern England. In addition to consulting and training in Windows, client/server, VB, COM, and systems design and management, he also runs the UK ODBC User Group and is author of RDO and ODBC: Client Server Database Programming with Visual Basic, published by Pinnacle. Rob can be contacted in England at +44 1722 782 433 or via e-mail at rob@salterton.com.