May 1999

Optimize Data Access Using ADO

Use these 10 tips to get the most out of ADO.

by Dianne Siebold

Reprinted with permission from Visual Basic Programmer's Journal, May 1999, Volume 9, Issue 5, Copyright 1999, Fawcette Technical Publications, Palo Alto, CA, USA. To subscribe, call 1-800-848-5523, 650-833-7100, visit www.vbpj.com, or visit The Development Exchange at www.devx.com.

Performance and scalability are buzzwords in development today, but they are also real objectives for the applications we create. An application's performance is hard to quantify and measure because perceived performance is sometimes more important than actual performance. For example, data access can impose a performance penalty, especially in today's distributed environment where many clients can simultaneously access a single database. Understanding where your bottlenecks occur is half the battle to improving performance. Once these areas are identified, you can examine your options for remedying the situation. There are steps you can take to improve performance in your applications if you're using ActiveX Data Objects (ADO) to connect to and update your database. In this article, I'll detail 10 tips for optimizing data access in Visual Basic using ADO.

What you need:
Visual Basic 5.0 or 6.0
ADO 2.0
SQL Server
ODBC
There are many factors to consider when evaluating the overall performance of an application. The number of users, network speed, hardware capacity, and database size are all issues to keep in mind. Adding additional, faster hardware or distributing components across multiple computers can sometimes resolve these issues. As developers, we can ensure our code is as efficient as possible. You've taken a step in the right direction by choosing OLE DB and ADO as your data access method.

OLE DB and ADO are at the center of Microsoft's Universal Data Access (UDA) strategy. The basic concept is to enable access to data, regardless of its location or format. ADO is an object-based interface to OLE DB that works with any language that supports COM/ActiveX objects. ADO was designed with performance in mind. Its object model is simpler than other models, such as RDO, and it focuses less on the hierarchy, making it more efficient. In this flatter model, objects can be instantiated without their parent objects, giving you more flexibility and wider options. For example, you can create a Recordset object and retrieve data from the database without first creating and opening a Connection object.

Some of the tips outlined here are general database programming guidelines, while others are specific to ADO. Many of these tips can be used with any database, although I've used the familiar SQL Server Pubs database in the code listings. All recommendations work with ADO 2.0 and either VB5 or VB6, as long as you have the appropriate version of ADO installed.

1. Avoid Updating Through Recordsets
The versatile Recordset object is the heart of ADO and stores data that's returned as the result of a query. You can also update the database through the Recordset object. When you open a recordset, a connection to the database is created and you can add, update, and delete records; these changes are immediately made to the database. Although updating the database through the recordset is convenient, it requires a cursor with an open connection and uses resources. It's more efficient to update the database through stored procedures, especially when performing multiple inserts and deletes.

2. Use Stored Procedures
Instead of using embedded (or dynamic) SQL, use stored procedures. A stored procedure is a collection of SQL statements—often with variables, looping, and branching—that are stored and processed as a unit on a database. Stored procedures offer many advantages over embedded SQL, which is hard-coded in your VB code. (You should avoid hard-coding anything in an application anyway.)

For one thing, they are compiled to be faster. When you send a SQL statement to the database server, in this case SQL Server, the query processor has to parse it, analyze it, and create a query plan for execution. Not only does this slow down your queries, but it can create serious overhead when you multiply this by the number of users. Stored procedures are already compiled and their query plan resides on the server.

Other advantages to using stored procedures include easier maintenance and security. Some developers use embedded SQL to keep the code for an application in one location, but this is also a reason to move it out of your code. The smallest modification to your tables means combing through your source code to find each instance where that table is referenced. By removing dynamic SQL, you eliminate the tedious and inefficient task of formatting variables and SQL strings with single quotes, double quotes, and so on, so it can be read by the database. Stored procedures improve security in two ways: Users can only access the stored procedure and not the underlying tables in the database, and you remove the possibility of full SQL statements that include sensitive information traversing the network.

3. Create the Minimum Recordset
When opening a recordset, the rule of thumb is to create the minimum recordset with only the options your application requires. For example, you don't want to create an updatable, server-side recordset if you only need to view records. Use the CursorType, CursorLocation, and LockType properties to create the most efficient recordset.

The CursorType property specifies how you move through the recordset and whether changes made on the database are visible to the recordset after you retrieve it. Forward—only is the default if no CursorType is specified and, like the static cursor type, creates a static set of records. You can only scroll forward through the records when you use a forward-only cursor type, but this improves performance if you only have to make a single pass through the recordset. Changes made on the database are not visible to forward-only and static cursors. If you use a keyset cursor, updated records are visible, but additions and records deleted by other users are not. With a dynamic recordset, you are notified of all additions, updates, and deletions, but it also consumes the most resources.

The CursorLocation property defines where the recordset is created when it's opened—on the client or the server. The CursorLocation is not required to open a recordset, but the default is a server-side cursor, which might not be the most efficient. In most cases, a client-side cursor is preferred, because scrolling and updates are faster and more efficient, although returning data to the client increases network traffic. In addition, only client-side cursors can be disconnected.

The LockType property tells the provider what type of locks should be placed on records during editing. Read-only is the default and does not allow data editing. Use read-only whenever possible because updatable recordsets use resources to request additional metadata about the fields. Pessimistic locking locks each record while it's being edited and is available only on server-side cursors. Use pessimistic locking only when necessary because the lock is maintained as long as the cursor is open. Optimistic locking locks the record only when it's physically updated. BatchOptimistic locking is used with disconnected recordsets. These recordsets are updated locally and all modifications are sent back to the database in a batch. This code demonstrates the use of these properties to open a static, client-side, read-only recordset:

With adoRS
   .ActiveConnection = "Pubs"
   .CursorType = adOpenStatic
   .CursorLocation = adUseClient
   .LockType = adLockReadOnly
   .Source = "SELECT title_id, " & _
      "title, pub_id FROM Titles"
   .Open
End With

It's important to know your options when creating a recordset and the effect they will have on performance. Which properties provide the best performance will depend on what you're trying to accomplish. Keep in mind factors such as whether you'll be using Microsoft Transaction Server (MTS), or whether your code will be doing updates and inserts rather than only viewing data. In some cases, you might be restricted by the functionality you need. If you're going to pass recordsets to MTS components, you can only use disconnected (client-side) recordsets.

4. Use Disconnected Recordsets
A disconnected recordset no longer has a connection to the database and can only be used with a client-side cursor. Disconnected recordsets free valuable database connections and resources for other users and are recommended when you can use them. To disconnect a recordset, set its Active Connection property to Nothing:

Dim adoRS As New ADODB.Recordset

With adoRS
   .ActiveConnection = "Pubs"
   .CursorLocation = adUseClient

   .CursorType = adOpenStatic
   .LockType = adLockBatchOptimistic
   .Source = "SELECT emp_id, " & _
      "lname,hire_date FROM Employee"
   .Open

   Set .ActiveConnection = Nothing
End With

There are many advantages to using disconnected recordsets. Scalability is improved, because adding 50 users doesn't mean 50 open database connections and 50 open cursors on the database server. Time-consuming operations executed on a disconnected recordset will not affect the database server and all modifications are batched. The recordset must be reconnected to the data source to update the database. Note that with a disconnected recordset, changes made on the database by other users will not be visible—concurrency issues must be handled with code.

5. Define Command Parameters
The Command object has a Parameters collection used when calling stored procedures in ADO. Each Parameter object contains a parameter that's passed to the stored procedure when it's executed. When using stored procedures, it's more efficient to explicitly create the command parameters rather than call the Command object's Refresh method to query the information from the database. The Refresh method requires a database connection, and it causes unnecessary trips to the database server each time you call a stored procedure (see Figure 1), leading to increased network traffic and more processing on the database server.

 
Figure 1 Trace That Method. Click here

6. Encapsulate Data Access
Moving the data access code for an application into a data access layer is good design practice—as evidenced by the three-tier, client/server model that proliferates today (see Figure 2). This involves putting all the code that communicates with your database into objects. The goal of distributed architecture is to move logic and data access out of the client and into objects. Objects offer the benefit of easier debugging and reuse, as well as simplified maintainability. For example, you only have to change code in the data access layer if you decide to switch data providers, and your client application code remains untouched. You might need to spend more time in the design phase to ensure your data access layer provides the necessary functionality to the client. But you gain scalability because creating a data access layer means minimal effort to convert it into COM components you can move to an application server such as MTS.

 
Figure 2 The Three-Tier Client/Server Model. Click here

7. Use Data Shaping to Return Master/Detail Data
Data shaping is a new feature in ADO 2.0 that allows you to model one-to-many relationships within a single recordset. Typically, a query that returns publishers and the titles for each publisher creates a recordset in which the publisher information is duplicated for each title. With data shaping, you can return a single parent record (the publisher data) with a field that contains the child records in a recordset (the titles). The child recordset can be traversed and behaves as a normal recordset. Using data shaping you can also create grouped recordsets and use aggregate functions such as SUM, AVG, MIN, MAX, and COUNT. Although it takes some time to master the SHAPE language that's used to create these queries, it can result in significantly smaller resultsets. This means less network traffic and fewer client resources needed (see Figure 3).

 
Figure 3 Shape Up Your Performance With Data Shaping. Click here

8. Use MTS
Putting your components in MTS gives you the benefits of scalability, connection pooling, and transactions. Business logic and data access are moved to components that run remotely on MTS in a typical distributed application. This offloads processing from the client, and the components can then be distributed across multiple computers when the number of users increases. MTS also provides connection pooling in which database connections are cached between uses for faster access to the database. Automatic transactions are another advantage to using MTS. All transactions occur in an object's context, and all the transaction's operations are committed or rolled back using the SetComplete or SetAbort method. You can also take advantage of MTS's role-based security model by putting your components in it (see Anthony Sneed's article, "Unlock MTS Security," in the Fall 1998 issue of VBPJ Windows NT Enterprise Development).

There are some trade-offs to consider when using MTS. Increased overhead is involved when referencing MTS components because they reside remotely. Additional time is spent marshalling the data to the MTS component if parameters are passed from the client to an MTS component. Consider these issues when making design decisions before creating components to run in MTS. Depending on your requirements, it could be well worth the effort.

9. Use Output Parameters
Use output parameters instead of returning a recordset if you have a stored procedure that returns a single value or row of data. For example, if you have a stored procedure that returns the city and state for a ZIP code, return them as output parameters to eliminate the overhead of creating a recordset and retrieving the field metadata. The output parameters are defined and added to the Parameters collection of the Command object. Check for a valid return value and retrieve the output values after calling the stored procedure (see Listing 1).

10. Retrieve Only Required Data
Retrieving required data means fetching fewer rows and returning only the fields you need. You can optimize your queries by restricting the number of records with a WHERE clause in your SQL statement or specifying a default ROWCOUNT. I default to 100 rows if no record count is given. It's convenient to only type "SELECT * FROM table", but chances are you don't need all the fields from a table. Instead, specify only the fields you require in your stored procedures with statements such as "SELECT title_id, title, type, price FROM Titles."

The adExecuteNoRecords execute option is used to optimize non-data-returning commands. In previous versions of ADO, all commands executed against the database returned a Recordset object regardless of whether there were results to be returned. New to ADO 2.0, the adExecuteNoRecords execute option specifies that no rows should be returned to the client. Use this setting with the Connection or Command object's Execute method when no rows are returned or if you don't need to access the rows returned. This removes the overhead of creating the Recordset object and improves performance. Here's an example of how to use this option:

Dim adoCommand As New ADODB.Command

With adoCommand
   .ActiveConnection = "Pubs"
   .CommandText = "usp_InsertNewTitle"
   .CommandType = adStoredProc
   .Execute , , adExecuteNoRecords 
End With 

These 10 tips can help improve the performance and scalability of your application's data access—your results might vary depending on the nature of your environment. If you're just starting to use ADO, keep these tips in mind when writing data access code. If you're already using ADO, review your existing code and consider how you can refine and optimize it.

ADO's flexibility means more options for developers, but knowing when and how to use these options can be challenging. When deciding which features or technologies to employ, you need to make an informed decision by considering the benefits and trade-offs of each. The key in all development is to strike a balance between performance and functionality.


Dianne Siebold is a consultant and author specializing in Visual Basic programming. Reach her by e-mail at dsiebold@earthlink.net.