Duwamish Books Data Access Layer, Phase 3

Robert Coleridge
Microsoft Developer Network

September 1, 1998

Summary: Discusses issues surrounding object state and resource caching. Examines how these relate to the Duwamish Phase 3 data access layer. (13 printed pages) Includes API Reference to Phase 3 DAL.

Introduction

Phase 1 of the Duwamish Books sample contained a data access layer that was embedded in most of the sample's desktop applications. This layer encapsulated data access so that the data structure would be hidden, easier to maintain, and have increased data security. In Phase 2 we migrated the sample from a desktop model to a two-tier model and, in doing so, we removed the data access layer from the application code and encapsulated it into a stand-alone Component Object Model (COM) component. In Phase 3 we keep the data access layer as a separate COM component, but make some significant changes to it. The changes revolve around three issues: maintaining state, the caching of database connections, and hierarchical or shaped data. This article explains those changes, and why they were made.

For an introduction to the Duwamish Books sample, including a discussion of tiered architectures, see "An Introduction to the Duwamish Books Sample."

State

For our purposes, the term state refers to the condition or status of an object at any given point in time. Therefore, maintaining state is the ability of an object to retain or maintain information about itself and/or its surrounding environment by which a certain condition can be replicated or maintained over a period of time or over a series of events or invocations. In contrast, a stateless object does not maintain any state information about its condition over time or between invocations.

Stateful Objects

When we moved the data access code from Phase 1 to Phase 2, we simply moved the code module to its own project and created a COM dynamic-link library (DLL). This allowed us to create the data access layer COM component (which we call the DAL) very quickly, but it also meant that the statefulness of the Phase 1 code was carried over to Phase 2. That statefulness was kept in global variables and thus persisted across method invocations.

This persistence maintained database connections across calls and had the advantage of lessening the time it took to retrieve data because the database connection did not have to be established every time the application wanted data. This sped up the data access, but created certain disadvantages.

One major disadvantage is the assumption that the database server would be available for the lifetime of a DAL object. This creates a situation whereby the application may start correctly but not be able to continue later if the data server fails.

Another major disadvantage to maintaining state is that any object that does so will not scale very well. Because it can only maintain a limited amount of information, a stateful object is limited in how many callers it can have, whereas a stateless object does not usually suffer from this limitation.

Stateless Objects

While a stateful object has its advantages, a stateless object is much more versatile and scalable. A stateless object can be invoked multiple times and in multiple ways. The developer does not have to take into consideration the possible state of the object—the developer simply uses the object.

For example, if I have a stateful object that maintains database connections and I want to retrieve some data, I need to know whether a database connection has been previously established. If one has, I can retrieve my data; if not, I have to make the connection before I can retrieve my data. To retrieve data with a stateless object, I supply all of the information necessary to make the connection to the database and retrieve the data. The method invocation to retrieve data does not have to worry about previous connections because it will make one itself, retrieve the data, and close the connection. I, as the developer, don't need to worry about the existing state of the object because the state is set up for the duration of the method invocation.

The big advantage of all this? A stateless object is far more scalable because it does not need to maintain state for its callers and can be used in a much more encapsulated object-oriented manner.

The major disadvantage to writing stateless objects is the time it takes to design one properly. To truly ensure a stateless mode, I have to ensure that all of the relevant information is passed to the object when it is invoked. This takes more design time and energy, but in my opinion is worth the effort.

Synopsis

  1. The advantages of not maintaining any state in an object:

The advantages of maintaining state in an object:

The disadvantage of stateless objects:

The disadvantages of maintaining state:

Cache

When I use the term cache, I mean to store or persist something for future use or reference. In Phases 1 and 2, Duwamish cached database connections to avoid having to make a connection every time the calling application needed to access the database.

Caching

In Phase 1 and 2, an application opened a connection to the database early on in the application lifecycle and it held the connection open for the lifetime of that application session. This was acceptable with a desktop or a two-tier model. Once we migrated to a three-tier model, we reexamined the notion of caching a connection to the database. We knew that we did not have to continue to cache the connection because more modern servers, such as Microsoft SQL Server™, had the ability to cache the connection for us.

Making a connection and caching it may or may not speed up your database access. If you decide to cache the connection, the application or the database server could cache it. If the database server has the ability to cache the connection, caching the connection in an application is not such a wise idea. Instead, simply let the server do the work. On the other hand, if the database server does not cache the connection, it might be a wise idea to cache the connection in something like the DAL. Similarly, if the database server takes a long time (a few seconds) to reopen a connection, it would make sense to open the connection once and cache it.

Note   If you cache something in an object, you may be drastically reducing the scalability of the component. If scalability is critical, you'll want to avoid caching.

Noncaching

When you cache a resource, an application has an implied sense of security knowing there is no need to be concerned about whether the resource is there—it is cached, hence it is available. However, this implied security is detrimental on any scalable or distributed system due to the fact that no matter how large or powerful a server may be, it still has physical limitations, such as memory or disk space. Once all memory or disk space is used up, no more resources can be cached and therefore no more applications can run until resources are released. Because the trend is toward scalable and/or distributed systems, we need to examine how to live in a noncached world.

When I speak of a noncached world, I am not implying that there is absolutely no caching going on, simply that the client application or even the middle-tier components do very little or no caching. Most caching should be done on the server and thus free up the client or middle tier for scalability.

A noncached world is almost synonymous to a stateless world. Each function/method call to an object needs to have enough information passed to it so that it can complete its task without requesting more information from the caller or relying on state information. An object that uses cached information must make certain assumptions about that information, such as whether the information is still valid or whether the cached resource still exists.

Synopsis

"To cache, or not to cache?" That is the real question. The answer is found by asking a different question: "Can I afford to hold or lock the resource open and for how long?" If you can afford to hold or lock the resource, it may indeed be a good idea. However, if the resource is limited or shared, a noncached solution is the best answer.

State and Caching in the Duwamish Sample Phases 1–3

The following section outlines the changes made to the Duwamish project as it moved from Phase 2 to Phase 3 with regard to state and caching.

State: Phase 1 to 3

In Phase 1, the data access layer was embedded in each application and thus was able to maintain its own state. Because each application was unaware of the others, the objects in the data access layer could maintain whatever state they wanted to without consideration of the other applications.

In Phase 2, the data access layer was extracted out into its own COM component, which we call the DAL. At this point, we had to make a choice between using singleton or multiuse objects. The difference in these two types is that with a singleton object, there is only one copy of it no matter how many times an application instantiates it. The application may have several references to the object but it has only one copy of it. With a multiuse object, an application creates a new copy of the object every time it instantiates it. Because the Phase 2 DAL component was written for multiuse, we did not have to worry about objects maintaining state for each of its callers.

In Phase 3, the DAL object is still a multiuse object that can maintain state about itself. However, certain methods in the DAL have the capability to be stateless, depending on how you invoke them. For example, the GetRecordset method accepts a connection string, which, if passed in, uses the parameter to build its own connection rather than relying on the cached connection.

When we reimplement Phase 3 with Microsoft Transaction Server (MTS), we will be removing the stateful code in the DAL component to make it entirely stateless, thus facilitating the migration to a distributed three-tier model.

Cache: Phase 1 to 3

In Phase 1, the data access layer cached its database connection, but because the code was embedded in each application, this was not detrimental to the desktop-model applications.

In Phase 2, the DAL component still cached the connection. Because each caller creates its own instance of the object, we did not have to worry about the object maintaining state for each of its callers.

In Phase 3, we removed the ability to cache a connection to maximize scalability for the three-tier model. Because we migrated from a Microsoft Access database to an SQL Server database in Phase 2.5, it made sense to shift state paradigms in Phase 3.

Example: GetRecordset

This section will outline how the GetRecordset API changed from Phase 2 to Phase 3. As an example, let's consider how a sales order that consists of a sale header record and all of its associated sales detail records is retrieved.

Using the DAL from Phase 2 to retrieve a sales order requires that the code be written one of two ways.

Method 1: Caching the connection

This method would have involved the following steps:

  1. Opening and caching the database connection.

  2. Reading in the sales header record.

  3. Reading in all related sales detail records.

  4. Closing and releasing the cached connection.
    Dim oRSSalesHeader As ADODB.Recordset
    Dim oRSSalesDetail As ADODB.Recordset
    Dim oDAL As New db3dal.CDataAccess
    Dim sQuery As String
    
        'make connection to database
        oDAL.OpenConnection ("Duwamish")
        
        'get sales header record
        oDAL.GetRecordset "", _
                "SELECT * FROM Sales WHERE PKID = 2", _
                oRSSalesHeader, QO_UseText, False
        
        'get related sales detail records
        sQuery = "SELECT * FROM SalesDetail WHERE SalePKID = " & _
                    oRSSalesHeader.Fields("PKId")
        oDAL.GetRecordset "", sQuery, oRSSalesDetail, _
                QO_UseText, False
        
        'close connection to database
        oDAL.CloseConnection
    
    'walk through detail records
        Do While Not oRSSalesDetail.EOF
            '...
            'do some processing to the detail record
            '...
            oRSSalesDetail.MoveNext
        Loop
    

Note how this method opens the connection (which caches it), holds the connection open while the entire sales order is retrieved, and then closes the connection. Although this method works, it increases the amount of data going back and forth across the network because each call to get data must be marshaled and sent. This method is very stateful and thus not as scalable if used with results that have large, detailed recordsets.

Method 2: Not caching the connection

This method is slightly better in that it does not hold open a database connection and thus does not consume as many resources over the same length of time.

This method would have involved the following steps:

  1. Reading in the sales header record while passing in connection information.

  2. Reading in all related sales detail records while passing in connection information.
    Dim oDAL As New db3dal.CDataAccess
    Dim oRSSalesHeader As ADODB.Recordset
    Dim oRSSalesDetail As ADODB.Recordset
    Dim sQuery As String
    
        'get sales header record
        oDAL.GetRecordset "Duwamish", _
                "SELECT * FROM Sales WHERE PKID = 2", _
                oRSSalesHeader, QO_UseText, False
        
        'get related sales detail records
        sQuery = "SELECT * FROM SalesDetail WHERE SalePKID = " & _
                    oRSSalesHeader.Fields("PKId")
        oDAL.GetRecordset "Duwamish", sQuery, oRSSalesDetail, _
                QO_UseText, False
        
        'walk through detail records
        Do While Not oRSSalesDetail.EOF
            '...
            'do some processing to the detail record
            '...
            oRSSalesDetail.MoveNext
        Loop
    

This method is similar to the preceding example, except that is does not cache the connection, but rather passes in enough information to each function call so that the function can make the connection, retrieve the data, and close down the connection.

A better way

With the release of ADO 2.0, there is a new feature called the Shape Provider that allows us to retrieve this type of data in one round trip, thus maintaining statelessness and not requiring any cached connection information.

Using the DAL from Phase 3 to retrieve the same sales order, with its associated details, consists of defining the relationship in an SQL query and simply executing that query using the new Shape Provider.

The steps necessary to use the Shape Provider in this example are as follows:

  1. Create a query string that defines the sales order header and detail relationship.

  2. Read the data from the database while passing in connection information.
    Dim oDAL As New db3dal.CDataAccess
    Dim oRSSalesHeader As ADODB.Recordset
    Dim oRSSalesDetail As Variant
    Dim sQuery As String
    
        'get related sales detail records
        sQuery = "SHAPE  {SELECT * FROM Sales WHERE PKID = 2} " & _
                 "APPEND ({SELECT * FROM SalesDetail} AS chapter " & _
                 "RELATE PKId TO SalePKID)"
        oDAL.GetRecordset "Duwamish", sQuery, oRSSalesHeader, _
                QO_UseText, True
        
        'walk through detail records
        Set oRSSalesDetail = oRSSalesHeader.Fields("chapter")
        Do While Not oRSSalesDetail.EOF
            '...
            'do some processing to the detail record
            '...
            oRSSalesDetail.MoveNext
        Loop
    

Notice the three subtle differences:

This method is a lot easier to code, requires less resources because nothing is cached, and best of all, requires only one trip to the database server to obtain all the required records.

Note   There is a known bug in ADO 2.0 (which will be fixed for version 2.1) when using a disconnected, shaped recordset. ADO does not release an SQL connection until the recordset is destroyed. This problem occurs for each recordset created in this manner. In Duwamish Books Phase 3, we have minimized the problem as much as possible. Although the SQL connection is not used when accessing the recordset, this will somewhat limit the scalability of the application by increasing the resources SQL must manage.

Duwamish DAL Phase 3 API Reference

This section details the API functions that were added, modified, or deleted.

What's New

OpenConnection

Creates and caches a connection to a database.

Visual Basic signature

Public Function OpenConnection(ByVal Connect As String) As Boolean

Visual C++ signature

BOOL OpenConnection(BSTR Connect);

Parameter

Connect: Contains the connection string, database name, DSN, or other relevant identifiers of the connection.

Remarks

Although this API functions similarly to the old CacheConnection API when a connection string was passed in, we felt that the name change reflected a more accurate understanding of what the function was now doing.

CloseConnection

Closes a previously opened connection (via OpenConnection).

Visual Basic signature

Public Function CloseConnection() As Boolean

Visual C++ signature

BOOL CloseConnection ();

Remarks

Although this API functions similarly to the old CacheConnection API when no connection string was passed in, we felt that the name change reflected a more accurate understanding of what the function was now doing.

GetConnectedRecordset

Retrieves a recordset that is connected to the connection established by a call to the OpenConnection API.

Visual Basic signature

Public Function GetConnectedRecordset(   _
                ByVal SQLQuery As String, _
                ByRef Recordset As ADODB.Recordset, _
                Optional ByVal LockType As LOCKTYPES = LT_Optimistic, _
                Optional ByVal Options As QUERYOPTIONS = QO_UseDefault _
                ) As Boolean

Visual C++ signature

BOOL GetConnectedRecordset (BSTR SQLQuery, 
                  IDispatch ** Recordset, LOCKTYPES LockType,
                  QUERYOPTIONS Options); 

Parameter

Remarks

The returned recordset is available to manipulate in any way the developer feels is necessary but, unlike a disconnected recordset, any changes made to the connected database take effect immediately, depending on the transactional nature of the connection (see BeginTrans for more details).

DisconnectRecordset

Disconnects a connected recordset from a database connection.

Visual Basic signature

Public Function DisconnectRecordset( _
                ByRef Recordset As ADODB.Recordset _
                ) As Boolean

Visual C++ signature

BOOL DisconnectRecordset (IDispatch ** Recordset);

Parameter

Recordset: An ADO recordset that is obtained via the GetConnectedRecordset API.

Remarks

This API disconnects a connected recordset from a database connection. The recordset passed in must have been obtained from the GetConnectedRecordset API.

What's Changed

The following reflects changes in the DAL API.

BeginTrans

Begins transaction processing on a previously established connection (via the OpenConnection API).

Visual Basic signature

Public Function BeginTrans() As Boolean

Visual C++ signature

BOOL BeginTrans();

Remarks

The function contract changed from Phase 2 syntax that read:

Public Function BeginTrans(Optional ByVal Connect As String) As Boolean

to look like the following:

Public Function BeginTrans() As Boolean

By removing the KeepOpen parameter we felt that this presented a much cleaner API and allowed us to clean up the internal code.

CommitTrans

Commits changes to a data source.

Visual Basic signature

Public Function CommitTrans() As Boolean

Visual C++ signature

BOOL CommitTrans ();

Remarks

The function contract changed from Phase 2 syntax that read:

Public Function CommitTrans(Optional ByVal KeepOpen As TRANSTYPES = TT_DropConnection) As Boolean

to look like the following:

Public Function CommitTrans() As Boolean

By removing the KeepOpen parameter we felt that this presented a much cleaner API and allowed us to clean up the internal code.

RollbackTrans

Rolls back the changes to a data source.

Visual Basic signature

Public Function RollbackTrans() As Boolean

Visual C++ signature

BOOL RollbackTrans ();

Remarks

The function contract changed from Phase 2 syntax that read:

Public Function RollbackTrans(Optional ByVal KeepOpen As TRANSTYPES = TT_DropConnection) As Boolean

to look like the following:

Public Function RollbackTrans() As Boolean

By removing the KeepOpen parameter we felt that this presented a much cleaner API and allowed us to clean up the internal code.

GetRecordset

Returns a disconnected recordset as a result of an SQL query.

Visual Basic signature

Public Function GetRecordset( _
                      ByVal Connect As String, _
                      ByVal SQLQuery As String, _
                      ByRef Recordset As ADODB.Recordset, _
                      Optional ByVal Options _
                           As QUERYOPTIONS = QO_UseDefault, _
                      Optional ByVal UseShapeProvider _
                           As Boolean = False
                      ) As Boolean

Visual C++ signature

BOOL GetRecordset(BSTR Connect, BSTR SQLQuery, 
                  IDispatch ** Recordset,
                  QUERYOPTIONS Options, BOOL UseShapeProvider);

Parameters

Remarks

The API has two additional optional parameters added to the API contract so as to facilitate a more optimal usage of the ADO recordset and also to facilitate the use of the ADO Shape Provider.

The old API contract changed from:

Public Function GetRecordset(ByVal Connect As String, ByVal SQLQuery As String, ByRef Recordset As adodb.Recordset) As Boolean

to:

Public Function GetRecordset(ByVal Connect As String, ByVal SQLQuery As String, ByRef Recordset As adodb.Recordset, Optional ByVal Options As QUERYOPTIONS = QO_UseDefault, Optional ByVal UseShapeProvider As Boolean = False) As Boolean

What's Been Deleted

The following functions were deleted due to the paradigm shift in state and caching:

Conclusion

We have examined the advantages and disadvantages of maintaining state and of caching or not caching resources, how these concepts affect Duwamish, and how they changed in Phase 3. As Duwamish moves forward to the next phase, you may want to begin taking a look at the Microsoft Transaction Server, or MTS. This piece of technology is so revolutionary in dealing with the state/caching issues that Microsoft is building it into the next version of Windows NT®. What does Duwamish have to do with MTS? The next phase of Duwamish will be fully MTS-enabled, and thus all of the issues dealt with in this article will be eliminated or greatly reduced by letting MTS handle them.