Just-in-Time Instantiation

R. L. Parker

Although it might be heretical to say so in these days of object-oriented fervor, there are situations in typical data processing applications where a fully populated object model introduces unnecessary overhead that can severely degrade the performance of the application. In this article, R.L Parker demonstrates a technique called "just-in-time instantiation" that allows the data to be treated in far more efficient two-dimensional arrays, while still retaining the benefits of a good object-oriented design.

Ever since the release of Visual Basic 4.0 with its support for classes, a veritable army of VB programmers has learned the techniques of object-oriented development and has largely become convinced of its benefits. I won't review those techniques or benefits in this article -- to do so adequately would require a book, and good ones such as Deborah Kurata's Doing Objects in Microsoft Visual Basic 5.0 already exist.

But perhaps the pendulum has swung too far. Applications are being developed and deployed that use objects too much for their own good. Is such a thing possible?  I think so, because objects have their own overhead; if an application uses too many of them, or uses them in inappropriate situations, its performance might suffer.

The problem

Let's assume that we're developing an application for a mail order company that sells music CDs, and that we're working on a subsystem that will be used by the customer service department clerks to answer questions about existing orders. During the analysis phase, we determined that the application must support these requirements (among others):

A logical model and user interface that will support these requirements are shown in Figure 1 and Figure 2.

Figure 1: A logical model that supports the storage and retrieval of customer and order information

Figure 2: A Rudimentary user interface design that supports the display of customer and order information

An object-oriented solution

As object-oriented programmers, we'd look at the requirements and the logical model and decide that the following classes are needed in our application: 

Also, our business layer functionality should be contained in a class called clsBusinessLayer. This class has two jobs: to encapsulate any business rules that must be executed when data is retrieved, inserted, or updated, and to act as the go-between for application layer and the data layer. This separation of functionality is often called three-tier or n-tier architecture.

As a first cut (see BL1.WRI and CUST1.WRI in the accompanying Download file), we decide that our business layer object needs to provide a method called GetCustomers on its interface. When GetCustomers is called, it returns a collection of fully populated Customer objects. By "fully populated," I mean that all the objects -- including each Customer and the Customer's address(es) and order(s) -- have been initialized from the data source. The appeal of this approach is that all the front end has to do is call this single method, then simply sit back and navigate through the object model as the customer service clerk moves around through the application.

In a perfect world, this design might be ideal. But during testing, it turns out that our application's performance makes molasses look zippy. Why? Because objects and, especially, collections of objects have a high runtime cost (see the sidebar "Rolling My Own Benchmark").

Okay, let's refine our design. We could change GetCustomers() so that instead of returning the fully populated object model, it returns only the top level of the model -- the Customers collection. Then, we'd wait and get the Addresses and Orders collections only for each specific Customer when selected. This makes a noticeable improvement; are we out of the woods?

Another solution

There are situations where neither of these approaches will provide acceptable performance. First of all, we might decide to put the business layer object in an OLE server executable. Why would we want to do that? Two reasons are usually cited: so the business rules object can run on a powerful "application server" machine, and so the business rules can be modified without re-distributing applications all the way to the desktop. If the business layer is put in a separate EXE, it will run out-of-process, and we'll find that it isn't a good idea to return objects or collections of objects from the methods of our business layer. Because of cross-process marshaling, performance suffers terribly.

Instead, the data needs to be returned from our business layer as a Variant variable, which holds two-dimensional arrays. Variant variables can be marshaled relatively efficiently. (See BL2.WRI in the accompanying Download file.)

You'll notice that several changes have been made to clsBusinessLayer. First, the return type of GetCustomers() has been changed from "Collection" to "Variant". Instead of being packaged into a fully populated object model, the raw data is returned to the application layer in a two-dimensional array (in a Variant variable). This 2-D array has a row and column for each corresponding row and column in the result set that's obtained from the data layer.

The client-side code (see Listing 1) must manage this 2-D array. Especially, it must be able to find whatever it needs in this raw data. This is where our classes come back into the picture. We don't have to -- indeed we shouldn't -- discard our class design. We mainly discard the collections because of their inefficiency. And we can add a couple of methods to the remaining application classes that help the application find what it needs in the raw data.

Listing 1. The modified version of clsCustomer (see CUST2.WRI in the accompanying Download file).

Option Explicit
Private mlCustomerID As Long 'the primary key
Private msFirstName As String
Private msLastName As String
Private msPhone As String
'these constants define the order of the columns found
'in the raw data
Private Const cColID = 1
Private Const cColFirstName = 2
Private Const cColLastName = 3
Private Const cColPhone = 4
Private Const cColLastCol = 4
Public Property Get CustomerID() As Long
   CustomerID = mlCustomerID
End Property
Public Property Let CustomerID(lID As Long)
   mlCustomerID = lID
End Property
Public Property Get FirstName() As String
   FirstName = msFirstName
End Property
Public Property Let FirstName(sName As String)
   msFirstName = sName
End Property
Public Property Get LastName() As String
   LastName = msLastName
End Property
Public Property Let LastName(sName As String)
   msLastName = sName
End Property
Public Property Get Phone() As String
   Phone = msPhone
End Property
Public Property Let Phone(sPhone As String)
   msPhone = sPhone
End Property
Public Sub InitFromRow(aData As Variant, lRow As Long)
'initialize an instance of this class by
'extracting a row (specified by lRow) from the
'raw data (a 2-dimensional array in aData)
   If IsArray(aData) Then
      If (lRow >= LBound(aData, 2)) And (lRow <= _
         UBound(aData, 2)) Then
         mlCustomerID = aData(cColID, lRow)
         msFirstName = aData(cColFirstName, lRow)
         msLastName = aData(cColLastName, lRow)
         msPhone = aData(cColPhone, lRow)
      End If
   End If
End Sub
Public Function SaveAsRow() As Variant
'return a 1-d array image of the object
'in the same column order defined in InitFromRow
   Dim aRow As Variant
   ReDim aRow(1 To cColLastCol)   
   aRow(cColID) = mlCustomerID
   aRow(cColFirstName) = msFirstName
   aRow(cColLastName) = msLastName
   aRow(cColPhone) = msPhone   
   SaveAsRow = aRow
End Function

You'll notice that, like clsCustomer, each application class has two new methods -- InitFromArray and SaveAsRow. These methods are used to move data from and to, respectively, a two-dimensional array to a class instance. As we loop through our ersatz collection, now represented by the two-dimensional array, we can use InitFromArray to "reconstitute" an object from a row in the two-dimensional array. I call this "just-in-time" instantiation because, instead of a collection of object instances, we only have one object instance. The data is put into this object instance (via the call to InitFromArray) just in time for us to use its properties and methods; then the object is re-used to represent the next row in the result set.

In our sample customer service application, we turn a row into an object so that the data can be displayed in detail panels. The following code fragment from the application layer (frmCustomerService.DisplayDetail) shows how we use InitFromRow and then use the reconstituted object's properties to display the details for an order line object:

Private Sub DisplayDetail()
Dim oOrderLine As clsOrderLine
   ...  
Set oOrderLine = New clsOrderLine
oOrderLine.InitFromRow maOrderLines, mlDetailIndex 
'reconstitute the object from a row in the raw data  
   txt(cTxtDetailID) = oOrderLine.OrderLineID
   txt(cTxtArtist) = oOrderLine.Artist
   txt(cTxtTitle) = oOrderLine.Title
   txt(cTxtDesc) = oOrderLine.Description
   txt(cTxtPrice) = oOrderLine.UnitPrice
   txt(cTxtQty) = oOrderLine.Qty
      ...
End Sub

Then, when the user navigates to the next row of data, we reconstitute that row into our object instance so that it can be displayed. Because our sample application is read-only, there's no need to store any changes back into the 2-D array result set. But if we did, we could use the SaveAsRow method that we've added to each class.

The sample code in the accompanying Download file is in VB4, but if you were to implement this application in VB5, you might consider creating an "abstract base class" that has InitFromArray and SaveAsRow in its interface. Then, all appropriate application classes could implement that interface -- that way, you'd be guaranteed that InitFromArray and SaveAsRow are available in all necessary classes.

Obviously, this technique requires more coding and maintenance on the client side. This is a trade-off that you must evaluate using your professional judgment. If you decide to use just-in-time instantiation, you'll almost certainly need to declare constants (probably maintained by the programmer responsible for the data layer) that define the column order for each "get" method in the data layer and business layer interfaces. The application class programmer will use these constants to implement InitFromRow and SaveAsRow. For example, here's the relevant code showing how to implement InitFromRow from clsOrderLine:

'these constants define the order of the columns found 
'in the raw data
Private Const cColID = 1
Private Const cColFirstName = 2
Private Const cColLastName = 3
Private Const cColPhone = 4
Private Const cColLastCol = 4
...
Public Sub InitFromRow(aData As Variant, lRow As Long)
'initialize an instance of this class by
'extracting a row (specified by lRow) from the
'raw data (a 2-dimensional array in aData)
If IsArray(aData) Then
   If (lRow >= LBound(aData, 2)) And (lRow <= _
      UBound(aData, 2)) Then
      mlCustomerID = aData(cColID, lRow)
     msFirstName = aData(cColFirstName, lRow)
     msLastName = aData(cColLastName, lRow)
            msPhone = aData(cColPhone, lRow)
   End If
End If
End Sub
Conclusion

Performance considerations must be taken into account early in the design process. Depending on the application requirements, deployment strategy, and object model complexity, business layer methods might need to return variants that contain raw data in 2-D arrays rather than returning collections of objects. If so, a technique by which objects can be "reconstituted" from a row in the raw data is necessary.

This technique isn't intended to replace good object-oriented design methodology. Rather, it's a compromise implementation made necessary in the interest of performance. It should be used after -- not instead of -- the object-oriented design.

R. L. Parker is a Microsoft Certified Solution Developer and senior technical lead at DBBasics, Inc. in Raleigh, NC, who specializes in custom development of mission-critical database applications. rlp@dbbasics.com.

Sidebar: Rolling My Own Benchmark

When I started to write the accompanying article ("Just-in-Time Instantiation"), I knew that large, fully populated object models cause performance problems. But how large is "large," and when exactly do the performance problems occur?

To find out, I wrote two different benchmark programs. The first one simply adds n instances of a simple object to a collection, and then deallocates the collection. Table 1 and Table 2 show the results. I was surprised to learn that there's a bigger performance problem with deallocating the collection than with creating the instances and adding them to the collection in the first place. You'll see that allocation time increases linearly with respect to n. But deallocation time increases at an increasing rate when n gets large.

Table 1. The initial Collection_Perf benchmark using a 133 MHz Pentium with 32M RAM running VB4 under Win95.

n allocate (ms) deallocate (ms)
100 21 18
200 41 28
400 83 81
800 164 206
1600 419 599
3200 798 1899
6400 1604 7032
12800 3134 30158
25600 7354 133067

The same code run on a machine with a lot more horsepower, but running under NT 4.0 Workstation instead of Windows 95, shows similar behavior, except that the definition of "large" increases.

Table 2. The initial Collection_Perf benchmark using a 200 MHz Pentium with 64M RAM running VB4 under NT4 Workstation.

n allocate (ms) deallocate (ms)
100 20 0
200 40 10
400 80 30
800 190 90
1600 380 291
3200 972 1071
6400 2423 4507
12800 7371 17245
25600 26017 80666

The second benchmark (see Listing 1a) might be more representative of real-life object models with a parent class that owns several collections.

Listing 1a. A parent class that owns several collections.

Option Explicit
Public ClassName As String
Public RedChildren As Collection 'of clsRed instances
Public OrangeChildren As Collection 'of clsOrange insts
Public YellowChildren As Collection 'of clsYellow insts
Public GreenChildren As Collection 'of clsGreen insts
Public BlueChildren As Collection 'of clsBlue insts
Public IndigoChildren As Collection 'of clsIndigo insts
Public VioletChildren As Collection 'of clsViolet insts
Private Sub Class_Initialize()
   ClassName = "clsParent" 
   Set RedChildren = New Collection
   Set OrangeChildren = New Collection
   Set YellowChildren = New Collection
   Set GreenChildren = New Collection
   Set BlueChildren = New Collection
   Set IndigoChildren = New Collection
   Set VioletChildren = New Collection
End Sub
Private Sub Class_Terminate()
   Set RedChildren = Nothing
   Set OrangeChildren = Nothing
   Set YellowChildren = Nothing
   Set GreenChildren = Nothing
   Set BlueChildren = Nothing
   Set IndigoChildren = Nothing
   Set VioletChildren = Nothing
End Sub
Public Sub AddChildren(lNumber As Long)
   Dim lLoop As Long
   Dim oRed As clsRed
   Dim oOrange As clsOrange
   Dim oYellow As clsYellow
   Dim oGreen As clsGreen
   Dim oBlue As clsBlue
   Dim oIndigo As clsIndigo
   Dim oViolet As clsViolet   
   For lLoop = 0 To lNumber - 1
      Set oRed = New clsRed
      RedChildren.Add oRed      
      Set oOrange = New clsOrange
      OrangeChildren.Add oOrange      
      Set oYellow = New clsYellow
      YellowChildren.Add oYellow      
      Set oGreen = New clsGreen
      GreenChildren.Add oGreen     
      Set oBlue = New clsBlue
      BlueChildren.Add oBlue      
      Set oIndigo = New clsIndigo
      IndigoChildren.Add oIndigo      
      Set oViolet = New clsViolet
      VioletChildren.Add oViolet
   Next
End Sub

You can see from the numbers in Table 3 that performance for a relatively large object model degrades very rapidly as n increases. In applications with large object models, especially if those models are repeatedly allocated and deallocated, a technique such as "just-in-time" instantiation might be necessary to achieve acceptable performance.

Table 3. The Collection_Perb benchmark using a 200 MHz Pentium with 64M RAM running VB4 under Windows NT 4 Workstation.

n allocate (ms) deallocate (ms)
10 110 20
20 421 160
40 2093 1993
80 11386 34189
160 107875 652959

Download sample files for this article here.