Creating a Page Cache Object

Duwamish Books, Phase 4

Robert Coleridge
Microsoft Developer Network

June 1999

Summary: Demonstrates how to create a highly versatile cache object that works in any COM-enabled environment. (11 printed pages) Discusses reasons to create your own cache and provides implementation details, including plenty of Microsoft® Visual C++® code.

Introduction

When I speak of a cache object I envision a container with particular properties that can hold or cache data. For our definition, a container is simply that: an object that is used to hold things. There does not have to be any organization or order to the things being contained, they are merely contained. Think of a bucket where you simply put things in and take things out. There is no need for the items to be stacked a certain way, no need for them to be ordered in any way, but simply stored and contained for later retrieval. This idea of a container works great if the items are similar and can be retrieved in any order, but what if the items are similar and need to be retrieved in a particular order or identified individually? What we need is a container that has the ability to retrieve a particular object by some sort of reference or key and, although not necessary, but definitely desired, is an object that is very minimalistic in its resource usage. This is my picture of the Duwamish Books, Phase 4 cache object.

What we're going to examine in this article is the reason for writing a keyed cache, how to write a keyed cache, and how such a cache fits in with the Duwamish sample in Phase 4 and presentation caching in general. For the complete documentation of the Cache interfaces, see "The Duwamish Books, Phase 4 Cache API Reference"

Why Another Container Object?

There are container, collection, and list classes in Microsoft Foundation Classes (MFC) and collection objects in Microsoft Visual Basic®, so why another similar class/object? As you can see from the following table, the three different MFC implementation types (List, Array, and Map) each has strengths and weaknesses. The List is fast to insert but slow to search because it is not indexed; the Array is indexed but slow to insert and search; the Map, although fast to insert and search, uses a hashing algorithm to shape its keys. The algorithm may or may not work, depending on the presumptions made by the hashing algorithm, such as English-only keys, alphabetic keys, and so on. Another disadvantage of using MFC for our container is not obvious here but has to do with the amount of resources required to use MFC. MFC is a great tool, but if we want something very lean we will have to look elsewhere.

Table 1. MFC Collection Shape Features

Shape	Ordered?	Indexed?	Insert an element	Search for specified element	Duplicate elements?
List	Yes	No	Fast	Slow	Yes
Array	Yes	By int	Slow	Slow	Yes
Map	No	By key	Fast	Fast	No (keys) Yes (values)

What about the Visual Basic Collection class? Although this is a seemingly thin object, it requires the Visual Basic run time to work, and therefore cannot be used outside the Visual Basic environment. It cannot be passed off to something other than a Visual Basic program or component.

Because MFC is bulky and Visual Basic collections are proprietary to Visual Basic, we chose to create our own container collection for use with the Duwamish Phase 4 Web application. Using the Visual C++ Active Template Library (ATL) allowed us to create a component that avoids bulky run times and class frameworks. This collection of templates allowed us to stay lean, and even implement additional required features while staying lean. These new features include keyed access, multiple indices (key and access time), event firing automatic flushing of old content based on access time, and a multiple threaded model (both free and apartment) for use under Internet Information Services (IIS).

The Duwamish Books Cache Component

The Duwamish Books, Phase 4 Web store includes a catalog of books that users browse when shopping. When a user requests a page of details from the Web server, the Web server requests the raw details from the database server. The Web server receives the raw details and renders them (through the Workflow component) into the appropriate presentation data—for example, HTML, XML, and so on—for display purposes. This retrieval and rendering can be a very resource-expensive proposition. The cache object detailed in this article is used in order to minimize the hits to the database server and lower the Web server's CPU usage that the rendering of the details requires. The pseudo-code for the usage of the cache object is as follows:

Obtain detail request from caller.
Are these details already in the cached data?
If not already cached then
   Retrieval raw data from data store.
   Render raw data into presentation data.
   Cache presentation data
Else if cached then
   Retrieve relevant presentation data from cached data
Send presentation data back to caller.

Using the cache object in this way removes some of the burden from at least two processes in the chain of events: output rendering and data store retrieval. The cache object, although used in the workflow layer in Duwamish, could be used at any point in the architecture. For example, the cache object could be used to cache raw data retrieved from requests, hence caching the data and lightening the load from the data store itself. On our test platform we saw an order of magnitude increase in processing speed (from 2-3 per second without the cache object in place to 20-25 per second with the cache object in place).

Implementation

This section will show the implementation details.

Object features

Threading model

When creating the object I used the ATL Wizard "Both" option for the threading model. This means that the object supports both the apartment and free-threaded models. This allows for greater flexibility in usage but required that all of the critical sections within the code be written to handle the locking and unlocking of the data.

Singleton class factory

The object is coded, at present, to be a multiply instantiated object. It could be made into a singleton object by simply setting the following constant to TRUE. This code can be found in D4Cache.h:

//**************************************
//
// Set to TRUE for a Singleton object
//
#define SINGLETONOBJECT FALSE
//**************************************

Critical sections

The object is coded, at present, to utilize critical sections. This can be turned off by simply setting the following constant, found in D4Cache.h, to TRUE:

//**************************************
//
// Set to TRUE to eliminate critical sections
//
#define NOCRITICALSECTIONS FALSE
//**************************************

Read/write locks on critical sections

Critical sections are great but can present bottlenecks in processing if not implemented and monitored carefully. The cache object implements a simplistic read/write locking model such that reading the cache is fully asynchronous from multiple threads, and the inserting or updating of the cache blocks synchronously to prevent collisions.

The read lock is implemented to allow multiple threads to access code and data by checking for a write lock; if not blocked by the write lock, the read lock simply increments a reference count upon entry of critical code and decrements that count upon exit. The following code shows how the read lock is used:

STDMETHODIMP CCache::get_EventsEnabled(BOOL *pVal)
{
   // return error if receiving pointer is invalid
   if (pVal == NULL)
      return E_POINTER;

   // Lock down critical section
   CACHEREADLOCK();

   *pVal = m_bEventsEnabled;

   // Unlock critical section
   CACHEREADUNLOCK();

   return S_OK;
}

Notice how the relevant code is bracketed by the read locking mechanism. The read lock is merely a reference-counting mechanism used by the write locking mechanism. The write lock is implemented in such a way that it will wait for the read lock reference count to drop to zero—that is, if nobody is reading the data, it will invoke an OS critical section lock, thus preventing anyone from reading the changing data. Once the data has changed the OS critical section is released and any outstanding reads can occur.

The following is an example of how the write lock is used:

STDMETHODIMP CCache::Flush(BOOL *pVal)
{
   // Lock down critical section
   CACHEWRITELOCK();

   if (InternalFlush() == -1)
      *pVal = FALSE;
   else
      *pVal = TRUE;

   // Unlock critical section
   CACHEWRITEUNLOCK();

   return S_OK;
}

The code that implements the locking mechanism can be found in the clock.cpp and clock.h files, with most of the code appearing in the clock.h file as inline code.

Automatic flushing

The cache object is designed to monitor the duration of the cached items and to flush itself of any old items when new items are added. An old item is determined by the length of time it has been cached. This time value is set via the FlushDeltaTime property. The automatic flushed can be turned on and off with the FlushEnabled property. The following code is an abbreviated example of the internal flushing code:

long CCache::InternalFlush()
{
   long lFlushFrom;
   long n;
   long lFlushCount;

   // if not flushing then simply return
   if (!m_FlushEnabled)
   {
      return -1;
   }

   // if flush time is 0 then simply return
   if (m_FlushDeltaTime == 0)
   {
      return -1;
   }

   // if not time for a flush then return
   if ((long)GetTickCount() - m_LastFlushAt < m_FlushDeltaTime)
   {
      return -1;
   }

   // check if flush required
   // retrn variable will contain index to flush from to end of index
   lFlushFrom = FlushRequired();
   lFlushCount = 0;

   // if required then flush
   if (lFlushFrom != -1)
   {
      // raise event to signal user
      if (m_bEventsEnabled)
         this->Fire_CacheFlushing();

      for (n = m_ElementCount - 1; n >= lFlushFrom; n--)
      {
         lFlushCount++;
         EraseElement(m_pIndex_Time[n]);
      }

      // store time flushed
      m_LastFlushAt = GetTickCount();

   // raise event to signal user
      if (m_bEventsEnabled)
         this->Fire_CacheFlushed();
   }

   return lFlushCount;
}

Note how the code checks for the delta between the previous flush and the present. If the required time has not passed, the flush is simply exited. This cuts down on the number of flush checks performed.

Insertion sort for index maintenance

The indices are maintained in ascending order at all times by performing an insertion sort upon item addition. The new item's index values are simply inserted into the correct location. This allows for a rapid binary search through the indices. The code to do this is shown here:

void CCache::InsertAndSortLists(long lIndex)
{
   long temp;

   // find insertion point and insert data
   temp = FindKeyInsertionPoint(m_pItem_Key[lIndex]);
   if (temp == -1)
      m_pIndex_Key[m_ElementCount] = lIndex;
   else
   {
      ::MoveMemory((void *)&m_pIndex_Key[temp+1], (void *)
               &m_pIndex_Key[temp], (m_ElementCount - temp) *
               sizeof(long));
      m_pIndex_Key[temp] = lIndex;
   }


   // find insertion point and insert data
   temp = FindTimeInsertionPoint(m_pItem_Time[lIndex]);
   if (temp == -1)
      m_pIndex_Time[m_ElementCount] = lIndex;
   else
   {
      ::MoveMemory((void *)&m_pIndex_Time[temp+1], (void *)
            &m_pIndex_Time[temp], (m_ElementCount - temp) *
            sizeof(long));
      m_pIndex_Time[temp] = lIndex;
   }

   return;
}

The FindXXXInsertionPoint routines are simplified binary search routines that return the index of the next greatest index item. Once this point is found, due to the fact that the arrays are in memory, the memory below the returned index is shifted down one and the new index is simply inserted. This technique allows for an extremely fast insertion sort. For a similar example of the binary search code see the following section.

Binary chop through indices

The search technique used to rapidly look up a key is a binary search. This is possible due to the presorted nature of the indices. This code and similar code is one of the key factors in the performance of the cache object. Because the lookup of data is Log(n) comparisons, where n is the number of elements in the array, the execution of this code is a minimal performance hit.

long CCache::KeyToIndex(BSTR sKey)
{
   long lIndex;
   long lLow = 0;
   long lHigh = m_ElementCount - 1;
   long lMid;
   long lRemainder = m_ElementCount;
   unsigned long lHalf;
   int iCmpResult;

   // Lock down critical section
   CACHEREADLOCK();

   // initialize to key not found to start
   lIndex = -1;

   // search for key only if data found
   if (m_ElementCount)
   {
      // if key less than data then exit
      if (wcscmp(sKey, m_pItem_Key[m_pIndex_Key[0]]) < 0)
         goto Bypass;

      // if key greater than data then exit
      if (wcscmp(sKey, m_pItem_Key[m_pIndex_Key[
               m_ElementCount - 1]]) > 0)
         goto Bypass;

      // chop to the middle of data and find matching key
      while (lLow <= lHigh && lIndex == -1)
      {
         if (lHalf = lRemainder / 2)
         {
            lMid = lLow + (lRemainder & 1 ? lHalf : (lHalf - 1));
            iCmpResult = wcscmp(sKey, m_pItem_Key[m_pIndex_Key[lMid]]);
            if (iCmpResult == 0)   // key found
            {
               lIndex = m_pIndex_Key[lMid];
               break;
            }
            else if (iCmpResult < 0)   // key is left half of data
            {
               lHigh = lMid - 1;
               lRemainder = lRemainder & 1 ? lHalf : lHalf - 1;
            }
            else   // key in right half of data
            {
               lLow = lMid + 1;
               lRemainder = lHalf;
            }
         }
         else if (lRemainder)
         {
            iCmpResult = wcscmp(sKey, m_pItem_Key[m_pIndex_Key[lLow]]);
            lIndex = iCmpResult ? -1 : m_pIndex_Key[lLow];
            break;
         }
         else
         {
            break;
         }
      }
   }

Bypass:
   // Unlock critical section
   CACHEREADUNLOCK();

   // return index
   return lIndex;
}

General index generation

Due to the various requirements for indexing and such, and the overarching need for performance, the cache object has a generalized index creation mechanism whereby an index is created via a method call and then the sorted data can be enumerated. This relieves the critical code (insertion and lookup) of the overhead of superfluous indexing, thus maximizing performance where needed. The object method that does the sorting is listed here. Note the generalized use of the IndexSort routine. This routine simply accepts a pointer to an array and sorts it accordingly.

STDMETHODIMP CCache::CreateGeneralIndex(SORTON SortOn, BOOL *pVal)
{
   long n;

   // Lock down critical section
   CACHEWRITELOCK();

   this->Fire_CacheSorting();

   for (n = 0; n < m_ElementCount; n++)
      m_pIndex_General[n] = n;

   switch (SortOn)
   {
   case icSORT_HITCOUNT:
      IndexSort(m_pItem_Hits, m_pIndex_General, m_ElementCount,
            IndexSort_CompHits);
      break;

   case icSORT_TIME:
      IndexSort(m_pItem_Time, m_pIndex_General, m_ElementCount,
         IndexSort_CompTime);
      break;
   }

   this->Fire_CacheSorted();

   // Unlock critical section
   CACHEWRITEUNLOCK();

   return S_OK;
}

Hit count references

There is a hit count reference that is maintained for enumerating and such. This hit count reference is used by the general index generation discussed in the previous section. The code used for reference counting simply increments an associated array element for the specified item. This is done whenever the item is retrieved.

Implementing IEnumVariant and the various enumeration methods

There are several indices maintained during the life of the cache object. Each index is maintained in ascending order. The caller can enumerate the cached data in one of the following orders:

Unsorted
In ascending key
In descending key
In ascending access time
In descending access time

When the data is enumerated the caller also has the choice of enumerating either the key or the value of the item.

The following gives several examples of how to enumerate through the cached data:

Dim v  As Variant
    
    Set oCache = New MSDN_DUWAMISH4Lib.D4Cache
    '...
    
    'enumerate unsorted keys
    oCache.EnumType = Key
    oCache.EnumSorted = NONSORTED
    For Each v In oCache
        Debug.Print v
    Next v
    
    'enumerate unsorted values
    oCache.EnumType = Value
    oCache.EnumSorted = NONSORTED
    For Each v In oCache
        Debug.Print v
    Next v
    
    'enumerate values sorted by ascending key
    oCache.EnumType = Value
    oCache.EnumSorted = ASCENDING_KEY
    For Each v In oCache
        Debug.Print v
    Next v
    
    'enumerate values sorted by descending key
    oCache.EnumSorted = DESCENDING_KEY
    For Each v In oCache
        Debug.Print v
    Next v

If an ascending enumeration is desired, the data is presented as such by simply walking the relevant index. If a descending enumeration is desired, the data is presented as such by simply walking the relevant index in reverse order.

Conclusion

This article has shown you the features of the Duwamish Books, Phase 4 caching object. I would encourage you to examine the code base to see how to create a highly versatile cache object that works in any COM-enabled environment.

Comments? We welcome your feedback at duwamish@microsoft.com.