Robert Coleridge
Microsoft Developer Network
June 1999
Summary: Demonstrates how to create a highly versatile cache object that works in any COM-enabled environment. (11 printed pages) Discusses reasons to create your own cache and provides implementation details, including plenty of Microsoft® Visual C++® code.
When I speak of a cache object I envision a container with particular properties that can hold or cache data. For our definition, a container is simply that: an object that is used to hold things. There does not have to be any organization or order to the things being contained, they are merely contained. Think of a bucket where you simply put things in and take things out. There is no need for the items to be stacked a certain way, no need for them to be ordered in any way, but simply stored and contained for later retrieval. This idea of a container works great if the items are similar and can be retrieved in any order, but what if the items are similar and need to be retrieved in a particular order or identified individually? What we need is a container that has the ability to retrieve a particular object by some sort of reference or key and, although not necessary, but definitely desired, is an object that is very minimalistic in its resource usage. This is my picture of the Duwamish Books, Phase 4 cache object.
What we're going to examine in this article is the reason for writing a keyed cache, how to write a keyed cache, and how such a cache fits in with the Duwamish sample in Phase 4 and presentation caching in general. For the complete documentation of the Cache interfaces, see "The Duwamish Books, Phase 4 Cache API Reference"
There are container, collection, and list classes in Microsoft Foundation Classes (MFC) and collection objects in Microsoft Visual Basic®, so why another similar class/object? As you can see from the following table, the three different MFC implementation types (List, Array, and Map) each has strengths and weaknesses. The List is fast to insert but slow to search because it is not indexed; the Array is indexed but slow to insert and search; the Map, although fast to insert and search, uses a hashing algorithm to shape its keys. The algorithm may or may not work, depending on the presumptions made by the hashing algorithm, such as English-only keys, alphabetic keys, and so on. Another disadvantage of using MFC for our container is not obvious here but has to do with the amount of resources required to use MFC. MFC is a great tool, but if we want something very lean we will have to look elsewhere.
Table 1. MFC Collection Shape Features
Shape | Ordered? | Indexed? | Insert an element | Search for specified element | Duplicate elements? |
List | Yes | No | Fast | Slow | Yes |
Array | Yes | By int | Slow | Slow | Yes |
Map | No | By key | Fast | Fast | No (keys) Yes (values) |
What about the Visual Basic Collection class? Although this is a seemingly thin object, it requires the Visual Basic run time to work, and therefore cannot be used outside the Visual Basic environment. It cannot be passed off to something other than a Visual Basic program or component.
Because MFC is bulky and Visual Basic collections are proprietary to Visual Basic, we chose to create our own container collection for use with the Duwamish Phase 4 Web application. Using the Visual C++ Active Template Library (ATL) allowed us to create a component that avoids bulky run times and class frameworks. This collection of templates allowed us to stay lean, and even implement additional required features while staying lean. These new features include keyed access, multiple indices (key and access time), event firing automatic flushing of old content based on access time, and a multiple threaded model (both free and apartment) for use under Internet Information Services (IIS).
The Duwamish Books, Phase 4 Web store includes a catalog of books that users browse when shopping. When a user requests a page of details from the Web server, the Web server requests the raw details from the database server. The Web server receives the raw details and renders them (through the Workflow component) into the appropriate presentation data—for example, HTML, XML, and so on—for display purposes. This retrieval and rendering can be a very resource-expensive proposition. The cache object detailed in this article is used in order to minimize the hits to the database server and lower the Web server's CPU usage that the rendering of the details requires. The pseudo-code for the usage of the cache object is as follows:
Obtain detail request from caller.
Are these details already in the cached data?
If not already cached then
Retrieval raw data from data store.
Render raw data into presentation data.
Cache presentation data
Else if cached then
Retrieve relevant presentation data from cached data
Send presentation data back to caller.
Using the cache object in this way removes some of the burden from at least two processes in the chain of events: output rendering and data store retrieval. The cache object, although used in the workflow layer in Duwamish, could be used at any point in the architecture. For example, the cache object could be used to cache raw data retrieved from requests, hence caching the data and lightening the load from the data store itself. On our test platform we saw an order of magnitude increase in processing speed (from 2-3 per second without the cache object in place to 20-25 per second with the cache object in place).
This section will show the implementation details.
When creating the object I used the ATL Wizard "Both" option for the threading model. This means that the object supports both the apartment and free-threaded models. This allows for greater flexibility in usage but required that all of the critical sections within the code be written to handle the locking and unlocking of the data.
The object is coded, at present, to be a multiply instantiated object. It could be made into a singleton object by simply setting the following constant to TRUE. This code can be found in D4Cache.h:
//**************************************
//
// Set to TRUE for a Singleton object
//
#define SINGLETONOBJECT FALSE
//**************************************
The object is coded, at present, to utilize critical sections. This can be turned off by simply setting the following constant, found in D4Cache.h, to TRUE:
//**************************************
//
// Set to TRUE to eliminate critical sections
//
#define NOCRITICALSECTIONS FALSE
//**************************************
Critical sections are great but can present bottlenecks in processing if not implemented and monitored carefully. The cache object implements a simplistic read/write locking model such that reading the cache is fully asynchronous from multiple threads, and the inserting or updating of the cache blocks synchronously to prevent collisions.
The read lock is implemented to allow multiple threads to access code and data by checking for a write lock; if not blocked by the write lock, the read lock simply increments a reference count upon entry of critical code and decrements that count upon exit. The following code shows how the read lock is used:
STDMETHODIMP CCache::get_EventsEnabled(BOOL *pVal)
{
// return error if receiving pointer is invalid
if (pVal == NULL)
return E_POINTER;
// Lock down critical section
CACHEREADLOCK();
*pVal = m_bEventsEnabled;
// Unlock critical section
CACHEREADUNLOCK();
return S_OK;
}
Notice how the relevant code is bracketed by the read locking mechanism. The read lock is merely a reference-counting mechanism used by the write locking mechanism. The write lock is implemented in such a way that it will wait for the read lock reference count to drop to zero—that is, if nobody is reading the data, it will invoke an OS critical section lock, thus preventing anyone from reading the changing data. Once the data has changed the OS critical section is released and any outstanding reads can occur.
The following is an example of how the write lock is used:
STDMETHODIMP CCache::Flush(BOOL *pVal)
{
// Lock down critical section
CACHEWRITELOCK();
if (InternalFlush() == -1)
*pVal = FALSE;
else
*pVal = TRUE;
// Unlock critical section
CACHEWRITEUNLOCK();
return S_OK;
}
The code that implements the locking mechanism can be found in the clock.cpp and clock.h files, with most of the code appearing in the clock.h file as inline code.
The cache object is designed to monitor the duration of the cached items and to flush itself of any old items when new items are added. An old item is determined by the length of time it has been cached. This time value is set via the FlushDeltaTime property. The automatic flushed can be turned on and off with the FlushEnabled property. The following code is an abbreviated example of the internal flushing code:
long CCache::InternalFlush()
{
long lFlushFrom;
long n;
long lFlushCount;
// if not flushing then simply return
if (!m_FlushEnabled)
{
return -1;
}
// if flush time is 0 then simply return
if (m_FlushDeltaTime == 0)
{
return -1;
}
// if not time for a flush then return
if ((long)GetTickCount() - m_LastFlushAt < m_FlushDeltaTime)
{
return -1;
}
// check if flush required
// retrn variable will contain index to flush from to end of index
lFlushFrom = FlushRequired();
lFlushCount = 0;
// if required then flush
if (lFlushFrom != -1)
{
// raise event to signal user
if (m_bEventsEnabled)
this->Fire_CacheFlushing();
for (n = m_ElementCount - 1; n >= lFlushFrom; n--)
{
lFlushCount++;
EraseElement(m_pIndex_Time[n]);
}
// store time flushed
m_LastFlushAt = GetTickCount();
// raise event to signal user
if (m_bEventsEnabled)
this->Fire_CacheFlushed();
}
return lFlushCount;
}
Note how the code checks for the delta between the previous flush and the present. If the required time has not passed, the flush is simply exited. This cuts down on the number of flush checks performed.
The indices are maintained in ascending order at all times by performing an insertion sort upon item addition. The new item's index values are simply inserted into the correct location. This allows for a rapid binary search through the indices. The code to do this is shown here:
void CCache::InsertAndSortLists(long lIndex)
{
long temp;
// find insertion point and insert data
temp = FindKeyInsertionPoint(m_pItem_Key[lIndex]);
if (temp == -1)
m_pIndex_Key[m_ElementCount] = lIndex;
else
{
::MoveMemory((void *)&m_pIndex_Key[temp+1], (void *)
&m_pIndex_Key[temp], (m_ElementCount - temp) *
sizeof(long));
m_pIndex_Key[temp] = lIndex;
}
// find insertion point and insert data
temp = FindTimeInsertionPoint(m_pItem_Time[lIndex]);
if (temp == -1)
m_pIndex_Time[m_ElementCount] = lIndex;
else
{
::MoveMemory((void *)&m_pIndex_Time[temp+1], (void *)
&m_pIndex_Time[temp], (m_ElementCount - temp) *
sizeof(long));
m_pIndex_Time[temp] = lIndex;
}
return;
}
The FindXXXInsertionPoint routines are simplified binary search routines that return the index of the next greatest index item. Once this point is found, due to the fact that the arrays are in memory, the memory below the returned index is shifted down one and the new index is simply inserted. This technique allows for an extremely fast insertion sort. For a similar example of the binary search code see the following section.
The search technique used to rapidly look up a key is a binary search. This is possible due to the presorted nature of the indices. This code and similar code is one of the key factors in the performance of the cache object. Because the lookup of data is Log(n) comparisons, where n is the number of elements in the array, the execution of this code is a minimal performance hit.
long CCache::KeyToIndex(BSTR sKey)
{
long lIndex;
long lLow = 0;
long lHigh = m_ElementCount - 1;
long lMid;
long lRemainder = m_ElementCount;
unsigned long lHalf;
int iCmpResult;
// Lock down critical section
CACHEREADLOCK();
// initialize to key not found to start
lIndex = -1;
// search for key only if data found
if (m_ElementCount)
{
// if key less than data then exit
if (wcscmp(sKey, m_pItem_Key[m_pIndex_Key[0]]) < 0)
goto Bypass;
// if key greater than data then exit
if (wcscmp(sKey, m_pItem_Key[m_pIndex_Key[
m_ElementCount - 1]]) > 0)
goto Bypass;
// chop to the middle of data and find matching key
while (lLow <= lHigh && lIndex == -1)
{
if (lHalf = lRemainder / 2)
{
lMid = lLow + (lRemainder & 1 ? lHalf : (lHalf - 1));
iCmpResult = wcscmp(sKey, m_pItem_Key[m_pIndex_Key[lMid]]);
if (iCmpResult == 0) // key found
{
lIndex = m_pIndex_Key[lMid];
break;
}
else if (iCmpResult < 0) // key is left half of data
{
lHigh = lMid - 1;
lRemainder = lRemainder & 1 ? lHalf : lHalf - 1;
}
else // key in right half of data
{
lLow = lMid + 1;
lRemainder = lHalf;
}
}
else if (lRemainder)
{
iCmpResult = wcscmp(sKey, m_pItem_Key[m_pIndex_Key[lLow]]);
lIndex = iCmpResult ? -1 : m_pIndex_Key[lLow];
break;
}
else
{
break;
}
}
}
Bypass:
// Unlock critical section
CACHEREADUNLOCK();
// return index
return lIndex;
}
Due to the various requirements for indexing and such, and the overarching need for performance, the cache object has a generalized index creation mechanism whereby an index is created via a method call and then the sorted data can be enumerated. This relieves the critical code (insertion and lookup) of the overhead of superfluous indexing, thus maximizing performance where needed. The object method that does the sorting is listed here. Note the generalized use of the IndexSort routine. This routine simply accepts a pointer to an array and sorts it accordingly.
STDMETHODIMP CCache::CreateGeneralIndex(SORTON SortOn, BOOL *pVal)
{
long n;
// Lock down critical section
CACHEWRITELOCK();
this->Fire_CacheSorting();
for (n = 0; n < m_ElementCount; n++)
m_pIndex_General[n] = n;
switch (SortOn)
{
case icSORT_HITCOUNT:
IndexSort(m_pItem_Hits, m_pIndex_General, m_ElementCount,
IndexSort_CompHits);
break;
case icSORT_TIME:
IndexSort(m_pItem_Time, m_pIndex_General, m_ElementCount,
IndexSort_CompTime);
break;
}
this->Fire_CacheSorted();
// Unlock critical section
CACHEWRITEUNLOCK();
return S_OK;
}
There is a hit count reference that is maintained for enumerating and such. This hit count reference is used by the general index generation discussed in the previous section. The code used for reference counting simply increments an associated array element for the specified item. This is done whenever the item is retrieved.
There are several indices maintained during the life of the cache object. Each index is maintained in ascending order. The caller can enumerate the cached data in one of the following orders:
When the data is enumerated the caller also has the choice of enumerating either the key or the value of the item.
The following gives several examples of how to enumerate through the cached data:
Dim v As Variant
Set oCache = New MSDN_DUWAMISH4Lib.D4Cache
'...
'enumerate unsorted keys
oCache.EnumType = Key
oCache.EnumSorted = NONSORTED
For Each v In oCache
Debug.Print v
Next v
'enumerate unsorted values
oCache.EnumType = Value
oCache.EnumSorted = NONSORTED
For Each v In oCache
Debug.Print v
Next v
'enumerate values sorted by ascending key
oCache.EnumType = Value
oCache.EnumSorted = ASCENDING_KEY
For Each v In oCache
Debug.Print v
Next v
'enumerate values sorted by descending key
oCache.EnumSorted = DESCENDING_KEY
For Each v In oCache
Debug.Print v
Next v
If an ascending enumeration is desired, the data is presented as such by simply walking the relevant index. If a descending enumeration is desired, the data is presented as such by simply walking the relevant index in reverse order.
This article has shown you the features of the Duwamish Books, Phase 4 caching object. I would encourage you to examine the code base to see how to create a highly versatile cache object that works in any COM-enabled environment.
Comments? We welcome your feedback at duwamish@microsoft.com.