Reference Counting

In short, reference counting is the way an object controls its own lifetime. Reference counting works on the same principles as memory management. Just as a component frees memory when the memory is no longer in use, objects are destroyed when they are no longer being used. The difference is that destroying an object is not a passive operation: instead of freeing the object directly, a client must tell the object to free itself. The overall difficulty of making this work is that COM objects are dynamically allocated from within the component, yet clients must be allowed to decide when the object is no longer needed. Furthermore, objects can be simultaneously connected to multiple clients, even in different processes, and the object must wait for all clients to release their hold on the object before it can destroy itself.

Hence the reference count, which is a ULONG variable, usually called m_cRef inside C++ implementations. This reference count maintains the number of independent interface pointers that exist to any of the object's interfaces. If there is only a single client of an object and that client has two different interface pointers on the object, the reference count will be 2. If there are three clients, each with a single interface pointer on the object, the reference count will be 3. While the overall reference count concerns the object, that object might maintain individual interface reference counts so that it creates interfaces only when they are requested from QueryInterface and destroys them when their individual reference counts go to 0, even though the object itself is still around.

The concepts governing reference counting can be distilled into two fundamental rules:

Because some objects internally use interface-specific reference counting, clients must always match AddRef and Release calls through each interface pointer.

The overall implications of these two rules are that whenever you (client or object) assign one pointer to another in some piece of code, you should call AddRef through the new pointer (the left operand). Before overwriting or otherwise destroying that pointer, call Release through it. These implications are illustrated in the client code on the following page.


LPSOMEINTERFACE    pISome1;   //Some1 object
LPSOMEINTERFACE pISome2; //Some2 object
LPSOMEINTERFACE pCopy;

//A function that creates the pointer calls AddRef.
CreateISomeObject(&pISome1); //Some1 ref count=1
CreateISomeObject(&pISome2); //Some2 ref count=1

pCopy=pISome1; //Some1 count=1
pCopy->AddRef(); //AddRef new copy, Some1=2

[Do things.]

pCopy->Release(); //Release before overwrite, Some1=1
pCopy=pISome2; //Some2=1
pCopy->AddRef(); //Some2=2

[What kinds of things do you do?]

pCopy->Release(); //Release before overwrite, Some2=1
pCopy=NULL;

[Things that make us go.]

pISome2->Release(); //Release when done, Some2=0, Some2 freed
pISome2=NULL;
pISome1->Release(); //Release when done, Some1=0, Some1 freed
pISome1=NULL;

Again, an object's lifetime is controlled by all AddRef and Release calls on all of its interfaces combined. An object might use interface reference counting internally, so it is important that the AddRef and Release calls be matched through the same interface pointer (which can, of course, be in several client variables at once).

In any case, the first fundamental principle of reference counting is that any function that returns a pointer to an interface must call AddRef through that pointer. Functions that create an object and return the first pointer to an interface are such functions, as in the hypothetical CreateISomeObject function in the preceding example. Anytime you create a new copy of a pointer, you must also call AddRef through that new copy because you have two independent references—two independent pointer variables—to the same object. Then, according to the second principle of reference counting, all AddRef calls must be matched with a Release call. So before your pointer variables are destroyed (by an explicit overwrite or by going out of scope), you must call Release through each pointer. This process includes calling Release through any pointer copy (through which you called AddRef) and through the pointer you obtained from the function that created the object (and the pointer) that called AddRef implicitly.

My Kingdom for Some Optimizations!

The stated rules and their effect on the code shown earlier probably seem rather harsh, and in fact, they are. However, when you have dependent pointer variables for the same object's interfaces and you know the relative lifetimes of those variables, you can bypass the majority of explicit AddRef and Release calls. There are two manifestations of such knowledge, nested lifetimes and overlapping lifetimes, which are illustrated in Figure 2-3.

Figure 2-3.

Nested and overlapping interface pointers.

In the code fragment shown earlier, every instance of pCopy is nested within the lifetimes of pISome1 and pISome2—that is, the copy lives and dies within the lifetime of the original. After CreateISomeObject is called, both objects have a reference count of 1. The lifetimes of the objects are bounded by these create calls and the final Release calls. Because we know these lifetimes, we can eliminate any other AddRef and Release calls through copies of those pointers:


LPSOMEINTERFACE    pISome1;
LPSOMEINTERFACE pISome2;
LPSOMEINTERFACE pCopy;

CreateISomeObject(&pISome1); //Some1 ref count=1
CreateISomeObject(&pISome2); //Some2 ref count=1

pCopy=pISome1; //Some1=1, pCopy nested in Some1's life

[Do things.]

pCopy=pISome2; //Some2=1, pCopy nested in Some2's life

[Do other things.]

pICopy=NULL; //No Release necessary

[Do anything, and then clean up.]

pISome2->Release(); //Release when done, Some2=0, Some2 freed
pISome2=NULL;
pISome1->Release(); //Release when done, Some1=0, Some1 freed
pISome1=NULL;

In other words, the lifetime of the first object is bounded by CreateISomeObject(&pISome1) and pISome1->Release. The lifetime of the second object is bounded by CreateISomeObject(&pISome2) and pISome2->Release. Therefore, you can make as many temporary pointers as you need as long as those variables have a scope nested within the object's lifetime. There are three instances in which you can take advantage of this optimization:

Overlapping lifetimes are those in which the original pointer dies after the copy is born but before the copy itself dies. If the copy is alive at the original's funeral, it can inherit ownership of the reference count on behalf of the original:


LPSOMEINTERFACE    pISome1;
LPSOMEINTERFACE pCopy;

CreateISomeObject(&pISome1); //Some1 ref count=1

pCopy=pISome1; //Some1=1, pCopy nested in Some1's life
pISome1=NULL; //Pointer destroyed, pCopy inherits count, Some1=1

pCopy->Release(); //Release inherited ref count, Some1=0, Some1 freed
pCopy=NULL;

Again, the lifetime of the object is between CreateISomeObject and pCopy-> Release. Release is still being called through the original interface pointer; it's just that the pointer variable changes.

With both of these optimizations, there are only four specific cases in which an AddRef must be called explicitly for a new copy of a pointer (and thus must have a Release call made through it when destroyed):

In all cases, some piece of code must call Release for every AddRef on a pointer. In the first of the preceding cases, the caller of an interface-creating function is responsible for the new pointer (that is, the object) and must call Release when finished. If the object's reference count is decreased to 0, the object can destroy itself at its leisure, although the client has to consider it gone. If you fail to call Release, you generally doom the object to the boredom of wasteful immortality—memory will not be freed, the object's server might not unload, and so on. Be humane to your objects; let them die with dignity: be sure to release them.

Call-Use-Release

I want to clarify a statement you might see in the OLE Programmer's Reference, which describes IUnknown::Release as follows: "If IUnknown::AddRef has been called on this object's interface[s] n times and this is the n+1th call to IUnknown::Release, the [object] will free itself." This might be confusing because n+1 minus n seems like too many calls to Release. This statement refers specifically to explicit AddRef calls from within the client code and implicitly assumes that AddRef was called within the function that initially created the interface pointer. This is a valid assumption because a creation function is required to make the call, but it is unclear in the documentation. In every case, AddRef and Release calls are perfectly paired, even when done in different places.

In addition, although the return value of Release is specified as returning the new reference count of an object, you really cannot use this value for anything other than debugging purposes. Usually, an object will be destroyed when a client's call to Release has returned 0. An object doesn't have to free itself when its reference count is 0, so a client cannot use the zero return value from Release to know whether the object has been destroyed. In cases for which this information is important, a higher-level protocol such as OLE Documents will provide the necessary information, for example an explicit notification that says, "The object has been closed."

In all of OLE, Release is just about the only function you'll see anywhere along the lines of destroy or delete. (There are a few close functions in OLE Documents and OLE Automation.) A very common programming pattern is for a client to call some function to get an interface pointer (which might be QueryInterface), then call interface member functions for whatever purpose, and then call Release when it's finished with that pointer. The point is that there are many interface creation functions—not only QueryInterface, but other interface and API functions that effectively include a QueryInterface. Regardless of how you get the pointer or what you do with it, you must call Release through it when you have finished.

This pattern isn't much different from any other resource manipulation sequences involved with Windows programming, such as CreateWindow, use window, DestroyWindow; CreateFont, use font, DeleteObject; or OpenFile, use file, _lclose. But whereas the Windows API is befuddled with many destroy/delete/close functions, the names of which hardly match their respective creation functions (for example, OpenFile and _lclose, a truly well-matched pair! <sarcasm>), OLE has only Release. The final Release can do more than simply free the object. For example, releasing the root storage of a compound file effectively closes the file; a memory allocator object that we'll see later in this chapter will free any allocations it has made; a custom component will terminate its own EXE server. Thus, Release makes it much easier to remember how to get rid of something.

Circular Reference Counts

Imagine that human beings' lifetimes are determined by the number of acquaintances they have: as long as you know someone who in turn knows you, you'll both stay alive. At birth, you have an immediate acquaintance with your mother, and throughout your life you meet and befriend other people. Every new relationship is effectively a new reference count on both you and the other person. The only way the count would ever diminish would be for someone you knew to pass on, but that is impossible because you know them. Therefore, we'd all be immortal.

As appealing as this scenario might sound, there is the problem of resources: if no one ever dies, sooner or later there is not enough food, water, land, air, and so forth to maintain everyone. Then what? You simply cannot create more people—you'd need a few lightning bolts from Olympus to abruptly free a few resources.

The same is true on a computer: if objects are never destroyed and their resources are never freed, eventually there will be nothing left from which to create new objects. Hardly a workable situation. This is exactly what can happen, however, if two objects, such as a connectable object and an event sink, as we saw in Chapter 1, hold reference counts on each other! This problem is known as a circular reference count and requires special handling. In all cases in which such circular counts are possible, the interfaces involved are designed to include some other function besides Release that will force one of the two objects to call Release on the other.

For example, in the connectable object/event sink relationship, the client of the connectable object explicitly tells that object to terminate the notification relationship. This means that the object releases its reference to the event sink, thereby allowing the event sink to be destroyed. The client can then release the connectable object, destroying it. Other, slightly different, examples occur in OLE Documents. First, if the end user deletes an embedded or a linked object in the container, the container explicitly tells the object to close, which means that object releases any references it has to the client's site and shuts itself down. Second, if the end user directly closes the object's visible editing window, the object then releases its client references and shuts itself down.

In all cases, there is something other than Release, either another function call or a bolt of lightning from the Almighty End User, that causes one object in a circular relationship to terminate its relationship with another. The circle is broken, and objects can be freed.

Artificial Reference Counts

As a final note about reference counts, let's examine the use of a technique called artificial reference counts. Suppose you're writing the code in method CMyObject::Init, and in the implementation of Init you invoke functions that might call your AddRef and then Release. If your reference count is 0, as happens during the creation of an object before any interfaces exist, a call to AddRef and Release would destroy the object, causing Init to crash. This artificial count means incrementing your reference counter directly at the beginning of the risky code and then decrementing it, usually to 0, directly afterward. Decrementing the counter directly bypasses Release and its potentially destructive behavior:


void CMyObject::Init(void)
{
m_cRef++; //Increment count.

//Risky code that might call AddRef and Release

m_cRef--; //Decrement count.
return;
}

The artificial reference count guarantees object stability within this function.