How OLE and COM Solve the Problems of Component Software Design

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

May 1996

How OLE and COM Solve the Problems of Component Software Design

Kraig Brockschmidt

Kraig Brockschmidt is part of the OLE design team at Microsoft, involved in many aspects of the continuing development and usage of this technology. He is the author of two books Inside OLE 2 and Inside OLE 2nd Edition (Microsoft Press).

Microsoft continues to make heavy investments in OLE-related technologies. OLE itself has been in development for over seven years, and essentially every new technology coming out of Microsoft somehow incorporates elements of OLE. Why does OLE deserve such an investment, and why is OLE significant for the independent software vendor (ISV) and the computer industry as a whole? The answer is that Microsoft created OLE to provide an object-oriented solution for practical problems encountered in developing operating systems and applications. OLE provides the necessary specifications and the key services that make it possible to create component software that ultimately benefits the entire computing industry.

The core of this infrastructure is a simple, elegant, yet very powerful and extensible architecture called the Component Object Model, or COM. Within COM you can find solutions to some of the most perplexing software problems, such as accessing objects outside application boundaries and versioning. Furthermore, these solutions are concerned with binary components in a running system rather than source code components in an application. In this article I explore these problems and the solutions that COM, and therefore OLE, provides for running binary systems.

A Brief History of OLE

Graphical user interfaces brought about the clipboard metaphor-with copy, cut, and paste operations-that greatly simplified the creation of compound documents. A compound document included text, graphics, and other types of content. Prior to this invention, you had to print the text and graphics separately, then cut and paste them together with scissors and glue.

The clipboard succeeded in initially allowing the creation of compound documents, but Microsoft soon realized its limitations. Editing the graphics required many difficult manual steps to get them from the compound document to the original editing tool-if that was possible at all.

To simplify these steps, Microsoft's Applications division created a complex DDE protocol. Out of the DDE protocol grew OLE version 1.0 (1991), which was made available to all developers as a standard. (At this point, OLE stood for Object Linking and Embedding.) OLE 1.0 greatly enhanced the creation and management of compound documents. You placed embedded objects or linked objects in a document. The document retained the native data, or a link to that data, as well as information about the editing tool. All the complexity of editing components of a compound document was reduced to a double-click of the mouse. Presto! The object data was automatically brought back into the original editor.

The OLE 1.0 designers realized, however, that these compound document objects were actually a specific case of software components-small elements of software than can be "plugged-in" to an application, extending the application's functionality without requiring new code. In the compound document paradigm, the document editor is a generic container that can hold any kind of content object; you can insert charts, sounds, videos, pictures, and many other kinds of objects into that container without having to update the container.

In the more general sense, component software has much broader applicability than compound documents. It provides a multi-purpose plug-in model that is more powerful and flexible than DLLs, VBXs, and the like. This was the guiding principle behind the design of OLE version 2.0, released in 1993. Not only did OLE 2.0 improve upon the compound document facilities of OLE 1.0, but a vast infrastructure was built to support a wide range of component software.

What makes COM and OLE unique is that the architecture is based on reusable design and reusable code. One of the fundamental concepts in COM, the interface, is based on the idea of design reuse. COM and OLE introduce a programming model based on reusable designs, and an implementation that provides the fundamental services to make both design and code reuse possible. As a result, Microsoft continues to introduce more and more "OLE Technologies" that build upon the original OLE 2.0 architecture, such as Distributed COM, shell extensions, and most of the ActiveX Technologies.

There is no "OLE 3.0," nor are there plans for such a release, because OLE's architecture accommodates new technologies-regardless of Microsoft's involvement-without requiring modification to the base designs. The name no longer stands for Object Linking and Embedding, and is simply OLE, pronounced "oh-lay." It has moved from being a specific technology for a specific purpose to being a reusable architecture for component software that accommodates new designs and technologies.

People have a hard time pinning down exactly what OLE is (even Microsoft's marketing groups). Throughout its history, OLE has been promoted as many things, from Visual Editing to OLE Automation to OLE controls to Distributed COM. As time moves forward, OLE (as based on COM) expands to accommodate new technology, never becoming obsolete as an architecture. OLE can be described as an extensible systems object technology whose architecture accommodates new and existing designs.

A systems object technology works with object-oriented principles in the context of a running operating system-encapsulated, polymorphic, reusable components exist and interoperate as binary entities, not as source code. New components, developed by anyone at any time, can be added to the running system. They immediately extend the services offered to applications, even those applications that are already running. This is what is meant by an extensible service architecture.

A newly installed operating system offers the basic set of services developers employ in creating applications. COM and OLE provide a way for anyone to extend the system with new services without requiring a change to the operating system. That is, COM and OLE make it possible for anyone to create new services. Developers can use these new services to create more innovative applications. Furthermore, this incremental development of system services is accomplished without requiring any kind of central control or coordination between vendors. This potential for ad hoc integration leads to significant improvements in how we develop software and how users experience software as a problem-solving tool.

Problems and Solutions: The COM Architecture

How did Microsoft end up betting its future on OLE? The COM and OLE designs were not just dreamed up by a Microsoft architect who drank too much Jolt one night and stayed awake playing Reversi. They are a result of many years of experience in the business of developing operating systems and applications. In particular, Microsoft has repeatedly faced the problem of offering new system services for applications. Some kinds of services, like device drivers and subsystems (as on Windows NT) are easy to manage because they usually ship with the operating system. However, in a component software environment nearly all new components that implement services are shipped separately-not just separate from the system but separate from each other. Component software requires an architecture through which you can deliver a component and have it become useful to applications immediately.

This really means that applications always check to see what components are available when they need one instead of assuming there is only a limited set. When you add a new component to the system, it should become available to applications instantly, even those already running. For example, consider a word processor that has a Check Spelling command that relies on the existence of a suitable spell-checking component. If you add a new spell-checking component to the system, the application can take advantage of it the next time you select that command.

A system that supports component software thus has to support a generic service abstraction: an architecture that defines how all types of components are instantiated and manipulated. The architecture also must be extensible to allow the introduction of new component types without having to revise the architecture. For instance, it might be easy to define an architecture that accommodates components for compound documents, but that architecture may not accommodate later specifications for custom controls. The architecture has to expect that new component types will be defined later on. An architecture also has to solve the problem of versioning. The first definition of a component type is easy to manage, as is the first implementation of any particular component. The difficulty is in managing revisions to designs and implementations over time.

From DLLs to COM

Let's say you have some kind of software service you'd like to provide, like a library of useful functions. You would usually implement a dynamic-link library (DLL) that exports all its functions. I call this DLL the server. The server exposes what is called a flat API because its functionality is described in a one-dimensional list of function calls. A client app calls these functions to perform various operations.

This flat API setup has been used for quite some time. It is considered the de facto method of exposing a software service and it works reasonably well for the code modules shipped with an operating system. Microsoft used this technique for creating VBX controls and MFC classes, but unexpected problems arise when new components are installed separately over a long period of time. You'll see that the basic DLL model is lousy for component software, and that COM was designed to overcome these problems.

Continuing the previous example, assume that someone wrote a specification that describes what a spell checker component does. The specification that describes the service of type spell checker might look like this:

A spell checker component is a DLL that exports one function to check whether or not a particular word exists in the current dictionary. The function prototype is as follows:

 BOOL WINAPI LookUpWord(wchar_t *pszWord)

This function must be exported as ordinal number 10.

Now, anyone can sit down and create an implementation of this service category, producing a server that supplies a spell-checker component. Typically there is only one component in any server DLL, and when that DLL is loaded into a process we just call it an object. The relationship between a component/server instance and the service category is just like the relationship between an object instance and a class definition in C++.

Let's say the vendor Basic Software creates a server and calls it BASICSPL.DLL, which they sell to ACME Software, Inc. ACME incorporates the DLL into their application, AcmeNote, implementing its Spelling command by parsing each word in the text and passing them one by one into LookUpWord.

 void OnToolsSpelling(void)
    {
    wchar_t *pszWord;

    pszWord=GetFirstWord();  //Retrieve first word in 
                             //text

     while (NULL!=pszWord)
        {
        if (FALSE==LookUpWord(pszWord))  //Check if 
                                //word is in dictionary
            [Alert user]

         pszWord=GetNextWord(pszWord);  //Go to next 
                                       //word in text
        }

     return;
    }

When an application uses a DLL function, the application must identify the function it wants to call with an absolutely unique name. In the basic DLL model, all absolute function identities include the code module plus the exported function ordinal (or text name). In my example, the LookUpWord function (ordinal 10) in BASICSPL.DLL is known as BASICSPL.DLL:10.

When the ACME developers compile and link AcmeNote, the call to LookUpWord is stored in the EXE as an import record for BASICSPL.DLL:10. (Do a dumpbin /imports command on some EXE and you'll see a list of module:ordinal records for all the dynamic links.) An import record is a dynamic link to the function in the DLL, because the actual address of the function is not known at compile time. Each module:ordinal record is essentially an alias for an address that is not supplied until run time. (In the Win32 SDK , there are tools called rebase and bind that allow you to fix DLL addresses prior to run time.) When the kernel loads the application, it loads the necessary DLLs into memory (always prior to calling WinMain), giving each exported function a unique address. The kernel then replaces each import record with a call to that address.

An application can perform these same steps manually to control when the DLL loads. You might do this to bypass the kernel's default error handling when the DLL or exported function does not exist. If you run AcmeNote but BASICSPL.DLL can't be found, the system will generate an error message. By performing these steps manually, AcmeNote could disable its Spelling command if the DLL isn't found. The kernel also generates an error if it finds the DLL but not the exported function.

In the case of AcmeNote, the app can boot faster if it doesn't load the DLL until it executes the Spelling command. To do this, you explicitly load the DLL with the Win32Ò API LoadLibrary, then retrieve the address of LookUpWord (ordinal 10) using the Win32 API GetProcAddress. You then call the function through a pointer (see Figure 1).

I can call the instance identified with hMod a component or an object (they are not quite equivalent, but are often used interchangeably), and the scenario I've just described actually meets the core object-oriented principles. The object is encapsulated behind the LookUpWord interface, because the function is specified outside any implementation. The object is polymorphic because the client code in AcmeNote would still work if I substituted another spell checker in place of BASICSPL.DLL-two implementations of the same specification are polymorphic. Finally, BASICSPL.DLL is reusable since a spell checker that checks only medical terms could load BASICSPL.DLL and pass it unrecognized words.

How would AcmeNote know to load another spellchecker instead of BASICSPL.DLL? This brings me to the first of many problems that arise when DLLs are used as a component software model. The solutions to such problems form the basis of COM design.

Hard-Coded DLL Names

It should be obvious from the previous piece of code that the name of BASICSPL.DLL is hard-coded into AcmeNote. This happens whether you call LoadLibrary or use an import library. In the latter case, the name is hard-coded in an import record that you don't even see. Such hardcoded names lead to a couple of problems.

First, the DLL must exist in either the application's directory, a system directory, or some directory included in the PATH environment variable. If the DLL is stored in the application directory, it cannot be shared between multiple applications that need the same service. You end up with multiple copies of the DLL wasting disk space. If the DLL is stored in a system directory, there can be only one provider of the service BASICSPL.DLL. The corollary to this is that each provider must have a different name, but then AcmeNote would not work if WESBTERS.DLL existed but BASICSPL.DLL didn't. You run the serious risk that someone else will install a BASICSPL.DLL written to a different specification. If the DLL is stored in some other path, then the application has to add that directory to the PATH environment variable. This is a non-issue on Win32 platforms, but the size of the PATH is severely restricted for Windows¨ 3.1.

Part of the solution is to remove the path dependencies by defining an abstract identifier for each DLL. I could define the number "44980" as the abstract ID for BASICSPL.DLL. I then require some way to map this number to the exact installation point of the DLL dynamically. This is the purpose of the system registry (or registration database). In the registry we might create entries like this:

 HKEY_CLASSES_ROOT
    ServerIDs
        44980 = c:\libs\44980\basicspl.dll

The ServerIDs section lists the exact locations of different DLLs. Instead of using "BASICSPL.DLL" directly, you write it like this:

 void OnToolsSpelling(void)
    {
    wchar_t    szPath[MAX_PATH];
    [Other locals]

    //Looks up ID in registry and returns path
    if (!MapIDToPath(44980, szPath, MAX_PATH))  
        [No mapping found, show error)]

    hMod=LoadLibrary(szPath);

    [Other code the same]
    }

This at least allows you to install one copy of BASICSPL.DLL in a specific directory so that multiple applications can share it. Each application hard-codes the ID "44980" rather than the exact location of the DLL.

Of course, this doesn't help the applications use alternate implementations of the same service category, which is a critical requirement for component software. AcmeNote should use whatever implementation of the spell checker service is available, and not rely solely on BASICSPL.DLL. Stated in object-oriented terms, all implementations of the same service category must be polymorphic so AcmeNote doesn't care which DLL actually provides the LookUpWord function. AcmeNote might ship with BASICSPL.DLL as a default component, but if a newer and better implementation-WEBSTERS.DLL (ID 56791), with more words and a faster algorithm-showed up with some other application, AcmeNote should benefit automatically.

The solution is an identifier for the service category along with registry entries that map the category ID to available implementations. If I assign the value "100587" to this spell checker category, I could create registry entries that map the category to the implementations, which are mapped elsewhere to the locations of the modules.

 HKEY_CLASSES_ROOT
    Categories
        100587 = Spell Checker
            56791 = Webster's Spell Checker
            44980 = Basic Spell Checker

    ServerIDs
        44980 = c:\libs\44980\basicspl.dll
        56791 = c:\libs\56791\websters.dll

Now I can write AcmeWord to use the most recently installed spell checker, or I can add a user interface that lets the user select from the spell checkers listed in the registry (see Figure 2).

Figure 2 Giving users a choice

The new code looks like this:

 void OnToolsSpelling(void)
    {
    DWORD      dwIDServer;
    wchar_t    szPath[MAX_PATH];
    [Other locals]

      //Magic to get a server ID, may involve UI
    dwID=MapCategoryIDToServerID(100587);  
    
     //Looks up ID in registry and returns path
    if (!MapIDToPath(dwID, szPath, MAX_PATH))  
        [No mapping found, show error)]

    [Other code the same]
    }

Abstract identifiers present another problem: who defines the category IDs and the server IDs? Some central organization could maintain a master list of which IDs are assigned to which categories, and which IDs are assigned to which DLLs. Instead, COM and OLE use globally unique identifiers (GUIDs, pronounced goo-ids or gwids)-128-bit values generated using an algorithm defined by the Open Systems Foundation that guarantees uniqueness across time and space. The COM library provides an API that implements the algorithm so anyone can obtain identifiers as needed and still be assured uniqueness.

Of course, it is also stupid for each application to define their own category IDs and registry structure, so Microsoft has defined some standards in these areas. COM and OLE provide APIs to manipulate GUIDs and their various uses in the registry. I'll show you more on this later. For now, it's enough to understand that COM eliminates dependencies on DLL names, thus allowing multiple polymorphic providers of the same service to coexist.

Management APIs

You might have noticed that the code shown earlier had a few undefined magic functions: MapCategoryIDToServerID and MapServerIDToPath. The first function provided a way to look up the ID of a server for a particular category and possibly display it. The second function retrieved the path of the server DLL associated with a particular ID. I could have simply combined all these steps, including the call to LoadLibrary, into one function called something like LoadCategoryServer. Such functions are management APIs, which free an application from having to walk through the registry and so on. The APIs shown here are pretty advanced because they assume you have a registry and it has a pre-defined structure. In reality, the API can be more generic and accommodate many different service types.

This was not always the case-for a long time there was no central registry or standards for storing registry information. As a result, whenever someone defined a new service category, they also defined a category-specific management API. Over time, many new categories were defined, each with its own management API. Each API made working with its service type easier, but in combination they complicated application programming because most applications employed many different services to achieve their desired ends.

The Win32 API itself, evolved over a decade, is a prime example. There are approximately 70 different creation functions that each deal with a different kind of object. Each type of object also has a different way to identify it (handle, pointer, structure, and so on) and a different set of APIs to manipulate it. Hence there are hundreds of API functions, each working in different ways, which often makes programming in Windows difficult. New service categories introduce new management APIs. These intend to simplify the programming model for that category, but tend to complicate the overall programming of applications.

The solution is to create generic abstractions for the common elements found in these APIs. This is what the MapCategoryIDToServerID and MapServerIDToPath functions do in my example. The process of finding existing implementations of a category is common in nearly all category definitions. Defining a generic API based on unique identifiers greatly reduces the overall number of API functions and thereby simplifies the overall programming model.

COM provides a single generic API for management of all categories and servers, based on the standards defined for how these things are listed in the registry. This generic API is extensible in that it accommodates new categories and services. Furthermore, this API includes only a handful of functions, the most important of which is CoCreateInstance. This function creates an instance of some class (given a class identifier, or CLSID) and returns its interface pointer. The key to this generic API is that object instances are always referenced through a COM interface pointer. All interface pointers are also polymorphic in allowing a generic API to manipulate any kind of interface pointer to any kind of object. This stands in stark contrast to the myriad non-polymorphic handles and references in the Win32 API.

Shared Server Instances and Lifetime Management

In the code I've shown you so far, the spell-checker object is identified with a module handle through which you can get at function pointers. This is all well and good for the application that loaded the DLL, because the module handle makes sense in that application's process space. However, that same handle is useless to another process. The AcmeNote application could not pass the module handle to another application (via RPC or some other inter-process communication mechanism) and expect that other process to make any sense of it. In general, identifiers for instances ofaserver or component cannot be shared with other processes, let alone with processes running on other machines.

On Win32 platforms, process separation means that each application using a handle-based service has to load that service into its own address space, wasting memory (even virtual memory), increasing boot time, and degrading performance. On Windows NTÔ, you can create a subsystem-type service where only one instance runs for the whole system. Much of Windows itself (like USER.EXE) works this way, which is why certain handles, like HWNDs, can be passed between processes. However, these types of system services are expensive to the system and are not suitable to component software.

Even if you could solve the problem for a single machine, you still can't share instances across machines-multiple applications on multiple machines accessing a service running on yet another machine, like a central database. To make distributed systems work, you have to employ RPC, named pipes, or similar mechanisms, which have different structures and programming models from anything you might do on a single machine or within a single process. The mechanisms for working with services differ greatly between the in-process service (DLLs), the local service (EXEs or other processes on the same machine), and the remote service (processes on other machines). A client application must understand three or more programming models to work with all three service locations.

With DLLs in particular, other problems arise if you share module handles between processes. Let's say AcmeNote loaded BASICSPL.DLL and then passed the module handle to another application. What happens if AcmeNote terminates and the other application still has the module handle? Since the DLL is loaded into AcmeNote's process, that DLL is unloaded, the module handle becomes invalid, and the other application will crash the next time it attempts to use that handle.

On 16-bit Windows, where all applications run in the same address space, one instance of a DLL can be shared by multiple applications. Unfortunately, abnormal termination of one application might cause the DLL to remain in memory even when all other applications using it closed normally. When instances of a service can be shared between processes, there is usually no robust means of handling abnormal termination of a process, especially the one that first loaded the service. Either the server remains orphaned in memory or other processes using that server may crash. DLLs, for example, have no way of knowing when other processes are referencing them.

COM solves these problems by virtue of the interface pointer structure and the ability to implement servers as either DLLs or EXEs. Through marshaling, you can share an interface pointer with other processes on the same machine or on other machines. Of course, you don't pass the exact same pointer value, but multiple processes can each own an interface pointer through which they call functions in the object attached to that pointer. This structure comes from RPC and guarantees robustness when the process that loaded the object is terminated. When other processes try to use that object, they receive disconnected error codes instead of just crashing. Also, if an object is implemented using a process in its own EXE, it will remain active as long as any client holds a reference to it. (COM automatically releases references held by clients that terminate abnormally, so the server process doesn't get orphaned itself.)

The only way you ever deal with an object instance in COM and OLE is through interface pointers. This applies regardless of the object's actual location. Therefore, the programming model is identical for the in-process, local, and remote cases. The technology that makes this works is called Location Transparency.

Multiple Services in a Single DLL

Earlier, I described a server and a component as pretty much the same thing, and I used the terms component and object to describe a loaded instance of a server. I could do this because there is the base assumption that each server DLL implements a single service only. However, anyone who has been programming for Windows for any length of time knows that a DLL implies a fixed amount of overhead, both in memory footprint and the time it takes to load. When performance is a big issue, the first thing you'll want to do is combine services into one DLL to improve the boot time and working set of applications that use your DLL.

When services are defined as a set of DLL exports, allowing one DLL to offer multiple services requires coordinating which ordinal numbers or function names are assigned to which service categories. If two categories want to use ordinal 10 for completely different operations, then you could not implement both services in one DLL! You already saw how COM and OLE use GUIDs to identify both service categories and implementations of a category. This allows anyone to independently define or implement a service. To allow multiple services per DLL, we have to solve this problem: a server cannot support multiple services without risking a conflict between the function names or ordinals used to reference those functions. The choice seems to be between central control over ordinal assignments or the inability to combine services together, which either stifles innovation or causes performance problems.

The real solution is to stop referencing functions with names and ordinals altogether. The absolute identity of any given function in a DLL-based system is of the form module:ordinal, such as BASICSPL:10. In COM and OLE, components and objects expose their functions not through singular exports, but in groups that can be referenced through a single interface pointer. An interface is defined as a group of related member functions, and is assigned an Interface Identifier (IID) GUID.

For example, if I redesigned the spell-checker specification to use COM, I'd define an interface called ISpellChecker (the "I" prefix stands for interface):

A spell checker component is any server supplying an object that implements the ISpellChecker interface, which is defined as follows using the Interface Definition Language (IDL):

 /*
IUnknown is the "base" interface for all other interfaces. The value inside the uuid() attribute is the IID, which is given the symbol IID_ISpellChecker. ISpellChecker is the interface type as recognized 
by compilers.
 */
[uuid(388a05f0-626d-11cf-a231-00aa003d7352), object]
interface ISpellChecker : IUnknown
    {
    HRESULT LookUpWord(OLESTR *pszWord);
    }

Notice how I've eliminated any reference to ordinals as well as any need to stipulate that the spell checker be implemented in a DLL. Implementing a server according to this spec is hardly any more work than implementing the DLL according to the old spec.

An interface pointer (a run-time entity) really points to a table of function pointers. In the case of ISpellChecker, the table has four entries: three for the members of the base interface IUnknown and one for LookUpWord. The table contains the addresses of the interface member function implementations. When a client calls a member function through an interface pointer, it actually calls whatever address is at the appropriate offset in the table. Programming languages make this easy, given the way interfaces are defined in header files.

As shown in Figure 3, AcmeNote can use COM's generic CoCreateInstance function and the interface to comply with this new specification (in C++). Given the interface definition, a compiler knows to generate the right machine code that calls the right offset for the LookUpWord member of ISpellChecker. To do this reliably requires a standard for the binary structure of an interface, which is one of the fundamental parts of the COM specification.

Any given function in any given object implementation is identified with three elements: the object's CLSID, the IID that types the pointer through which the client calls member functions, and the offset of the particular member function in the interface. The absolute function identity in COM and OLE is CLSID:IID:table_offset, as in CLSID_BasicSpellChecker:IID_ISpellChecker:4. Because anyone can generate CLSIDs and IIDs at will, anyone can design a service where the functions describing that service are ultimately identified with two guaranteed unique values.

COM allows any given server module to implement as many different classes as it wants, so any server can support as many CLSIDs as desired. When a client wants to access an instance of a class, COM passes the CLSID to the server and the server decides what to instantiate. As just mentioned, that interface pointer points to a table of the addresses of the unique object class code. In short, COM removes all barriers to multi-service implementations, regardless of who designed the service.

A small note about the pSC->Release line at the bottom of the sample code: Release is a member of the IUnknown interface that all interfaces share, so all objects have this member. A client tells the object through Release that it is no longer needed. The object maintains a reference count for all clients using it to know when to free itself from memory. When no objects remain, the server object can unload itself (or terminate if it's an EXE server). In this sense, Release is the universal delete abstraction across all of COM-there are only a handful of other special-case free or delete functions that work across the whole OLE API. Since all objects are manipulated through interface pointers, freeing the object always means calling its Release member. This single powerful abstraction significantly simplifies the overall programming model, just like the generic creation function CoCreateInstance. In fact, one of the most common sequences in COM/OLE programming is to first call CoCreateInstance to obtain the interface pointer, then call functions through that pointer, and finally call Release through that pointer when the object is no longer needed.

The Big One: Versioning

I've worked through many of the problems inherent in DLL-based service models and introduced many of the solutions found in COM and OLE. So far, everything applies to the first version of a particular service only-not only the first version of the category definition, but the first version of the server implementation and the first version of the AcmeNote application that uses such a server as well. What happens now when we want to change the service specification, update the server, and update the client? The primary issue is one of compatibility between different versions of services and the applications that use those services. A new service must work with applications that expect an old service, and new applications must be able to work with old services.

When an application talks directly to a service, how does that application dynamically discover the capabilities supported in that service? How does an application differentiate between different versions of the same service category and between different versions of a server implementation? This is called the versioning problem, and is the most important problem COM was designed to solve. Independently changing the specification of a service, the implementation of a server, or the expectations of a client typically results in significant interoperability problems. COM fundamentally solves this problem across the board.

A component that I called SpellCheck1.0 is a DLL that exports a function LookUpWord as ordinal 10. I also created BASICSPL.DLL, which I call BasicSpell1.0, that is used by AcmeNote version 1.0. At this stage, interoperability is not an issue as long as everyone sticks to the API specifications-this is always true with version 1.0 of any API. As long as they stick to the specifications, API vendors are free to make performance improvements that don't affect the interface between client and service.

Problems arise when I want to change the specification. For example, customers may want the ability to add their own words to the spell checker's dictionary. This requires new functionality in spell checker components. Working with a DLL, I would write the SpellCheck2.0 specifications as a DLL that exports three functions (see Figure 4). A SpellCheck2.0 implementation is polymorphic with SpellCheck1.0 because 1.0 clients will look for ordinal 10 only. It would be nice if I could overwrite the DLL for BasicSpell1.0 with a DLL for BasicSpell2.0, but there are many potential problems.

Remember the two different versions of AcmeNote code that used the LookUpWord function, one using an import library to resolve the function name, the other retrieving the address of the function directly with LoadLibrary and GetProcAddress? Both solutions work equally well when there's only one function to worry about. However, when adding the Add/Remove Word feature to AcmeNote2.0, the additional functions make it more complicated to program with GetProcAddress than to use import libraries. It's still tolerable in the case of AcmeNote2.0 because all these functions are used at the same time and I only need one call to LoadLibrary.

Most applications will not use just one service DLL, nor only a handful of functions in the same place in the code. Most likely there will be dozens of functions strewn all around the application. Calling LoadLibrary and GetProcAddress for each call introduces many more error conditions that rapidly increase the complexity of the application. In the real world, vendors are concerned with shipping applications that sell, which means they have to be robust. The more robust way to do dynamic linking is to use import libraries. After all, having implicit links like this isn't so bad, is it?

Say I ship AcmeNote2.0 with BasicSpell2.0, which happens to be in the same module as BasicSpell1.0-BASICSPL.DLL. When you install AcmeNote2.0, it overwrites the old BASICSPL.DLL with the new one. Any application that was using BasicSpell1.0 will still be able to use BasicSpell2.0, so compatibility is ensured. However, two different indistinguishable versions of BASICSPL.DLL exist. Microsoft discovered that older versions of a DLL end up overwriting newer versions no matter what rules you lay down; you might install some application that uses BasicSpell1.0 and overwrite the newer BASICSPL.DLL with the older one!

So what happens when the user now runs AcmeNote2.0? When the kernel tries to resolve the AddWord and RemoveWord calls, it will fail. If import libraries are used, which is most common, then the kernel will generate the errors-"Call to undefined dynlink" (Windows 3.x) or "The application failed to initialized properly (0xc0000139)" (Win32). Expect customer support calls to come streaming in. No matter how hard you try to prevent this situation, it invariably happens when multiple versions of the same service use the same DLL name. There are several ways to prevent this problem that are less than optimal. Let's look at these in detail before seeing how COM solves the problem for good.

The easiest way to avoid "Call to undefined dynlink" problems is to simply avoid using a new version of a DLL in the first place! This approach takes no work and has no risk, but it means no innovation. The result is called least-common denominator usage of a service DLL's feature set; no one uses functionality that is not guaranteed to be present in all versions of that service DLL. The vendor of the DLL then becomes hesitant to add any more features to the DLL because few applications will bother using them, resulting in less return on investment. Who is going to take the risk first, the DLL vendor or the application vendor? Quite commonly, neither.

The usual way to avoid this sort of trouble is to never distribute the service DLLs by themselves, thereby avoiding the possibility that an old DLL would overwrite a new DLL. This generally means that new service DLLs ship with a new version of an operating system, and that applications should be revised and recompiled to work on the new operating system. This, of course, doesn't work for anyone but the operating system's vendor!

Another solution is to always ship a new DLL (with a different name) when you have new features. This happens in one of two ways: either add the new code to the existing DLL code base and recompile the whole mess, or make a new DLL that contains only the new features. In either case, DLLs never get versioned at all! In the first case, the DLLs are given filenames that reflect the version number, such as BASSPL10.DLL and BASSPL20.DLL. All versions of the server DLL can coexist, and there is no chance of overwriting the old version because it has a different name.

This is how Microsoft traditionally solved the versioning problem, as with the Visual BasicÒ run-time libraries VBRUN100.DLL, VBRUN200.DLL, VBRUN300.DLL, and so on. The trouble, however, is that these things accumulate on the user's machine! On my machine, I have three versions of the 16-bit Visual Basic run time: VBRUN200.DLL (348K), VBRUN300.DLL (389K), and VB40016.DLL (913K). In all, this represents 1.65MB of Visual Basic run-time libraries.

How much code is repeated between these DLLs? Ideally, I should need only the most recent DLL to work with all applications that use the DLL. On my machine, I should be able to toss the 737K of old stuff. Multiply the problem over time with a few added versions and by dozens of DLLs. Sooner or later I might have 50MB of disk space eaten by worthless DLLs that never get used. Yet I'm afraid to delete them in case I'll need them!

So why does Microsoft put redundant code into each version of these DLLs? Why not just put only what is different from VBRUN100.DLL into VBRUN200.DLL, and do the same for subsequent versions? There are two reasons. First, this still bloats the user's machine with a bunch of DLLs, even if they are smaller. The second reason-the big one-is performance; each DLL has a fixed overhead in load time and memory footprint. Increasing the number of DLLs can exponentially increase application load time.

Another possible solution is to create a specific management API in the server that the application uses to determine if a server supports a particular level of service. Instead of having just one functional API exposed from the server, there are different API levels where each level is a group of functions that represent a set of additional features. New features are introduced in new levels. So the Level 2 API extends the base API, Level 3 extends Level 2, and so on. Each level of service is polymorphic with all previous service levels. Each application that wants to use a service through this API asks the management API to locate those services. Each service may use registry entries to implement a certain level of API. Thus the application can guarantee that the server implements the right functionality, or it can disable certain features to work with a less-functional server.

The application doesn't actually link to the functions exported from the server because it doesn't know the DLL name ahead of time. Rather, it links to a stub server that provides entry points for all functions in all levels. When the application asks the management API to load a server, that server is loaded and dynamically linked into the stub server by using GetProcAddress. Client calls into the stub server are then forwarded to the real server.

A key feature of this kind of architecture is the grouping together of functions as levels. Each level is a contract, so if a server implements one function in a level it has to implement all functions in that level. This enables the client application, having successfully located and loaded a server supporting a given level, to trust that all calls to a supported level will work as expected. The client only has to check for Level 3 support once before it can invoke any function in Level 3. This contract idea means the application doesn't have to ask the provider about each function call before invoking that function. Without the manager API, you'd have to call GetProcAddress for each function before trying to invoke that function.

So why isn't this level architecture the right solution? A number of previous problems are still present in this model. First of all, who defines each level? We still have the problems of centralized control over the specification, this time for levels. This also doesn't solve the versioning problem, because the architecture doesn't make any provision for changing the definition of a level. The only way to add new functionality is to define a new level. Unless you design each level perfectly from the beginning, the level approach just adds complexity.

The fourth way around the "call to undefined dynlink" problem is to return to the GetProcAddress programming model and avoid using import libraries altogether. While this is a difficult programming model to work with, it may be the only choice for an application that really has to be robust. Again, the complication is that every call into a DLL turns into two or three calls: LoadLibrary, GetProcAddress, and the actual call.

Developers have invented all kinds of interesting strategies to reduce the complexity of this programming model. One quick optimization is to load a given DLL only once. This means the application has one global HINSTANCE variable for each DLL it wants to load, and each variable is initialized to NULL. Whenever the application needs a certain DLL, it calls an internal function to retrieve the module handle. This function checks the global variable; if the variable is NULL, it loads the DLL, otherwise it simply returns the handle that already exists in the global.

For example, you might have a global array of module handles and symbols for each DLL you're going to use. The internal helper function checks the array and loads the module if necessary (see Figure 5). The array in g_rghMods is initialized to NULL on startup, and on shutdown the application calls FreeLibrary on anything in here that's not NULL. (This could also happen when freeing up memory.) Throwing an exception on a LoadLibrary failure would allow you to write the kind of code shown in OnToolsSpelling. You wouldn't have to check for failure of ModHandle-you could put that code in an exception handler. All of this goes a long way toward simplifying this programming model.

Another complication is that the pointers returned from GetProcAddress are defined as the type FARPROC, or as a pointer to a function that takes no arguments and has no return value. Unless you define specific types for each function pointer you're going to use (like PFNLOOKUP in As its name ), you'll get no compile-time type checking. Enormously critical run-time errors can arise when the application calls a function without the requisite arguments or when the app misinterprets a return value. Even if you take the time to write all the necessary typedefs, you'll probably mistype some of them or use the wrong type somewhere in the source code. You'll still have run-time errors, caused by extremely hard-to-find bugs in header files and sources that otherwise compile without warnings!

The server vendor should provide the right typedefs in their header files to begin with. The types should be defined by those implementing the service instead of placing the burden on the application. COM happens to enforce this in a way that allows strong type checking while eliminating the possibility of bugs caused by subtle definition or usage errors.

With workable solutions for the loading and typing issues, the next step is to simplify the process of obtaining the function pointers themselves. It complicates the programming model to have to call GetProcAddress each time you need a pointer. A more efficient way would be to cache all the pointers you might need when you first load the DLL. Use the same technique as with the module handles. For each, DLL define a table of function pointers, then change the ModHandle function to be a FunctionPointer so we can make calls like this:

 void OnToolsSpelling(void)
    {
    //other code

    while (//[loop on words]
        {
        if (!(*(PFNLOOKUP)(FunctionPointer(DLL_ID_SPELLCHECK,
                                         FUNC_ID_LOOKUP)))(pszWord))
           //[etc.]
        }

    return;
    }

You can use exceptions so that OnToolsSpelling doesn't have to check for a NULL return value from FunctionPointer. You might also write macros for each function you want to use so you don't have to write such ugly code. In the end, the code could look like this:

 #define CALL_SPELLCHECK_LOOKUP(p)  (*(PFNLOOKUP)(FunctionPointer(DLL_ID_SPELLCHECK, \
 FUNC_ID_LOOKUP)))(p))

void OnToolsSpelling(void)
    {
    [other code]

    while (//[loop on words]
        {
        if (!CALL_SPELLCHECK_LOOKUP(pszWord))
            [etc.]
        }

    return;
    }

At this point the programming model is simplified to the point where it is just as easy to work with as import libraries, without the problems associated with using import libraries.

What happens when there are multiple versions of the server to deal with? You have to expect that some of the functions you want to put in the table of pointers may not exist in the server. How do you handle this? A sophisticated application would start by defining multiple function tables for each server, where each table represents a group of functions that make up a certain feature. The application itself chooses the exact groupings. For example, there might be one table for basic spell checking that contains only LookUpWord, one for dictionary additions that contains only AddWord, one for dictionary editing that contains both AddWord and RemoveWord.

The application would define a flag for each feature that is set to TRUE only if all the necessary functions are available for that feature. Using these flags, other application code could enable or disable certain commands depending on the features available from the various servers. For example, if basic spell checking is available, the application enables the Spelling command. The spell check dialog might have Add and Edit buttons inside it, with Add enabled if the dictionary additions feature is available, and Edit enabled if the dictionary editing and dictionary additions features are available.

Rather than disable a feature altogether, a really sophisticated application would provide its own default code for functions that might not exist in all servers. For example, a sophisticated word processor may provide its own backup custom dictionary implementation. When the application discovers the absence of AddWord and RemoveWord functions in the server, it would store its own entry points in the function table mentioned earlier. Of course, the application would have to install its own proxy implementation of LookUpWord to filter out custom entries before calling the server's implementation. Alternately, the application might store a pointer to a "do nothing" function in the table. The rest of its code can trust that the table is completely full of valid pointers, even if some of the functions don't do anything. That's better than having to check for NULL entries in the table. In short, a really sophisticated application defines and guarantees its own idea of a contract for certain features without complicating its internal programming model.

Sounds great, doesn't it? Assuming that you can solve all the other problems with the registry, generic management APIs, and so on, in reality it is still very costly to create an application architecture like this. Such an architecture creates a bloated application that might contain 30-50 percent more code than one that just lived with import libraries. Sure, when robustness is the top priority and resources are not an issue, you might do this. But to be honest, 95 percent of all vendors probably cannot afford going to this extent.

What an awful burden to place on every application! Why should applications have to do this? Is it not more appropriate to place the definition of features on the servers themselves? Shouldn't the servers provide the do-nothing stub functions? Shouldn't the servers create the tables themselves?

Yes! They should! It is appropriate to have only one implementation of this table-construction code (and instances of the tables themselves!) inside the server rather than duplicating across many applications. The server should fulfill the contract defined for a certain feature, not the client! The client should just ask the server, "do you support this contract," and if so, invoke any function in that contract without checking for the individual functions, without creating tables, without providing default implementations, and most importantly, without defining the types, macros, and helper functions.

In the preceding sections I've hinted at how the Component Object Model (a combination of specification and standard system-provided implementation) provides the solutions to a wide range of requirements of a component software architecture. The table in Figure 6 reviews these points. In the rest of this article, I will explain how features of COM solve these problems through GUIDs, interfaces, implementation location and the registry, location transparency, and the IUnknown interface.

Globally Unique Identifiers

Within the scope of an application or project where you control all the source code, avoiding naming conflicts between functions, objects, and so on is trivial; the compiler will tell you. In a component software environment, however, especially a distributed one, you have the possibility of many different developers in different organizations having to name functions and modules and classes without running into conflicts. You can either coordinate this through a centralized organization, which is inefficient and expensive, or through a standardized algorithm such as the one created by the Open Systems Foundation for their Distributed Computing Environment (DCE).

In the DCE, RPCs needed to travel around large distributed networks knowing exactly what piece of code to call. The solution is the 16-byte Universally Unique Identifier (UUID) standard. A UUID is generated through an algorithm that uses a machine's unique 48-bit network adapter ID and the current date and time. Uniqueness in space and time means it will always produce a unique value.

COM and OLE use UUIDs for all unique identifications needs, simply calling them globally unique identifiers, or GUIDs. According to the DCE standard, GUIDs and UUIDs are spelled out in hexadecimal digits in the following format:

  {01234567-1234-1234-1234-012345678AB}

The UUID-generating algorithm is implemented in the RPC run time and made available through the UuidCreate API. The COM library provides a wrapper around this API: CoCreateGuid. The Win32 SDK provides a command-line tool, UUIDGEN, that spits out a UUID any time you ask. UUIDGEN supports a command-line switch, -nXXXX, to generate XXXX sequential UUIDs. Another Win32 SDK tool, GUIDGEN, generates one GUID at a time in a variety of text formats that are more directly useful in source code. Both tools simply call into UuidCreate to generate the UUIDs. Use these tools whenever you need a GUID for a category specification, a class implementation, or an interface definition. Because interfaces are the fundamental extension mechanism in COM and OLE, you can innovate without having to ask permission from anyone.

Interfaces

An interface is the point of contact between a client and an object-only through an interface can a client and object communicate. In general, an interface is a semantically related group of member functions that carry no implementation (nor any data members). The group of functions act as a single entity and essentially represent a feature or a design pattern. Objects expose their features through one or more interfaces.

In C++ parlance, an interface is an abstract base class. Interfaces are more formally described using the Microsoft's COM extensions to the standard DCE Interface Definition Language (IDL). You've seen one example already for the hypothetical ISpellChecker (the object attribute is the IDL extension that describes a COM interface as opposed to an RPC interface).

 [uuid(388a05f0-626d-11cf-a231-00aa003d7352), object]
interface ISpellChecker : IUnknown
    {
    HRESULT LookUpWord(OLESTR *pszWord);
    }

The MicrosoftÒ IDL compiler (MIDL) can generate header files and various other files from an IDL script like this. At compile time, the interface name (ISpellChecker in this example) is a data type that can be used for type checking. The other important element is the uuid attribute, which assigns an IID to the interface. The IID is used as the run-time type of the interface; it identifies the intent of the interface design itself. Once defined and assigned an IID, an interface is immutable-any change requires assignment of a new IID as the original intent has changed.

At run time, an interface is always seen as a pointer typed with an IID. This pointer points to another pointer that points to a table holding the addresses of each member function's implementation in the interface (see Figure 7). This binary structure is a core standard of COM. All of COM and OLE depend upon this standard for interoperability between software components written in arbitrary languages. As long as a compiler can reduce language structures to this binary standard, it doesn't matter how you program a component or a client-the point of contact is a run-time binary standard.

Figure 7 The Binary Interface Structure

This interface structure provides the ability to marshal one of these pointers between processes and machines. To implement an interface on some object means building this exact binary structure in memory and providing the pointer to the structure-this is what we want instead of having clients do it themselves! You can do this in assembly language if you want, but higher-level languages, especially C++, build the structures automatically. In fact, the interface structure is, by design, identical to that used for C++ virtual functions. This is also why COM calls the table portion the vtable and the pointer to that table lpVtbl. A pointer to an interface is a pointer to lpVtbl, which points to the vtable. Because this is what C++ expects to see, using an interface pointer to call an interface member is just like calling a C++ object's member function.

If I have a pointer to ISpellChecker in the variable pSC, I can call a member like this:

 pSC->LookUpWord(pszWord);

Because the interface definition describes all the argument types for each interface member function, the compiler does all the type checking for you. As a client, you never have to define function prototypes for these things yourself. In C, the same call would look like this:

 pSC->lpVtbl->LookUpWord(pSC, pszWord);

It contains an explicit indirection through lpVtbl and passes the interface pointer as the first argument. This is exactly what C++ does behind the scenes, where the first argument is the pointer. As you can see, programming COM and OLE in C++ saves you a lot of extra typing.

An important point about calls to interface member functions is that the address of the call itself is discovered at run time using the value of the interface pointer. Therefore the compiler generates code that calls whatever address is in the right offset in the vtable pointed to by the interface pointer. This value is known only at run time, so when the kernel loads a client application there are no import records to patch up with absolute addresses. Instead, those addresses are computed at run time. In short, using interfaces is a true form of dynamic linking to a component's functionality, but one that bypasses all the complexity of using GetProcAddress and bypasses all the risks of using import libraries.

Finally, note that interfaces are considered contracts. When an object implements an interface it must provide at least default or do-nothing code for every member function in the interface-every vtable element must contain a valid function pointer. Therefore a client that obtains an interface pointer can call any member function of the interface. Granted, some member functions may just return a "not implemented" error code, but the call will always occur. Therefore clients need not check for NULL entries nor provide their own default implementations.

Implementation Location and the Registry

When a client wants to use an object, it always starts by asking COM or OLE to locate the object's class server. COM or OLE asks the server to create an object and return an initial interface pointer back to the client. From that point the client obtains additional interface pointers from the same object through the member function IUnknown::QueryInterface.

For some of its native classes, OLE provides specific creation APIs to streamline the initialization process. For all custom components (those not implemented in OLE itself), COM provides a generic creation API, CoCreateInstance, that instantiates an object given its CLSID. You saw earlier how a client calls this function:

 if (SUCCEEDED(CoCreateInstance(clsID,NULL,CLSCTX_SERVER,
           IID_ISpellChecker,(void **)&pSC)))

The client passes a CLSID, some flags, and the IID of the initial interface-on output the pointer variable passed by reference in the last argument receives the pointer to the interface.

Internally, COM maps the CLSID to the server, loads that server into memory, and asks the server to create the object and return an interface pointer. If the object is in a different process from the client, COM automatically marshals that pointer to the client's process. The basic process of object instantiation is shown in Figure 8.

Figure 8 Locating and Activating an Object

To map the CLSID to its server, COM looks in the registry. Servers implemented as Win32 DLLs (in-process servers) are registered as follows, assuming {01234567-1234-1234-1234-012345678AB} is the CLSID:

 HKEY_CLASSES_ROOT
    CLSID
        {01234567-1234-1234-1234-012345678AB}
            InprocServer32 = <path to server DLL>

To load the DLL into memory, COM needs only to call LoadLibrary. A local (out-of-process) server, on the other hand, is implemented as an EXE and registered using the LocalServer32 key instead of InprocServer32. COM calls the Win32 API CreateProcess to launch the EXE, which then initializes COM in its own process space. When this happens, COM in the server's process connects to COM in the client's process.

Registry entries for a remote server (one that runs on another machine) include the machine name as well. When COM activates such a server it communicates with a resident Service Control Manager (SCM) process running on that other machine. The SCM loads or launches the server on that machine and sends a marshaled interface pointer back to the client machine and process.

The net result in all three cases is that the client has some interface pointer through which it can begin using the object's services. Now you're ready to see how COM's Location Transparency makes the programming model identical for all three cases.

Location Transparency

When you think of pointers to objects-such as objects implemented in C++-you don't even think about passing such a pointer to another process. This is exactly what COM's Location Transparency allows you to do if the pointer is an interface pointer (which can be generated from a C++ object pointer quite easily if the C++ class is itself derived from an interface type).

When an in-process object is involved, COM can simply pass the pointer directly from the object to the client since that pointer is valid in the client's address space. Calls through that pointer end up in the object code directly, making the in-process case the fastest calling model (as fast as using raw DLLs except for one additional pointer indirection).

COM cannot, obviously, just pass the object's exact pointer value to other processes when local or remote objects are involved. Instead, the marshaling mechanism builds the necessary interprocess communication structures. To marshal a pointer means to create a marshaling packet containing the necessary information for connecting to the object's process. This packet is created through the COM API function CoMarshalInterface. The packet is transported through any means available to the client process where another function, CoUnmarshalInterface, turns the packet into an interface pointer in the client's process. The client can then use this interface pointer to make calls.

This marshaling sequence creates a proxy object and a stub object that handle the cross-process communication details for the interface. COM creates the stub in the object's process and has the stub manage the real interface pointer. COM then creates the proxy in the client's process, and connects it to the stub. This proxy supplies the interface pointer given to the client. (Those familiar with RPC will recognize this proxy/stub setup as the same client/server stub architecture used in raw RPC.)

This proxy does not contain the actual implementation of the interface. Instead, each member function packages the arguments it receives into a remoting packet and passes that packet to the stub through remote procedure calls. The stub unpacks these arguments, pushes them on the stack, and calls (with the interface pointer the stub is managing) the real object. The object executes the function and returns its output. That output is packaged and sent back to the proxy, which unpacks the output and returns it to the client. This process is illustrated in Figure 9, which shows the differences between in-process, local, and remote cases. It also shows that the client sees in-process objects only, which is why we use the term transparency-remoting a call is transparent to the client.

Figure 9 Location Transparency

An important feature of this architecture is that, if the object's process is terminated abnormally, only the stubs for that object are destroyed. Proxies in client processes remain active but disconnected. Therefore a client can still call the proxy and not risk crashing-the proxy simply returns a "disconnected" error code. In addition, if a client process terminates abnormally, the proxy disappears and its associated stub detects the disconnection. The stub can then clean up any reference counts to the object on behalf of the missing client.

The necessary proxy and stub code for most standard interfaces (those defined by Microsoft) is built into the system. If you define your own custom interface, you have to supply your own proxy/stub code. Fortunately, the MIDL compiler will generate this code for you from an IDL file. Compile the code into a DLL and you have remoting support for your custom interface.

Note that, in the remote case, it is not necessary to have an OLE implementation on the server machine to interoperate with COM on the client machine. All remote interface calls are transmitted with DCE-compatible RPC so any DCE-aware system receives the calls and converts them to fit any object architecture.

IUnknown: Reference Counting and Multiple Interfaces

Each interface can be thought of as representing a feature or a design pattern. Any given object will be a combination of one or more features or patterns. The combined functionality of those features defines the object's own component category, which is described with a Category ID (CATID, a GUID). (Until recently, there were so few categories defined that CATIDs were not used to identify them; specific registry keys like Insertable and Control were used. Microsoft has now published a specification for CATID-based categorization, which is available on http://www.microsoft.com/intdev/inttech/compnent.htm.) This ability of an object to support multiple interfaces is a key COM/OLE innovation. It is precisely the idea of multiple interfaces that solves the versioning problem.

The capacity for multiple interfaces depends on the interface IUnknown, which is the core interface in COM and OLE. In fact, all other interfaces are derived from IUnknown and are polymorphic with IUnknown. The three IUnknown member functions are always the first three members of any interface: AddRef increments the object's reference count; Release decrements the object's reference count, freeing the object if the count becomes zero; and QueryInterface asks the object to return a pointer to another interface, given its IID.

The AddRef and Release members together provide for lifetime management of an object. Every independent external reference to any of the object's interfaces carries a reference count. This lets multiple clients use the same instance of an object independently. Only when all clients have released their reference counts will the object destroy itself and free its resources. As described in the last section, remoting stubs clean up the reference count automatically for any client process that terminates without first releasing its references.

QueryInterface makes multiple interfaces possible; once a client obtains the initial interface pointer to any object, it obtains other interface pointers to the same object through QueryInterface. To query, you pass the IID of the interface. In return, you get the interface pointer, if it is available, or an error code that says "that interface is not supported." If you get a pointer, you can call the member functions of that interface, calling Release when you're through with it. If not, you can't possibly call members of that interface and the object protects itself from unexpected calls.

A client uses QueryInterface, which is available through every interface pointer, to ask an object, "do you support the feature identified by this IID? "By asking, a client determines the greatest common set of interfaces understood by both client and object. This completely avoids both the least-common denominator problem and versioning problems.

The Solution to Versioning

How does the idea of multiple interfaces solve the real-world versioning issues? Consider the real issues: you need to quantify the difference between any two versions and get the client request a specific version. Any given revision of an object will support a certain set of interfaces. Between versions you can add or remove interfaces, or introduce new versions of old interfaces (you cannot change the old interfaces, but an object would support both old and new versions of the interface). QueryInterface lets the client check for whatever version or set of interfaces it requires.

A critical feature of the COM programming model is that a client must call QueryInterface to obtain any interface pointer from the object (creation calls like CoCreateInstance have built-in QueryInterface calls by definition). In other words, COM forces clients to check for the presence of an interface before trying to invoke members of that interface, and forces clients to expect the absence of an interface.

 //Assume pObj is an IUnknown * for the object

 if (SUCCEEDED(pObj->QueryInterface(IID_<xxx>,
                                   (void**)&pInterface)))
    {
    //Call members of IID through pInterface, then:
    pInterface->Release();
    }
else
    {
    //Interface is not available, handle degenerate 
    //case
    }

You might also write a client to query an object for a certain interface when the object is first created, thereby enabling or disabling certain features based on the interfaces available. A client can also cache interface pointers to avoid having to call QueryInterface.

In contrast, the DLL programming model is more complex, costly, and risky. If you program with import libraries, you have no way of checking for functional support in an object before invoking the function. You also risk "Call to undefined dynalink" errors. If you program with GetProcAddress, you can at best simulate what interfaces provide by default: grouping functions together as features and maintaining flags that say whether or not those features are available.

Although some of this may have been review, my intention was to illustrate how Microsoft designed COM and OLE to solve many problems you face in developing and distributing component software. In particular, I showed you how COM and OLE address those problems of integrating software components into a running system and managing different versions of components. With DLLs, this was a messy and complicated task even with a single version of a component or application. Adding a new version multiplied your problems. OLE and COM provides an architecture designed from the start to accommodate an evolving system of software components.

From the May 1996 issue of Microsoft Systems Journal.