This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.


October 1998

Microsoft Systems Journal Homepage

Effective COM Programming: Seven Tips for Building Better COM-based Applications

Don Box, Keith Brown, Timothy J. Ewald, Chris Sells

This article is adapted from the forthcoming book, Effective COM Programming: 50 Ways to Improve Your COM and MTS Programs and Designs, (ISBN 0-201-37968-6) to be published in fall 1998 by Addison Wesley.

If you're like most developers, you've probably had at least some run-ins with COM. As often as not, programmers emerge from such scuffles bloodied and bruised, wondering if there could be some way to make the experience less perilous. Designing COM-based systems doesn't have to be a treacherous experience. The following seven expert tips can help everyone from the novice to the experienced COM jockey.

1 Design With
Distribution
In Mind
One of the most common mistakes C++ developers make is forgetting that objects may be further away than they actually appear. Consider the following C++ code fragment:


 // rect.cpp
 int Rect::GetLeft() throw() {
     return m_nLeft;
 }
 
 // client.cpp
 void foo(Rect& rect) {
     cout << rect.GetLeft();
 }
It's pretty obvious that the call to GetLeft cannot possibly fail—hence the notation that the function throws no exceptions. Nothing could possibly go wrong unless the user terminates the process while GetLeft is executing, in which case the calling code (which resides in the same process) won't be around to know the difference.
      When designing a COM-based system where objects may reside on different host machines, you cannot be this com-placent. Networks can be unreliable, processes can die, and objects can vanish without warning. Therefore, when writing COM code, keep in mind that method calls that would normally be trusted to succeed may fail due to circumstances beyond the control of your code. A large dose of paranoia is healthy as you sketch out your design on a whiteboard or write code in your development environment.
      To allow clients to detect dead objects, all methods must return an HRESULT to allow COM's remoting layer to replace the object's return value with an error code that describes the communication failure. In theory, communication errors can be distinguished by checking for FACILITY_RPC via the HRESULT_ FACILITY macro defined in winerror.h. However, recent versions of the COM library have begun to use FACILITY_WIN32 error codes as well. To preserve the convenient syntax for languages that support mapping HRESULTs to out-of-band exception mechanisms, use [retval] to specify the logical return value of the function. In most languages, this will allow the physical HRESULT return value to be translated into an exception automatically, allowing you to write client code that is easy to read:

 // rope.idl
 interface IRope : public IUnknown {
     HRESULT GetLength([out, retval] long* pnLength);
 }
 // ropeclient.bas
 Sub foo(r As IRope)
     rem failed HRESULT will throw an exception
     MsgBox "The rope is " & r.GetLength() & " ft long"
 End Sub
Unlike developers using Java and Visual Basic®, C++ developers must inspect each HRESULT by hand and take the appropriate action.
      The discussion so far has dealt with objects dying prematurely. It is also possible that a client can die without properly notifying the objects that it is using. This effectively means that any method call could be your last. While COM will eventually release any held object references, no additional action will be taken on behalf of a terminated client. The following is an example of an interface not designed with this fact in mind:

 [ uuid(31CDF640-E91A-11d1-9277-006008026FEA), object]
 interface ISharedObject : IUnknown {
 // call prior to calling DoWork
     HRESULT LockExclusive();
 // ask object to do work while holding exclusive lock
     HRESULT DoWork();
 // release lock held by LockExclusive
     HRESULT UnlockExclusive();
 }
Given this interface definition, you might expect a client to look like this:

 // sharedclient.cpp
 void DoPrivateWork(ISharedObject *pso) {
     HRESULT hr = pso->LockExclusive(); 
     if (SUCCEEDED(hr)) {
         for (int i = 0; i < 10 && SUCCEEDED(hr); i++)
             hr = pso->DoWork();
         HRESULT hr2 = pso->UnlockExclusive();
     }
 }
What happens if the client process terminates prior to reaching the call to UnlockExclusive? Assuming that the object acquired a lock from the operating system in its implementation of LockExclusive, the lock will never be released. Granted, the client's outstanding reference will be released automatically by COM, but assuming that there are other outstanding clients, the object cannot detect that the client that held the lock just died.
      The problem of premature client death can be solved using COM's garbage collector. Since COM will automatically release any held object references when the client process terminates, the ISharedObject interface could have been based on having the lock released when an interface pointer was released, not when an explicit method is called. Consider the following modified version of the interface:

 [ uuid(31CDF641-E91A-11d1-9277-006008026FEA), object]
 interface ISharedObject2 : IUnknown {
 // call prior to calling DoWork 	
 // and release (*ppUnkLock)
 // to unlock object
     HRESULT LockExclusive([out] IUnknown **ppUnkLock);
 // ask object to do work while holding exclusive lock
     HRESULT DoWork();
 }
Given this interface definition, it is trivial for the implementation to create a secondary object that will release the lock upon the client's final release.
      The code in Figure 1 demonstrates this technique using a Win32 semaphore as the lock. With the enhanced interface definition (and implementation) in place, the client could be rewritten as follows:

 // sharedclient2.cpp
 void DoPrivateWork(ISharedObject2 *pso) {
     IUnknown *pUnkLock = 0;
 // acquire lock
     HRESULT hr = pso->LockExclusive(&pUnkLock); 
     if (SUCCEEDED(hr)) {
 // do work 10 times while holding lock
         for (int i = 0; i < 10 && SUCCEEDED(hr); i++)
             hr = pso->DoWork();
 // release lock by releasing "lock"  object
         pUnkLock->Release();
     }
 }
Note that in this example, if the client dies any time after the call to LockExclusive succeeds, COM will automatically release the reference to the intermediate object, which will trigger the release of the OS lock.
      While the interface design shown here does not distinguish between the premature death of a client and a client deliberately releasing the lock, this could easily be addressed by the following slight modification:

 [ uuid(31CDF642-E91A-11d1-9277-006008026FEA), object]
 interface IUnlockCookie : IUnknown {
 // call to release the underlying lock
     HRESULT UnlockExclusive();
 }
 [ uuid(31CDF643-E91A-11d1-9277-006008026FEA), object]
 interface ISharedObject3 : IUnknown {
 // call prior to calling DoWork and call 
 // ILockCookie::UnlockExclusive to unlock object
     HRESULT LockExclusive([out] IUnlockCookie **ppuc);
 // ask object to do work while holding exclusive lock
     HRESULT DoWork();
 }
Given this interface definition, the object can now distinguish between an explicit unlock or a premature death based on whether the UnlockExclusive method is called prior to the intermediate object's final release.
      Additional failure modes are only one problem that developers face when moving to distributed object computing. Another common problem is related to the latency that exists when executing a method call. Consider the case of public attributes or properties. When developers first begin building object-oriented software, they must learn many new concepts, including classes and encapsulation. To this end, most novice developers are told early in their careers that all data members should be private. In order to respect the sensibilities of OO purists, many developers reflexively make all instance data private and then proceed to define individual accessor and mutator functions to expose their members to the public. This technique is known as "just enough encapsulation," and in many ways eliminates any potential benefit of encapsulation by exposing implementation details to the outside world.
      While this technique is only marginally better than simply using public data members, it is tolerated in the C++ community because it typically has no impact on the performance of member access, as these functions are usually inlined by the compiler.
      The real problem with just enough encapsulation arises when it is applied to COM. When designing interfaces for use in a distributed system, you must avoid this technique at all costs. Due to the independent compilation and dynamic binding of method calls inherent in COM, inline implementation is impossible. More importantly, each call normally implies a round-trip, which may involve an RPC, depending on the locality of the caller and callee.
      As an interface designer, you should always consider the impact of round-trips on both the performance and the semantics of an interface. An interface that requires multiple round-trips to perform a single logical operation is an interface that will subject its users to poor performance and potential race conditions. As an example, consider the following interface:

 interface IRect : IUnknown {
     HRESULT SetLeft  ([in] long nLeft);
     HRESULT SetTop   ([in] long nTop);
     HRESULT SetRight ([in] long nRight);
     HRESULT SetBottom([in] long nBottom);
     HRESULT GetLeft  ([out, retval] long* pnLeft);
     HRESULT GetTop   ([out, retval] long* pnTop);
     HRESULT GetRight ([out, retval] long* pnRight);
     HRESULT GetBottom([out, retval] long* pnBottom);
 }
This interface requires four round-trips to obtain the state of the rectangle. While IRect is obviously far from optimal in terms of performance, the lack of atomicity is a far more subtle but harmful flaw. While you are busy making round-trips to get the entire state of the rectangle, another client may be changing the state underneath you. Because the entire state of the rectangle cannot be fetched atomically in one round-trip, you cannot be guaranteed a consistent view of the rectangle's state.
      With this in mind, you set off to create a better rectangle interface:

 interface IRect2 : IUnknown {
     HRESULT SetRect( [in] long nLeft,
                      [in] long nTop,
                      [in] long nRight,
                      [in] long nBottom );
     HRESULT GetRect( [out] long* pnLeft,
                      [out] long* pnTop,
                      [out] long* pnRight,
                      [out] long* pnBottom );
 }
This new and improved interface requires only one round-trip to get or set the state, and thus ensures that you will indeed obtain a consistent view of the rectangle, with much improved performance across apartment boundaries.
      Note that IRect2 has lost some of the flexibility of IRect. What if you wanted to set only one coordinate to 100 and leave the other three coordinates as is? This is impossible to do atomically with the current design of IRect2 because it requires two round-trips. One totally acceptable solution is to add an extra parameter to the SetRect function to allow finer-grained control:

 HRESULT SetRect( [in] long nLeft,
                  [in] long nTop,
                  [in] long nRight,
                  [in] long nBottom,
                  [in] DWORD grfWhichCoords );
 typedef enum tagSRWC {
     SRWC_LEFT   = 0x0001,
     SRWC_TOP    = 0x0002,
     SRWC_RIGHT  = 0x0004,
     SRWC_BOTTOM = 0x0008
 } SRWC; // SetRectWhichCoords
Another solution would be to lump the members of IRect and IRect2 into a single interface. This would create a somewhat cumbersome interface (as with C++ interfaces, COM interfaces should be minimal and complete). Also, such an interface would not allow setting exactly two or three attributes at a time instead of one or four.
      Another common race condition with an interface like IRect2 surfaces when you consider offsetting the rectangle by some value because you must make two round-trips to perform a single logical operation. In this case, you might consider that rectangles are only one species of many that may require translation, and design a separate interface that eliminates race conditions for many types of two-dimensional objects:

 interface I2DObject : IUnknown {
     HRESULT Translate([in] long dx, [in] long dy);
     HRESULT Inflate([in] long dx, [in] long dy);
     HRESULT Rotate([in] double degreesInRadians,
                    [in] long xCenter, 
                   [in] long yCenter)
 }
You may notice a common theme here. To avoid race conditions, be sure that each logical operation can be performed by a single round-trip. Unnecessary round-trips are evil—avoid them.
      When designing interfaces, it is often an interesting exercise to use a packet sniffer to actually look at what is being sent across the wire, both on a per-method and per-object/client basis. (The Intel Network Monitor that ships with Windows NT Server and Systems Management Server is a reasonable product that parses COM packet traces fairly well, although any DCE RPC compatible product will do.) To do this, you must have a client communicating with an object on a remote host, since COM does not use the TCP loopback interface for local communications. One particularly useful technique is to observe the actual size and structure of the marshaled parameters for any given remote call.
      Developers who anticipate that their interfaces will only be used by in-process (or more specifically, same-apartment) clients often feel that they are exempt from worrying about network failures, round-trips, wire representations, and race conditions. It is important to remember that COM is no different than any other technology in that it is important to design in the future tense. Realize that "in-process" and "out-of-process" are often temporary, present-tense implementation details and that most well-designed interfaces can be used effectively in both scenarios. The third habit described here involves a well-known example of the "it will only be used in-process" trap.
      Finally, remember that one of the big wins of interface-based programming is design reuse. As you spend more of your waking hours poring over the documentation of the hundreds of new interfaces springing up to solve various problems, you will learn to love the simple, well-designed interfaces (such as IUnknown) whose well-known semantics you simply take for granted after awhile. When learning a completely new technology exposed via COM interfaces, these well-known interfaces become good friends.

2 Beware
the COM
Singleton
People put a lot of effort into implementing singletons in COM. A singleton is an object that is the one and only one instance of its class. The Patterns movement formalized the concept of a singleton as a technique for allowing multiple clients to acquire references to the same object via a well-known access point.
      Singletons are often used to provide an object-based rendezvous point that can replace class-level methods (such as the static member function of a C++ class). The most direct translation of this technique into COM is to expose a custom interface from your class object. However, this requires you to step out of the default behavior of most COM development tools (like ATL and MFC), and therefore is not as common a practice as it perhaps should be. Instead, many developers choose to overload their implementation of IClassFactory::CreateInstance to achieve the same effect.


 class Dog : public IDog { 
 // implementation of Dog deleted for clarity
 };
 
 // singleton version of CreateInstance
 STDMETHODIMP DogClass::CreateInstance(
            IUnknown *pUnkOuter, REFIID riid, 
           void **ppv)
 {
 // declare a "singleton" object
    static Dog s_Dog;
 // disallow aggregation
     if (pUnkOuter) 
         return (*ppv = 0), CLASS_E_NOAGGREGATION;
 // return a pointer to the "singleton"
    return s_Dog.QueryInterface(riid, ppv);
 }
ATL uses a variation on this technique when you use the DECLARE_CLASSFACTORY_SINGLETON macro in your class definition. Technically, this meets the formal definition of a singleton: every client gets a reference to the same instance of the class Dog through a well-known access point—in this case, through the Dog class object. However, there are some problems with this approach.
      First, when you expose singleton objects this way, you are violating the semantics of CreateInstance. CreateInstance is documented as creating a new, uninitialized object. Clients that expect two calls to CreateInstance to provide two separate objects may be surprised by the behavior of this deviant implementation.
      Semantically, it would be better to simply use some other interface to gain access to the singleton. You could define your own custom interface for this purpose or reuse a standard interface that matches your semantic requirements (such as IOleItemContainer).

 // Custom interface implemented by class object
 interface ISingletonFactory : IUnknown {
     HRESULT GetSingleInstance([in] REFIID riid,
                        [out, iid_is(riid)] void **ppv);
 }
 // Implementation of GetSingleInstance
 HRESULT DogClass::GetSingleInstance(REFIID riid,
                                     void **ppvObject)
 {
     static Dog s_Dog;
     return s_Dog.QueryInterface(riid, ppvObject);
 }
Note that this technique is difficult to implement and use from languages like Visual Basic and Java that hide the details of class objects and the various scripting languages. You can implement a variation on this theme that works with all languages using a separate Manager class, instances of which can be used to access the singleton.
      Maybe you're worried about playing fast and loose with CreateInstance's semantics—and maybe you're not. Either way, there's another problem that should cause concern. If your implementation of CreateInstance (or GetSingleInstance or equivalent) always returns a reference to the same COM object, it may violate COM's concurrency laws. COM objects live in apartments, and COM requires that you marshal interface pointers if you want to move them across apartment boundaries. If your singleton is being used from multiple apartments in a single process and your class object returns this interface pointer directly, your code violates this rule. This means that if your class is deployed as an inprocess server, you must mark it ThreadingModel=Free or leave the ThreadingModel attribute out of the registry. Otherwise you are subject to the same pitfalls as objects that use the freethreaded marshaler.
      So perhaps you're a tough-as-nails COM developer and you've gotten all the threading details right (OK, it really isn't that hard once you understand the issues). Now you've got to think about design issues. First and foremost, the singleton pattern assumes a naïve association between state and behavior. Even outside the scope of COM, it is commonplace for two or more objects to share state. More often than not, developers use the singleton idiom in COM to achieve a rendezvous point for two or more clients to access shared state. If this is all that is needed, this singleton-style implementation

 class Dog {
     long m_nHairs; 
     HRESULT Shed(void) {
         m_nHairs -= 100;
         return S_OK;
     }
 };
could be replaced with the following:

 class Dog {
     static long s_nHairs; // shared state
     Dog(void) : m_nHairs(s_nHairs) {}
     long& m_nHairs; // note that a reference is now
                     // used
     HRESULT Shed(void) {
         m_nHairs -= 100;
         return S_OK;
     }
 };
In the former case, some technique is needed to ensure that only one instance of Dog is ever created. In the latter case, there is no restriction on the number of Dogs created, as each dog shares its hair count using a static variable. The advantage of the latter approach is that the implementor can change the state management policy silently in the constructor to implement per-client behavior. This is not possible in a naïve singleton-based approach, as each client gets a reference to one particular object. Additionally, the latter approach is extremely straightforward to implement in any language irrespective of its support for low-level COM plumbing.
      COM singletons activated in a client's process are essentially identical to the original singleton put forth in Gamma et al's Design Patterns. Implementation issues are similar (although more extensive, as noted above), and the benefits are the same.
      It is important to remember that the singleton solution was originally conceived as the remedy for a problem that often arises during the development of standalone applications. What is the role of a singleton in a distributed application that includes multiple processes on multiple machines?
      First, what is the singleton's scope? Remember that a COM object can be activated in a client's process, in a separate process on a client's machine, or in a process on some other machine. If you implement your class object to return references to a singleton object, is that singleton unique in its process, on its machine, or across several machines on a network?
      When applied to distributed computing, singletons break down fairly rapidly, as they limit the implementor's options for load balancing, concurrency management, prioritization, and per-client state management. Consider applying singletons to an airline reservation system. If you represented a particular flight—say, United Flight 162 from Boston's Logan Airport to LAX—as a network-wide singleton, you'd create a massive bottleneck. Thousands of users all over the world might need to deal with that one object simultaneously, and that's more than a single COM object can be expected to bear. The singleton mechanism was not designed with this problem in mind.
      What's the solution? Realize that singletons are built on the notion of a shared physical identity. In other words, singleton implies that there is a single COM identity in a single apartment of a single process on a single machine that is the one and only representation of some entity in your distributed system. Discard this idea and instead concentrate on logical identity. Allow multiple COM objects in different apartments of separate processes on more than one machine to represent the same logical entity. To achieve the effects of a singleton, all of these objects perhaps as many as one per client, simply access the same shared state.
      If you follow this path, you open the door to load balancing. Also, if you create a separate COM object per client, you can cache per-client state in each object. You can also detect individual client death based on individual per-client objects being released by COM's garbage collector. Assuming each object refers to shared state, you technically are pushing the concurrency issues down one level, which means you (not COM) are on the hook for protecting against concurrent access. Transactions help with this problem immensely—in fact, this is essentially the Microsoft® Transaction Server (MTS) model.
      So where are we? If you're a big fan of singletons, that's fine. Just understand where a singleton really helps and where it really hinders. As you build larger and larger distributed systems, realize that network-wide singletons become less and less useful. Also be aware that if you plan on using MTS as part of your infrastructure, singletons are verboten in MTS.

3 Prefer Typed
Data to
Opaque Data
COM is a slight variation on classic RPC, which has been proven to be an effective mechanism for distributed computing. Like classic RPC, COM can be mapped onto the OSI reference model (see Figure 2). Note that developers using only sockets lose the Session and Presentation layers. The Session layer models application-level connection multiplexing and message passing. For the purposes of this discussion we will focus on the Presentation layer.
      The Presentation layer is responsible for ensuring that information presented by an application can be mapped to a uniform transmissible representation that can be decoded on any host platform. In COM, the MIDL-generated proxies and stubs carry out this mapping. COM proxies and stubs are responsible for translating the presented call stack into Network Data Representation (NDR) prior to transmission. When incoming requests or responses arrive in a process, the proxies and stubs unmarshal the NDR payload into the native format of the destination machine.
      Sockets programmers are used to implementing this layer by hand, flattening data structures into a stream to be sent across the wire and then rehydrating the data structures on the other end. They have to deal with platform dependency issues such as byte ordering, Unicode/multibyte strings, character sets, floating-point formats, alignment, and so on. COM programmers never need to write this code. Simply describe all data types in IDL, and the IDL compiler happily generates the proxy/stub code that can be compiled and linked into a proxy/stub DLL.
      In case you're wondering, the NDR format is incredibly efficient. It's a multicanonical format, using the optimistic assumption that the remote host may have the same native data format as the local host, in many cases avoiding the overhead of conversion altogether. COM's presentation layer is truly a big win for cross-platform compatibility as well as rapid application development. It is silly to duplicate this effort. Additionally, the MIDL-generated proxies and stubs are incredibly efficient (especially when compiled with the /Oicf option). It is unlikely you can do better with a handcrafted implementation.
      Due to the fact that the first versions of COM (namely, 16-bit COM) did not provide an IDL compiler, many old-time COM developers have fallen prey to seductive interfaces such as IDataObject, which was designed for out-of-process rendering. IDataObject supports transferring blobs of data that represents a bitmap or metafile rendering of an out-of-process object. The key word you should note here is blob. Blobs are opaque data structures that must be marshaled manually (you must flatten your data structure manually in order to expose it to clients via IDataObject). This works fine for local use in OLE document-style applications, where platform independence is not an issue (traditional OLE document apps run on the same machine). And OLE's default handler provides built-in support for extracting and caching metafile renderings via IDataObject. However, IDataObject is clearly inappropriate as a generic means of IPC when the IDL compiler is available.
      To take full advantage of COM, define your data types in IDL and let the IDL compiler generate the marshaling code. High-level languages like Visual Basic know nothing about how to deal with opaque data, while they are perfectly happy (within reason) dealing with data structures defined in IDL.
      Here's an example of a design that hasn't come to grips with modern COM:


 interface IJukeBox : IUnknown {
     //...
     HRESULT GetDiskInfo(
         [in] long nDiskNumber,
         [out, retval] IDataObject** ppDataObject );
 }
The designer of this jukebox interface determined that IDataObject was a flexible way to provide data about each disk. The client can ask for any clipboard format and get data about the musician who recorded the tracks, the track titles, the actual music on the tracks, and so on—all for the price of a single interface!
      Now imagine that you are presented with this interface and asked to develop the front-end controller for the jukebox (imagine the old 1950s style controls at each seat in a diner). One obvious task is to simply list the titles that are available in the jukebox along with the recording musician. So you look at the documentation and find a list of all the supported clipboard formats and the associated data types. Then you start coding.
      In Figure 3, IDataObject is clearly being used as a substitute for QueryInterface. Note the reliance on clipboard formats (which is hacked up to work across machines, by the way) to determine the supported data types. You may also need to call IDataObject::QueryGetData (or even worse, IDataObject:: EnumFormatEtc) to allow the extensibility mechanism of IDataObject to kick in (unnecessary round-trips are evil). Yet another problem is that you are typically left guessing which media types (TYMED_XXX) the data object supports. All this leads to code that is difficult to read and maintain. And once again, keep in mind that you will not be able to use this interface from high-level languages like Visual Basic.
      Here is a better solution:

     HRESULT GetDiskInfo(
         [in] long nDiskNumber,
         [out, retval] IDiskInfo ** ppInfo);
This solution allows the simple use of COM interfaces to obtain information about the disks.

 interface IDiskInfo : IUnknown {
     HRESULT GetTitleInfo(
         [out] BSTR * pbstrTitle,
         [out] BSTR * pbstrAuthor );
     //...
 }
With the IDiskInfo interface, we once again achieve language and platform independence, and the client code becomes much more readable:

 #include <jukebox.h>
 void DisplayItems(IJukeBox* pjb, ostream& os) {
     IDiskInfo* pdi = 0;
     for ( long i = 0;
           SUCCEEDED(pjb->GetDiskInfo(i, &pdi));
           i++ ) {
         BSTR bstrTitle = 0;
         BSTR bstrAuthor = 0;
         if (SUCCEEDED(pdi->GetTitleInfo(&bstrTitle,
                                         &bstrAuthor)))
         {
             os << bstrTitle  << ": "
                << bstrAuthor << endl;  
             SysFreeString( bstrTitle );
             SysFreeString( bstrAuthor );
         }
         pdi->Release();
     }
 }
Judge for yourself. The modern version spends time dealing with jukeboxes and disks, rather than interpreting blobs of data. If the extra round-trip to get the two strings from the intermediate IDiskInfo interface is unacceptable, the object implementor could elect to use marshal-by-value on the intermediate object to avoid the second round-trip.
      Another common interface that was used for intertask communication in 16-bit COM was IStream. As its name suggests, IStream represents an unstructured, opaque stream of bytes that can be read or written sequentially. This interface is often (ab)used to perform client-side flow control (see the IEnumXXX interface for a better approach). Note that IStream may make sense in cases where truly opaque byte streams are required (for instance transmitting large medical images), but it could be argued that simply dropping down to sockets for transferring this type of data would be preferable anyway. There is nothing wrong with using COM to transmit a dynamic TCP endpoint to set up a transient connection for purposes like this.
      You may wonder what direction to take when confronted with an existing application that needs to become distributed via COM. The resounding answer is to start your project by designing and writing IDL. Use QueryInterface to discover types at runtime. Redefine your data structures in IDL. Follow the example of Win32 (see wtypes.idl). Like retrofitting const-correctness into a C++ class hierarchy, retrofitting IDL is a big job, but the benefits are enormous.

4 Only Use
Connection
Points to
Support Scripting Events
A common misconception among COM programmers is that connection points enable bidirectional communication between clients and objects. They don't. What connection points provide is a standard mechanism for establishing bidirectional communication. Bidirectional communication is enabled when the client has an interface pointer to the object and vice versa. While you can use connection points to establish this bidirectional communication channel, you shouldn't for one important reason: round-trips. It takes five logical round-trips to establish bidirectional communication using connection points and four round-trips to tear it down; it really should only take one.
      It can be hard to resist the siren song of connection points. The tools (MFC, ATL, and Visual Basic) support both implementing connection points and using them. IDL even has a special connection point syntax for declaring the difference between an incoming interface that an object will implement and an outgoing interface that an object is willing to call back on. For example, imagine a Worker object willing to notify all concerned when its work is completed or delayed:


 library WorkerLib {
     // Outgoing interfaces are often dispinterfaces
     // to support late-binding clients, i.e. IE
     dispinterface DWorkerEvents {
     properties:
     methods:
         [id(1)] void OnWorkCompleted();
         [id(2)] void OnWorkDelayed();
     }
     interface IWorker : IUnknown {
         HRESULT StartWork();
     }
     coclass Worker {
         interface IWorker; // incoming
         [source] dispinterface DWorkerEvents; 
                            // outgoing
     }
Notice that the Worker class declares its willingness to call back on client implementations of the DWorkerEvents interface using the source attribute. This implies that the object will support the IConnectionPointContainer interface to allow clients to begin the bidirectional communication negotiation process:

 interface IConnectionPointContainer : IUnknown {
     HRESULT EnumConnectionPoints(
                 [out] IEnumConnectionPoints ** ppEnum);
     HRESULT FindConnectionPoint(
                 [in]  REFIID riid,
                 [out] IConnectionPoint ** ppCP);
 }
To obtain this interface, the client uses QueryInterface to request the IConnectionPointContainer interface from the object (round-trip 1). The client then calls FindConnectionPoint, passing in the IID for the interface that the client has implemented such as DIID_DworkerEvents (round-trip 2). If this succeeds, the object has expressed its ability to no-tify clients of the requested events and will return an IConnectionPoint interface to allow the client to attach its event sink:

 interface IConnectionPoint : IUnknown {
     HRESULT GetConnectionInterface([out] IID * piid);
     HRESULT GetConnectionPointContainer(
            [out] IConnectionPointContainer ** ppCPC);
     HRESULT Advise([in]  IUnknown *pUnkSink,
                    [out] DWORD *pdwCookie);
     HRESULT Unadvise([in] DWORD dwCookie);
     HRESULT EnumConnections(
                         [out] IEnumConnections ** pp);
 }
      The object's implementation of IConnectionPoint will maintain a list of clients interested in receiving notifications. For the client to add itself to the list, it calls the Advise method, passing in an interface pointer for the event interface that it has implemented (round-trip 3). Notice that even though the client may pass a specifically typed interface as the event sink parameter, the object (and the underlying remoting layer) can only assume that the parameter is an IUnknown reference. This means that to cache the event interface it wants, the object must call QueryInterface against the sink's IUnknown interface to acquire the typed event interface (round-trip 4). This last round-trip could have been avoided had the designer of connection points known about IDL's [iid_is] attribute. Finally, because the implementation of IConnectionPoint is often a distinct COM identity, when the client subsequently releases it, COM will perform one final round-trip (number 5) to tell the object that it has been released.
      In general, it's obvious that connection points were designed for in-process objects where round-trips are not as important as in the distributed case. In fact, they are required to write event handlers in Visual Basic or current ActiveX Script engines. If you want to support Visual Basic-style event handlers, you're going to have to live with connection points. Otherwise, you should replace them with another mechanism for establishing bidirectional communications. IViewObject is a perfect example.
      The IViewObject interface is implemented by an object that is willing to render itself in the client area of a container's window. It does this by defining a Draw method. To notify the container that its rendered state has changed and that it would like to be redrawn, it holds an implementation of the client's IAdviseSink interface. If the client is interested in this notification from the self-rendering object, it can pass its IAdviseSink interface to the object via the IViewObject method SetAdvise. If the client is already holding the object's IViewObject interface (the most common case for a container), it is a single round-trip to establish bidirectional communication between the client and the object. In case you missed it, this is four times faster than the protocol required for connection points.
      In addition, when calling SetAdvise, the client passes flags that the object can use for filtering events—for example, only sending the events that the client is interested in—further reducing round-trips. This illustrates another downside of connection points, that there is no mechanism for filtering or prioritizing events. Every client gets all events whether or not they're interested.
      At this point, you might think that all objects should be shoehorning their bidirectional communications through IViewObject and IAdviseSink. This is definitely not the case. IAdviseSink is an event interface that has been paired with IViewObject for events specific to self-rendering objects. The concept of having domain-specific protocols like these is a good idea. However, unless you are building rendering plumbing, you have no business using IAdviseSink or IViewObject. When one of your custom interfaces implies events, add a pair of methods for managing interested clients. Where appropriate, include flags to filter or prioritize events to further reduce round-trips. For example, consider an augmented IWorker interface and another events interface (see Figure 4).
      For a client interested in worker events, it is a single round-trip to add it to the list for the events it's interested in. There are a few things worth mentioning about the design of these two interfaces. First, notice that the IWorkerEvents methods pass in an interface pointer for the object sending the event. This makes building a client holding multiple instances of a class more convenient. Now the client can have one implementation of IWorkerEvents instead of one implementation per held object. Since the client-side proxy has already cached the IWorker interface, this represents no additional marshaling requirements.
      Second, notice that the name of the Advise and Unadvise methods are prefixed with the word Worker. If an object implements several interfaces that imply events, it's often more convenient to implement them if they have different names.
      Third, notice that the flags and the cookie arguments are signed longs instead of the more intuitive unsigned longs (DWORDs). Certain clients (notably the current implementation of Visual Basic) do not support unsigned types. While signed types aren't quite what we mean, they work and they allow more restricted clients to use this mechanism. Since we're explicitly supporting Visual Basic, Figure 5 is an example of establishing a bidirectional communication channel between a worker object and a Visual Basic-based client.
      If an object supports events independently of any specific interface, it's often necessary to factor the methods for setting up and tearing down the communication into a separate interface. If you do this, it's convenient to be able to support multiple event interfaces via a single Advise method. The [iid_is] attribute can be used to make this easier and more efficient.

5 Don’t Provide
More Than One
Implementation
of the Same Interface
on a Single Object
COM authorities love to talk and write about per-interface reference counting. They also are known to expound on the virtues of defining all interfaces as dual interfaces. Both of these techniques are interesting, but are in conflict with the basic tenets of QueryInterface.
      The COM specification is fairly clear about the rules of QueryInterface. It states that the set of interfaces an object exports is static for the lifetime of the object. It also states that the set of interfaces an object exports can be thought of as a collection of fully connected nodes; that is, one can acquire any available interface from any other interface and in any order. These rules form the notion of COM identity and must be followed for the COM remoting architecture to work properly.
      Most discussions of COM identity frame the traversal of interface pointers on an object in terms of yes/no answers; that is, do you support the interface or don't you? In general, this is reasonable level of detail except for the special case of QueryInterface(IID_IUnknown). However, in glossing over what it means to hand out an interface pointer from QueryInterface, a subtle problem that bites most COM developers is often overlooked.
      As stated in the COM specification, an interface is an immutable contract that implies concrete syntax and loose, abstract semantics. When a client requests an interface from an actual object, it can safely assume that the object implementor was aware of this contract and will provide a pointer (vptr) to a vtable chock full of interface-conformant code. The COM specification explicitly provides a loophole that allows implementors to dynamically allocate these vptrs on an as-needed basis, potentially returning different physical results for identical QueryInterface requests. This technique is often referred to as tearoffs, as a new vptr is "torn off" on demand. What is not explicitly stated in the COM specification is that trouble will ensue when you give out different logical results for identical QueryInterface requests.
      To grasp the problem of returning different logical results from QueryInterface, consider the following code fragment that asks a cat to bark like a dog:


 HRESULT BarkLikeADogYouCat(ICat *pCat) {
     IDog *pDog = 0;
     HRESULT hr = pCat->QueryInterface(IID_IDog, 
                                       (void**)&pDog);
     if (SUCCEEDED(hr)) {
         hr = pDog->Bark();
         pDog->Release();
     }
     return hr;
 }
If the following function were also available that asked birds to bark like a dog

 HRESULT BarkLikeADogYouBird(IBird *pBird) {
     IDog *pDog = 0;
     HRESULT hr = pBird->QueryInterface(IID_IDog, 
                                        (void**)&pDog);
     if (SUCCEEDED(hr)) {
         hr = pDog->Bark();
         pDog->Release();
     }
     return hr;
 }
it would be strange (to say the least) if the following code resulted in two different barks:

 HRESULT CreateAndBarkTwice(void) {
 // create a DogCatBird
     ICat *pCat = 0;
     HRESULT hr = CoCreateInstance(CLSID_DogCatBird, 0,
                                   CLSCTX_ALL, IID_ICat,
 	                        (void**)&pCat);
         IBird *pBird = 0;
         hr = pCat->QueryInterface(IID_IBird,
 	                        (void**)&pBird);
         if (SUCCEEDED(hr)) {
 // ask object to bark via two different interfaces
             hr = BarkLikeADogYouCat(pCat);
             if (SUCCEEDED(hr))
                 hr = BarkLikeADogYouBird(pBird);
             pBird->Release();
         }
         pCat->Release();
     }
 }
Since the ICat and IBird interface pointers both point to the same object identity, it is obvious that either both QueryInterface(IID_IDog) requests will succeed or both requests will fail.
      The subtlety of this code surrounds what happens when the Bark method is called from each of the helper functions. The call will always be dispatched to the same object identity. However, due to the loophole in the COM specification, it is possible that the object implementor has given out two different physical results from QueryInterface(IID_ IDog). Because there may be two different physical results, there also may be two different logical results; that is, the vtbl referenced by the result of

 pBird->QueryInterface(IID_IDog, (void**)&pDog1);
may be different than that referenced by

 pCat->QueryInterface(IID_IDog, (void**)&pDog2);
This means that the actual code that is executed via

 pDog1->Bark();
may be different than the code executed via

 pDog2->Bark();
While technically this can still be considered legal COM, it flies in the face of the basic tenets of QueryInterface—that is, the order that you acquire interfaces doesn't matter. In this example, the code that will execute when you ask the object to bark will be different depending on the order in which you acquired the IDog interface.
      So you may be saying to yourself that this seems like a largely academic discussion and that your objects are probably exempt from this semantic brouhaha. As long as your objects follow the letter of the COM specification, you just want to be left alone. If your objects will only be used within a single apartment, you may be able to get away with this assumption. However, once your object is accessed across apartment boundaries, your main physical client is now the COM remoting layer, and the architects of the COM remoting layer read a lot into the COM specification that you may not have. (Actually, the architects of the COM remoting layer wrote the COM specification, but that isn't important for this discussion.)
      To understand the issue at hand, it is useful to reexamine the top-level client code shown earlier:

 HRESULT CreateAndBarkTwice(void) {
 // create a DogCatBird
     ICat *pCat = 0;
     HRESULT hr = CoCreateInstance(CLSID_DogCatBird, 0,
                                   CLSCTX_ALL, IID_ICat,
 	                        (void**)&pCat);
         IBird *pBird = 0;
         hr = pCat->QueryInterface(IID_IBird,
	                        (void**)&pBird);
         if (SUCCEEDED(hr)) {
 // ask object to bark via two different interfaces
             hr = BarkLikeADogYouCat(pCat);
             if (SUCCEEDED(hr))
             hr = BarkLikeADogYouBird(pBird);
             pBird->Release();
         }
         pCat->Release();
     }
 }
Assume that the DogCatBird object will reside in a distinct apartment from the caller (the server is remote or out-of-process or the CLSID is marked with an incompatible ThreadingModel attribute). The call to CoCreateInstance will cause a stub manager to be created in the apartment of the object, and because the resultant pointer is of type ICat, an ICat interface stub will be created as well. The stub manager will hold an IUnknown reference to the actual object, and the interface stub will hold an ICat reference that will be used to dispatch the incoming ICat requests. The client will receive a pointer to an ICat interface proxy that has been bound to a new proxy manager that represents the identity of the object in the client's apartment.
      When the call to QueryInterface(IID_IBird) is made, the proxy manager notes that it does not yet have an IBird interface proxy, so it forwards the request to the object's apartment where an IBird interface stub is bound to the object (the new interface stub will hold an IBird reference to the object). When the remote QueryInterface request returns, a new IBird interface proxy will be aggregated dynamically into the proxy manager so that the overall proxy will appear to implement both ICat and IBird.
      The interesting leap happens when the two helper functions call QueryInterface(IID_IDog). The first request for IDog will go through the same steps as the request for IBird; a new interface stub will be created as will a new interface proxy. The fireworks begin at the second request for IDog. Since the proxy manager has already aggregated an IDog interface proxy in the call to BarkLikeADogYouCat, the second QueryInterface request will be satisfied by returning a pointer to the cached interface proxy. The good news is that there will be no second round-trip to the object, improving performance. The bad news is that there will be no second round-trip to the object, which robs the object of the chance to return the second implementation of IDog.
      The moral of the story is that when you are accessed remotely, the first implementation of an interface that you return is the one you are stuck with for the lifetime of the object. COM never tears down the interface stub until the last proxy manager releases its connection to the object. If you had designed your object around having two or more distinct logical implementations of an interface, your code will break once proxies are used. As a corollary, if you designed your object around being notified of individual interfaces being released, your code will break once proxies are used. This means that techniques such as per-interface reference counting or tearoff interfaces are largely useless in scenarios where more than one apartment is involved.

6 Beware Mixing
Interface-Based
Programming and
Typeless Languages
The power and expressiveness of the COM programming model is based on one simple idea: program against abstract interfaces, not concrete implementations. To determine which interfaces an implementation supports, the client must ask explicitly via calls to QueryInterface. This coarse-grained negotiation is the foundation of the COM type system. It provides encapsulation, abstraction, and polymorphism (all the goodness of object-orientation) while avoiding the problems related to versioning and coupling that often occur in classic object-orientation. Unfortunately, typeless languages such as JavaScript or VBScript throw these additional benefits of COM away in favor of increased simplicity.
      To support the use of COM objects, typeless languages currently expect all objects to support the IDispatch interface. IDispatch provides the functionality of a mini-interpreter to the runtime system used by these languages. When invoking methods or accessing properties on an object, the language runtime will ask the object via IDispatch::GetIDsOfNames if it supports a given named operation. If the answer is yes, the runtime will package the parameters into a self-describing stack frame for interpretation by the object via IDispatch::Invoke. It is the object's job to unpack the parameters and perform the operation.
      For example, consider a dispatch interface that is defined in IDL:


 dispinterface IDog {
 properties:
 methods:
     void Bark([in] long nVolume);
 };
The IDog interface has one logical method, Bark. An object that supports this interface must implement IDispatch to allow clients to physically access this operation. However, because typeless languages do not allow the programmer to associate type/interface names with variables, the language runtime will not be able to distinguish IDog from any other dispatch interface on the object. In this way, typeless languages limit objects to a single interface.
      It is not completely impossible to support interface-based programming from typeless languages. Consider the following IDL:

 [object,dual,uuid(92284211-9221-2412-11d1-552124391232)]
 interface ICat : IDispatch {
     [id(1)] HRESULT Meow([in] long nVolume);
 }
 [object,dual,uuid(92284212-9221-2412-11d1-552124391232)]
 interface IDog : IDispatch {
     [id(1)] HRESULT Bark([in] long nVolume);
 }
 [uuid(92284211-9221-2412-11d1-552124391232)]
 coclass DogCat {
     interface ICat;
     [default] interface IDog;
 }
If the client is written in a typed language (such as C++, Visual Basic, or Java), the following code will work as expected:

 Sub MeowAndBark( ) 
     Dim dog as IDog
     Dim cat as ICat 
     Set dog = new Dog
     Set cat = dog
     cat.Meow 100
     dog.Bark 200
 End Sub
If, however, the client is written in a typeless language (like VBScript or JavaScript), variables cannot be typed, making it impossible to bind two different IDispatch-based references to the same object identity.
      Object implementors who want to allow typeless languages to use their objects in an interface-based model often resort to adding properties to each interface to allow the typeless client to call QueryInterface indirectly via a property access. The modified IDL shown in Figure 6 illustrates this technique. This code assumes that the object implementor wrote the property accessors as follows:

 STDMETHODIMP DogCat::get_AsDog(IDog **ppDog) {
     (*ppDog = static_cast<IDog*>(this))->AddRef();
     return S_OK;
 }
 STDMETHODIMP DogCat::get_AsCat(ICat **ppCat) {
     (*ppCat = static_cast<ICat*>(this))->AddRef();
     return S_OK;
 }
      Ignoring the problems related to having tight coupling between two otherwise unrelated interfaces, this technique basically works provided that the object resides in the same apartment as the client (that is, there is no proxy or stub). However, once the object resides in a distinct apartment, all access is via proxy/stub connections. Because the results of IDispatch::Invoke must be marshaled through VARIANTs, the IDog-ness or ICat-ness of the result is lost. Instead, the client will always get the first IDispatch pointer that has been marshaled from the object. In the case of the previous VBScript fragment, this means that all references to the object will be of type IDog simply because the initial interface pointer was acquired via a call to QueryInterface(IID_IDispatch). Again, refer to the discussion about objects not providing multiple implementations of the same interface.
      Instead of relying on objects with multiple interfaces, typeless languages typically access multiple functionalities via multiple distinct COM objects. These objects are typically exposed via relationship hierarchies called object models. An object model is simply a set of related objects, typically traversed via a top-level object (often called the Application object in classic Automation hierarchies) and a set of lower-level objects that provide various finer-grained services.
      Consider the automation-style model for Dogs and Cats shown in Figure 7. Object models are designed and implemented in a class-oriented style (each object has a single public interface). This makes the model simple to understand. However, this design style also leads to versioning problems. If management requests a new feature for this object hierarchy, you're in trouble because each object can only support a single implementation of IDispatch. However, because we're using dynamic invocation instead of vtbl-based invocation, you might suspect we have a bit more flexibility. If we were to add a new method or property to one of the interfaces, new clients would have access to that functionality via GetIDsOfNames and Invoke. If, on the other hand, a new client gets hold of an old implementation of the revised interface expecting that the new functionality be there, whamo! It's not going to be there and that's a runtime error.
      Why does this happen? It's true that GetIDsOfNames is basically a fine-grained QueryInterface (it checks if the object supports a named method or property). However, for simplicity, most scripting environments bundle GetIDsOfNames and Invoke into a single operation. No typeless languages that are currently available allow the client application to ask for the availability of an operation before calling it.
      One way to support evolution of objects and object hierarchies for typeless clients is to fall back on version management. That's one reason all type libraries are tagged with a major and minor version number. Unfortunately, most typeless clients don't have direct access to the object's type library or its version number. To allow typeless clients to check the version number, the top-level object should support a CheckVersion method:

 HRESULT PetStore::CheckVersion(
         short         major,
         short         minor,
         VARIANT_BOOL* pbCompatible)
 {
     if (!pb) return E_POINTER;
     if (major == MAJOR_VERSION && minor <= MINOR_VERSION)
         *pbCompatible = VARIANT_TRUE;
     else
         *pbCompatible = VARIANT_FALSE;
     return S_OK;
 }
Unfortunately, many existing object hierarchies do not have this method. Rather, they expect typeless clients to deal with the runtime errors. This manual version management technique is not for the faint of heart, though, and should be avoided.

7 Don’t
Require
People to
Implement Dual
Interfaces
Today's COM interfaces fall into one of three categories: pure-vtable interfaces, which are usable from most typed languages; pure-dispatch interfaces or dispinterfaces, which are usable from most untyped languages; and dual interfaces, which are hybrids of vtable and dispatch interfaces. Dual interfaces are part of the Visual Basic architecture and were introduced with the release of Visual Basic 4.0. Dual interfaces are the only type of interface that can be defined within the Visual Basic environment (although programmers using Visual Basic can implement externally defined pure-vtable and pure-dispatch interfaces). In theory, dual interfaces combine the best aspects of both vtable and dispatch interfaces. That's the theory at least.
      Most COM-aware development environments make it easy to implement dual interfaces to allow your object to be accessed from both typed and untyped languages. For this purpose, dual interfaces are an adequate solution given the nature of COM's circa-1998 dynamic invocation. However, it is a common mistake to define callback or extensibility interfaces as duals, which ultimately makes no sense.
      Consider the following pure vtable-based interface hierarchy:


 [ uuid(A9FC76E0-FDC1-11d1-927F-006008026FEA), object ]
 interface IStopWatchEvents : IUnknown {
     HRESULT OnStart();
     HRESULT OnTicking();
     HRESULT OnStop();
 }
 [ uuid(A9FC76E1-FDC1-11d1-927F-006008026FEA), object ]
 interface IStopWatch : IUnknown {
     HRESULT Advise([in] IStopWatchEvents *pswe);
     HRESULT Start();
     HRESULT Stop();
 }
This hierarchy allows IStopWatch clients to register an event sink interface to be notified when certain temporal events happen in a stopwatch object. Given this interface hierarchy, the IStopWatch::Stop method might look something like this:

 STDMETHODIMP Stop() {
 // stop running stopwatch
     this->InternalStopTimer();
 // notify event sink (attached in Advise method)
     if (m_pswe)
         m_pswe->OnStop();
     return S_OK;
 }
It is commonplace to define callback or extensibility interfaces as pure dispatch interfaces:

 [ uuid(A9FC76E2-FDC1-11d1-927F-006008026FEA)]
 dispinterface IStopWatchEvents {
 properties:
 methods:
     void OnStart();
     void OnTicking();
     void OnStop();
 }
Had this been the case with IStopWatchEvents, the event would then be fired by calling IDispatch::Invoke:

 STDMETHODIMP Stop() {
 // stop running stopwatch
     this->InternalStopTimer();
 // notify event sink (attached in Advise method)
     if (m_pswe) {
         DISPPARAMS params = { 0, 0, 0, 0 };
         m_pswe->Invoke(DISPID_ONSTOP, IID_NULL, 0,
                        DISPATCH_METHOD, &params, 0, 0, 0);
     }
     return S_OK;
 }
In either case, the stopwatch object knows either to call through a direct vtable entry or through the IDispatch mechanism.
      Now, consider the case where IStopWatchEvents is a dual interface:

 [ 
     uuid(A9FC76E4-FDC1-11d1-927F-006008026FEA), 
     object, dual 
 ]
 interface IStopWatchEvents : IDispatch {
     HRESULT OnStart();
     HRESULT OnTicking();
     HRESULT OnStop();
 }
Given this interface, how should the implementation of IStopWatch::Stop notify the event sink? If you say it should use IDispatch::Invoke, then what purpose was there in defining the interface as dual instead of as a pure dispinterface? If you say it should use the OnStop vtable entry directly, then why not just define the interface as a pure vtable interface? Granted, most tools make implementing dual interfaces trivial. However, in this situation there are two equally legal mechanisms for invoking the methods with no clear guideline as to which one is preferred.
      For an example of what can happen when a dual interface is used as an extensibility/callback interface, consider what happens when you define the event sink of a control as dual and then deploy it in Microsoft Internet Explorer 4.0. Yes, Internet Explorer will gladly give you an implementation of your dual callback interface at initialization-time. And yes, Internet Explorer will fail miserably when you try to fire an event using a vtable entry. The bottom line is, dual interfaces were designed to solve a problem that never should have affected COM programmers. Hopefully, a future version of COM will render them completely obsolete.

From the October 1998 issue of Microsoft Systems Journal.