This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
July 1998
Don Box is a co-founder of DevelopMentor where he manages the COM curriculum. Don is currently breathing deep sighs of relief as his new book, Essential COM (Addison-Wesley), is finally complete. Don can be reached at http://www.develop.com/dbox. |
COM is five years old. As I write this column in
April 1998, developers all over the world are
celebrating the fifth anniversary of the OLE2 Professional Developer's Conference. At this 1993 event, Microsoft® shipped the release version of OLE 2.0 and the Component Object Model. To commemorate this anniversary, I am dedicating this month's column to looking at the current state of COM, reflecting on what the COM designers got right, and what needs improvement. Hopefully, COM+ (or whatever Microsoft marketing winds up christening the next generation of COM) will build on the strengths of today's COM and repair the mistakes of the past. It's always best to preface criticism with praise, so I'll begin by looking at aspects of COM that were right on the money and withstood the test of time.
IUnknown
|
|
In Java, you can use the C-style cast syntax to achieve the same effect: |
|
Both of these examples interrogate the object's type information at runtime to determine whether the object is compatible with the type Cat. If the object is a simple Dog, the cast will throw an exception, indicating that the cast makes no sense for this particular Dog. If the object is a Dog that is also a Cat (via multiple inheritance), the cast will succeed and return a valid reference to the "Cat-ness" of the object, allowing the client to access the Cat functionality of the object (in this case, the meow method). In a dynamically composed system, this ability to interrogate an object's type information is critical for allowing clients to access extended functionality from an object.
The capability for runtime type interrogation was deemed so important that the COM designers mandated that all objects (and object references) support it by making QueryInterface a method in the IUnknown interface. In languages that do not use a runtime layer or virtual machine (VM), QueryInterface can be called directly: |
|
In languages that have a runtime or virtual machine, it is up to the VM implementor to thunk the VM's cast operation to COM's QueryInterface, mapping the native language's intrinsic runtime type identifiers to COM GUIDs.
Different languages and systems also have different mechanisms for reclaiming resources held by object references. To allow interoperation of such systems, IUnknown acts as an impedance matching layer for the lifecycle management of object references. Rather than simply add a delete method to IUnknown, the COM designers determined that it was none of the client's business when an object should reclaim its resources. Rather, all the client can expect is that the object should remain valid as long as the client's reference is still "live." In COM, clients notify objects when an object reference is no longer live by calling IUnknown's Release method independent of the client's policy for invalidating object references (termination of scope, mark and sweep garbage collection, and so on). In languages that have a runtime or VM, it is up to the VM implementor to notify the object that an object reference is no longer live. In languages that do not have a runtime or VM, it is up to the client programmer to release the reference explicitly (which is no different than calling free to offset a call to malloc). The use of a passive Release in lieu of an explicit delete operation also simplifies the case where the client holds multiple types of references to a single object. Since there may not be a one-to-one correspondence between object references and objects, using an explicit delete operation per object would add undue complexity to the client's program or supporting runtime. Finally, as an optimization, the COM designers added the AddRef method to make duplicating homogeneous object references more efficient. While this is legal COM |
|
the following implementation is likely to be considerably more efficient: |
|
Most implementations of AddRef simply increment an integer. Most implementations of QueryInterface perform multiple GUID comparisons prior to returning an object reference. This is just one example of the compromise between theoretical purity and runtime performance reality that permeates much of COM's design.
Perhaps one of the most compelling aspects of IUnknown is its simplicity. It is completely minimal and requires no runtime support from the COM library. You can implement this part of COM on any operating system and in virtually any language that allows direct manipulation of memory. In fact, IUnknown was deemed so powerful that the Netscape developers used it in their cross-platform Web browser, Netscape Navigator, even though COM was not (and is still not) available on all of Netscape's target platforms.
Interface-based Programming
The COM Remoting Architecture
Apartments
|
|
What happens if two threads execute this function simultaneously? Since the barkLikeADog method is not synchronized, the two threads will execute the for loop concurrently. This means that the two threads may issue calls to the same object simultaneously. Since the interface IDog is abstract, each element in the array may point to different implementations of IDog, some of which were written with thread safety in mind, others which were not. This means that depending on the concrete type of the object, this code may or may not be correct. So, what do Java developers do? One solution is to do nothing, assuming that naïve implementors marked their bark method as synchronized.
Unfortunately, Java methods are unsynchronized by default, which means that only developers who are somewhat naïve will remember to add the synchronized keyword. Completely naïve developers tend to forget this kind of thing. This is one reason the Java language allows clients to add synchronization to other people's objects using synchronized blocks. The following modified method illustrates this technique: |
|
The problem with this method is that it assumes that all implementations of IDog are not thread-safe. This means that objects that could tolerate concurrent bark requests will never be given the opportunity, since the client must program to the lowest common denominator.
The problem with the Java approach to components and threads is that it requires the client to guess what level of thread-awareness the object implementor may have had and program to the worst-case scenario. This limits opportunities for concurrency and can introduce bottlenecks in multithreaded code. The COM designers decided that thread-awareness was yet another implementation detail that the client had no business worrying about. This is the motivation for apartments. A COM apartment is a group of one or more threads in a process that can execute method calls. Threads that use COM must first enter an apartment by calling CoInitializeEx. All threads that call CoInitializeEx with the COINIT_ MULTITHREADED flag enter the lone multithreaded apartment (MTA) of the process. Each thread that uses the COINIT_APARTMENTTHREADED flag (or calls CoInitialize or OleInitialize) enters a new single-threaded apartment (STA) that no other thread will ever enter. By default, a COM object belongs to exactly one apartment, and only threads within that apartment can execute methods on the object. This means that objects that live in the MTA must be robust in the face of concurrent access. Conversely, objects that live in an STA do not need to worry about concurrent access, since only one thread will ever enter the apartment. In-process component implementors annotate which types of apartments they are compatible with by using the ThreadingModel registry entry. At activation time, COM ensures that your component is created and accessed in the correct type of apartment. If the client thread is in the wrong apartment type, COM will create your object on a COM-managed thread of the correct apartment type. When this happens, the client will get a proxy to the object that is appropriate for use in their apartment. This simple model allows non-thread-safe components to be accessed in a multithreaded manner, since COM will return a thread-safe proxy to MTA-based clients. This means that multithreaded clients never need to serialize method invocations on an object reference. If the reference points to a proxy, the proxy serializes the calls to the actual object by forwarding the request to the object's STA thread. If the reference points to an actual object, it must have marked itself as MTA-safe in the registry and therefore has taken deliberate steps to allow clients to access the object concurrently. While the notion of an apartment is fundamental to the COM programming model, COM provides facilities for bypassing the semantics of an apartment (such as the freethreaded marshaler and the global interface table) for objects that have special threading requirements. These facilities are optional and only to be used by developers with a strong understanding of threads and apartments. Nonetheless, even objects that use advanced facilities, such as the freethreaded marshaler, can be accessed safely by clients with disparate threading awareness.
Type Information is a Mess
IDispatch
|
Figure 2 Server-side Dynamic Invocation |
For a great example of how dynamic invocation should have been handled, look at either Java's reflection package or CORBA's DII/DSI infrastructure. The former makes it trivial for a client to build a method call on the fly solely on a text-based description of what the call should look like (this is what scripting engines and interpreters need to do). The object simply implements the interfaces that it needs to expose independent of dynamic invocation, and the reflection plumbing deals with forming the correct type of stack frame based on type information buried in the Java .class file. As far as the object is concerned, the call came from an early-bound compiled client.
CORBA's DII/DSI does everything reflection does for dynamic invocation, in addition to providing server-side hooks for allowing object implementors to actually participate in method/type resolution. Both CORBA and Java reflect the assumption that dynamic method invocation is a client-side decision that most object implementors really don't care about. At the time of this writing, there was no easy way for an object to munge around with the server-side of dynamic invocation in Java (à la CORBA's DSI), but there's always version 1.3 of the JDK to solve this. The saddest part of this discussion is that one potential solution to the problem has been with us since 1993. Microsoft's type library parser knows how to do what the Java reflection package does; that is, create a method call based on some uniform generic API. Figure 1 shows the ATL implementation of IDispatch::GetIDsOfNames and IDispatch::Invoke. Like most modern implementations, ATL simply forwards these calls to the type library parser, where the low-level thunking of VARIANTs and DISPIDs down to stack frames and vtable offsets occurs. This technique is illustrated in Figure 2. There is no reason why clients that now use IDispatch (like the scripting engines) couldn't do the same thing on the client side against non-IDispatch-based interfaces, provided that sufficient type information is available. |
Figure 3 Client-side Dynamic Invocation |
Figure 3 shows what this technique would look like if it were to be adopted by today's IDispatch clients. Determining the type information of an object at runtime is trivial given today's simple IProvideClassInfo interface. Arguably, all that is really needed for most objects is a CLSID and LIBID from which the client could then load the type library directly. Of course, this is not the way today's scripting engines behave, so we're still implementing IDispatch in every object known to man. The Java and CORBA architects were correct in pushing this responsibility onto the client, not the object. One could argue that IDispatch should be akin to IMarshalthat is, a completely optional interface implemented only by the one percent of objects that have special needs, not by 100 percent of the objects that want to interoperate with brain-dead client environments.
C++ Language Mapping Needs an Overhaul
|
|
Given this template, one can write the following code: |
|
Note that this code does not suffer from the common type mismatches that QueryInterface's void ** parameter would normally allow.
So what's the problem with __uuidof? For one, it requires header files compiled with MIDL version 3.01.75 or later. If you only have the C++-based interface definitions but not the original IDL files, you must explicitly add the __declspec(uuid) attributes by hand (see comdef.h for an example of how this is done). Perhaps more problematic, __uuidof is a Visual C++ language extension, not part of COM. If you write code that uses __uuidof, you are married to using Visual C++ 5.0 or greater from now until the end of time. No amount of macro magic will dig you out of this dependency. Had the COM team elected to provide a standard way to associate COM and C++ typenames, the Visual C++ team would never have been compelled to add the useful but proprietary __uuidof. Ideally, the system headers would contain the following generic template definitions: |
|
Given these templates, the MIDL could then emit the following template specialization along with the standard C++ interface definition: |
|
Assuming this standard type infrastructure was in place, one could easily rewrite the com_cast template as follows: |
|
Note that the only change was the use of the uuidof template function as opposed to the Visual C++-specific __uuidof.
The hypothetical C++ interface definition shown previously also provides a mechanism for determining the base type of a COM interface. Given such a facility, it would be trivial to construct a template class that implements IUnknown using fairly simple syntax: |
|
Because the entire type hierarchy is available to the compiler, it is possible to build a template that traverses the inheritance hierarchy in its implementation of QueryInterface. Note also that because the IIDs are intrinsically linked to the C++ typenames, no additional MFC/ATL-style interface map is needed. (Maintaining these maps is a common source of errors among novice COM developers.) Assuming a similar mapping of COM class names to CLSIDs, constructing an in-process server could become as simple as this: |
|
Of course, to get the self-registration right, you'd need higher fidelity type information to capture things like ProgIDs, component categories, and threading models. You could hack in this additional information via custom IDL attributes, but it would be nice if COM could provide the corresponding registration code as part of the infrastructure.
Deployment
Security
Oversimplification in the Tools
The Next Five Years
Have a question about programming with ActiveX or COM? Send your questions via email to Don Box: dbox@develop.com or http://www.develop.com/dbox. From the July 1998 issue of Microsoft Systems Journal. |