House of COM, MSJ March, 1999

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

March 1999

Code for this article: Mar99Com.exe (5KB)

Don Box is a co-founder of DevelopMentor, a COM think tank that educates the software industry in COM, MTS, and ATL. Don wrote Essential COM and coauthored the follow-up Effective COM (Addison-Wesley). Reach Don at http://www.develop.com/dbox.

Just when I think I am finished writing about marshal- by-value (MBV) in this column, something pulls me back to my networking roots and forces me to retread this path, where I find new nuances or subtleties each time. Since my previous column (MSJ, January 1999), I wrote a clever piece of COM code that provides MBV service to objects implemented in C++, the Java language, or Visual Basic®. This month I present what is hopefully the final chapter in my MBV opus.

    If you dig through your back issues of MSJ, you will find that two years ago this month (March 1997) I covered the topic of MBV rather thoroughly. Recall that, by default, COM objects marshal by reference. That is, when passing object references as method parameters, CoMarshalInterface creates a stub to allow method calls to be dispatched to the object from foreign apartments. Clients wind up with COM-provided proxies that send the stub request messages that trigger remote method invocations.

    CoMarshalInterface provides an extensibility hook in the form of the IMarshal interface. Objects that implement IMarshal are said to custom marshal; objects that don't implement IMarshal are said to standard marshal. This has nothing to do with the distinction between custom and standard interfaces, nor does IMarshal have anything to do with the MIDL-generated proxies used in standard marshaling.

    CoMarshalInterface uses the IMarshal interface to allow the object to specify a custom proxy that will be instantiated in the client's apartment. The IMarshal::GetUnmarshalClass method is called to discover the CLSID of the custom proxy. Prior to sending the marshaled object reference to the client's apartment, CoMarshalInterface asks the object's IMarshal::MarshalInterface method to write an initialization message that will be given to the custom proxy once it is instantiated. When the marshaled object reference is received in the client's apartment, CoUnmarshalInterface is used to read in the object reference. CoUnmarshalInterface simply calls CoCreateInstance to instantiate the custom proxy, and calls the proxy's IMarshal:: UnmarshalInterface to initialize the new proxy.

    By far the two most common applications of custom marshaling are to implement apartment neutrality and MBV. Apartment neutrality simply means that the object can be accessed freely without a proxy from anywhere inside of the object's process. This is achieved by aggregating the freethreaded marshaler (FTM) using the CoCreateFreeThreadedMarshaler API. MBV is typically implemented by hand and is based on the idea that a proxy is not actually necessary; rather, a disconnected clone of the object that exists in the client's apartment is sufficient. MBV causes the entire state of the object (that is, its value) to be transmitted instead of a network-aware reference.

Figure 1 Marshal-by-Reference

    Figure 1 Marshal-by-Reference

Figures 1 and 2 show the difference between marshal-by-reference (the default) and MBV. The advantage of MBV is that the client gets its own private copy of the object and can access its properties and methods without any additional network traffic. The disadvantage of MBV is that the client's copy is disconnected and any changes made to the client's copy are not communicated back to the original object.

Figure 2 Marshal-by-Value

Figure 2 Marshal-by-Value

My March 1997 column discussed the similarity of IMarshal and IPersistStream. Both interfaces allow the object to serialize state to a caller-provided byte stream (MarshalInterface and Save), and both allow the object to specify the CLSID of an object that can read the byte stream (GetUnmarshalClass and GetClassID). IMarshal and IPersistStream assume that a new object can be created from the specified CLSID that will read the state written by the original object (UnmarshalInterface and Load). Finally, both interfaces allow the caller to discover the upper bound on the size of the serialized state (GetMarshalSizeMax and GetSizeMax). My March 1997 column presented an aggregatable implementation of IMarshal that used the aggregating object's IPersistStream implementation to serialize the object when CoMarshalInterface requested a custom proxy. Figure 3 shows the basic architecture of that implementation.

Figure 3 Aggregating the MBV Object

Figure 3 Aggregating the MBV Object

The primary flaw of that implementation was that it required participation from the object; that is, to marshal an object by value, it had to want to be marshaled by value and aggregate my code. This was great if you were writing the object in question. But even so, many languages such as Visual Basic and Java are not capable of aggregating other COM objects. However, these languages provide great support for implementing COM persistence and are ripe for MBV support.

Lately, I've been doing a lot of work with OLE DB as well as with the Windows® 2000 handler marshaling and non-blocking method invocation. All of these technologies make heavy use of COM aggregation to merge the identities of two or more COM objects in extremely novel ways. In working with these technologies, I've started to see the world in terms of COM aggregation—not as a reuse mechanism, but as a way to design extensibility into an object hierarchy.

Figure 4 MBV Aggregating the Object

Figure 4 MBV Aggregating the Object

My original MBV code relied on the MBV object aggregating my object (which requires explicit participation). My new plan for adding MBV support relies on having my code aggregate the MBV object and exposing my IMarshal interface in addition to all of the real object's interfaces, as shown in Figure 4.

Of course, you all know what they say about the best-laid plans. This time I was thwarted by the complete lack of support for aggregation in Visual Basic. Granted, the Java Virtual Machine (JVM) supports aggregating Java objects. But let's face facts; it's a Visual Basic-based programmer's world and we are all just visitors in it.

Not one to be thwarted by less-than-stellar support for COM functionality in Visual Basic, I decided to employ the ideas of one of my MSJ cohorts, Keith Brown, who has just published his work on COM delegators (MSJ, January 1999). Keith's work is based on the idea that you can build an interception layer on top of an existing COM object that can delegate all calls to the underlying object—with or without knowledge of the underlying method signatures. Due to the way that the __stdcall calling convention works, the this pointer is always available at a well-known offset into the stack. This is true no matter how many parameters a given method may have.

Figure 5 COM Method Call on the Stack

Figure 5 COM Method Call on the Stack

Figure 5 shows a typical COM method call on the stack. Note that the interface pointer is always stored four bytes off the top of the stack (just below the return address). Blind delegation is based on replacing the delegator's pointer on the stack with the real interface pointer of the delegatee and then jumping to the delegatee's method (which is stored in a well-known location relative to the delegatee's interface pointer).

Figure 6 Blind Delegation in Action

Figure 6 Blind Delegation in Action

Figure 6 shows how blind delegation is glued together. Note that each vtable entry for a blind delegator points to the following ASM stub:



 void __declspec(naked) Thunk() {
 // move BD::this into eax
     mov eax, DWORD PTR [esp + 4]
 // move BD::this->m_pUnkRealItf into eax
     mov eax, DWORD PTR [eax + 4]
 // move eax into "this" position on stack
     mov [esp + 4], eax
 // move base of real vtable into eax
     mov eax, DWORD PTR [eax]
 // add offset for current method
     add eax, ecx 
 // jmp to the actual function
     jmp DWORD PTR [eax]
 }

This code relies on the ECX register containing the offset into the vtable for the method in question, so technically each VTABLE entry is populated with the following two-liner, which shoves the offset into ECX and JMPs to the thunking code:



 #define METHODTHUNK(n) \
 void __declspec(naked) methodthunk##n() {  \
 _asm { mov ecx, (n * 4) } _asm { jmp Thunk } }

This code should look remarkably familiar if you paid close attention to my last column, as it is the same ASM used by the /Oicf proxy. Of course, to maintain the identity laws, the IUnknown methods of the blind delegators must forward to a fairly boilerplate implementation of QueryInterface, AddRef, and Release. Figure 7 shows the implementation of a generic blind delegator.

Figure 8 Object Model of the MBV Delegation Code

With the blind delegation code in place, all you need is some glue code to stitch together the delegators on top of a persistent object. Figure 8 shows the object model of the MBV delegation code. The MBV code will hand out blind delegators for any interface except IUnknown and IMarshal. IUnknown must be implemented manually to maintain identity laws. IMarshal needs to be implemented because that's the whole idea behind building this infrastructure. Note that the IMarshal implementation simply thunks down to the inner object's IPersistStream implementation. This is identical to how the model worked in my March 1997 column. The only difference is that now my code is the controlling unknown and the persistent object is "pseudo-aggregated" behind the MBV implementation.

I use the term pseudo-aggregated because blind delegation is implemented differently than classic COM aggregation. With blind delegation, no raw references to the inner object are ever passed out to the client. Instead, the blind delegation code acts as the client to the object, and the object is completely oblivious that any monkey business is taking place. Another tremendous advantage of this approach is that it can be added to an object that has already been created, unlike classic COM aggregation that requires the object to be informed at creation time of its aggregate status.

Once I got the blind delegation code in place, I found that there were some sticky problems related to the way Visual Basic implemented IPersistStream. First, the Visual Basic implementation of IPersistStream::GetSizeMax returns E_NOTIMPL (as do many other implementations of GetSizeMax). In my implementation of IMarshal::GetMarshalSizeMax I needed to get an accurate estimate. The most obvious approach was to go ahead and serialize the object in GetMarshalSizeMax and cache the serialized stream for later use in MarshalInterface. This requires some extra logic to keep track of the serialized stream as long as necessary, but it works reasonably well. An alternative approach that was recommended by ATL guru Chris Sells is to call the object's IPersistStream::Save against a null IStream implementation that does nothing in its Write method except increment an integer by the number of bytes "written."

Another issue that arose was related to IPersistStreamInit (a close cousin to IPersistStream), which requires the client to tell the object that it is being created from thin air and will never get a Load call. Some implementations (the Microsoft® JVM, for example) are fairly forgiving and allow you to call IPersistStreamInit::Save without first calling InitNew or Load. Unfortunately, the Visual Basic implementation of IPersistStream::Save gets quite upset if the object doesn't first receive an IPersistStreamInit::InitNew method call. Strangely enough, Visual Basic itself will call InitNew when it creates an object via the New operator or CreateObject. However, VBScript doesn't do this, which means if you want to marshal a Visual Basic object by value from within VBScript, you must first call InitNew prior to passing the object as a method parameter.

To make it easy to create Visual Basic objects that support MBV from any environment (including VBScript), my code is exposed via the dispatch-based interface:



 [ dual ] interface IMBVRef : IDispatch {
 // This routine wraps an existing object 
 // behind an MBV delegator
   HRESULT MBVRef([in] IUnknown *pUnk, 
     [out, retval] IUnknown **pvarResult);
 // This routine calls IPersistStreamInit::InitNew 
 // for VB objects created outside of VB 
 // (which calls InitNew for you). Call it 
 // before using an object reference returned 
 // from CoCreateInstance.
   HRESULT InitNew([in] IUnknown *pUnk);
 // This routine creates a new instance and wraps 
 // it behind a MarshalByValue delegator. This 
 // routine calls IPersistStreamInit::InitNew 
 // automatically.
   HRESULT CreateInstance([in] BSTR bstrProgId, 
     [in, defaultvalue("")] BSTR bstrHostName, 
     [out, retval] IUnknown **ppUnkNewObject);
 };

The following interface (and its corresponding implementation) allows Visual Basic scripters to wrap arbitrary persistent objects with an MBV delegator:



 Sub CallThroughRemoteObject(remObj)
 rem create MBV wrapper factory
     Dim mbvwrapper, persistentObj, mbv
     Set mbvwrapper = CreateObject("bvref.mbvref")
 rem create and init an inproc (persistent) object
     Set persistentObj = CreateObject("structwrap")
     mbvwrapper.InitNew persistentObj
     persistentObj.prop = "Some property"
 rem create MBV wrapper to persistent object
     Set mbv = mbvwrapper.MBVRef(persistentObj)
 rem pass mbv delegator to remote object
     remObj.LickThisLollipop mbv
 End Sub

Note that this code needs to trigger a call to IPersistStreamInit::InitNew prior to setting any properties or invoking any methods on the persistent object. Also note that the MBV delegator was passed in lieu of the actual object. This triggers the MBV code and sends a clone of the object to the remote component.

The MBV delegator brings up the basic question of who should decide whether MBV should be used. Both COM and Java RMI assume that it should be the object being passed as a method parameter. If you look at parameter passing in C++, it is the interface designer's choice—consider f(FOO&) versus f(FOO). The MBV delegator gives the object's user the power to decide. Given the fact that the MBV delegator only works with persistent objects, you could argue that the user is in no way violating the semantic contracts that the object expects. After all, how can the object distinguish between sending serialized state via method parameters versus sending serialized state via floppy disks.

One final issue arose when working with Java-based COM objects. Supporting COM persistence from the Java language is trivial. Simply by implementing the java.io.Serializable interface, the JVM will provide an implementation of IPersistStreamInit as well as IPersistStorage, both of which simply use Java's intrinsic serialization infrastructure. However, the COM-callable wrappers created by the JVM already implement IMarshal in order to make Java objects apartment-neutral. In an earlier implementation of my MBV delegator, the apartment neutrality was lost, resulting in undue overhead when Java objects were passed from thread to thread. Since I discovered this problem, I've fixed it so that the MBV delegator checks to see if the delegatee uses the FTM. If it does, the delegator uses the FTM as well.

Figure 9 shows the interesting bits of the MBV delegator code. You can download the whole enchilada, complete with Visual Basic and VBScript samples, from the link at the top of this article or from http://www.develop.com/dbox/com/mbvref.

Have a question about programming with COM? Send your questions via email to Don Box: dbox@develop.com or http://www.develop.com/dbox.

From the March 1999 issue of Microsoft Systems Journal