Building a Lightweight COM Interception Framework, Part II: The Guts of the UD

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

February 1999

Building a Lightweight COM Interception Framework, Part II: The Guts of the UD

Keith Brown

Code for this article: Delegate.exe (82KB)

This article assumes you're familiar with C++, COM

The Universal Delegator provides the code necessary to compose arbitrary services on top of existing COM objects, with or without the explicit cooperation of those objects. It encapsulates some rather tricky interception and marshaling code, making it easy to add new interception policies.

Keith Brown works at DevelopMentor, developing the COM and Windows NT Security curriculum. He is coauthor of Effective COM (Addison-Wesley, 1999), and is writing a developer's guide to distributed Security. Reach Keith at http://www.develop.com/kbrown.

Last month I provided a rationale for an extensible interception framework, along with a specific example called the Universal Delegator (UD). The UD provides the grungy code necessary to compose arbitrary services on top of existing COM objects, with or without the explicit cooperation of those objects. This month I will dive down into the guts of the UD to see how it works. I will also discuss how COM developers can write composable services that plug into the UD framework, and present a server-side application of the UD.

The UD encapsulates some rather tricky interception and marshaling code, making it quite easy to add new interception policies. These policies are packaged in "hooks," which are simply COM objects that implement one or two interfaces, depending on what the hook needs to accomplish. All hooks must implement IDelegatorHookQI, and hooks that need to perform preprocessing or postprocessing for individual method calls will also provide implementations of IDelegatorHookMethods. Both of these interfaces are shown in Figure 1.

Simple hooks like the Anonymous and Alternate Credential hooks (described in last month's article) do not require actual method call interception. Rather, they only need to get a peek at each individual interface pointer as the client requests new interfaces. This allows these hooks to call SetBlanket on each interface proxy. IDelegatorHookQI allows this behavior because the delegator's implementation of QueryInterface invokes the hook's OnFirstDelegatorQIFor method each time the client queries for a different interface pointer. This is a very natural extensibility point for the UD, as it needs to wrap each individual interface pointer anyway.

After asking the wrapped object for the requested IID, the UD gives the hook a chance to see the object's interface pointer (this only occurs the first time a client queries for a particular IID). IUnknown is handled separately via the Init method of IDelegatorHookQI. The parameter passed via Init is the result of querying the wrapped object for IID_IUnknown, so it is guaranteed to represent the COM identity for that object, in case the hook cares about this. The hooks described earlier use this method to call SetBlanket to control the security settings for IRemUnknown calls. Now would be a good time to start delving down into the implementation of the UD, to understand exactly how hooks fit into the big picture.

Figure 2 shows the delegator's implementation of QueryInterface. Note that the UD explicitly implements IUnknown. It also implements IMarshal if any of the DO_MBV_XXX flags have been specified; otherwise, IMarshal methods are delegated just like any other interface. To provide thread safety, each UD holds a critical section that it acquires wherever potential race conditions could occur. In QueryInterface, the UD first checks to see if the requested IID has already been wrapped. If not, the UD asks the original object if it supports the requested interface and, if so, gives the hook a peek at it. Note that the hook is allowed to hide interfaces that the inner object exposes by returning E_NOINTERFACE from OnFirstDelegatorQIFor. To help the hook avoid breaking the rules of COM identity, the UD remembers that this interface was hidden via an internal flag, which ensures that the hook will not be asked again about an IID that the hook has chosen to hide. After some thread-local storage (TLS) bookkeeping (discussed later), the UD wraps the interface pointer in an internal structure called a Delegator. This structure exposes a vptr that points to a single vtable of generic interception entry points, instead of the original object's implementation. A pointer to a Delegator structure is what the client eventually sees. Figure 3 shows how all these pieces fit together, along with some of the key member variables in each class.

Figure 3 Inside the Universal Delegator

The CoDelegator class represents the COM identity of the UD, and holds pointers to individual instances of the Delegator structure, a lightweight wrapper that satisfies the requirements for a COM interface implementation (a vptr that points to a vtable). The CoDelegator class holds a pointer to the hook specified in the CreateDelegator call (an implementation of IDelegatorHookQI), and the individual Delegator structures hold backpointers to the CoDelegator class, where QueryInterface is implemented (thus allowing the UD to correctly maintain the laws of COM identity).

The Delegator structure also holds a pointer to a hook. This is an implementation of IDelegatorHookMethods, which is an optional interface that hooks can implement if they would like to perform per-method preprocessing or postprocessing. This interface is shown in Figure 1, and provides the hook with the zero-based index of the method that was invoked. (The lowest index any hook will see is 3, since 0 through 2 are the IUnknown methods implemented by the UD.) Along with the method index, the hook is passed a pointer to the method's arguments on the stack. To allow the hook to maintain context between the preprocessing and postprocessing phases, the UD stashes a hook-defined cookie in TLS for the duration of the method call, and gives it back to the hook during postprocessing.

A hook may want to provide an implementation of IDelegatorHookMethods only for certain interfaces, and may prefer to use different implementations for each interface. The UD does not care about these details, so the hook is free to implement this interface in any way it sees fit. To install a method-level hook for a given interface, the implementation of OnFirstDelegatorQIFor should specify a pointer to the hook as well as one or both of the DHO_ XXXPROCESS_METHODS options, as the sample code in Figure 4 demonstrates.

Implementing Simple Delegation

To understand exactly how the UD implements interception without relying on type information, it is first important to understand the binary standard for a COM method call. On all 32-bit Windows® platforms today, the standard calling convention used by COM interfaces is __stdcall. This calling convention specifies that arguments are pushed onto the stack from right to left (consistent with the __cdecl calling convention), and that the receiver of the method call adjusts the stack pointer upon returning from the method (consistent with the now defunct __pascal calling convention). The fact that the implementor of the method is responsible for cleaning up the stack means that as long as the delegator actually invokes the method on the original object, the delegator itself does not have to know how many parameters were actually pushed on the stack.

Consider the stack layout for a given method call, shown in Figure 5. Imagine that this is a method call being made on the UD, in which case pFoo actually points to a Delegator struct. Assuming for the moment that you don't care about preprocessing or postprocessing, it seems like all you need to blindly delegate the call would be to peek at the wrapped interface pointer's vtable, then simply jump through the appropriate function pointer directly. If you don't change the stack pointer (esp), the wrapped object will happily implement the method, adjust the stack pointer, and return directly to the original caller. The only hole in this scheme is that one of the parameters on the stack will be incorrect; the pFoo parameter is the implicit this pointer, which currently points to the Delegator struct, not the wrapped interface pointer. Since this is a copy of a pointer variable passed by value to the inner object, the caller cannot be guaranteed what the pointer value will be on return. So it is completely safe and reasonable to simply swap out pFoo for the wrapped interface pointer (directly on the stack) before delegating the call. Fortunately, since the __stdcall calling convention dictates a right-to-left argument push, upon method invocation the this pointer can always be found at esp + sizeof(void*) (to skip over the return address on the stack), regardless of the signature of the method.

Figure 5 Stack Layout

One other piece of information required is the method being invoked. This is necessary so the delegation code knows which method to invoke via the wrapped interface pointer's vtable. Figure 6 demonstrates how a single function can be used to delegate any method call, coupled with small shim entry points to determine which method of the interface was invoked. It's surprisingly simple and efficient.

For those who are not familiar with it, __declspec(naked) suppresses the compiler-generated stack frame that normally exists to allow local variables to be declared on the stack via standard C++ syntax. When the jmp instruction actually delegates the call, the delegation code must guarantee that the stack pointer has the same value it did when the client made the original call. A standard compiler-generated stack frame usually looks like this:

push ebp
mov  ebp, esp

To avoid relying on this code (which is not guaranteed to always look like this), the delegator must take over the management of the stack frame. __declspec(naked) is incredibly useful for plumbing like this, and my hat is off to the Visual C++® team for exposing it. (It's been a long time since I've had to use a tool other than the inline assembler provided by Visual C++.)

Note that the delegate function assumes that the wrapped interface pointer is stored at a fixed offset from the top of the Delegator struct. This structure is carefully laid out to allow these assumptions to be made. With this code in place, all that you need to do is provide enough shim entry points to support any interface with a reasonable number of methods. The UD provides 1024 entry points, and therefore will support interfaces that have 1024 methods or less, which should not be a problem for most systems. A single vtable is constructed from these entry points, and each Delegator struct has a vptr member that points to this vtable, thus creating a generic implementation that behaves correctly when substituted for virtually any other COM interface (specifically, those that don't have more than 1024 methods).

Implementing Interception

The code in Figure 6 is interesting, but is only the starting point for a true interception architecture. The first thing to consider is adding the capability for preprocessing. This is not difficult because after making a method call to invoke a preprocessing routine, the stack pointer will naturally be adjusted back to its original value, and delegation can proceed with the code in Figure 6.

The tricky part is postprocessing. Since this minor detail was ignored in the delegation code already discussed, a simple jmp statement was used to transfer control directly to the wrapped object, whose final ret statement returns control directly to the original caller. The UD is out of the loop and cannot possibly perform postprocessing.

What the UD really needs to do is temporarily swap out the return address on the stack to point back to the delegator code. This lets the delegated call return to the UD for postprocessing. Finally, the original return address can be restored, and the UD can return to the original caller. This is totally reasonable, and would work reliably if only there was a safe place to store the original return address. Pushing it on the stack won't work since the UD must be very careful to leave the stack pointer unchanged when delegating the call. There is no option to make a copy of the stack arguments and create an entirely new stack frame for the call because the delegator has no idea how many arguments there are in the first place (remember, the UD operates without type information). The UD needs some kind of out-of-band storage, which is exactly what TLS was designed to provide. (This article assumes you are familiar with TLS, or that you can become familiar with it by reading the Platform SDK docs coupled with Jeffrey Richter's book, Advanced Windows.)

When a hook answers OnFirstDelegatorQIFor with the DHO_POSTPROCESSMETHODS flag set, the UD ensures that a unique TLS slot has been allocated within the process. This was the bookkeeping code referred to earlier in the delegator's QueryInterface implementation (see Figure 2). The UD uses this slot to hold context information while allowing the wrapped object to execute the delegated call. Technically, the only piece of out-of-band information that is required to return control to the delegator is the return address; however, there are other important items that need out-of-band storage, such as the pointer to the delegator itself, which holds the hook that will ultimately perform the postprocessing.

For the convenience of the hook, room for a hook-defined cookie is also reserved in the structure the UD allocates and references via TLS:

struct CallContext {
  Delegator*  m_pDelegator;
  const void* m_pReturnAddr;
  DWORD       m_nCookie;
  DWORD       m_nVtblOffset;
};

Since TLS effectively provides private storage on a per-thread basis, it might seem safe to simply allocate and initialize one of these structures and tuck the pointer away in the preallocated TLS slot for the duration of the delegated call. However, in the presence of recursion (or a nested call to some other UD in the same process), this would break down. You literally need a parallel stack in TLS to support the safe maintenance of call contexts on each thread. Because of this constraint, Call Context objects are stored on an intrusive linked list whose head is stored in TLS. Generally, only one element will be stored on the list, but in rare cases like recursion the list may grow.

Since a CallContext object must be allocated and freed for every single method call, the UD provides a custom fixed-sized memory allocator that makes sure these allocs and frees are extremely fast and reliable. The allocator obtains fixed-size blocks of memory from the OS and suballocates from within them, weaving a linked list of free elements through the block. This ensures that virtually all allocation requests are satisfied immediately, without the searching that normally takes place within a generic heap, and ensures that memory fragmentation does not occur due to many small allocations. As the allocator acquires blocks of memory from the OS, it reuses those blocks and only gives them back when the UD is unloaded from the process, which makes the allocator much less prone to out-of-memory failures.

Figure 7 shows the actual code used by the UD when postprocessing is required. During preprocessing, the UD attempts to allocate a CallContext. If this memory allocation fails, the call cannot possibly be postprocessed since there will be no place to store the original return address of the caller. In this case, the UD is in a rather ugly situation. The UD cannot simply fail the call with E_OUTOFMEMORY since the method implementor is required to clean up the stack (recall the __stdcall convention) and the UD doesn't have type information to know how to do this. The call must be delegated to the original object without postprocessing. This is one of the reasons why the extra attention has been paid to the robustness of the CallContext memory allocator.

After acquiring and initializing a new CallContext object, the hook is given a chance to preprocess the call and the cookie returned by the hook is copied into the CallContext object for later use by the hook's postprocessing function. At this point, the UD swaps out the this pointer on the stack (as described earlier). If the CallContext allocation was successful, the UD simply pops the original return address off the stack and makes a call (as opposed to a jmp) via the wrapped interface pointer's vtable. This intrinsically pushes a new return address on the stack, which points back to the delegator's code. Once the call returns, the UD adjusts the stack pointer to make room for the original return address, which it writes after invoking the hook's DelegatorPostprocess method using the CallContext object tucked away earlier in TLS. Finally, the UD frees the CallContext object and returns control to the original caller.

Considering that postprocessing is so much more involved than a simpler scenario that requires (at most) preprocessing, the UD allows hooks to select the quality of interception service required for any given interface. This is provided as an optimization to avoid any unnecessary overhead for hooks that don't require postprocessing, and is manifested in the options specified when installing a new method-level hook (DHO_POSTPROCESSMETHODS). When this option is specified, the Delegator's vptr points to an alternate vtable that uses the more sophisticated interception algorithm. This vtable also supports interfaces with up to 1024 methods.

Inhibiting Method Calls

The major limitation of this no-type-info-required architecture is that there is absolutely no possibility of inhibiting delegated method calls because the Delegator relies on the wrapped object's implementation to clean up the stack. If COM instead preferred the __cdecl calling convention, this would not be an issue since the caller would be responsible for cleaning up the stack, and it wouldn't matter whether the call was actually delegated to the object or blocked by a hook. So with the current architecture, the DelegatorPreprocess method on the IDelegatorHookMethods interface does not return an HRESULT; it returns void to indicate that there is no way to stop a method from being delegated. If this problem were to remain unsolved, there would be no way to implement a hook to do server-side security access checks and return E_ACCESSDENIED directly from the preprocessing code.

Interestingly enough, soon after I submitted this article to MSJ, I successfully reverse-engineered the /Oicf string format produced by the MIDL compiler. Since the editors at MSJ thankfully decided to split the article into a two-part series, I had time to revise this section and present some good news. The bottom line is that I added a new option for installing a method level hook: DHO_MAY_BLOCK_CALLS. If you specify this flag, the delegator will search for a standard interface marshaler (packaged in a proxy/stub DLL built with the MIDL /Oicf option, or in a type library for [dual] or [oleautomation] interfaces) and extract the stack sizes for each of the methods in the interface. This information is cached in a global table that is keyed based on the IID of the interface. The delegator holds a pointer into this table, allowing it to correctly unwind the stack if DelegatorPreprocess returns a failure code (such as E_ACCESSDENIED).

If this new option is specified when a method-level hook is installed, the delegator will ask the method-level hook for IDelegatorHookMethods2 instead of IDelegatorHookMethods. This new interface looks similar to the old one, except it returns an HRESULT from DelegatorPreprocess rather than void, allowing hooks to fail calls during preprocessing. This also gave me a chance to make the interface more Win64™-friendly by changing the hook-defined cookie from a DWORD to a void* (pointers will be 64 bits in Win64, while DWORDs will remain 32 bits).

To demonstrate the ability of the delegator to block calls, I built the canonical hook that implements declarative security based on custom attributes in a type library. Figure 8 shows some IDL that uses these attributes to grant or deny access to various groups on a class-by-class basis, an interface-by-interface basis, or even on a method-by-method basis. This sample demonstrates custom IDL attributes and traversal of type information at runtime (which is rather grungy code to write). This hook is known as the Access Control hook, and you can download it along with the code for this article from http://www.microsoft.com/msj.

Cool Marshaling Techniques

One of the most interesting features of the UD is its ability to custom marshal. This allows the Delegator to automatically propagate itself by value, taking along the hook that controls the interception policy. However, there are several issues that need to be dealt with to make this work correctly and robustly:

·The UD may be asked to marshal-by-value (MBV) only within a limited context

·The wrapped object may implement IMarshal explicitly

·Hooks may or may not be stateful

·The wrapped object may desire apartment neutrality by aggregating the freethreaded marshaler (FTM)

·Some people still think COM identity is important

·Interface pointers may be marshaled into the Global Interface Table (GIT)

Rather than attempt to walk through the UD's marshaling code (which I'm sure all of you will do during your copious free time), I think it would be more useful to convey the rationale and general techniques the UD uses to deal with the issues discussed here. This particular discussion is targeted at intermediate to advanced COM developers who are familiar with the basics of custom marshaling, the FTM, and the GIT. (Read Don Box's book, Essential COM, if you want to become familiar with these topics.) My hope is that a discussion of the issues I encountered when developing the UD's marshaling infrastructure may help clarify some things regarding the somewhat esoteric art of using COM's custom marshaling architecture. In any case, it will at least be fodder for flames on the DCOM listserver.

If the UD is asked to MBV in any of the three contexts (INPROC, LOCAL, or DIFFERENTMACHINE), it must take over the implementation of IMarshal rather than blindly delegating those calls. However, when marshaling occurs, a context hint is provided by COM that specifies how far away the marshaled interface pointer could potentially travel. If, for instance, a UD is created with only the DO_MBV_INPROC option, and is later asked to marshal to a different process or a remote machine, the UD simply backs off and delegates to the standard marshaler (by calling CoGetStandardMarshal to get the standard implementation of IMarshal).

There is nothing new about this technique; objects that custom marshal within limited contexts use this feature all the time. The UD adds a twist: if the wrapped object exposes an implementation of IMarshal, using the standard marshaler would be incorrect. For instance, recall that standard proxies implement IMarshal to avoid creating proxies to proxies. If the UD were to use the standard marshaler to marshal an interface pointer to a proxy, the standard marshaler would ignore the proxy's IMarshal implementation and would instead create a stub for the proxy, leading to a middleman situation. To solve this problem, the UD always verifies that the wrapped object does not implement IMarshal before invoking the standard marshaler. If the object does custom marshal, the UD simply delegates directly to the object's implementation of IMarshal.

Some hooks, like the Anonymous Hook, are very simple and don't require any initialization. Other hooks, like the Alternate Credentials Hook, require state to function properly (for example, an authority, a principal, and a password). In any case, when the UD marshals by value, it needs to bring the hook along, state or no state. Stateless hooks simply need to provide a CLSID so when the UD is unmarshaled it can create a new copy of the hook by calling CoCreateInstance. This is a requirement of all hooks-they must implement IPersist to provide their CLSID.

Stateful hooks, on the other hand, are required to implement IPersistStream, which allows the UD to marshal not only the CLSID of the hook, but also its entire state. Due to the fact that IPersistStream has no mechanism for releasing resources held by the stream, hook developers should avoid marshaling interface pointers into this stream unless it is known in advance that those interfaces don't need to be released (for example, interface pointers that are marshaled with MSHLFLAGS_NOPING). If something should go wrong later in the UD's marshaling code, there would be no way to let the hook know that it needs to call CoReleaseMarshalData. I chose to use IPersistStream over IMarshal for marshaling hooks because of its simplicity.

Some objects prefer to be apartment-neutral-that is, they don't ever want a proxy to stand between them and their in-process clients, even when those clients are in a different apartment than the one the object was originally created in. These objects float from apartment to apartment within a process by aggregating the FTM. Wrapping the UD around an apartment-neutral object and asking the UD to MBV would normally change these semantics a bit, since a new copy of the UD would be created in each apartment. This is not the normal behavior of objects that aggregate the FTM, and so to avoid this (potentially surprising) change in semantics, the UD explicitly checks whether the wrapped object has aggregated the FTM. This is a simple check since the UD already needs to determine whether the object implements IMarshal. To check for the FTM, the UD simply calls IMarshal::GetUnmarshalClass and checks the CLSID against that of the FTM (which is undocumented, but easily obtainable from the FTM itself):

// in Windows NT 4.0,
CLSID_FTM = {0000001C-0000-0000-C000-000000000046}
// and in Windows 2000,
CLSID_FTM = {0000033A-0000-0000-C000-000000000046}

If the UD discovers the wrapped object is using the FTM, when the UD marshals-by-value, it uses the FTM to perform the marshaling, which avoids creating new copies of the UD. For example, if the UD is marshaled by value between apartments in the same process, the interface pointers handed out would all have the same physical pointer values rather than having separate copies made for each apartment. This maintains physical COM identity and will avoid surprising a client that expects to be using an apartment-neutral object and isn't aware that the UD has been injected on top of that object because the UD now also becomes apartment-neutral.

An interesting consequence of this behavior is that hooks also need to be apartment-neutral. Hook developers must be aware of this and should avoid holding apartment- or thread-sensitive resources (for instance, interface pointers or window handles) as data members. If a hook really needs to hold an interface pointer, it should store it in the GIT and hold a GIT cookie instead.

Regardless of the MTS revolution, some people still think that physical COM identity is important (and it really can be useful in some cases where scalability is not important). Specifically, the identity I'm referring to is the ability to QueryInterface for IID_IUnknown on two random interface pointers and compare the resulting physical pointer values for equality. This precisely determines whether the interface pointers refer to the same physical COM object.

Typically, when implementing custom marshaling, COM identity gets broken; MBV is a very obvious example. When passing an object by value from apartment A to apartment B, a new copy of the object is created in apartment B. If the same physical object is again passed by value to apartment B, another copy is made. COM identity is clearly lost in this case. Most of the time when a COM developer chooses to use MBV, physical identity is not much of an issue anyway. The UD is a clear exception to this rule, however. A developer may want the convenient behavior of propagating the UD around wherever an interface pointer is marshaled, but may not want to pay the price of breaking COM identity to make this happen. In this case, the UD can be constructed with the DO_MBV_XXX flags coupled with the DO_MAINTAIN_IDENTITY flag. This forces the UD to respect COM identity, even when marshaling by value.

The normal way to maintain physical COM identity (as standard proxies do) when custom marshaling is to have a separate unmarshaler class whose job is to perform unmarshaling and maintain a table of existing identities within each apartment of the process. This means that the object's implementation of IMarshal::GetUnmarshalClass should not return its own CLSID (which is standard practice when implementing MBV); rather, it should return the CLSID of a separate unmarshaler. Often this unmarshaler can simply be an apartment-neutral singleton that either hands out a pointer to an existing object (based on a table lookup) or creates a new object, adding it to the table.

The UD uses a separate unmarshaler class to support the potential to maintain identity. This class manages a dictionary whose entry consists of a pair where the key is the IID_IUnknown pointer to the wrapped object (in other words, its physical COM identity in a particular apartment), and the value is a pointer to the UD for that object. The UD therefore maintains COM identity based on the identity of the underlying object. As long as the wrapped object correctly maintains identity, the UD will as well. I felt this was a reasonable guarantee, satisfying the principal of least surprise. Keep in mind that these tables are only used for UDs created with the DO_MAINTAIN_IDENTITY option.

The GIT is becoming a very popular tool for multiple-apartment processes because it tremendously simplifies the work needed to export an interface pointer in such a way that many apartments can import it. However, being marshaled into the GIT is very different than being marshaled via the deprecated CoMarshalInterThreadInterfaceInStream API, and objects that custom marshal quickly become aware of this difference. The older API performs a normal marshal (MSHLFLAGS_NORMAL). The GIT, on the other hand, performs a strong table marshal (MSHLFLAGS_TABLESTRONG). Objects that custom marshal often neglect this context and have trouble being marshaled into the GIT.

When implementing IMarshal, seeing the combination of MSHLFLAGS_TABLESTRONG and MSHCTX_INPROC provides a pretty strong hint that the object is being marshaled into the GIT. In this case, the UD simply marshals the wrapped object into the GIT and stores the GIT cookie in the marshal data. This allows the unmarshaler to retrieve the wrapped interface pointer from the GIT. This is the only table-marshaling context supported by the UD, which is a simple way of avoiding any weird lifetime issues that often crop up during table marshals.

Wrap-up

I've discussed the design and implementation of a lightweight interception framework that provides for pluggable interception policies known as hooks. The Universal Delegator can be used in client code as well as in server code, for many different purposes. I can't speak for anyone else, but I fell in love with COM all over again when I saw the cool plumbing I could build on top of an object model that has such a simple and elegant binary definition at its core.

If you would like to download a complete copy of the delegator, the sample hooks discussed here, as well as all the source code, the latest version can be found at http://www.develop.com/kbrown in my COM sample gallery.

For related information see: DCOM Technical Overview at http://msdn.microsoft.com/library/backgrnd/html/msdn_dcomtec.htm. Also check http://msdn.microsoft.com for daily updates on developer programs, resources, and events.

From the February 1999 issue of Microsoft Systems Journal.