This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.


April 1996

Microsoft Systems Journal Homepage

OLE Q & A

Don Box

Don Box has been working in networking and distributed object systems since 1989. He is currently chronicling the COM lifestyle in book form for Addison Wesley, and gives seminars on OLE and COM across the globe. Don can be reached at dbox@braintrust.com.

QI have finally convinced my manager to abandon 16-bit Windows® as a target platform, but now he wants me to write a Windows® 95 logo-compliant, multithreaded application. I am having a difficult time getting a handle on how threads interact with OLE. What gives?

AThe first release of 32-bit OLE shipped with Windows NTÔ 3.5 and did not allow multiple threads in a single process to use OLE. In this initial release, only one thread per process was allowed to call CoInitialize, upon which it would become the only thread within its process that was allowed to make OLE API calls or make method calls through interface pointers. This made it impossible to create multithreaded servers or clients.

With the release of Windows NT 3.51 and Windows 95, it is now possible to write multithreaded OLE servers and clients. In these releases of OLE, there are no fixed limitations on the number of threads per process that can make OLE API calls. Once a thread calls CoInitialize, you can make as many OLE API calls as you want. There are some interesting limitations on the number of threads that can make method calls on an interface pointer, and this is where confusion often occurs. To better understand the threading model of OLE, it is useful to examine the techniques commonly used to write thread-safe objects in C++.

Figure 1 illustrates a simple helper class (MTObject) that can be used to make an object thread safe. It has a single mutex data member that acts as the lock for the object. Acquire, a nested helper class, allows exception-safe locking of the object for some fixed scope (the constructor acquires the lock using WaitForSingleObject, and the destructor releases the lock using ReleaseMutex). The correct usage of this class is as follows:

 class Accumulator : protected MTObject {
  double m_val;
public:
  double GetValue() const {
    Acquire lock(this); // get the lock in ctor    
    return m_val;
  }                     //release lock in dtor

  void Add(double val) {
    Acquire lock(this); // get the lock in ctor
    m_val += val;
  }
                        // release lock in dtor
};

Acquiring the lock at the beginning of each member function ensures that no more than one thread will be performing an operation on the object at any instant in time. Many threads may be blocked at the initial locking statement, but only one thread at a time will be manipulating the actual data members of the object.

The current threading model of OLE is inspired by the simple example shown previously. To allow multiple threads to access an object safely, OLE assumes that all interprocess and interthread access to the object will occur through proxy/stub pairs. As shown in Figure 2, any number of threads can have outstanding proxies on an object. However, the stub always belongs to the thread that initially marshals the first interface from the object. This thread is called the "owning apartment" of the object, as it is the thread where the object resides. The term "apartment" is used to indicate that a single piece of real estate (a process) can support multiple dwellings (threads). All method calls on a proxy result in messages being queued on the apartment that owns the corresponding stub. These messages are serviced sequentially as a byproduct of running a standard GetMessage/DispatchMessage loop, and the result is that only one physical thread is ever in direct contact with the object. This single-threaded access policy is logically equivalent to using the MTObject class shown earlier, and has the positive side effect of not requiring the object implementation to perform the serialization explicitly.

Figure 2 Apartment Model Threading

Armed with the information presented so far, you could easily write a multithreaded OLE server that creates one apartment per COM class allowing your server to take advantage of multiple CPUs or make blocking system calls without hanging the entire server. For each class, we can simply create a thread that registers the class factory and runs a message pump to dispatch incoming calls for objects created within that apartment. Assuming the generic thread proc below,

 struct APT_INIT {
  REFCLSID rclsid;
  IClassFactory *pcf;
};
DWORD WINAPI AptProc(void* tparam) {
  APT_INIT *pai = ((APT_INIT*)tparam);
  DWORD dwReg;
  CoInitialize(0);
  CoRegisterClassObject(pai->rclsid, 
                        pai->pcf,
                        CLSCTX_LOCAL_SERVER, 
                        REGCLS_MULTIPLEUSE,
                        &dwReg);
  MSG msg;
  while (GetMessage(&msg, 0, 0, 0))
      DispatchMessage(&msg);
  CoRevokeClassObject(dwReg);
  CoUninitialize();
  return msg.wParam;
}

you could easily write a multithreaded server as follows:

 int WINAPI WinMain(HINSTANCE,HINSTANCE,LPSTR,int)
{
  CoClassFactory<CoRectangle> factory1;
  CoClassFactory<CoEllipse>   factory2;
  CoClassFactory<CoPolygon>   factory3;

  APT_INIT ai[3]={{CLSID_Rectangle,&factory1},
                    {CLSID_Ellipse,&factory2},
                    {CLSID_Polygon,&factory3}};
  HANDLE h[3];
  DWORD dw;
  for (size_t i = 0; i < 3; i++)
    h[i] = CreateThread(0,0,AptProc,&ai[i],0,&dw);
  WaitForMultipleObjects(3, h, TRUE, INFINITE);
  return 0;
}

The resulting concurrency model of this server is shown in Figure 3.

Figure 3 Multithreaded Shape Server

The code shown above sidesteps one very sticky problem that all multithreaded servers need to deal with: server lifetime control. If you assume the normal ObjectDestroyed routine as it is implied in the COM specification,

 DWORD g_cLocks = 0;
void ObjectDestroyed() {
  if (--g_cLocks == 0)
    PostQuitMessage(0);
}

the last object to be destroyed will only terminate the currently executing thread. In the previous example, this means that the last shape to be destroyed is a rectangle, only the rectangle apartment will be shut down, leaving the polygon and ellipse apartments running. Since LocalServers are started on a process level, there is no way to restart our rectangle apartment within the current process. To ensure the correct behavior, we need to track the total number of object instances in our server, and shut down all apartments when there are no instances of any class left to be serviced. One approach for shutting down our apartments would be to simply call ExitProcess. This approach is extremely heavy-handed and doesn't guarantee that each thread will be able to perform an orderly termination sequence (for example, executing destructors for objects defined at thread scope). Ideally, we would like to call PostQuitMessage for each apartment in our server. Calling PostQuitMessage on an arbitrary thread can be simulated by using the PostThreadMessage API function.

 BOOL PostQuitMessageToThread(DWORD id, WPARAM wp){
  return PostThreadMessage(id, WM_QUIT, wp, 0);
}

To correctly shutdown a multi-apartment server, a list of thread IDs for all apartments must be maintained, and the ID list can be used to multicast the WM_QUIT message upon the final object destruction.

CoModuleLifetime is an implementation of this technique (see below Figure 4). CoModuleLifetime is a C++ class that maintains the lock count for a server and shuts down a set of managed apartments. It is assumed that a single instance of this class will be used by all objects (which will call its LockServer method in their constructors and destructors) and by all apartments (which will register themselves to receive WM_QUIT messages upon server shutdown). To register itself as a managed apartment, our generic thread proc needs only slight modification.

 CoModuleLifetime g_module; // single instance
DWORD WINAPI AptProc(void* tparam) {
  APT_INIT *pai = ((APT_INIT*)tparam);
  DWORD dwApt, dwReg;
  CoInitialize(0);
  //register to receive WM_QUIT message
  g_module.RegisterManagedApartment(&dwApt);
  CoRegisterClassObject(pai->rclsid, pai->pcf,
                        CLSCTX_LOCAL_SERVER, 
                        REGCLS_MULTIPLEUSE, &dwReg);
  MSG msg;
  while (GetMessage(&msg, 0, 0, 0))
      DispatchMessage(&msg);
  g_module.RevokeManagedApartment(dwApt);
  CoRevokeClassObject(dwReg);
  CoUninitialize();
  return msg.wParam;
}

Assuming that the corresponding WinMain uses the class factory shown in Figure 5, we have a functioning multithreaded server. While a method call on a rectangle is blocked, calls to ellipses and polygons are also allowed to be serviced. However, the rectangle thread can service only one method call on a rectangle at a time.

This concurrency model is reasonable for scenarios where multiple independent classes need to exist in a single process. On a multiprocessor computer running Windows NT, it is conceivable that your server could utilize up to three processors at a time (one processor per apartment). Of course, this utilization assumes that the demand on your server is spread evenly across rectangle, ellipse, and polygon method calls. If this is not the case, you would probably prefer allocating more than one apartment to service concurrent method calls for a single COM class.

Under the current COM threading model, the thread that services method calls for an object is fixed at the initial marshaling (normally via the CreateInstance method on the class factory). To allow more than one apartment to own objects of a given class, you would ideally like to have several worker apartments register a class factory for the same CLSID and have the system select the appropriate worker thread (based on CPU utilization or some other scheduling mechanism). Unfortunately, you cannot do this, as only one call to CoRegisterClassObject will succeed. Instead, you must register a master class factory with the system once, and have its implementation of CreateInstance multiplex to one of the worker apartments that will create the instance and perform the initial marshaling.

The OLE SDK sample (OLEAPT) uses a somewhat arcane technique to signal the worker apartment to create and marshal the object. Here, I'm going to illustrate a more general technique that can be used outside of the context of this article. Figure 6 illustrates a generic multiplexing class factory that uses round-robin scheduling to select 1 of n class factories to perform the CreateInstance call. The presence of the proxy/stub connections between the multiplexor and the actual class factories is required for two reasons. First, because COM requires all interthread communications to take place using COM marshaling. Second, because you need the initial marshaling to originate from the worker thread, not the multiplexing thread. The advantages of using this technique are that the message pump of the worker apartments does not need any special case coding to accommodate interthread communications and standard COM techniques for instantiation are used.

Figure 6 Multiplexing Class Factory

It may not be obvious from looking at Figure 6 how you pass an interface pointer across thread boundaries to create the proxy/stub connection. To understand how this is done, it is useful to review how COM marshaling takes place: when an interface pointer is to be passed across a marshaling boundary, the remoting code for the sender uses the CoMarshalInterface API function to marshal the interface into the marshaling packet via the packet's IStream interface.

 HRESULT CoMarshalInterface(IStream *pStmPacket,
                           REFIID riid, 
                           IUnknown *pUnkObject,
                           DWORD dwDestContext,
                           void *pvDestContext,
                           DWORD mshlflags);

The dwDestContext parameter describes the type of boundary that the interface is being marshaled across, and currently must be one of the four values shown in Figure 7. Using CoMarshalInterface and CoUnmarshalInterface, you can pass interfaces across thread boundaries.

 HGLOBAL MarshalFromServerThread(IUnknown *punk) {
  //allocate memory for packet  
  HGLOBAL result = GlobalAlloc(0, GPTR);
  IStream *pstrm;
  //create an IStream interface to packet memory
  CreateStreamOnHGlobal(result, FALSE, &pstrm);
  //write the marshaling info into the packet  
  CoMarshalInterface(pstrm, IID_IUnknown, punk,
                     MSHCTX_INPROC, 0, MSHLFLAGS_NORMAL);
  pstrm->Release();
  return result;
}
IUnknown* UnmarshalIntoClientThread(HGLOBAL hg){
  IUnknown *punkResult;
  IStream *pstrm;
  //create an IStream interface to packet memory
  CreateStreamOnHGlobal(hg, TRUE, &pstrm);
  //read the marshaling info from the
  //packet and create the handler/proxy to the object    CoUnmarshalInterface(pstrm, IID_IUnknown, 
                       (void**)&punkResult);
  pstrm->Release();
  return punkResult;
}

To automate the task of interthread marshaling, COM provides two helper functions that perform the operations just shown.

 HRESULT 
CoMarshalInterThreadInterfaceInStream(REFIID riid,
                                      IUnknown* punk,
                                      IStream** ppstm);
HRESULT 
CoGetInterfaceAndReleaseStream(IStream *pstm,REFIID riid,
                               void** ppvObj);
 

The multiplexing class factory shown in Figure 8 uses these API functions to create the proxy/stub connection between the class factories in the worker threads (that call CoMarshalInterThreadInterfaceInStream) and the class factory in the multiplexing apartment (that calls CoGetInterfaceAndRelease-Stream). Once these connections are established, the forwarding of CreateInstance from the multiplexor crosses the thread boundary to one of the workers, causing the initial marshal to originate in the worker thread, not the multiplexing thread.

Given the nonmultiplexing and multiplexing class factories shown in Figures 5 and 8, you can build a multithreaded server that creates a pool of worker apartments and spreads the object instantiations across each worker thread. Figure 9 shows a COM class that performs a blocking operation in its Sleep method, and Figure 10 shows the implementation of the worker and multiplexing apartments that are used to implement the server.

To test the server, one simply needs to instantiate a small number of objects and invoke the Sleep method on each object. Note that when an object is blocked in its Sleep method, objects that exist in other apartments can execute freely, but objects in the same apartment are blocked. Figures 11 and 12 show a Visual Basic® client program that instantiates an object and makes a blocking call.

Figure 12 SleepClient in action.

Figure 12a This client puts apartment 1 to sleep for one minute

Figure 12b This client will not block (apartment 2 is free)

Figure 12c This client will not block (apartment 3 is free)

Figure 12d This client will block (apartment 1 is busy)

Is this the perfect scenario for a concurrent server? Probably not. As is often the case, compromises are made to balance efficiency, ease of programming, and functionality. Apartment-model threading favors ease of programming in the general case (as thread safety is not required at the object instance level), but makes it difficult to implement certain concurrent designs. While OLE provides hooks to allow free-threaded access to an object within the same address space, all interprocess method calls are sent to the originating apartment. One could extrapolate from the design of the multiplexing class factory to have a new apartment spawned for each new instance, but this would result in a larger than necessary number of threads and still not allow multiple clients to access a single object concurrently.

Future versions of OLE are likely to support a free-threading model similar to that used by Microsoft RPC, where a pool of anonymous worker threads is dispatched on a method-by-method basis with no regard to thread/object relationships. While this approach would yield the greatest flexibility, it also places a greater burden on the object implementor to guarantee the thread safety of the server.

Have a question about programming in OLE? You can mail it directly to Q&A, Microsoft Systems Journal, 825 Eighth Avenue, 18th Floor, New York, New York 10019, or send it to MSJ (re: OLE Q&A) via:

Internet:

Don Box
dbox@braintrust.com

Eric Maffei
ericm@microsoft.com

From the April 1996 issue of Microsoft Systems Journal.