Introducing Distributed COM and the New OLE Features in Windows NT™ 4.0--MSJ, May, 1996

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

May 1996

Introducing Distributed COM and the New OLE Features in Windows NT™ 4.0
Don Box

The most exciting OLE-related feature in Windows NT 4.0 will likely be Distributed COM (DCOM). This means that in this version of Windows NT you can instantiate and bind to objects across the network. Windows NT 4.0 introduces many other features that represent a maturation of the API.

Don Box is a co-founder of DevelopMentor where he manages the COM curriculum. Don is currently breathing deep sighs of relief as his new book, Essential COM (Addison-Wesley), is finally complete. Don can be reached at http://www.develop.com/dbox/default.asp.

The wait is over. The most highly anticipated release of OLE is now in beta, and will be released upon all of humanity within a few months of your reading this. I'm not talking about Cairo; I'm talking about Windows NT? 4.0.
      The most exciting OLE-related feature in Windows NT 4.0 will likely be Distributed COM (DCOM). This means that in this version of Windows NT you can instantiate and bind to objects across the network. Windows NT 4.0 introduces many other features that represent a maturation of the API, most of which have nothing to do with sending packets on a network. The key highlights of the Windows NT 4.0 release of OLE include:
MIDL 3.0
Enhanced threading model
Security
Support for Windows NT services
Remote instantiation
System support for persistent property sets
      Many of these features will creep back into Windows? 95 in the future, of course. This article is based on a beta prerelease version of DCOM. Any and all features are subject to change.
Ground Zero: MIDL
      Real COM programmers start work in IDL. IDL is a language that has its origins in OSF DCE and was originally used to define functional RPC interfaces in DCE and Microsoft RPC. In Windows NT 3.5, the Microsoft Interface Definition Language (MIDL) compiler was extended to support COM interfaces as well. For both RPC and COM interfaces, the MIDL-generated code marshals the parameters passed onto a call stack into packets that can be transmitted across process or host boundaries so the function can execute remotely. With the newest version of MIDL, these same interface descriptions can be used to generate type libraries, which are binary descriptions of COM interfaces and implementations often used in OLE Automation. This merging of functionality makes IDL the best language for defining all things COM.
      MIDL allows developers to describe the OLE interfaces used in a project in a platform- and language-independent manner. These descriptions unambiguously indicate the direction in which parameter values must be passed (caller to function, function to caller) and the run-time dimension of arrays. Based on these descriptions, the MIDL compiler can generate C and C++ language definitions for the interface as well as the proxy and stub implementations used in standard marshaling. This functionality has been available since MIDL 2.0 in Windows NT 3.5, although the first version of Visual C++ to ship with the MIDL compiler was Visual C++® 4.0 (see the sidebar Using MIDL with Visual C++; for details).
      At its core, IDL is an attribute-extended subset of C++ that supports standard C constructs like enum, struct, union, and typedef. It also supports the COM notion of an interface, which is defined as a family of logically related operations or methods. By defining your interfaces in IDL, you are free to generate language bindings to access your objects from whatever language happens to be in fashion (Object Cobol, Perl++, Latte, or whatever). Using IDL as your initial starting point instead of C++ seems strange at first, but the long-term benefits far outweigh any initial discomfort.
      In IDL, anything can have attributes associated with its definition. These attributes precede the actual C-style definition and tell the IDL compiler additional information about what's being defined.

[ attr1, attr2 ] long var1;

The attributes attr1 and attr2 apply to the variable var1. Except for the extended attributes, writing IDL is much like writing a standard C or C++ header file. IDL files typically consist of one or more interface definitions. Each interface definition contains the structure and enumeration statements used by the interface and definitions of each method exported by the interface. Figure 1 shows an IDL file that defines a simple COM interface named IPager. Note that, in addition to describing the vtable signature of the interface used for in-process implementations, the IDL file also describes the RPC messages—or Protocol Data Units (PDUs), as they're typically referred to by network gurus. PDUs are used to remote the method calls to objects in different address spaces (which may be on different host machines). Each method in an interface defines two messages or PDUs. These messages are sent back and forth between the MIDL-generated proxy and stub when the target object exists in a different thread, process, or host machine. The Request PDU is sent by the client-side proxy to the stub to invoke the object's method, and it contains the values of any parameters with the [in] attribute. The Response PDU is sent as a reply from the stub back to the proxy to indicate that the method call executed. The Response PDU contains the values of any parameters with the [out] attribute. Based on the method definitions in the IDL file, the MIDL compiler can generate proxies and stubs that translate between the stack frame and the Request and Response PDUs correctly. I don't have enough space to explain all the subtleties of IDL here, but Figure 2 describes some of the common IDL attributes used in COM.
      The new MIDL 3.0 compiler bundled with the Win32® SDK incorporates the functionality of MKTYPLIB, the ODL compiler used to generate type libraries (binary descriptions of interfaces and implementations used by development environments such as Visual Basic® and Visual C++). By extending IDL to support ODL keywords and constructs, MIDL 3.0 eliminates the need to use different languages for proxy/stub implementations and type libraries. This also means that many of the extremely useful features of IDL—like inserting code into the generated header files and producing const-correct parameter lists—are now available when defining type libraries for OLE Automation.
      To generate a type library with MIDL, your IDL file must contain a type library definition. Unlike ODL, IDL files can contain definitions outside the scope of a library definition. (In ODL, the library statement must be the topmost definition in the file.) Like the previous version, MIDL 3.0 will generate C/C++ bindings and proxy/stub implementations for all of the interface definitions in the file (proxy/stub implementations can be suppressed by using the local attribute). The generated type library will contain only type descriptions for interfaces that are defined or referred to within the scope of a library definition. For the IDL file shown in Figure 3, the generated type library contains only descriptions of IInside (since it was defined inside the scope of the library statement) and IOutside (since there is a reference to IOutside by a statement inside the library). The library would not contain any references to IOutside2, so it is not present in the generated type library.
      MIDL is the future, and MKTYPLIB is the past. You will be compiling your type libraries with MIDL very soon. The MIDL 3.0 compiler can compile most existing ODL files with little or no modification, but there are several incompatibilities related to ODL and MKTYPLIB. The ODL Boolean data type differs from the IDL Boolean data type. ODL treats Boolean and BOOL as VARIANT_BOOL, while IDL treats Boolean as an unsigned char and BOOL as a long. If you are looking for Visual Basic compatibility (Automation-compliant Booleans), IDL supports the VARIANT_ BOOL type directly.
      The scope of typedef names and structure/union/enum tags is different. Given the statements

typedef enum tagWIDTH {WIDE,NARROW} WIDTH; typedef enum { TALL, NOTTALL } HEIGHT ; enum COLOR { RED = 0, GREEN, BLUE };

the MIDL-generated type library differs from the MKTYPLIB generated library. (In fact, MKTYPLIB will not even compile the third statement.) The first line is legal IDL and ODL, but MKTYPLIB ignores the enum tag name (tagWIDTH) and generates a TKIND_ENUM entry in the type library based on the typedef name WIDTH. Like C++, MIDL assumes the tag is the actual type name and generates a TKIND_ENUM entry for tagWIDTH and a TKIND_ ALIAS entry for the name WIDTH. If no structure tag is present (as is the case in the second statement), MIDL creates a unique name for its TKIND_ENUM entry and generates a TKIND_ALIAS entry that refers to it (in this case, HEIGHT). Since MKTYPLIB ignores tags, it generates a single TKIND_ENUM entry for HEIGHT. The third statement shows the IDL-style definition, which is similar to C++. The generated type library would contain a single TKIND_ENUM entry for COLOR, which is the desired result. Unfortunately MKTYPLIB does not recognize this syntax, which presents a problem if you need to maintain backwards compatibility with older development environments.
      ODL-generated header files use the DEFINE_GUID macro to declare and define the GUIDs used in the ODL file. IDL-generated files simply declare the GUIDs as extern in the header file, and define them separately in a generated C file (xxx_i.c). In addition, ODL supported both integral and floating point constants. IDL supports integral constants only.
      The scope of an enumeration name was local to the enum under ODL. Like C and C++, enumeration names have global scope in IDL. The following is legal ODL:

typedef struct { short s; } RED; typedef enum { RED, GREEN } COLOR;

IDL prohibits reuse of the typedef name as an enumeration name.
      To support legacy ODL files, MIDL can compile in MKTYPLIB compatibility mode. (It's enabled by the command line switch /mktyplib203.) When using this mode, your file must comply with the old ODL syntax and proxy/stub implementations are not supported. Fortunately, the C preprocessor is supported by both MIDL and MKTYPLIB, so differences between the two can be addressed using conditional compilation.
Threading and More Threading
      From day one, OLE has been based on two assumptions. First, objects that live in separate address spaces with separate threads of control may need to collaborate to accomplish some task. This implies that most programs are neither pure clients nor pure servers, but a hybrid of both. They are considered clients when accessing someone else's objects. They are considered servers when they are servicing method calls on behalf of other programs. Second, most COM interface methods are synchronous and have blocking semantics; the client program must wait for the method to complete before returning from the proxy. Since the client may also be a server—specifically, a server that is needed by the method originally invoked when acting as a client—some technique for allowing reentrancy into the client must be supported to prevent deadlock.
      These two assumptions were as true in 1993 as they are today. However, Windows NT 4.0 provides a new threading model for addressing reentrancy and concurrency. It also has an improved implementation of the original threading model that should yield higher performance than Windows NT 3.51 in single-machine scenarios.
      The original OLE threading model is called the Apartment model and is still the default model in Windows NT 4.0. The motivation for the Apartment model is that most objects are not thread safe. Objects created by Apartment-model threads are guaranteed to have method calls dispatched only at well-defined control points (API calls where incoming messages can be dispatched), and only by the thread that originally created the object. To achieve this, the remoting infrastructure dispatches method calls by posting a window message to the message queue of the thread that created the object. Since most single-threaded Windows applications are already servicing the queue regularly, this results is a sane interaction between incoming method requests and user input.
      Under the Apartment model, each thread that calls CoInitialize or OleInitialize is an OLE Apartment (a place where objects reside). Both API calls result in a call to CreateWindowEx. This creates an invisible HWND that receives a private window message for each incoming method request. The OLE-supplied WndProc for this window looks up the stub—based on the contents of the request—and calls the stub's Invoke routine. The first time an interface pointer on an object is marshaled, its owning apartment is established based on the thread executing the call to CoMarshalInterface. (For objects created via CoCreateInstance, this is determined by the thread that made the initial call to CoRegisterClassObject.) From that point forward, all proxies to the object route their request PDUs to the message queue of the object's apartment. These messages are normally serviced in the main message pump of the server application, but some technique for allowing incoming calls to be serviced is needed to avoid deadlock if the server application is also a client and is making a synchronous method call.
      All versions of 32-bit OLE use the RPC run-time layer for sending and receiving PDUs. The RPC routine for sending the request PDU and receiving the response PDU is a blocking system call, which means that the physical thread making the call is blocked until the response is available. If the client thread sends the request directly using the RPC run time, it cannot service incoming calls. This can potentially cause a deadlock.
      As shown in Figure 4, the Apartment model uses additional threads to achieve the reentrancy requirements for OLE. In the Apartment model, when the client thread makes a method call on an object in a different thread, process, or host, the client is actually making a method call on a proxy. The proxy marshals the appropriate parameters into the request PDU and dispatches the call to the object by calling the channel's SendReceive method, which does not return until the response PDU is received from the object. For those of you new to standard marshaling, a channel is just a COM-based wrapper around the RPC run-time layer. Once inside the implementation of the channel, the client thread does not call the low-level RPC run-time routine directly. Instead, the client thread spawns a worker thread to perform the blocking call and then waits in a modal message loop so external method requests can be serviced as well. (An application can install a message filter inside this modal loop to allow or disallow incoming method requests and non-OLE window messages. See CoRegisterMessageFilter for details.) This modal loop continues until the worker thread notifies the client thread that the response PDU was returned by the RPC run time. When the modal loop exits, the channel's SendReceive method returns the response PDU to the proxy for unmarshaling to the client. Because all of this happens behind the channel's IRpcChannelBuffer interface, the client and proxy are blissfully unaware that anything other than a blocking call took place. To avoid excessive thread creation, COM maintains a pool of worker threads that perform blocking RPC calls. The number of threads in the pool grows and shrinks based on the number of outstanding calls.
      There is additional thread-switching that has to take place on the object side of the channel. When COM first initializes in a process, the RPC run time is notified that incoming requests will be arriving. This causes the RPC layer to create its own pool of server threads that listen for incoming connection requests and receive and dispatch request PDUs. Since these threads are not the owning apartment of any object, they must forward the incoming request PDU to the correct apartment by posting a private window message to the apartment's invisible HWND and waiting for a response. When the object's apartment retrieves the message from the queue, the call is sent to the stub as a result of the HWND's WndProc being called in DispatchMessage. When the stub returns from the method call on the object, WndProc signals the RPC worker thread that the response PDU is available. The RPC thread then sends this PDU back to the client and returns to the server-side thread pool.
       Figure 4 shows the Apartment model as implemented in Windows NT 3.51, which is also the way it works in Windows NT 4.0 when the RPC connection is between two different machines. To improve performance in Windows NT 4.0, if the RPC connection is between two threads on the same machine, it uses an optimized RPC transport that requires considerably less thread switching. As shown in Figure 5, this optimized transport requires no additional threads to carry out the call since the RPC transport layer posts the window message containing the request PDU directly to the owning apartment. No intermediate thread is used on the client side of the connection. To ensure that the client can support incoming calls and non-OLE window messages, OLE supplies a callback function to the RPC transport that gets incoming window messages from the queue and dispatches them with the help of the user-supplied message filter. Since PostMessage does not work across host boundaries, this transport works in the local case only. When the object is on a remote host, it uses the original Windows NT 3.51 threading model.
      The greatest feature of the Apartment model is that each object is "touched" by exactly one thread. This is also its greatest limitation. The added complexity and thread switching present in Apartment dispatching slows down the performance of interthread method calls, making it less than optimal for high-bandwidth communications. Additionally, there are classes of objects that have concurrency requirements that conflict with protections afforded by the Apartment model. For example, a directory or database application may have a single object that services requests from many clients simultaneously. Under the Apartment model, this single object would service each incoming method call sequentially. As is the case with normal Windows messages, this severely limits the amount of time any method call could spend without returning or servicing the message queue so that other methods could begin work. While workable in theory, implementing a multi-user server application by using PeekMessage loops for concurrency would send the most dedicated OLE programmer into early retirement. Also, the Apartment model does not take advantage of multiple processor machines (unless you have multiple objects on multiple threads).
      To address the limitations of the Apartment model, Windows NT 4.0 introduces a new threading model to OLE: freethreading. Objects created by freethreaded servers have no owning apartment. Freethreaded objects cannot make any assumptions about which thread will invoke its methods. Freethreaded objects cannot make any assumptions about how many threads will be invoking its methods simultaneously. Freethreaded objects can assume that anything can happen at any time, with absolutely no regard to message queues. In short, the responsibility for guaranteeing thread-safety is pushed up from the OLE layer into the object's domain. If your objects and all resources they access are thread-safe, then this may be the threading model for you. If you are not prepared for this level of responsibility, then the Apartment model is always available and is a completely reasonable choice.
      To enable the freethreading model, Windows NT 4.0 introduces a new API function, CoInitializeEx.

typedef enum tagCOINIT { COINIT_MULTITHREADED = 0x0, COINIT_APARTMENTTHREADED = 0x2 // ...additional flags deleted for clarity } COINIT; HRESULT CoInitializeEx(void *rsv, DWORD dwCoInit);

Under the Apartment model, each thread that will use OLE must call CoInitializeEx or CoInitialize (which is now shorthand for calling CoInitializeEx with COINIT_APARTMENTTHREADED as the second parameter). If the process is to be freethreaded, only one thread needs to call CoInitializeEx with the COINIT_MULTITHREADED flag. In freethreaded processes, objects created by any thread are freethreaded and method calls are not serialized in any way. As we go to press, the threading model is an attribute of a process, which means all threads within a single process must share the same threading model (freethreading or Apartment). Calls to CoInitializeEx with a model other than that used by the first thread will fail with an HRESULT value of RPC_E_CHANGED_MODE. Supports for mixed model processes are planned for a future OLE release.
      Freethreaded OLE is actually a simplification of the Apartment model. As Figure 6 illustrates, when an incoming method request arrives at the RPC layer, the RPC receive thread that receives the request calls into the stub's Invoke method directly. There's no thread switch and no message queue. Since the object was marshaled as a freethreaded object originally, no guarantees are made about which thread would invoke the method. On the client side, when a freethreaded program invokes a method on a proxy, no additional thread is needed to make the blocking RPC call. The client thread can block since there were no guarantees on which thread would service incoming method requests for existing objects created by the client. If incoming calls arrive while the client is blocked, the RPC receive thread that receives the PDU calls into the stub directly. This implies that nested calls (or callbacks) are always executed by different threads and that any thread local storage used by the originating client thread is not available in the thread of the nested call.
      In the current release of Windows NT 4.0, the client's threading model does not have to match the threading model of local or remote servers accessed by the client. This makes sense since the client and server control their own concurrency. In the case of Inproc servers and handlers, the DLL must be loaded into the client's world and be capable of correct operation in the client's threading model. The registry supports the named value "ThreadingModel" which must be present at the InprocServer32 or InprocHandler32 key for each CLSID exported by the DLL, so inproc implementations can advertise their level of concurrency support. The supported values of ThreadingModel are Apartment (client must be an apartment model process), Free (client must be a freethreaded process), and Both (client can be either apartment or freethreaded).
      Needless to say, there is also support for single-threaded DLLs. Absence of the ThreadingModel key implies a single-threaded DLL. Single-threaded DLLs can be used by only one thread in an apartment process. DLLs written before the Apartment model (that have no ThreadingModel key) are not guaranteed to be thread safe.
      In the prerelease version of Windows NT 4.0 available as I write this, there is no support for loading DLLs with threading models that are incompatible with the client's threading model. Instead, the call to CoGetClassObject or CoCreateInstance simply fails. The following is an example of a REG file for a free and Apartment-safe Inproc server:

REGEDIT4 [HKEY_CLASSES_ROOT\CLSID\{00000000-E3F0-101B-8488- B0BB0BB0BB0B}\InProcServer32] @="C:\temp\Bob.dll" "ThreadingModel"="Both"

Of course, modern implementations would not use REG files, but would call RegSetValueEx from within the DLL's DllRegisterServer function.
Security
      The implementation of OLE in Windows NT 3.51 has very little support for security. Local servers running under Windows NT 3.51 use the same permissions (access token) as the interactive user. Also, any CLSID that's visible in the registry can be instantiated irrespective of the current login. This makes it difficult to write an object implementation that performs privileged operations (like reading sensitive files and manipulating system configurations) without compromising security.
      In Windows NT 4.0, the Windows NT security model has been incorporated into OLE. Additionally, OLE supplies hooks for external security providers to allow a single interface for controlling object security. With the default security provider in Windows NT 4.0, you can control the following security attributes simply by properly configuring the registry:
Allow/disallow launching of servers of a given CLSID based on the client's access token.
Allow/disallow connections to running objects or class objects of a given CLSID based on the client's access token.
Configure a particular CLSID to always run as a specific user.
      These attributes are best understood by examining what happens when a client calls CoCreateInstance. Assume that a client attempts to call CoCreateInstance with CLSID_ Pager as the implementation name. This results in a call to the Service Control Manager (SCM), which must first verify that the client has the proper access rights to the server for this class. OLE's default security model relies on the fact that the Registry is secure. Like NTFS files, keys in the registry can have access control lists (ACLs) that limit who can read or modify the contents of a key. To verify that the client has launch rights on the class, the SCM impersonates the client by assuming the client's access token and trying to call RegQueryValue on the key.

HKEY_CLASSES_ROOT\CLSID\{CLSID_Pager}\LaunchPermission

If the call succeeds and the value is "Y" or "y," the client can instantiate the object. If the call fails due to a security violation or if the key has any other value than "Y" or "y," the SCM assumes that the caller does not have permission to launch the server and CoCreateInstance fails. If the RegQueryValue fails because the LaunchPermission key is not present, then this class has no per-class launch permissions and the default launch permission for this machine must be consulted.
      If a CLSID does not have a LaunchPermission subkey, the machine-wide DefaultLaunchPermission key should be used to verify the client's credentials.

HKEY_LOCAL_MACHINE\Software\Microsoft\OLE\DefaultLaunchPermission

The SCM will try to read this key while running with the client's access token. If the call to RegQueryValue succeeds and the value is "Y" or "y," then the SCM assumes that the client can instantiate the object. If the call fails because of a security violation, or it returns any other value, then the SCM assumes that the client does not have permission to launch the server and the SCM fails the call to CoCreateInstance. If this key is missing altogether, the SCM assumes the client does not have permission to launch and CoCreateInstance fails.
      Once the SCM verifies that the client has permission to launch, it must create the server process if it is not already running. By default, it starts the server using the same login and permissions as the client. This default behavior is reasonable because it does not require any explicit participation from either the client or object to achieve the correct, secure behavior. Upon creation, the window station and desktop of the server process are the same as the client. Windows NT supports multiple window stations and desktops. This prevents system-controlled processes from accessing the user-interface of the interactive login and decouples the lifetime of a process from the current login session. Normal OLE client applications that are started interactively by the user run under the interactive window station, and any servers that are started can create windows, access the clipboard, and behave as if they were also started interactively. In short, Word and Microsoft® Excel work fine when launched from the Windows® Explorer.
      You may find it useful to run a server as a particular user account, to either enhance or restrict the operations that the object can perform. OLE now supports the RunAs subkey, so the object implementation can run the server as a given user.

HKEY_CLASSES_ROOT\CLSID\{CLSID_Pager}\RunAs = "Domain\UserName"

      When this key is present, the default value contains a user name (it can be qualified with a domain name). The correct password must also be present in a separate, private area of the registry that cannot be manipulated by using RegEdt32. (At this time, these keys must be added using the olecnfg tool, a simple registry manipulation tool that ships with the prerelease version of DCOM. API support should be available in a subsequent release.) If the RunAs key is present, the SCM creates the server process to run as the specified user. Note that the server runs in a different desktop and window station from the current interactive user. With separate desktops, when the current interactive user logs off, the server is not forced to shut down. The downside is that you cannot display any windows or collect user input without considerable effort. You can get UI-intensive objects to interact with the user by configuring the object's CLSID to run using the access permissions, desktop, and window station of the current interactive user.
      While it is useful to specify which user account the server should use, you can also impersonate the client's access rights for the duration of a method call dynamically. To support this, COM provides the following two API functions:

HRESULT CoImpersonateClient(); HRESULT CoRevertToSelf();

You can call these functions only while a thread is servicing a method on behalf of a client. Like RpcImpersonateClient, ImpersonateNamedPipeClient, and DdeImpersonateClient, they toggle the access token between the native access token and the client's access token based on a connection to a process and thread. To see how this functionality is useful, consider a COM class that lets clients make file system calls and logs the results in a secure logfile. To let objects write to the secure logfile, you must configure the CLSID to run as a privileged user. However, when actually making file system calls on behalf of the client, the object should not allow operations to succeed unless the client could perform the operation with native API calls directly. Figure 7 shows how a method on such an object would be implemented.
      While the security functionality described here is sufficient for most OLE clients and objects, you can control access to your objects with much more detail. OLE supports interfaces and API functions for controlling message integrity, encryption, and other security attributes. Consult the Win32 SDK documentation for more details.
Windows NT Services
      Windows NT Services are normal Win32 processes that run independent of user logins. Services can be configured to run as a particular user, to start on demand or at boot time, and to shut down or pause. Services can be configured to start other Services that are required for correct operation. Services are identified by name, not by the path name of their executable. Services share a single API and control panel for controlling execution on local and remote hosts. In short, Services are great for implementing functionality that needs to be available at all times.
      Some characteristics of Services have been inherent in COM from the beginning. Like Windows NT Services, COM servers are started on demand (CoCreateInstance indicates that there is a demand to run the COM server). With the RunAs functionality described previously, you can run a COM server as a specific user and not be tied to a particular interactive login. However, there are times where it is really useful to have COM objects implemented as Windows NT Services. With Windows NT 4.0, this is now possible.
      Implementing a Windows NT Service that exports COM objects is no different than writing any other Windows NT Service. In at least one thread, your service must call CoInitializeEx to enable COM and must export one or more class factories by calling CoRegisterClassObject. Due to the way Services are implemented, you should not do this in WinMain or main, since the thread that executes main is usually stolen by the system to run the Service dispatch routine. You also need to address, when moving from a normal LocalServer to a service, what to do when the last object in the Service is destroyed. LocalServers usually shut down unless the UI is in use. This may or may not be appropriate when you move to a Windows NT Service; Services tend to be long-lived resources on a machine.
      After you implement an object as a Service, make sure the SCM knows to start it via the Services API instead of just calling CreateProcess, by using the LocalService key. When instantiating out-of-process servers, the SCM examines the registry key for the desired CLSID and first tries to read the default value of the LocalService subkey. If this key is not present, the SCM then looks for a LocalServer32 subkey. If it finds the LocalServer32 subkey, the SCM launches the server using the CreateProcess API function. If the LocalService subkey is present, however, the SCM uses the OpenService and StartService API functions to launch the Service named at that subkey.
      The following REG file shows the registry keys needed to bind a CLSID to a service:

REGEDIT4 [HKEY_CLASSES_ROOT\CLSID\{EEA83374-6CCC-11cf-B171- 0080C7BC7884}\LocalService] @="BasicSvc" "ServiceParameters"="/RunByOLE"

Note that the LocalService key value is not an absolute path name, but instead is a simple Service name suitable for passing to OpenService. OpenService consults the registry to resolve the Service name to a path name. If the named value ServiceParameters is present, it is used to construct the command line passed to the Service's ServiceMain routine. These registry keys assume that the Service has been installed properly through the CreateService API function.
Remote Instantiation
      Windows NT 4.0 provides the ability to instantiate objects on remote machines and to bind to objects running on remote machines. In keeping with COM's tradition of in-proc/out-of-proc transparency, you can access objects that exist on remote hosts as if they were instantiated in your own address space. Because COM is a binary object model, it is possible to add support for DCOM without recompiling either the client or the object. To understand how this is accomplished, reexamine the most fundamental COM API function, CoGetClassObject.
      Clients use CoGetClassObject to locate the class factory for a particular implementation. Here's the original Windows NT 3.51 signature of this function:

HRESULT CoGetClassObject(REFCLSID rclsid, DWORD dwClsCtx, void *pvReserved, REFIID riid, void** ppvClsObj);

The first two parameters are used by the SCM to locate the implementation of the object. The dwClsCtx parameter lets the client indicate a preference as to loading the object in-process or out-of-process. The dwClsCtx parameter is a bit-mask that supports the following values:

enum CLSCTX { CLSCTX_INPROC_SERVER = 1, CLSCTX_INPROC_HANDLER = 2, CLSCTX_LOCAL_SERVER = 4 }; #define CLSCTX_ALL CLSCTX_INPROC_SERVER \ |CLSCTX_INPROC_HANDLER \ |CLSCTX_LOCAL_SERVER #define CLSCTX_SERVER CLSCTX_INPROC_SERVER \ | CLSCTX_LOCAL_SERVER

Based on the bits set in dwClsCtx, the SCM consults the appropriate keys in the registry and loads the appropriate implementation. Figure 8 shows the algorithm used by the SCM to load an implementation via CoGetClassObject or CoCreateInstance.
      The implementation of CoGetClassObject and CoCreateInstance in Windows NT 4.0 supports an additional subkey that allows administrators to specify the machine on which a CLSID should be instantiated.

HKEY_CLASSES_ROOT\CLSID\{CLSID_Pager}\RemoteServerName = "da.ics.uci.edu"

In the current implementation, if no inproc or local implementations are found using the algorithm in Figure 8, the SCM attempts to read the RemoteServerName key. If the SCM finds the RemoteServerName key, it contacts the SCM on the specified host (in this case, da.ics.uci.edu) and requests that the class implementation be loaded as a local server. When the remote SCM successfully launches the server, the interface pointer passed by the server's call to CoRegisterClassObject is marshaled across the network back to the client. From that point on, the client communicates with a remote host machine each time a method call is invoked.
      This technique for specifying host names for CLSIDs lets legacy applications communicate with remote objects without recompilation. It also allows you to specify the location of the implementation at installation time instead of hard-coding it into the application. For clients that want to launch a remote server, the following CLSCTX value was added to the API for DCOM:

CLSCTX_REMOTE_SERVER = 0x10

The CLSCTX_ALL and CLSCTX_SERVER macros now include this bit as well. The following structure has been defined so the client can indicate the host name explicitly:

typedef struct _COSERVERINFO { DWORD dwSize; OLECHAR * pszName; } COSERVERINFO;

The dwSize parameter must contain the structure size and is used for versioning. The pszName member must contain a Winsock-compatible host name.
      The CoGetClassObject API function was subtly overloaded to allow clients to specify hostnames.

HRESULT CoGetClassObject(REFCLSID rclsid, DWORD dwClsCtx, COSERVERINFO *psi, REFIID riid, void** ppvClsObj);

In the pre-DCOM version of CoGetClassObject, the reserved third parameter had to be zero. Passing null to the current version simply means that the client does not specify a remote host name. If the third parameter is non-null and points to a valid COSERVERINFO structure, then the SCM on the client machine uses the pszName member to contact the remote SCM that launches the class implementation. The code shown in Figure 9 launches a server on the machine LOLA. Note that the host name can be specified using UNC, DNS, or raw IP addresses.
      The original version of CoGetClassObject had a reserved parameter and was easily extended to support explicit host names. CoCreateInstance has no reserved parameters to spare and simply cannot support explicit host names. To support explicit host names in a single API for instantiation, Windows NT 4.0 introduces a new API function, CoCreateInstanceEx.

typedef struct tagMULTI_QI { const IID *pIID; IUnknown * pItf; HRESULT hr; } MULTI_QI; HRESULT CoCreateInstanceEx(REFCLSID rclsid, IUnknown *pUnkOuter, DWORD dwClsCtx, COSERVERINFO *psi, DWORD dwCount, MULTI_QI *rgResults);

CoCreateInstanceEx differs from CoCreateInstance in two ways. First, it allows an explicit host name via the COSERVERINFO struct. Second, it lets the client receive more than one interface pointer to the newly created object, eliminating the need for multiple QueryInterface calls to bind each interface. As has always been the case in COM, performance suffers as more trips are made from the client to the object. By using the MULTI_QI array, the number of round-trips to instantiate and access the object can be reduced. Figure 10 shows the process of creating an object using CoCreateInstanceEx. Note that the client only needs one call to get the class factory, call CreateInstance on the class factory, and call QueryInterface for each interface pointer. CoCreateInstanceEx is optimized to perform all three steps in one trip.
      Client programs call CoCreateInstance to create new objects. Often, clients want to connect to objects that are already running. This was traditionally accomplished by using monikers and the Running Object Table (ROT). Monikers are COM objects that identify particular instances of some COM class. The ROT is a directory service maintained by the SCM. The ROT maps monikers onto the running instances they identify. Objects that want to be found in this manner must register their moniker in the ROT. Typically, this is a file moniker that identifies the object's persistent state. If a client needs to connect to the object, it creates a file moniker and binds to it by calling either its IMoniker::BindToObject method directly or the BindMoniker API function.

void GetPager(IPager **ppPager) { *ppPager = 0; IMoniker *pmk = 0; CreateFileMoniker(L"\\\\Lola\\pub\\pages.pgf", &pmk); BindMoniker(pmk, 0, IID_IPager,(void**)ppPager); }

If the object's moniker was registered in the ROT, the moniker returns a new interface pointer to the object by calling the object's QueryInterface method. If the object is not running, then the file moniker instantiates the object and instructs it to initialize from the specified file via the object's IPersistFile::Load method. Once initialized, the moniker would QueryInterface for the pointer to return to the client. Once loaded, the object should register itself in the ROT so future bindings on the same moniker yield the same instance.
      Irrespective of the file's location, in Windows NT 3.51 the object would always be instantiated on the client's machine. This is still the default behavior under Windows NT 4.0. Since this is the way COM worked in the past, it's a reasonable default. However, if the object uses incremental reads and writes, the performance costs for doing file I/O to the remote file system could swamp any advantages gained by instantiating on the client's host machine. Additionally, if clients on different machines bind to the same file moniker, there will be multiple instances, violating the object identity of the file. You can avoid these problems through the use of the ActivateAtStorage registry key.
      In the implementation of the file moniker's BindToObject method, it first finds the CLSID of the file by calling the API function GetClassFile. If the CLSID does not have an ActivateAtStorage subkey in the registry, or if it does have one but its value is something other than "Y" or "y," it is assumed that the object should be activated locally and the normal moniker binding takes place on the client's machine. If the CLSID does have an ActivateAtStorage subkey and its value is "Y" or "y," the object must always be instantiated at the host machine where the file is located. The file moniker extracts the location of the file from the UNC name and then asks the SCM on the remote machine to bind the moniker at the remote host. Once the remote SCM binds the object (by consulting its ROT or by instantiating and loading a new instance), the interface pointer is marshaled back to the client. If the object was not running already, the instantiation is checked against the client's permissions (as in CoCreateInstance). If the object is running, the CLSID's AccessPermission subkey—or if that's not present, the host machine's DefaultAccessPermission key—is consulted (much like the use of LaunchPermission and CoCreateInstance). This parallel scheme for verifying moniker binds makes it possible for clients to link to existing objects but to prohibit the creation of new instances.
      For efficient bindings to persistent objects, Windows NT 4.0 introduces two new API functions that allow the caller greater flexibility than normal moniker binding. CoGetInstanceFromFile and CoGetInstanceFromIStorage both allow the client to provide: either a file name (CoGetInstanceFromFile) or an IStorage pointer to a compound file (CoGetInstanceFromIStorage); a hard coded CLSID or a default derived from the specified file's content; an explicit host name using COSERVERINFO or the default based on the RemoteHostName and ActivateAtStorage Registry keys; and an array of MULTI_QI structures to bind all of the interface pointers to the object in one trip. Like CoCreateInstanceEx, these functions are implemented to perform all operations in one trip between object and client.
Property Persistence
      Property sets are useful for serializing the state of an object in a uniform, platform-independent manner so that the contents of the property set can be parsed without requiring the creator to interpret the contents. Property sets serialize the Summary Information used by most Windows-based applications and allow users to examine document attributes, such as author or subject, without opening the file. They are used in OLE Controls as well and ultimately will allow content-based queries across application boundaries.
      The OLE 2 Programmer's Reference did not document any property set system calls or interfaces because none existed when it was published. The OLE sample source code had a hard-coded implementation that could read and write Summary Information streams, but was not suitable for general property sets. Charlie Kindel wrote an excellent MSDN article about this, which included source code that eventually wound up in the implementation of MFC (check out \mfc\src\ctlimpl.h and \mfc\src\olepset.cpp). However, there was no supported API for reading and writing property sets.
      Windows NT 4.0 introduces two new interfaces for reading and writing property sets. IPropertySetStorage is a manager interface that supports creating, opening, deleting, and enumerating property sets. Each property set exposes an IPropertyStorage interface that allows the client to read, write, delete, and enumerate the properties of a property set. The Windows NT 4.0 implementation of compound files supports these interfaces natively. To begin using property sets, you simply QueryInterface an IStorage pointer for the IPropertySetStorage interface. The interface looks like this:

interface IPropertySetStorage : IUnknown { HRESULT Create(REFFMTID rfmtid, CLSID * pclsid, DWORD grfFlags, DWORD grfMode, IPropertyStorage **ppprstg); HRESULT Open(REFFMTID rfmtid, DWORD grfMode, IPropertyStorage **ppprstg); HRESULT Delete(REFFMTID rfmtid); HRESULT Enum(IEnumSTATPROPSETSTG ** ppenum); }

      Property Sets are stored as streams or storages and are uniquely identified by their Format ID GUID (FMTID). If you consult the OLE 2 Programmer's Reference, you will note that a single property set could contain multiple formats or sections, and that the property set is identified by some distinguished stream name ("\005SummaryInformation" for the Summary Information stream). This functionality was redundant in the face of structured storage, so property sets are identified now by their FMTID. It is assumed that, if the property set is stored in a stream, it contains exactly one section. To map the FMTID onto a stream name, the implementations of Create and Open first check for the well-known FMTIDs. (There are two: one for Summary Information, and one for Office Document Summaries.) If the requested FMTID identifies either of these streams, the hard-coded names are used to open the stream that contains the properties. If the requested FMTID is not one of the two well-known names, the FMTID GUID is ASCII-ized (a la uuencode) into a unique string less than 32 characters long that is used to identify the storage or stream containing the persistent properties.
      Windows NT 4.0 implements two kinds of property sets: simple and non-simple. A simple property set is implemented in a single stream and contains flat data types only (like ints, floats, strings, blobs). A non-simple property set is implemented as a sub-storage. It contains a single stream, named Contents, that contains the flat properties. Any hierarchical properties (such as nested streams and storages) are stored as siblings to the Contents stream. The property set is created as either simple or non-simple through the use of the PROPSETFLAG_NONSIMPLE flag that's passed to IPropertySetStorage::Create. Once created or opened, both simple and non-simple property sets are manipulated via their IPropertyStorage interface (see Figure 11).
      The main goal of this interface is to let clients read and write groups of named properties in the property set efficiently. In a property set, individual properties are identified by both a unique integer (called the PID) and by a human-readable name. To allow accessing properties based on either type of identifier, properties are identified at run time by the PROPSPEC structure.

const ULONG PRSPEC_LPWSTR = 0; const ULONG PRSPEC_PROPID = 1; typedef struct tagPROPSPEC { ULONG ulKind; union { PROPID propid; LPOLESTR lpwstr; } ; } PROPSPEC;

To access the property value, an extended version of the OLE Automation VARIANT data type is used (see Figure 12). The CA types are counted arrays, which are simply length-prefixed arrays of some data type. As with VARIANTs, when a function passes PROPVARIANTs to the caller as [out] parameters, it is the caller's responsibility to free any resources held by the PROPVARIANT. OLE provides the FreePropVariantArray API function that frees all of the resources held by an array of PROPVARIANTs.
      Given the new implementation of property sets, it is trivial to read and write Summary Information streams. Figure 13 illustrates how to read the Author and Title properties from an arbitrary file. Note that the Summary Information format stores its string properties as ANSI strings. As is always the case in OLE, any new property set formats should store their strings as Unicode.
Conclusion
      Will Distributed COM change the world and render all other communications technologies obsolete? Perhaps, but I doubt anyone expects it to. The Windows NT 4.0 release of COM represents the next logical step for the technology. Several holes have now been filled. Since one of these holes is networking, people have extremely high expectations given the current sensitivity to all things Internet.
      I doubt that Distributed COM will suddenly allow every application to become "distributed" simply by adding a key to the registry or by passing an additional parameter to an API function. I also doubt that the lack of Distributed COM was holding anyone back from writing distributed applications. People have been sending packets back and forth between computers long before COM (or Windows for that matter), and at one level Distributed COM is "yet another protocol" and not all that revolutionary.
      I see the impact of Distributed COM as being more evolutionary than revolutionary. More people will program in COM because there is now one more incentive (networking). This will improve the quality of tools and libraries for COM programming. This is a good thing. More people will use IDL because now it really is the "official" language of COM. This will improve the level of support for IDL in the tools we all use everyday. Again, this is a good thing. More people will start to play with networking because COM is easier to work with than most other communications technologies. Once they get over the thrill of seeing windows pop up on other machines, developers will go back to solving real problems, some of which require remote communication.

Have a question about programming with ActiveX or COM? Send your questions via email to Don Box at dbox@develop.com or http://www.develop.com/dbox/default.asp

From the May 1996 issue of Microsoft Systems Journal.