This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.


March 1998

Microsoft Systems Journal Homepage

Understanding the DCOM Wire Protocol by Analyzing Network Data Packets

Guy Eddon and Henry Eddon

We're going to examine COM from the bottom up. By analyzing the data packets physically transmitted across a network during the execution of COM-enabled applications, you can learn how COM's remoting architecture works. This will help you develop better components.

This article assumes you're familiar with COM

Guy and Henry Eddon are the authors of Active Visual Basic 5.0 (Microsoft Press, 1997). They are currently working on a book entitled Inside Distributed COM, to be published by Microsoft Press in March 1998. Both can be reached at guyeddon@msn.com.

Most articles about COM approach this gargantuan subject from the perspective of the programming architecture. They tell you what COM function to call in order to perform a specific task. We're going to examine COM from the bottom up. By analyzing the data packets physically transmitted across a network during the execution of COM-enabled applications, you can learn how COM's remoting architecture works. This will help you better understand the COM programming model, and can help you design and develop better components.
      While COM is a specification for building interoperable components, Distributed COM (DCOM) is simply a high-level network protocol designed to enable COM-based components to interoperate across a network. We consider DCOM a high-level network protocol because it is built on top of several layers of existing protocols. For example, assume that a computer has an Ethernet network interface card and is using the User Datagram Protocol (UDP). The layering of protocols ranges from the Ethernet frame at the lowest level to DCOM at the highest (see Figure 1). Sandwiched in the middle are IP, UDP, and RPC. Figure 1 shows only one of many possible configurations; many protocols could be substituted below the RPC layer. DCOM automatically chooses the best underlying network protocol based on the protocols available on the client and server machines.
Figure 1 Protocol Layers
Figure 1 Protocol Layers


      It may also be useful to think of a DCOM protocol stack in terms of the Open Systems Interconnection (OSI) seven-layer model. In Figure 2, the OSI seven-layer model is juxtaposed with the sample protocol stack discussed here. Note that this figure shows a Windows® platform. Other operating systems might implement the protocols at different layers.

Figure 2  OSI Seven-layer Cake
Figure 2  OSI Seven-layer Cake


      The packet sent across the network for each protocol in the protocol stack consists of a header followed by the actual data. Each protocol considers the one directly above it in the protocol stack to be part of its data. For example, the Internet Protocol consists of a header followed by data. IP data actually consists of the header for the User Datagram Protocol followed by that protocol's data. Thus a packet that is transmitted across the network contains the header and data sections of each protocol in the protocol stack (see Figure 3).
      As you can see in Figure 3, DCOM is not really an independent network protocol layered on top of the RPC protocol. Instead, DCOM merges with the RPC header and data, using the fields of the RPC structures for its own devices. The DCOM network protocol is often called Object RPC or ORPC to emphasize the close relationship between RPC and the DCOM protocol at the network level. ORPC highly leverages the functionality of the OSF DCE RPC network protocol. For example, the authentication, authorization, and message integrity/encryption features of RPC are present in ORPC.

Figure 3  Protocol Stack
Figure 3  Protocol Stack


      The ORPC protocol extends the standard RPC protocol in two particular areas: how calls are made on remote objects, and how object references are represented, transmitted, and maintained.

Spying on the Network Protocol
      Since nearly every aspect of the network protocol is hidden from the path of the COM programmer, the most interesting and concrete way to examine the network protocol is to spy on the transmissions between computers during the execution of a DCOM client and component. To do this, a special type of software (or hardware) popularly known as a network sniffer is needed. There are a variety of third party tools available that enable the capture and viewing of network traffic. One is Network Monitor (see Figure 4), a Microsoft utility that ships with Windows NT Server and Microsoft Systems Management Server (SMS). While the SMS version of Network Monitor is fully featured, the Windows NT server version has limited protocol support and will only show server traffic.
      Simply turn on Network Monitor's capture facility by selecting Start from the Capture menu. Then run a DCOM test program to generate the desired network traffic. After the program has completed execution, return to Network Monitor and select Stop and View from the Capture menu. The network capture facility will be disabled and captured packets presented. Network Monitor is quite a clever program in that it actually understands many standard network protocols, rather than simply displaying the raw captured packets, and thus presents the captured data in an intelligent and descriptive format. As far as DCOM-related protocols are concerned, Network Monitor can recognize and comprehend Ethernet, IP, UDP, and RPC. To view only the packets captured for certain network protocols, select Filter from the Display menu.
      Currently, Network Monitor does not support the DCOM protocol itself, so you're forced to look at DCOM through the eyes of RPC. This does not have to be a permanent affliction: Network Monitor does have a publicly documented interface for building parser DLLs that understand a specific network protocol and interpret the captured data in an intelligent manner. Developing a Network Monitor parser DLL that understands the DCOM network protocol is left as an exercise for the reader.
      To analyze the DCOM network protocol, we ran Network Monitor to capture packets as a COM class named InsideDCOM executed. The client ran on one computer named Thing1 and was configured to activate the InsideDCOM object on a server named Thing2. After calling CoInitialize, the client calls CoCreateInstanceEx to instantiate the remote object, as shown here:


 CoInitialize(NULL);
 
 COSERVERINFO ServerInfo = { 0, L"Thing2", 0, 0 };
 MULTI_QI qi = { &IID_IUnknown, NULL, 0 };
 CoCreateInstanceEx(CLSID_InsideDCOM, NULL, 
                    CLSCTX_REMOTE_SERVER, 
                    &ServerInfo, 1, &qi);
This call to CoCreateInstanceEx resulted in the transmission of the network packet shown in Figure 5. The client program ran on Thing1, which was configured with an IP address of 199.34.58.3, while the component ran on Thing2, which was configured with an IP address of 199.34.58.4.
      The layering of protocols can be easily seen in the network packet in Figure 5. In the RPC header, you can see that the interface identifier is B8 4A 9F 4D 1C 7D CF 11 86 1E 00 20 AF 6E 7C 57. In accordance with the rules described in the section "Interpreting Marshaled GUIDs," the actual IID is 4D9F4AB8-7D1C-11CF-861E-0020AF6E7C57, otherwise known as IRemoteActivation.

Remote Activation
      IRemoteActivation is an RPC interface (not a COM interface) exposed by the Service Control Manager (SCM) on each machine—not to be confused with the SCM that manages Windows NT services. The SCM process that exists on all machines is named RPCSS.EXE. The IRemoteActivation interface has only one method, RemoteActivation, which is designed to activate a COM object on a remote machine. This is a very powerful feature missing in pure RPC, where the server must always be running before the client can connect. Windows 95 and Windows 98 lack the necessary security features to support remote launch of server processes. However, remote activation and the IRemoteActivation interface apply to these platforms as well. The IDL definition of the IRemoteActivation interface is shown in Figure 6.
      Through the IRemoteActivation interface, the SCM on one machine contacts the SCM on another machine to request that it activate an object. When launching an object on a remote machine, the client machine's SCM calls the IRemoteActivation::RemoteActivation method of the server's SCM to request activation of the class object identified by the CLSID in the fourth parameter. The RemoteActivation method returns a marshaled interface pointer of the requested object and two special values. The interface pointer identifier (IPID) is a special value that identifies an interface belonging to a specific instance of an object in a process. The object exporter identifier (OXID) is a special value that identifies the RPC string binding information needed to connect to the interface specified by an IPID. We'll discuss these values in greater detail later on.
      The SCM resides at well-known endpoints, one for each supported network protocol. An endpoint identifies the virtual channel through which you are communicating, and is based on the network protocol in use. For example, when using the TCP or UDP protocols the endpoint will be a port number such as 1066; if named pipes are used then the endpoint is a pipe name such as \\pipe\mypipe. Figure 7 shows the endpoints for the SCM when using some of the more popular protocols.

Interpreting Marshaled GUIDs
      GUIDs transmitted over the network must be interpreted in accordance with the IDL definition of a GUID:


 typedef struct _GUID
 {   DWORD Data1;
     WORD  Data2;
     WORD  Data3;
     BYTE  Data4[8];
 } GUID;
Since the GUIDs are marshaled in little-Endian format, the original GUID can be reconstructed via a two-step process. First, the GUID found in a captured network packet needs to be formatted to look like a standard GUID. For example, imagine that you located the GUID 78 56 34 12 34 12 34 12 12 34 12 34 56 78 9A BC in a network packet. In the first step, shown in Figure 8, the GUID is grouped into standard formation. Better already, isn't it? Now the little-Endian format needs to be taken into account to obtain the actual GUID. The first three elements of the GUID structure (Data1, Data2, and Data3 in Figure 8) need to be reversed on a byte-by-byte basis. The last element of the GUID structure (Data4) does not need to be modified since it is already stored as a simple byte array. Therefore, after reversing the first three elements of the GUID structure, the completed second step of the process provides the true GUID: 12345678-1234-1234-1234-123456789ABC.
Figure 8  GUID in Standard Form
Figure 8 GUID in Standard Form

Calling All Remote Objects
      Method calls made on remote COM objects are considered true DCE RPC invocations in that a standard Request Protocol Data Unit (PDU) is transmitted across the network requesting that a specific method be executed. A PDU is the basic unit of communication between two machines. The Request PDU contains all of the in parameters that the method expects to receive. When method execution is complete, a Response PDU containing the method's out parameters is transmitted back to the client. While this sounds rather obvious, it is really quite amazing. A remote COM method call requires two packets to be transmitted across the network: one from the client to the server containing the in parameters and the other from the server to the client for the out parameters. The 19 defined PDU types are shown in Figure 9. Note that some of the PDUs are specific to either a connection-oriented (CO) or connectionless (CL) protocol.
      A connection-oriented protocol, such as the TCP, maintains a virtual connection for the client and server between transmissions and guarantees that messages are delivered in the same order that they were sent. A connectionless protocol, such as UDP, does not maintain a connection between the client and server, and does not guarantee that a message from the client will actually be delivered to the server. Furthermore, even if the messages are delivered, they may arrive in a different order from that in which they were sent. By default, DCOM employs the connectionless UDP between Windows NT machines. But this does not make DCOM unreliable. When using a connectionless protocol, RPC ensures robustness by employing a custom mechanism for message ordering and acknowledgment.
      An RPC PDU contains up to three parts, only the first of which is required:
  • A PDU header that contains protocol control information.
  • A PDU body that contains data. For example, the body of a request or response PDU contains data representing the input or output parameters for an operation. This information is stored in the Network Data Representation (NDR) format.
  • An authentication verifier that contains data specific to an authentication protocol. For example, an authentication protocol may ensure the integrity of a packet via inclusion of an encrypted checksum in the authentication verifier.
      The PDU header used for connectionless protocols is defined by the IDL structure shown in Figure 10. The packet type field (named ptype) of a PDU identifies the PDU type. This value will be one of the 19 PDU types defined in the table shown in Figure 9. ORPC uses the object identifier field (named object) of a PDU to store an IPID. The interface identifier field (named if_id) must contain the IID of the COM interface. This is somewhat redundant given that the object field contains the IPID, which already identifies the interface. However, placing the IID in the if_id field allows DCOM to work correctly when run on a standard implementation of OSF DCE RPC. On Windows, the RPC implementation has been optimized such that method calls can be dispatched solely based on the information contained in the IPID, ignoring the IID. Finally, the interface version number (named if_vers) must always be 0.0. This is because a COM interface may never be modified after it is published. COM interfaces are not versioned; a new interface is defined instead. All of these fields can be found in the RPC header section of Figure 5.

This and That
      All COM method invocations are transmitted across the network in a request PDU containing a special first parameter called ORPCTHIS, which is inserted prior to all the other inbound parameters of the method. Thus a COM method defined as


 HRESULT Sum(int x, int y, [out, retval] int* result)
is transmitted in a request PDU as

 Sum(ORPCTHIS orpcthis, int x, int y)
The definition of the ORPCTHIS structure is shown here:

 // Implicit ‘this' pointer which is the first [in] 
 // parameter on every ORPC call.
 typedef struct tagORPCTHIS {
    COMVERSION version;  // COM version number (5.2)
    unsigned long flags; // ORPCF flags for presence of
                         // other data
    unsigned long reserved1; // set to zero
    CID  cid;                // causality id of caller
    ORPC_EXTENT_ARRAY* extensions; // [unique] extensions
 } ORPCTHIS;
The first field of the ORPCTHIS structure specifies the version of the DCOM protocol used to make the method call. In DCOM for Windows 95 version 1.0 and releases of Windows NT 4.0 prior to service pack 3, the COM version is 5.1. For Windows NT 4.0 service pack 3, the COM version is 5.2. For DCOM for Windows 95 version 1.1 and Windows NT 4.0 post service pack 3, the COM version is 5.3. Since each remote method call contains an ORPCTHIS structure, the version of DCOM on the client machine is always transmitted to the server. On the server, the client's version of DCOM is compared to the server's, and unless the major version numbers match, the RPC_E_VERSION_MISMATCH error will be returned to the client. It is permissible for the server to have a higher minor version number than the client. In such cases the server must curtail its use of the DCOM protocol to match those features available in the client's version.
      The causality identifier (CID) is a GUID used to link together what might be a long chain of method calls. For example, if client A on machine A calls component B on machine B, and component B, before ever returning to client A, proceeds to call component C on machine C, these calls are said to be causally related. Every time a new method call is made (not while handling an incoming method call), a new causality identifier is generated by the DCOM protocol. The same CID will be propagated in any subsequent calls made by component B on behalf of client A. This is true even if component B uses connection points or some other mechanism to call back into client A. The extensions field of the ORPCTHIS structure is meant to enable extra data to be sent with a COM method call.
      Currently, only two extensions are defined: one for extended error information (IErrorInfo) and the other for ORPC debugging. Custom extensions to the ORPC_ THIS structure can also be defined using an undocumented technique called channel hooking. For more information on channel hooks, see the January 1998 installment of Don Box's ActiveX®/COM column.
      In every response PDU for COM methods, a special outbound parameter called ORPCTHAT is inserted before all of the other out parameters of the method. Thus, a COM method defined as

 HRESULT Sum(int x, int y, [out, retval] int* result)
is transmitted in a response PDU as

 HRESULT Sum(ORPCTHAT orpcthat, int result)
The definition of the ORPCTHAT structure is shown here:

 // Implicit ‘that' pointer which is the first [out] 
 // parameter on every ORPC call.
 typedef struct tagORPCTHAT {
    unsigned long  flags;    // ORPCF flags for presence
                             // of other data
    ORPC_EXTENT_ARRAY *extensions; // [unique] extensions
 } ORPCTHAT;

Meow!
      The DCOM network protocol transmits method parameters in the Network Data Representation (NDR) format specified by OSF DCE RPC. NDR specifies exactly how all the primitive data types understood by IDL should be marshaled into data packets for network transmission. The only extension made to the NDR standard by DCOM is support for marshaled interface pointers. Use of the iid_is IDL keyword in an interface definition constitutes what can be thought of as a new primitive data type that can be marshaled: an interface pointer. The term "interface pointer" is problematic because it conjures a mental picture of a pointer to a pointer to a vtable structure that contains pointers to functions. But once marshaled into a data packet, an interface pointer does not look like that at all. It's merely a symbolic representation of access to an object, and thus is simply an object reference. The format of a marshaled interface pointer is governed by the MInterfacePointer structure:


 // Wire representation of a marshaled interface 
 // pointer, always the little-endian form of an OBJREF
 typedef struct tagMInterfacePointer {
    ULONG             ulCntData; // size of data
    byte              abData[];  // [size_is(ulCntData)] 
                                 // data
 } MInterfacePointer, *PMInterfacePointer;
      The byte array that follows the ulCntData field contains the actual object reference in a structure called an OBJREF (see Figure 11). An OBJREF is the data type used to represent a reference to an object. The OBJREF structure assumes one of three forms, depending on the type of marshaling being employed: standard, handler, or custom.
      The OBJREF structure begins with a signature field defined as the unsigned long hexadecimal value 0x574F454D. Interestingly, if you arrange this value in little-Endian format (4D 45 4F 57), and then convert each byte to its ASCII equivalent, the resulting characters spell MEOW. Some speculate that this is an acronym for Microsoft Extended Object Wire representation, but no one really knows for sure. The great thing about the MEOW structure is that when scanning through the mountains of packets captured by the Network Monitor utility, it is very easy to tell when you have hit upon an object reference: just say MEOW. Note that regardless of the format of the remainder of the NDR data, the wire representation of a marshaled interface pointer is always stored in little-Endian form. While object references can be stored in a variety of forms, it is not always possible or desirable to pass a header/flag indicating the byte-order (thus also increasing packet size). So COM object references are always transmitted as little-Endian.
      Following the MEOW signature is the flags field of the OBJREF structure, which identifies the type of object reference. The flags field can be set to OBJREF_STANDARD (1), OBJREF_HANDLER (2), or OBJREF_CUSTOM (4). The IID is the last field of the OBJREF structure. It specifies the interface identifier of the interface being marshaled. Figure 12 shows a network packet captured as a result of the request PDU shown in Figure 5. There the CoCreateInstance function had been called to request the object's IUnknown interface pointer. In Figure 12, you can see the response PDU returned to the client. It contains the marshaled interface pointer (decorated by "MEOW") of the object's IUnknown interface.

The Standard Object Reference
      The flags field of the OBJREF structure indicates that standard marshaling (OBJREF_STANDARD) is being used. Based on this field, the remainder of the structure contains a structure of the type STDOBJREF followed by a structure of the type DUALSTRINGARRAY. Here's the STDOBJREF structure:


 typedef struct tagSTDOBJREF {
    unsigned long  flags;        // SORF_ flags
    unsigned long  cPublicRefs;  // count of references 
                                 // passed
    OXID           oxid; // oxid of server with this oid
    OID            oid;  // oid of object with this ipid
    IPID           ipid; // ipid of Interface
 } STDOBJREF;
      The first field of the STDOBJREF structure specifies flags relating to the object reference. Although most of the possible settings for the flags parameter are reserved for the system, the SORF_NOPING (0x1000) can be used to indicate that the object does not need to be pinged. (The DCOM network protocol uses pinging to implement a sophisticated garbage collection mechanism that is covered later on.) The second field of the STDOBJREF structure, cPublicRefs, specifies the number of reference counts on the IPID that are being transferred in this object reference. Allocating multiple reference counts on an interface obviates the need to make remote method calls every time the client calls IUnknown:: AddRef.
      The third field of the STDOBJREF structure specifies the OXID of the server that owns the object. While an IPID identifies a specific interface of a specific object instance in a process, an IPID alone does not contain enough information to carry out a method invocation. Both DCOM and RPC use strings to specify the binding information needed to carry out a remote call. RPC string bindings contain information such as the underlying network protocol that should be used to carry out the call, as well as the network address of the server machine on which the component is running. Security binding strings help DCOM figure out which parameters to pass on to the RPC infrastructure when it is ready to connect to a particular OXID. An unsigned hyper (64-bits) variable, OXID, represents this connection information. Before making a call, the client translates an OXID into a set of string bindings that the RPC system understands. The details of this translation are covered below.
      The fourth field of the STDOBJREF structure specifies the object identifier (OID) of the object that implements the interface being marshaled. OIDs are 64-bit values used as part of the pinging mechanism. The final parameter of the STDOBJREF structure is the actual IPID of the interface being marshaled.

DUALSTRINGARRAY
      As part of an object reference, the STDOBJREF structure is followed by the DUALSTRINGARRAY structure. This structure is a container for a large array that contains two parts, STRINGBINDING elements and SECURITYBINDING elements.


 // DUALSTRINGARRAYS are the return type 	
 // for arrays of network addresses, arrays
 // of endpoints and arrays of both used in 
 // many ORPC interfaces
 typedef struct tagDUALSTRINGARRAY {
    // # of entries in array
    unsigned short    wNumEntries;     
    // Offset of security info
    unsigned short    wSecurityOffset; 
 
    // The array contains two parts, a set
    // of STRINGBINDINGs
 // and a set of 
 // SECURITYBINDINGs. Each 
 // set is terminated by 
 // an extra zero. The 
 // shortest array contains 
 // four zeros.
 // [size_is(wNumEntries)]
 unsigned short aStringArray[];  
 } DUALSTRINGARRAY;
      The first two fields of the structure simply specify the total number of entries in the array (wNumEntries) and the offset where the STRINGBINDINGs end and the SECURITYBINDINGs begin (wSecurityOffset). The array itself is pointed to by the aStringArray element.
      A STRINGBINDING structure represents the connection information needed to bind to an object.

 // This is the return type for arrays of string 
 // bindings or protseqs used by many ORPC interfaces.

 typedef struct tagSTRINGBINDING {
    unsigned short    wTowerId;      // Cannot be zero.
    unsigned short    aNetworkAddr;  // Zero terminated.
 } STRINGBINDING;
The first element of the STRINGBINDING structure, wTowerId, specifies the network protocol that can be used to reach the server via the second parameter, aNetworkAddr. Figure 13 describes the valid tower identifiers for common protocols. The NCA prefix for each tower identifier stands for "Network Computing Architecture." "CN" indicates a connection-oriented protocol, whereas "DG" refers to connectionless, datagram-based protocols.
      The second element of a STRINGBINDING structure, aNetworkAddr, is a Unicode string specifying the network address of the server. For example, if the wTowerId value was NCADG_IP_UDP, then a valid network address would be 199.34.58.4.
      Each STRINGBINDING structure ends with a null character to indicate the end of the aNetworkAddr string. The last STRINGBINDING in a DUALSTRINGARRAY is indicated by the presence of two extra zero-bytes. After that come the SECURITYBINDINGs. The SECURITYBINDING structure contains fields indicating the authentication service (wAuthnSvc) and the authorization service (wAuthzSvc) to be used. The wAuthzSvc is typically set to 0xFFFF, which indicates that default authorization should be used.

 // This value indicates to use default authorization
 const unsigned short COM_C_AUTHZ_NONE = 0xffff;
 
 typedef struct tagSECURITYBINDING {
    unsigned short    wAuthnSvc;     // Must not be zero
    unsigned short    wAuthzSvc;     // Must not be zero
    unsigned short    aPrincName;    // NULL terminated
 } SECURITYBINDING;
      This information, taken in total, represents all the information needed to marshal an interface pointer to the client. In the client's address space, a proxy will be loaded and used to communicate with the stub on the server side.

The IRemUnknown Interface
      IRemUnknown is a COM interface designed to handle reference counting and interface querying for remote objects. As its name suggests, IRemUnknown is the remote version of the holy IUnknown. Clients use the IRemUnknown interface to manipulate reference counts and request new interfaces based on IPIDs held by the client. In accordance with standard reference counting rules in COM, references are kept per interface rather than per object. The definition of the IRemUnknown interface is shown in Figure 14.
      You will never implement the IRemUnknown interface, as the OXID object associated with each COM apartment already provides an implementation of this interface. The standard IUnknown interface is never remoted in COM. In its place, the IRemUnknown interface is remoted and results in local calls to QueryInterface, AddRef, and Release on the server. The IRemUnknown::RemQueryInterface method differs from the IUnknown::QueryInterface method in that it can request several interface pointers in one call. The standard IUnknown::QueryInterface method is actually used to carry out this request on the server side. This optimization is designed to cut down on the number of round-trips executed. The array of REMQIRESULT structures returned by RemQueryInterface contain the HRESULT from the QueryInterface call executed for each requested interface, as well as the STDOBJREF containing the marshaled interface pointer itself.


 typedef struct tagREMQIRESULT
     {
        HRESULT     hResult;    // result of call
        STDOBJREF   std;        // data for returned 
                                // interface
     } REMQIRESULT;
      The IRemUnknown::RemAddRef and RemRelease methods increase and decrease the reference count of the object referred to by an IPID. Like RemQueryInterface, RemAddRef and RemRelease differ from their local counterparts in that they can increase and decrease the reference count of multiple interfaces on multiple objects in the apartment by an arbitrary amount in a single remote call. Imagine a scenario where an object that received a marshaled interface pointer wants to pass it to some other object. According to the COM reference counting rules, AddRef must be called before this interface pointer can be passed to another object. This results in two round-trips: one to get the interface pointer and another to increment the reference counter. The caller can optimize this scenario by requesting multiple references in one call. Thereafter, the interface pointer can be given out multiple times without the need for additional remote calls to increment the reference counter.
      The Windows implementation of DCOM typically requests five references when marshaling an interface pointer. This means that the client process receiving the interface pointer can now marshal it to four different apartments in the current process or to four other processes. Only when the client attempts to marshal the interface pointer for the fifth time will the COM runtime make a remote call to the object to request an additional reference. In the interest of performance, client-side AddRef and Release calls never directly translate to RemAddRef and RemRelease. Instead, a remote call to the RemRelease method is deferred until all interfaces on an object have been released locally. Only then is a single RemRelease call made, with instructions to decrement the reference counter for all interfaces by the necessary amount. A call to RemAddRef is made when an interface pointer is unmarshaled and also when the client needs to remarshal an interface pointer as mentioned above.
      It is important to note that in the scenario described above, where one component returns the interface pointer of another component to a client process, COM never allows one proxy to communicate with another proxy. For example, if client process A calls object B, which then returns an interface pointer for object C, any subsequent calls made by client A to object C are direct. This happens because the marshaled interface pointer contains information on how to reach the machine where the actual object instance exists. For object B to call object C, object B must keep track of object C's OXID, IP address, IPID, and so on. When object B hands client A a pointer to object C, it scribbles all that information into a new OBJREF for client A. Object B is no longer part of the relationship, thus saving network bandwidth and improving the overall performance and reliability. If the machine hosting B went down, A could no longer call C if the calls went through B.
      After issuing the CoCreateInstanceEx function to instantiate the remote component, your client process is in possession of an initial IUnknown interface pointer. This is typically followed by a call to the IUnknown::QueryInterface method to request another interface, as shown in the following code fragment:

 hr=pUnknown->QueryInterface(
    IID_ISum, (void**)&pSum);
In practice, clients often acquire all needed interface pointers as part of the CoCreateInstanceEx call. When the client process calls the IUnknown::QueryInterface method to request an interface pointer for ISum, the proxy manager in the client's address space calls the IRemUnknown::RemQueryInterface method on the server. Figure 15 shows the network packet transmitted for the RemQueryInterface method call. You can clearly see that DCOM is requesting a count of five references for the ISum interface pointer.
      On the server side, the actual IUnknown::QueryInterface call is executed to request an ISum interface pointer from the component. This interface pointer is then returned to the client in the marshaled form of a STDOBJREF. Figure 16 shows the response PDU returned to the client.
      To prevent a malicious application from being able to call IRemUnknown::RemRelease and forcing an object to unload while other clients may still be using it, a client can request private references. Private references are stored with the client's identity so that one client cannot release the private references of another. Note that private references cannot be provided from one object to another when passing an interface pointer. Each client must request and release its own private references by explicitly calling RemAddRef and RemRelease. Both the RemAddRef and RemRelease methods accept an argument that is an array of REMINTERFACEREF structures. The REMINTERFACEREF specifies an IPID and the number of public and private references that are being requested or released by the client. Programmatically, the client specifies that it needs private references by calling CoInitializeSecurity and specifying the EOAC_SECURE_REFS capability.

 typedef struct tagREMINTERFACEREF
     {
         IPID           ipid; // ipid to AddRef/Release
         unsigned long  cPublicRefs;
         unsigned long  cPrivateRefs;
     } REMINTERFACEREF;

The IRemUnknown2 Interface
      The IRemUnknown2 interface was introduced in version 5.2 of the DCOM protocol. Derived from the IRemUnknown interface, IRemUnknown2 adds the RemoteQueryInterface2 method, enabling clients to retrieve interface pointers to objects supplying something other than a STDOBJREF in their marshaled interface packets. Like RemQueryInterface, this method queries for one or more interfaces using the interface behind the IPID. Instead of returning the STDOBJREF marshaled interface packet, this method can return any marshaled data packet in the form of a blob of bytes (including a traditional STDOBJREF). The IDL definition of the IRemUnknown2 interface is shown here: interface IRemUnknown2 : IRemUnknown { HRESULT RemQueryInterface2 ( [in] REFIPID ripid, [in] unsigned short cIids, [in, size_is(cIids)] IID *iids, [out, size_is(cIids)] HRESULT *phr, [out, size_is(cIids)] MInterfacePointer **ppMIF ); }

The OXID Resolver
      The OXID Resolver, like the SCM, is a part of RPCSS.exe. The OXID Resolver stores and provides local clients with the RPC string bindings necessary to connect with remote objects. It also sends ping messages to remote objects for which the local machine has clients, and receives ping messages for objects running on the local machine. This aspect of the OXID Resolver supports the DCOM garbage collection mechanism.
      Similar to the way CoCreateInstanceEx incorporates the functionality of CoGetClassObject and IClassFactory:: CreateInstance, the IRemoteActivation interface incorporates the functionality of both the IRemUnknown and the IOXIDResolver interfaces so that only one round-trip is needed to activate an object. The OXID Resolver service resides at the same endpoints as the SCM, described previously. Like the IRemoteActivation interface, the OXID Resolver service implements an RPC interface (not a COM interface) called IOXIDResolver, shown in IDL notation in Figure 17. Note that in Figure 17 the object keyword is conspicuously absent from the interface header, indicating that this is not a COM interface.
      When presented with an object exporter identifier, it is the job of the OXID Resolver to obtain the associated RPC string binding necessary to connect to the object. This is done on each machine by maintaining a cache of mappings of OXIDs and their associated RPC string bindings. The OXID Resolver maintains this local table, of which one is present on each machine. When asked by a client to resolve an OXID into the associated string binding, the OXID Resolver first checks its cached local table for the OXID. If found, the string binding can be returned immediately. Otherwise, the OXID Resolver contacts the OXID Resolver of the server to request resolution of the OXID into a string binding.
      The client machine's OXID Resolver then caches the string binding information provided by the server. This optimization enables the OXID Resolver to quickly resolve that OXID for other clients on the same machine that may wish to connect in the future. Should the client pass the object reference to a process running on a third machine, that computer's OXID Resolver service would not have a cached copy of the OXID's string bindings and thus would be obliged to make a remote call to the server in order to resolve the OXID for itself.
      It is the purpose of the first method, IOXIDResolver:: ResolveOxid, to resolve an OXID into the string bindings necessary to access an OXID object. The OXID being resolved is passed as the first parameter, pOxid, to the ResolveOxid method. When calling the ResolveOxid method, the client specifies what protocol sequences (Tower IDs), from most preferred to least preferred, it is prepared to use when accessing the object. The client passes this information in the arRequestedProtseqs array argument. The OXID Resolver service on the server attempts to resolve the OXID and then returns an array of DUALSTRINGARRAYS, ppdsaOxidBindings, containing the string binding information, again in decreasing order of preference, that may be used to connect to the specified OXID. The steps below explain the OXID resolution process. Assume that the client is unmarshaling an interface pointer from a new OXID:
  1. If the COM runtime within the client process has not seen this OXID before, the client asks the OXID Resolver on its machine to resolve the OXID for the server object.
  2. If the client's OXID Resolver has not seen this OXID before, the client's OXID Resolver calls IOXIDResolver:: ResolveOxid to request that the server's OXID Resolver return the string bindings for the OXID.
  3. The OXID Resolver on the server does a lookup on its local table and returns the desired string bindings to the client's OXID Resolver. If the OXID Resolver can't find the bindings in its local table, it will call into the server process so that it can start using the requested protocols. Once this happens, the new bindings are cached locally and returned to the client's OXID Resolver.
  4. The client's OXID Resolver caches the best string binding in its local table for future use and then returns the string binding for the OXID to the client process.
  5. The client binds to the object using the given string binding.
  6. The client can now invoke methods on the object.
      Because machines may have many different network protocols installed, it can be a time consuming and resource intensive operation to allocate endpoints for each available protocol sequence. Most commonly, the server registers all available protocol sequences at initialization time. As an optimization, the OXID Resolver may decide to defer protocol registration. To implement lazy protocol registration, the server OXID Resolver waits until a client machine calls its IOXIDResolver::ResolveOxid method. Rather than registering all available protocols at initialization time, the implementation of the ResolveOxid method registers only those protocols requested by the client at the time of OXID resolution.
      In version 5.2 of the DCOM network protocol, the ResolveOxid2 method was added to the IOXIDResolver interface. This method enables a client to determine the version of the DCOM network protocol used by the server when requesting OXID resolution. Note the addition of the last parameter in the IDL definition of the IOXIDResolver:: ResolveOxid2 method:

  [idempotent] error_status_t ResolveOxid2
     (
     [in]       handle_t        hRpc,
     [in]       OXID           *pOxid,
     [in]       unsigned short  cRequestedProtseqs,
     [in,  ref, size_is(cRequestedProtseqs)]
                unsigned short  arRequestedProtseqs[],
     [out, ref] DUALSTRINGARRAY **ppdsaOxidBindings,
     [out, ref] IPID            *pipidRemUnknown,
     [out, ref] DWORD           *pAuthnHint,
     [out, ref] COMVERSION      *pComVersion
     );
DCOM Garbage Collection
      While a distributed system may offer excellent availability and protection against catastrophic failure, the probability of a failure somewhere in the system is greatly increased. From a client's perspective, a failure of the network or the server will be identified by the failure of a remote method call. In such cases, an HRESULT value such as RPC_S_SERVER_UNAVAILABLE or RPC_S_ CALL_FAILED will be returned.
      A more complicated situation exists on the server if a client fails. The failure of a client may or may not wreak havoc on the server. For example, a stateless object that always remains running and that simply gives out the current time to any client that asks will not be affected by the loss of a client process. An object that does maintain state for its clients will obviously be very interested in the death of one of those clients. Such objects typically have a method, called something like ByeByeNow, that clients call before calling Release on the proxy object. But the client may never have the opportunity to notify the object of its intentions if it or the network fails. This will leave the server in an unstable state, since it is still maintaining information for clients that may no longer exist.
      RPC deals with this situation using a logical connection between the client and the server processes called a context handle. If the connection between the two processes is broken for any reason, a special function called a rundown routine can be invoked on the server to notify it that a client connection has been broken. For performance reasons, DCOM does not use RPC context handles. Instead, the ORPC protocol defines a pinging mechanism that determines if a client is still alive. A pinging mechanism is quite simple. Every so often, the client sends a ping message to an object saying "I'm alive, I'm alive!" If the server does not receive a ping message for a certain period, then the client is assumed to have died and all its references are freed.
      The simplistic type of pinging algorithm described above is not sufficient for DCOM because it creates too much network traffic. In a distributed environment where there might be hundreds, thousands, or hundreds of thousands of clients and components in use, network capacity could be overwhelmed simply by the number of ping messages being transmitted. To reduce network traffic, DCOM relies on the OXID Resolver service running on each machine to detect whether its local clients are alive, and then send a single ping message on a per-machine basis instead of per-object.
      Even with one message being sent to each computer, ping messages may still grow quite hefty, as the ping data for each OID is 16 bytes. For example, if a client computer held 5000 object references to objects running on another machine, each ping message would be approximately 78KB! To further reduce the amount of network traffic, DCOM introduces a special mechanism called delta pinging. Often, a server has a relatively stable set of objects that are used by clients. With delta pinging, instead of including data for each individual OID in the ping message, a set of OIDs are pinged by a single identifier called a ping set that refers to all the OIDs in that set. When delta pinging is employed, the ping message for five OIDs is the same size as that for one million OIDs.
      To establish a ping set, the client calls the IOXIDResolver::ComplexPing method. The AddToSet parameter of the ComplexPing method accepts an array of OIDs that should define the ping set. Once defined, all the OIDs in the set can be pinged simply by calling IOXIDResolver::SimplePing and passing it the SETID value returned from the ComplexPing method. The ComplexPing method can be called again at any time to add or remove OIDs from the ping set.
      The pinging mechanism activates the garbage collection based on two values: the time that should elapse between each ping message and the number of ping messages that must be missed before the server can consider the client MIA. Taken together, the product of these two values determines the maximum amount of time that can elapse without a ping message being received, before the server assumes the client is dead. By default the ping period is set to 120 seconds; three ping messages must be missed before the client can be presumed dead. At the present time these default values are not user-configurable. Thus, 6 minutes (3 x 120 seconds) must elapse before a client's references are implicitly reclaimed. Once the OXID Resolver on the server machine decides that an OID is to be rundown, the OID's stub manager is destroyed. The object itself is thus notified that it has no external references. It may have internal (in-apartment) references, in which case it may elect to stay alive. Usually the object will destroy itself at this point, thus reclaiming resources allocated to the client.
      Some stateless objects, like the time server example discussed previously, have no need for the DCOM garbage collection mechanism. These objects usually run forever and don't really care about a client after a method call has finished executing. For such objects, the pinging mechanism can be switched off by passing the MSHLFLAGS_ NOPING flag to the CoGetStandardMarshal function. Figure 18 shows how to use the MSHLFLAGS_NOPING flag in an implementation of the IClassFactory::CreateInstance method.
      Just before exiting, the object should execute the following code to free the standard marshaler:

 pMarshal->DisconnectObject(0);
 pMarshal->Release();
      Objects for which the MSHLFLAGS_NOPING flag has been specified will never receive calls to their IUnknown:: Release methods. Clients may call Release, but such calls will not be remoted to the object itself. Because of the highly efficient delta pinging mechanism, turning off pinging for an object will typically not cause a corresponding reduction in network traffic. As there are other objects on the server that do require ping messages, DCOM must still send a ping message to that machine by calling the IOXIDResolver:: SimplePing method. The only difference is that the object that has specified the MSHLFLAGS_NOPING flag will not be added to the SETID that is being pinged.

A Remote Method Call
      With an understanding of the ORPC network protocol under our belts, let's examine the data transmitted across the network during an actual remote method invocation. Figure 19 shows the Request PDU sent when the client process calls the ISum::Sum method. Immediately following the ORPCTHIS parameter are the x and y inbound parameters of the Sum method. Here the Sum method has been called with the values 4 and 9.
      After the Sum method executes on the server, the Response PDU is generated and sent back to the client (see Figure 20). Clearly visible following the ORPCTHAT parameter in the response PDU is the outbound value of 13 (4 + 9).
      Once you understand how COM marshals interface pointers in the context of the DCOM network protocol, examining individual method calls becomes a simple exercise. The network packets in the figures will be very similar for most method calls, with the exception of the actual arguments transmitted. We hope this article has given you a taste for what goes on under the covers of DCOM. Large parts of the DCOM network protocol covered in this article are not configurable. In Windows NT 5.0, new COM functions and standard interfaces have been introduced to allow developers to control these options directly.

From the March 1998 issue of Microsoft Systems Journal.