May 1996
Introducing Distributed COM and the New OLE Features in Windows NT 4.0 |
The most exciting OLE-related feature in Windows NT 4.0 will likely be Distributed COM (DCOM). This means that in this version of Windows NT you can instantiate and bind to objects across the network. Windows NT 4.0 introduces many other features that represent a maturation of the API. |
Don Box is a co-founder of DevelopMentor where he manages the COM curriculum. Don is currently breathing deep sighs of relief as his new book, Essential COM (Addison-Wesley), is finally complete. Don can be reached at http://www.develop.com/dbox/default.asp. |
The wait
is over. The most highly anticipated release of OLE is now in beta, and will be released upon
all of humanity within a few months of your reading this. I'm not talking about Cairo; I'm talking about Windows NT? 4.0.
The most exciting OLE-related feature in Windows NT 4.0 will likely be Distributed COM (DCOM). This means that in this version of Windows NT you can instantiate and bind to objects across the network. Windows NT 4.0 introduces many other features that represent a maturation of the API, most of which have nothing to do with sending packets on a network. The key highlights of the Windows NT 4.0 release of OLE include:
Ground Zero: MIDL
|
|
The attributes attr1 and attr2 apply to the variable var1. Except for the extended attributes, writing IDL is much like writing a standard C or C++ header file. IDL files typically consist of one or more interface definitions. Each interface definition contains the structure and enumeration statements used by the interface and definitions of each method exported by the interface. Figure 1 shows an IDL file that defines a simple COM interface named IPager. Note that, in addition to describing the vtable signature of the interface used for in-process implementations, the IDL file also describes the RPC messagesor Protocol Data Units (PDUs), as they're typically referred to by network gurus. PDUs are used to remote the method calls to objects in different address spaces (which may be on different host machines). Each method in an interface defines two messages or PDUs. These messages are sent back and forth between the MIDL-generated proxy and stub when the target object exists in a different thread, process, or host machine. The Request PDU is sent by the client-side proxy to the stub to invoke the object's method, and it contains the values of any parameters with the [in] attribute. The Response PDU is sent as a reply from the stub back to the proxy to indicate that the method call executed. The Response PDU contains the values of any parameters with the [out] attribute. Based on the method definitions in the IDL file, the MIDL compiler can generate proxies and stubs that translate between the stack frame and the Request and Response PDUs correctly. I don't have enough space to explain all the subtleties of IDL here, but Figure 2 describes some of the common IDL attributes used in COM.
The new MIDL 3.0 compiler bundled with the Win32® SDK incorporates the functionality of MKTYPLIB, the ODL compiler used to generate type libraries (binary descriptions of interfaces and implementations used by development environments such as Visual Basic® and Visual C++). By extending IDL to support ODL keywords and constructs, MIDL 3.0 eliminates the need to use different languages for proxy/stub implementations and type libraries. This also means that many of the extremely useful features of IDLlike inserting code into the generated header files and producing const-correct parameter listsare now available when defining type libraries for OLE Automation. To generate a type library with MIDL, your IDL file must contain a type library definition. Unlike ODL, IDL files can contain definitions outside the scope of a library definition. (In ODL, the library statement must be the topmost definition in the file.) Like the previous version, MIDL 3.0 will generate C/C++ bindings and proxy/stub implementations for all of the interface definitions in the file (proxy/stub implementations can be suppressed by using the local attribute). The generated type library will contain only type descriptions for interfaces that are defined or referred to within the scope of a library definition. For the IDL file shown in Figure 3, the generated type library contains only descriptions of IInside (since it was defined inside the scope of the library statement) and IOutside (since there is a reference to IOutside by a statement inside the library). The library would not contain any references to IOutside2, so it is not present in the generated type library. MIDL is the future, and MKTYPLIB is the past. You will be compiling your type libraries with MIDL very soon. The MIDL 3.0 compiler can compile most existing ODL files with little or no modification, but there are several incompatibilities related to ODL and MKTYPLIB. The ODL Boolean data type differs from the IDL Boolean data type. ODL treats Boolean and BOOL as VARIANT_BOOL, while IDL treats Boolean as an unsigned char and BOOL as a long. If you are looking for Visual Basic compatibility (Automation-compliant Booleans), IDL supports the VARIANT_ BOOL type directly. The scope of typedef names and structure/union/enum tags is different. Given the statements |
|
the MIDL-generated type library differs from the MKTYPLIB generated library. (In fact, MKTYPLIB will not even compile the third statement.) The first line is legal IDL and ODL, but MKTYPLIB ignores the enum tag name (tagWIDTH) and generates a TKIND_ENUM entry in the type library based on the typedef name WIDTH. Like C++, MIDL assumes the tag is the actual type name and generates a TKIND_ENUM entry for tagWIDTH and a TKIND_
ALIAS entry for the name WIDTH. If no structure tag is present (as is the case in the second statement), MIDL creates a unique name for its TKIND_ENUM entry and generates a TKIND_ALIAS entry that refers to it (in this case, HEIGHT). Since MKTYPLIB ignores tags, it generates a single TKIND_ENUM entry for HEIGHT. The third statement shows the IDL-style definition, which is similar to C++. The generated type library would contain a single TKIND_ENUM entry for COLOR, which is the desired result. Unfortunately MKTYPLIB does not recognize this syntax, which presents a problem if you need to maintain backwards compatibility with older development environments.
ODL-generated header files use the DEFINE_GUID macro to declare and define the GUIDs used in the ODL file. IDL-generated files simply declare the GUIDs as extern in the header file, and define them separately in a generated C file (xxx_i.c). In addition, ODL supported both integral and floating point constants. IDL supports integral constants only. The scope of an enumeration name was local to the enum under ODL. Like C and C++, enumeration names have global scope in IDL. The following is legal ODL: |
|
IDL prohibits reuse of the typedef name as an enumeration name.
To support legacy ODL files, MIDL can compile in MKTYPLIB compatibility mode. (It's enabled by the command line switch /mktyplib203.) When using this mode, your file must comply with the old ODL syntax and proxy/stub implementations are not supported. Fortunately, the C preprocessor is supported by both MIDL and MKTYPLIB, so differences between the two can be addressed using conditional compilation.
Threading and More Threading
|
|
Under the Apartment model, each thread that will use OLE must call CoInitializeEx or CoInitialize (which is now shorthand for calling CoInitializeEx with COINIT_APARTMENTTHREADED as the second parameter). If the process is to be freethreaded, only one thread needs to call CoInitializeEx with the COINIT_MULTITHREADED flag. In freethreaded processes, objects created by any thread are freethreaded and method calls are not serialized in any way. As we go to press, the threading model is an attribute of a process, which means all threads within a single process must share the same threading model (freethreading or Apartment). Calls to CoInitializeEx with a model other than that used by the first thread will fail with an HRESULT value of RPC_E_CHANGED_MODE. Supports for mixed model processes are planned for a future OLE release.
Freethreaded OLE is actually a simplification of the Apartment model. As Figure 6 illustrates, when an incoming method request arrives at the RPC layer, the RPC receive thread that receives the request calls into the stub's Invoke method directly. There's no thread switch and no message queue. Since the object was marshaled as a freethreaded object originally, no guarantees are made about which thread would invoke the method. On the client side, when a freethreaded program invokes a method on a proxy, no additional thread is needed to make the blocking RPC call. The client thread can block since there were no guarantees on which thread would service incoming method requests for existing objects created by the client. If incoming calls arrive while the client is blocked, the RPC receive thread that receives the PDU calls into the stub directly. This implies that nested calls (or callbacks) are always executed by different threads and that any thread local storage used by the originating client thread is not available in the thread of the nested call. In the current release of Windows NT 4.0, the client's threading model does not have to match the threading model of local or remote servers accessed by the client. This makes sense since the client and server control their own concurrency. In the case of Inproc servers and handlers, the DLL must be loaded into the client's world and be capable of correct operation in the client's threading model. The registry supports the named value "ThreadingModel" which must be present at the InprocServer32 or InprocHandler32 key for each CLSID exported by the DLL, so inproc implementations can advertise their level of concurrency support. The supported values of ThreadingModel are Apartment (client must be an apartment model process), Free (client must be a freethreaded process), and Both (client can be either apartment or freethreaded). Needless to say, there is also support for single-threaded DLLs. Absence of the ThreadingModel key implies a single-threaded DLL. Single-threaded DLLs can be used by only one thread in an apartment process. DLLs written before the Apartment model (that have no ThreadingModel key) are not guaranteed to be thread safe. In the prerelease version of Windows NT 4.0 available as I write this, there is no support for loading DLLs with threading models that are incompatible with the client's threading model. Instead, the call to CoGetClassObject or CoCreateInstance simply fails. The following is an example of a REG file for a free and Apartment-safe Inproc server: |
|
Of course, modern implementations would not use REG files, but would call RegSetValueEx from within the DLL's DllRegisterServer function.
Security
The implementation of OLE in Windows NT 3.51 has very little support for security. Local servers running under Windows NT 3.51 use the same permissions (access token) as the interactive user. Also, any CLSID that's visible in the registry can be instantiated irrespective of the current login. This makes it difficult to write an object implementation that performs privileged operations (like reading sensitive files and manipulating system configurations) without compromising security.
|
|
If the call succeeds and the value is "Y" or "y," the client can instantiate the object. If the call fails due to a security violation or if the key has any other value than "Y" or "y," the SCM assumes that the caller does not have permission to launch the server and CoCreateInstance fails. If the RegQueryValue fails because the LaunchPermission key is not present, then this class has no per-class launch permissions and the default launch permission for this machine must be consulted.
If a CLSID does not have a LaunchPermission subkey, the machine-wide DefaultLaunchPermission key should be used to verify the client's credentials. |
|
The SCM will try to read this key while running with the client's access token. If the call to RegQueryValue succeeds and the value is "Y" or "y," then the SCM assumes that the client can instantiate the object. If the call fails because of a security violation, or it returns any other value, then the SCM assumes that the client does not have permission to launch the server and the SCM fails the call to CoCreateInstance. If this key is missing altogether, the SCM assumes the client does not have permission to launch and CoCreateInstance fails.
Once the SCM verifies that the client has permission to launch, it must create the server process if it is not already running. By default, it starts the server using the same login and permissions as the client. This default behavior is reasonable because it does not require any explicit participation from either the client or object to achieve the correct, secure behavior. Upon creation, the window station and desktop of the server process are the same as the client. Windows NT supports multiple window stations and desktops. This prevents system-controlled processes from accessing the user-interface of the interactive login and decouples the lifetime of a process from the current login session. Normal OLE client applications that are started interactively by the user run under the interactive window station, and any servers that are started can create windows, access the clipboard, and behave as if they were also started interactively. In short, Word and Microsoft® Excel work fine when launched from the Windows® Explorer. You may find it useful to run a server as a particular user account, to either enhance or restrict the operations that the object can perform. OLE now supports the RunAs subkey, so the object implementation can run the server as a given user. |
|
When this key is present, the default value contains a user name (it can be qualified with a domain name). The correct password must also be present in a separate, private area of the registry that cannot be manipulated by using RegEdt32. (At this time, these keys must be added using the olecnfg tool, a simple registry manipulation tool that ships with the prerelease version of DCOM. API support should be available in a subsequent release.) If the RunAs key is present, the SCM creates the server process to run as the specified user. Note that the server runs in a different desktop and window station from the current interactive user. With separate desktops, when the current interactive user logs off, the server is not forced to shut down. The downside is that you cannot display any windows or collect user input without considerable effort. You can get UI-intensive objects to interact with the user by configuring the object's CLSID to run using the access permissions, desktop, and window station of the current interactive user.
While it is useful to specify which user account the server should use, you can also impersonate the client's access rights for the duration of a method call dynamically. To support this, COM provides the following two API functions: |
|
You can call these functions only while a thread is servicing a method on behalf of a client. Like RpcImpersonateClient, ImpersonateNamedPipeClient, and DdeImpersonateClient, they toggle the access token between the native access token and the client's access token based on a connection to a process and thread. To see how this functionality is useful, consider a COM class that lets clients make file system calls and logs the results in a secure logfile. To let objects write to the secure logfile, you must configure the CLSID to run as a privileged user. However, when actually making file system calls on behalf of the client, the object should not allow operations to succeed unless the client could perform the operation with native API calls directly. Figure 7 shows how a method on such an object would be implemented.
While the security functionality described here is sufficient for most OLE clients and objects, you can control access to your objects with much more detail. OLE supports interfaces and API functions for controlling message integrity, encryption, and other security attributes. Consult the Win32 SDK documentation for more details. Windows NT Services
Windows NT Services are normal Win32 processes that run independent of user logins. Services can be configured to run as a particular user, to start on demand or at boot time, and to shut down or pause. Services can be configured to start other Services that are required for correct operation. Services are identified by name, not by the path name of their executable. Services share a single API and control panel for controlling execution on local and remote hosts. In short, Services are great for implementing functionality that needs to be available at all times.
|
|
Note that the LocalService key value is not an absolute path name, but instead is a simple Service name suitable for passing to OpenService. OpenService consults the registry to resolve the Service name to a path name. If the named value ServiceParameters is present, it is used to construct the command line passed to the Service's ServiceMain routine. These registry keys assume that the Service has been installed properly through the CreateService API function.
Remote Instantiation
Windows NT 4.0 provides the ability to instantiate objects on remote machines and to bind to objects running on remote machines. In keeping with COM's tradition of in-proc/out-of-proc transparency, you can access objects that exist on remote hosts as if they were instantiated in your own address space. Because COM is a binary object model, it is possible to add support for DCOM without recompiling either the client or the object. To understand how this is accomplished, reexamine the most fundamental COM API function, CoGetClassObject.
|
|
The first two parameters are used by the SCM to locate the implementation of the object. The dwClsCtx parameter lets the client indicate a preference as to loading the object in-process or out-of-process. The dwClsCtx parameter is a bit-mask that supports the following values:
|
|
Based on the bits set in dwClsCtx, the SCM consults the appropriate keys in the registry and loads the appropriate implementation. Figure 8 shows the algorithm used by the SCM to load an implementation via CoGetClassObject or CoCreateInstance.
The implementation of CoGetClassObject and CoCreateInstance in Windows NT 4.0 supports an additional subkey that allows administrators to specify the machine on which a CLSID should be instantiated. |
|
In the current implementation, if no inproc or local implementations are found using the algorithm in Figure 8, the SCM attempts to read the RemoteServerName key. If the SCM finds the RemoteServerName key, it contacts the SCM on the specified host (in this case, da.ics.uci.edu) and requests that the class implementation be loaded as a local server. When the remote SCM successfully launches the server, the interface pointer passed by the server's call to CoRegisterClassObject is marshaled across the network back to the client. From that point on, the client communicates with a remote host machine each time a method call is invoked.
This technique for specifying host names for CLSIDs lets legacy applications communicate with remote objects without recompilation. It also allows you to specify the location of the implementation at installation time instead of hard-coding it into the application. For clients that want to launch a remote server, the following CLSCTX value was added to the API for DCOM: |
|
The CLSCTX_ALL and CLSCTX_SERVER macros now include this bit as well. The following structure has been defined so the client can indicate the host name explicitly:
|
|
The dwSize parameter must contain the structure size and is used for versioning. The pszName member must contain a Winsock-compatible host name.
The CoGetClassObject API function was subtly overloaded to allow clients to specify hostnames. |
|
In the pre-DCOM version of CoGetClassObject, the reserved third parameter had to be zero. Passing null to the current version simply means that the client does not specify a remote host name. If the third parameter is non-null and points to a valid COSERVERINFO structure, then the SCM on the client machine uses the pszName member to contact the remote SCM that launches the class implementation. The code shown in Figure 9 launches a server on the machine LOLA. Note that the host name can be specified using UNC, DNS, or raw IP addresses.
The original version of CoGetClassObject had a reserved parameter and was easily extended to support explicit host names. CoCreateInstance has no reserved parameters to spare and simply cannot support explicit host names. To support explicit host names in a single API for instantiation, Windows NT 4.0 introduces a new API function, CoCreateInstanceEx. |
|
CoCreateInstanceEx differs from CoCreateInstance in two ways. First, it allows an explicit host name via the COSERVERINFO struct. Second, it lets the client receive more than one interface pointer to the newly created object, eliminating the need for multiple QueryInterface calls to bind each interface. As has always been the case in COM, performance suffers as more trips are made from the client to the object. By using the MULTI_QI array, the number of round-trips to instantiate and access the object can be reduced. Figure 10 shows the process of creating an object using CoCreateInstanceEx. Note that the client only needs one call to get the class factory, call CreateInstance on the class factory, and call QueryInterface for each interface pointer. CoCreateInstanceEx is optimized to perform all three steps in one trip.
Client programs call CoCreateInstance to create new objects. Often, clients want to connect to objects that are already running. This was traditionally accomplished by using monikers and the Running Object Table (ROT). Monikers are COM objects that identify particular instances of some COM class. The ROT is a directory service maintained by the SCM. The ROT maps monikers onto the running instances they identify. Objects that want to be found in this manner must register their moniker in the ROT. Typically, this is a file moniker that identifies the object's persistent state. If a client needs to connect to the object, it creates a file moniker and binds to it by calling either its IMoniker::BindToObject method directly or the BindMoniker API function. |
|
If the object's moniker was registered in the ROT, the moniker returns a new interface pointer to the object by calling the object's QueryInterface method. If the object is not running, then the file moniker instantiates the object and instructs it to initialize from the specified file via the object's IPersistFile::Load method. Once initialized, the moniker would QueryInterface for the pointer to return to the client. Once loaded, the object should register itself in the ROT so future bindings on the same moniker yield the same instance.
Irrespective of the file's location, in Windows NT 3.51 the object would always be instantiated on the client's machine. This is still the default behavior under Windows NT 4.0. Since this is the way COM worked in the past, it's a reasonable default. However, if the object uses incremental reads and writes, the performance costs for doing file I/O to the remote file system could swamp any advantages gained by instantiating on the client's host machine. Additionally, if clients on different machines bind to the same file moniker, there will be multiple instances, violating the object identity of the file. You can avoid these problems through the use of the ActivateAtStorage registry key. In the implementation of the file moniker's BindToObject method, it first finds the CLSID of the file by calling the API function GetClassFile. If the CLSID does not have an ActivateAtStorage subkey in the registry, or if it does have one but its value is something other than "Y" or "y," it is assumed that the object should be activated locally and the normal moniker binding takes place on the client's machine. If the CLSID does have an ActivateAtStorage subkey and its value is "Y" or "y," the object must always be instantiated at the host machine where the file is located. The file moniker extracts the location of the file from the UNC name and then asks the SCM on the remote machine to bind the moniker at the remote host. Once the remote SCM binds the object (by consulting its ROT or by instantiating and loading a new instance), the interface pointer is marshaled back to the client. If the object was not running already, the instantiation is checked against the client's permissions (as in CoCreateInstance). If the object is running, the CLSID's AccessPermission subkeyor if that's not present, the host machine's DefaultAccessPermission keyis consulted (much like the use of LaunchPermission and CoCreateInstance). This parallel scheme for verifying moniker binds makes it possible for clients to link to existing objects but to prohibit the creation of new instances. For efficient bindings to persistent objects, Windows NT 4.0 introduces two new API functions that allow the caller greater flexibility than normal moniker binding. CoGetInstanceFromFile and CoGetInstanceFromIStorage both allow the client to provide: either a file name (CoGetInstanceFromFile) or an IStorage pointer to a compound file (CoGetInstanceFromIStorage); a hard coded CLSID or a default derived from the specified file's content; an explicit host name using COSERVERINFO or the default based on the RemoteHostName and ActivateAtStorage Registry keys; and an array of MULTI_QI structures to bind all of the interface pointers to the object in one trip. Like CoCreateInstanceEx, these functions are implemented to perform all operations in one trip between object and client. Property Persistence
Property sets are useful for serializing the state of an object in a uniform, platform-independent manner so that the contents of the property set can be parsed without requiring the creator to interpret the contents. Property sets serialize the Summary Information used by most Windows-based applications and allow users to examine document attributes, such as author or subject, without opening the file. They are used in OLE Controls as well and ultimately will allow content-based queries across application boundaries.
|
|
Property Sets are stored as streams or storages and are uniquely identified by their Format ID GUID (FMTID). If you consult the OLE 2 Programmer's Reference, you will note that a single property set could contain multiple formats or sections, and that the property set is identified by some distinguished stream name ("\005SummaryInformation" for the Summary Information stream). This functionality was redundant in the face of structured storage, so property sets are identified now by their FMTID. It is assumed that, if the property set is stored in a stream, it contains exactly one section. To map the FMTID onto a stream name, the implementations of Create and Open first check for the well-known FMTIDs. (There are two: one for Summary Information, and one for Office Document Summaries.) If the requested FMTID identifies either of these streams, the hard-coded names are used to open the stream that contains the properties. If the requested FMTID is not one of the two well-known names, the FMTID GUID is ASCII-ized (a la uuencode) into a unique string less than 32 characters long that is used to identify the storage or stream containing the persistent properties.
Windows NT 4.0 implements two kinds of property sets: simple and non-simple. A simple property set is implemented in a single stream and contains flat data types only (like ints, floats, strings, blobs). A non-simple property set is implemented as a sub-storage. It contains a single stream, named Contents, that contains the flat properties. Any hierarchical properties (such as nested streams and storages) are stored as siblings to the Contents stream. The property set is created as either simple or non-simple through the use of the PROPSETFLAG_NONSIMPLE flag that's passed to IPropertySetStorage::Create. Once created or opened, both simple and non-simple property sets are manipulated via their IPropertyStorage interface (see Figure 11). The main goal of this interface is to let clients read and write groups of named properties in the property set efficiently. In a property set, individual properties are identified by both a unique integer (called the PID) and by a human-readable name. To allow accessing properties based on either type of identifier, properties are identified at run time by the PROPSPEC structure. |
|
To access the property value, an extended version of the OLE Automation VARIANT data type is used (see Figure 12). The CA types are counted arrays, which are simply length-prefixed arrays of some data type. As with VARIANTs, when a function passes PROPVARIANTs to the caller as [out] parameters, it is the caller's responsibility to free any resources held by the PROPVARIANT. OLE provides the FreePropVariantArray API function that frees all of the resources held by an array of PROPVARIANTs.
Given the new implementation of property sets, it is trivial to read and write Summary Information streams. Figure 13 illustrates how to read the Author and Title properties from an arbitrary file. Note that the Summary Information format stores its string properties as ANSI strings. As is always the case in OLE, any new property set formats should store their strings as Unicode. Conclusion
Will Distributed COM change the world and render all other communications technologies obsolete? Perhaps, but I doubt anyone expects it to. The Windows NT 4.0 release of COM represents the next logical step for the technology. Several holes have now been filled. Since one of these holes is networking, people have extremely high expectations given the current sensitivity to all things Internet.
|
Have a question about programming with ActiveX or COM? Send your questions via email to Don Box at dbox@develop.com or http://www.develop.com/dbox/default.asp |
From the May 1996 issue of Microsoft Systems Journal.