May 1999
Windows 2000 Brings Significant Refinements to the COM(+) Programming Model |
The next release of Windows NT, Windows 2000, is slated to bring more new features than any of its predecessors. None of these features will likely have as profound an impact as the integration of the Microsoft Transaction Service (MTS) into the COM programming model. |
This article assumes you're familiar with COM |
This article was written in January 1999 based on a pre-beta 3 version of Windows 2000. Details mentioned here may change prior to the release of Windows 2000. To ensure that readers don't make assumptions based on prerelease information, the author is maintaining a Web page that will track the post-publication changes he wishes he could make to this article due to changes in the product. Please surf to http://www.develop.com/dbox/com2k to see what's changed since January 1999. |
Don Box is a co-founder of DevelopMentor, a COM think tank that educates the software industry in COM, MTS, and ATL. Don wrote Essential COM and coauthored the follow-up Effective COM (Addison-Wesley). Reach Don at http://www.develop.com/dbox.
|
Each release of Windows NT® since version 3.5 has brought refinements to the COM programming model. Windows NT 3.5 introduced the first 32-bit implementation of COM, although only one thread per process could actually do anything with it. Windows NT 3.51 (and Windows® 95) made it possible to use COM from any thread in a process, and introduced the concept of an apartment.
The Windows NT 4.0 release of COM ushered in the new multithreaded apartment type, better integration with Windows NT security, and a new IDL compiler that eliminated the need for separate IDL and ODL files. Windows NT 4.0 also added support for cross-machine activation and method invocation. However, Distributed COM had relatively little impact on the programming model itself, which had supported cross-process invocation from its inception. Security, MIDL 3.0, and the multithreaded apartment (MTA) had far more impact on the COM programming experience than did the ability to perform remote method invocation. The next release of Windows NT, Windows 2000, is slated to bring more new features than any of its predecessors. The lengthy list of features described in my House of COM column in this issue of MSJ attests to the fact that the COM team at Microsoft has had their fair share of on-campus dinners over the last 24 months. While many of these new features were developed to solve common COM problems faced by developers, none of them will likely have as profound an impact as the integration of the Microsoft® Transaction Service (MTS) into the COM programming model, which is the focus of this article. (For more on COM, see What is COM+? and MTS Anachronisms.) The Motivation for Change Looking back to the 1980s and early 1990s, Microsoft (as well as most other major platform vendors) used handle-based APIs to expose a platform's services to application programmers. With each new technology, one or more new handle types would be defined (such as HWND, HSTMT, and HINTERNET) that represented opaque references to resources managed by the underlying technology. Additionally, a DLL would be shipped as part of the technology, exposing functions that took the new handle type as a first parameter. (Typically a CreateXXX function would be provided to allocate the resource that the handle represented.) This made the programming model fairly easy and regular for C or C++ programmers who routinely wrote code that looked very similar to this: In theory, there is nothing wrong with exposing technology in this manner; however this approach has its downsides. Every C++ programmer who uses the technology will notice three or more functions with the same first parameter and will feel a compulsion to write a wrapper around the technology. Not that there is anything wrong with wrappers, mind you; but if every C++ developer writes their own wrapper, literally hundreds of person-hours will be wasted on wrappers that do little beyond familiarize each writer with the intricacies
of the API.
A more fundamental problem with handle-based APIs is that they often assume that the entire world programs in C or C++. Any Visual Basic® programmer worth his or her salt knows how to write a Declare statement to allow a DLL to be used from Visual Basic. While it is possible for the technology implementor to ship a .BAS file containing the appropriate Declare statements, what about other languages? In practice, everything beyond C, C++, and perhaps Visual Basic is ignoredas most development organizations can't justify hiring additional programmers who are aware of every known development environment just to generate additional language bindings. This means that developers who don't work in C or C++ wind up spending nuisance time translating C-based function signatures into a form their development tool can consume. This problem was solved when Microsoft shipped COM in 1993. As COM becomes more accessible and popular, more technologies have been exposed as COM object models and interfaces in lieu of handle-based APIs. Instead of handles and APIs, a technology or platform implementor defines one or more COM interfaces that expose the desired functionality. In most cases, the object model implied by these interfaces gets bootstrapped using CoCreateInstance. This means that the previous handle-based code fragment could be rewritten as follows: Note that except for some syntactic differences, the basic programming model of alloc/use/release is unchanged.
One of the primary advantages of using COM is that by shipping either an IDL file or a type library, any COM-aware development environment can consume the technology's interface definitions and translate them into something usable in the programmer's language of choice. (Of course, this assumes that the interfaces are simple enough to work in pointer-challenged languages and script.) This allows the technology implementor to focus on getting the technology to work, without being distracted by five different programming languages. Additionally, because of the object-oriented nature of COM, fewer C++ programmers feel compelled to write wrappers and will instead actually write code that solves application-domain problems. While exposing new functionality through either COM interfaces or API functions is simple and effective, both approaches have some significant drawbacks. First and foremost, the problem with both approaches is that the application programmer must write executable statements to take advantage of the platform or technology. Not that there is anything inherently wrong with executable statements, but it is extremely difficult for the underlying platform to glean your intent if all that is available to analyze is x86 (or Alpha) machine code. To understand why this is a problem, consider the following scenario. A component developer wants access to his or her object to be serialized. Under Windows NT 4.0 or earlier, there are several ways to do this. The programmer can elect to program against the Win32® critical section API in the method code of his component. Alternatively, the programmer can elect to simply annotate his class as ThreadingModel=Apartment, at which point COM will guarantee that no concurrent access will take place. The first approach uses explicit programming against the platform to achieve the desired effect. The second approach requires no explicit code, as the platform (in this case COM) will implicitly provide the service you want. In general, having the platform provide a service is preferable to writing it yourself. One obvious reason is that it reduces development costs since there should be less application code to write and maintain. A related benefit is that relying on shared infrastructure code and letting the platform do some of the work usually reduces code size and working set. It is also likely that the underlying platform will have gone through more testing than the average piece of application code. This still doesn't quite explain why the ThreadingModel=Apartment approach is superior to the use of critical sections, since both rely on the platform to do much of the work. The reason is somewhat subtle. The critical sections approach requires small bits of executable code to be sprinkled throughout the component implementation. Each sprinklingno matter how smallresults in another opportunity for a human being to make a mistake. The ThreadingModel=Apartment approach requires zero lines of code. This means that there is virtually no chance that you'll write a buggy line of code that can only be repaired by recompiling your component. This also means that as long as the underlying platform gets it right, your code will be in reasonable shape. Another advantage of using ThreadingModel=Apartment is related to coupling. By having no code rely on hardcoded function/method signatures or semantics, the platform vendorin this case Microsofthas more freedom to change the underlying implementation without breaking your code. In theory, as long as the service semantics don't change, your code will continue to work. The primary advantage of this declarative approach is that it introduces the least possible amount of coupling between the component and the platform. ThreadingModel=Apartment is the canonical example of declaratively indicating your intent through "out-of-band" attributes instead of "in-band" explicit code. This is precisely the idea that permeates the MTS programming model. Instead of writing reams of code to take advantage of transactions, serialization, or security, MTS uses declarative attributes to control and extend the behavior of a component. This doesn't mean that you don't have to write any code; rather, it means that the amount of code required to take advantage of these services is dramatically reduced. MTS raises the level of abstraction used in component development primarily through the mechanism of attributed components. Windows 2000 is slated to bring a fundamental shift to COMthe formalization of declarative programming using attributed classes and context as part of the core COM programming model. The Catalog Manager If you are going to annotate your classes with attributes, the values of these attributes need to be stored somewhere. While storing attributes as static resources or data inside the component DLL is great for attributes that only the developer can correctly determine (such as transactioning or concurrency management), some attributes are best chosen at deployment time (security IDs and resource thresholds, for example). Rather than relinking the DLL every time the administrator wants to add a new security ID, component configuration information must ultimately be stored offline from the component DLL for ease of modification. Prior to Windows 2000, MTS stored component configuration information separately from the COM HKEY_CLASSES_ROOT\CLSID. Configuration management tools (including the MTS Explorer) used the Catalog Manager to manipulate the configuration for a class. As shown in Figure 1, the MTS Catalog Manager was a scriptable component that not only stored the attributes of a class, but also stored the file name of the DLL that contained the class code. This was needed because the Catalog Manager had to overwrite your InprocServer32 entry to point CoCreateInstance to the MTS Executive instead of your code. It was the job of the MTS Executive to carry out all of the services you had configured prior to forwarding the call to your object (more on this later). |
Figure 1 Catalog Managers |
Under Windows 2000, things should work more or less the same as they did under MTS. As shown in Figure 1, COM provides a Catalog Manager that controls not only HKEY_ CLASSES_ROOT\CLSID, but also an auxiliary configuration database (currently called RegDB). Note that under Windows 2000, the file name of the component DLL can remain under InprocServer32. This is because support for component configuration will be built into COM; no MTS Executive is required. At activation time, CoCreateInstance inspects the catalog to determine which (if any) additional services your class needs and ensures that your new object receives the services it requested.
Not all COM components are registered with the Catalog Manager. When REGSVR32.EXE self-registers a component that simply uses the Registry API to insert InprocServer32 key and perhaps a ThreadingModel entry, the component is considered a nonconfigured or legacy component. Only components that are explicitly installed into the Catalog can have extended attributes (such as synchronization and transactioning). These components are called configured components.
Figures 2 and 3 show the range of configuration options that are planned for Windows 2000. Figure 2 shows the attributes that can be set on a class, interface, and method basis. COM also supports the idea of an application, which is a collection of COM classes that share certain configuration settings. Figure 3 shows these shared attributes and their possible settings.
At the time of this writing, the only way to become a configured component is to use the Component Services Explorer or to explicitly program against the Catalog Manager interfaces. Additionally, except for transactioning (which can be set in your IDL file), the component developer has no control over the initial attributes that will be used when the component is installed in the Catalog Manager. The current plan is to allow components to carry their configuration attributes with them as custom resources to allow the component developer to set the required attributes in their development environment. Exactly how this will work has yet to be fully documented.
Interception Basics
At this point, it should be reasonably clear how a class is configured at development and deployment time. What remains to be seen is how COM will enforce the configured semantics at runtime. The key is interception, a fairly simple concept that can be traced at least as far back as MTS, if not further. Here is the basic idea:
That's it. Read those five items again. The idea described above is one of the cornerstones of modern COM programming and is the focus of the remainder of this article.
To make things concrete, let's examine how these five steps are applied in Windows NT 4.0. Consider a class that resides in a DLL and is registered as ThreadingModel=Apartment.
The key observation is that the proxy exists to intercept method calls and cause them to execute in an environment that keeps the object happy. The fact that I used apartments as the example is based on historical accident. As the next section will explain, the idea of partitioning objects into execution scopes far more general than apartments permeates the COM programming model under Windows 2000.
Contexts and Interception
As shown in Figure 4, COM under Windows 2000 partitions a process into contexts. A context is a collection of objects that share runtime requirements. Because different classes can be configured with different requirements, a process often contains more than one context to separate incompatible objects from one another. Some configuration settings allow an object to reside in a shared context with other like-minded objects. Other configuration settings force an object to reside in a private context that no other object will ever reside in. Exactly which configurations cause an object to reside in a private context has yet to be finalized at the time of this writing. One exception to all of this is that CoCreateInstance against a nonconfigured class always results in an object that resides in the activator's context (provided the class's ThreadingModel attribute is compatible with the activator's apartment type). More on this later.
Figure 4 COM Process Contexts |
Each context in a process has a unique COM object that represents it. This object is called the object context (OC). Objects can access their context's OC by calling the CoGetObjectContext API. Through the OC, components interact with the services that are provided by their context such as transactioning and just-in-time (JIT) activation, typically using the IObjectContextInfo interface.
Figure 5 Using Proxies Across Contexts |
As Figure 5 shows, proxies are used to allow objects to call across context boundaries. These proxies perform whatever interception services are required to switch the runtime environment from the caller's configuration to the target object's configuration. These interception services may include transaction management, lock acquisition, thread switching, JIT activation, or something even more exotic.
Proxies are needed whenever an object will be called from outside of its context. Under Windows 2000, all object references returned by an API function or method call are context-relative. This means that the reference you get back from CoCreateInstance (or any COM API or method call) can only be used within the current context. The reasoning behind this is fairly simple.
Consider the case where CoCreateInstance returns a raw reference. This means that the object resides in the current context and depends on the current runtime environment for proper operation. It is illegal to share this reference with another context without explicit support from COM. For example, simply storing a reference in a global variable for objects in other contexts to use is strictly illegal. If an object in another context were to use the raw reference, the object's methods would execute without the benefit of interception. This means that the caller's runtime environment would be used instead of the one the target object expects. If the object relied on having a transaction as part of its runtime environment, tough! It may not have one, or worse yet, it might wind up doing work against the caller's transaction (which may be a different transaction). Virtually all other configured services would also malfunction if the call is processed in the wrong context.
Context Relativity
Understanding why object references are still context-relative even though they refer to a proxy requires more explanation. The proxy returned by any API or method call is configured to run a certain set of interception services based on the differences between the object's context and the context where the reference is initialized. Passing this proxy to another context is not guaranteed to work, as this third context may need entirely different interception services to properly switch execution contexts. While you could argue that the proxy should be smart enough to work from any context, this would make the proxy implementation inherently more complex and inefficient. Additionally, adding this support to proxies would make the programming model more complex because references to proxies would be treated differently than raw references to real objects. So the bottom line is, don't share object references between contexts without support from COM.
Figure 6 Passing Object References Between Contexts |
To allow object references to be passed from one context to another, COM provides two APIs, CoMarshalInterface and CoUnmarshalInterface, that translate context-relative object references to and from context-neutral byte streams. These byte streams can be passed freely to any other context. In general, application programmers never call these APIs. Rather, these routines are called automatically by CoCreateInstance when an object is created to translate the raw object reference of the new object into a proxy suitable for use in the activator's context (see Figure 6). To make component composition easy, whenever a proxy sees an object reference as a method parameter, the proxy will marshal the object reference to ensure that the proper references are always used (see Figure 7). It is only when object references are shared outside the scope of a method call that care must be taken to explicitly marshal and unmarshal object references between contexts, either using CoUnMarshalInterface or by using more exotic techniques such as the Global Interface Table (GIT), a facility of the COM library that translates context-relative object references to and from context-neutral DWORDs.
Figure 7 Marshaling Object References |
The thought of all of these proxies may sound very expensive, especially when you look at the cost of using in-process proxies under Windows NT 4.0. The primary cost of Windows NT 4.0 proxies (I call them heavyweight proxies here) is due to the OS thread switch needed to cross apartment boundaries. The serialization of the call stack can also impact call performance under Windows NT 4.0, but the dominant cost is still the thread switch. The cross-context proxies that are planned to be used by Windows 2000 do not mandate a thread switch or call stack serialization. Rather, all that a proxy will need to do to cross a context boundary is to run whatever interception services that particular boundary crossing requires. If object references are passed as method parameters, the proxy must marshal them across the context boundary; otherwise the stack frame can simply be shared across the context boundary. In general, the smaller the delta between the caller's and the object's context, the lower the performance costs.
Despite the fact that cross-context calling is considerably cheaper than cross-apartment calling under Windows NT 4.0, it is possible to configure a class to always activate in its creator's context. This is useful for utility components that want to run in their creator's context and don't require configured services of their own. If for some reason the creator's context has been configured in such a way that it can't support the new object, CoCreateInstance will fail and return CO_E_ATTEMPT_TO_CREATE_ OUTSIDE_CLIENT_CONTEXT.
Assuming that CoCreateInstance succeeds, all calls on the new object will be serviced in the context of the creating component (even if references to the new object are passed to other contexts). This, by the way, is astonishingly similar to creating an instance of a nonconfigured class. The primary difference is that for nonconfigured classes, COM may wind up using a secondary context due to issues such as incompatible ThreadingModel settings, which results in a proxy being returned to the creator rather than the CO_ E_ATTEMPT_TO_CREATE_OUTSIDE_CLIENT_CONTEXT error.
Figure 8 Context Neutrality |
If an object always wants to execute in its caller's context, even when its references are passed to other contexts, it must indicate (at runtime) that it is context-neutral by aggregating the freethreaded marshaler (FTM) returned by CoCreateFreeThreadedMarshaler. Figure 8 shows the effects of becoming context-neutral. Note that no context ever gets a proxy to the context-neutral object, even when references to the object are marshaled across context boundaries. One motivation for never wanting to have a proxy or interception is for performance. However, a more likely reason is that the component needs to perform some utility service that requires direct access to the caller's environment (such as a component that directly accesses the caller's transaction), even when passed from context to context.
Contexts and Apartments
At this point, veteran COM programmers are probably wondering if contexts replace apartments. The answer is both yes and no. The yes part of the answer is that contexts replace apartments as the innermost execution scope of an object. Object references are now context-relativenot just apartment-relativeand the CoMarshalInterface APIs and the GIT are now used for cross-context work, not just cross-apartment work. All of this doesn't mean that apartments will disappear. The role of apartments should be somewhat diminished in the programming model, but they will in fact still exist.
Figure 9 Apartments, Contexts, and Processes |
Under Windows 2000, an apartment is intended to be a grouping of contexts in a process. As Figure 9 shows, a process can contain multiple apartments, and an apartment contains one or more contexts. A context resides in exactly one apartment, and an apartment resides in exactly one process. The primary role of apartments is to determine which threads in the process are allowed to dispatch calls in a particular context.
Similar to Windows NT 4.0, threads must first call CoInitializeEx prior to using COM. Prior to calling CoInitializeEx, the thread exists outside of all apartments and contexts and cannot use COM. After a thread calls CoInitializeEx, it enters the default context of a particular apartment. The default context is the context that has classic COM semantics (no transaction or no JIT activation, for example) and is primarily used to hold instances of nonconfigured classes created from other apartments. By definition, configured components (that is, classes that have extended attributes) never run in the default context of their apartment.
The algorithm for determining which apartment an object is created in hasn't changed since Windows NT 4.0. CoCreateInstance inspects the ThreadingModel attribute of the target class to determine which apartment to create the object in. If the threading model is compatible with the creator's apartment, the new object is created in the creator's apartment. Otherwise, COM creates the object in an apartment of the appropriate type. For same-apartment activation calls, COM will try to create the new object in the creator's context unless the target class's configuration prohibits it.
For cross-apartment activation calls, COM will create the new object in a properly configured context in the target apartment (using the default context for nonconfigured classes). This means that the ThreadingModel attribute as well as the extended attributes are consulted to determine whether the activator will get a proxy or a raw object reference. If the ThreadingModel and all extended attributes are compatible with the activator, a raw reference is returned. If only the ThreadingModel is compatible with the activator, a cross-context proxy will be returned that does not need to perform a thread switch. If the threading model is incompatible, then a cross-apartment/context proxy will be returned regardless of any other extended attributes. (Remember, a context cannot span multiple apartments.)
Recall that CoInitializeEx accepts a parameter that indicates which type of apartment the thread would like to call home. Passing COINIT_MULTITHREADED tells COM to put the thread in the lone multithreaded apartment of the process. Passing the unfortunately named COINIT_ APARTMENTTHREADED flag tells COM to put the thread into a new STA that no other threads can ever enter.
STAs are designed for user interface code and rely on the windows message queue to process incoming calls. To ensure that deadlock does not occur when calling out of the STA, COM runs a Windows message pump while waiting for an outbound call to return, allowing incoming calls to be processed as well as allowing core window messages (such as WM_NCACTIVATE) to be handled. Additionally, STA-based threads cannot perform blocking operations (WaitForSingleObject, for example) without servicing the message pump periodically to avoid deadlock.
Both MTAs and STAs bind a set of threads to a set of contexts. For the MTA, the set of threads is all threads that called CoInitializeEx(COINIT_MULTITHREADED). For an STA, the set of threads consists of the lone thread that called CoInitializeEx(COINIT_ APARTMENTTHREADED) to create the apartment. Because threads in a process are partitioned into apartments, there was no simple way under Windows NT 4.0 to indicate that a component could be freely accessed from any thread in the process, independent of the apartment the thread belonged to. This should change with Windows 2000.
Figure 10 Thread-neutral Apartments |
Microsoft plans to introduce a third apartment type in Windows 2000, the thread-neutral apartment (TNA). As shown in Figure 10, each process has at most one TNA. Classes indicate that they want to run in the TNA using the ThreadingModel=Neutral setting in the registry. Unlike the MTA and STA, no threads call the TNA home. Rather, when a thread is executing in the MTA or an STA creates a ThreadingModel=Neutral object, it gets back a lightweight proxy that switches to the object's context without causing a thread switch. In fact, no thread in the process ever needs to perform a thread switch when entering a context in the TNA. As of Windows 2000, ThreadingModel=Neutral is expected to be the preferred setting for components that have no user interface. (User interface components should still be marked ThreadingModel=Apartment due to the thread affinity of window handles.)
At first, developers often confuse the thread-neutral apartment with the freethreaded marshaler. From the 10,000 foot view they seem similar because both ensure that all incoming calls are serviced by the caller's thread. The fundamental difference is that TNA-based objects are still apartment (and context) relative. That is, they belong to a context and can hold context-relative resources such as object references. In contrast, FTM-based objects are context-neutral and have no context of their own (they use their caller's context). FTM-based objects cannot hold context-relative resources across method calls. In general, the FTM is for very specific uses and the thread-neutral apartment is preferable in almost all cases.
Figure 11 The ThreadingModel Decoder Ring |
Given this new apartment type, you may be wondering how the interpretation of the ThreadingModel setting will change. Figure 11 shows which apartment the new object will reside in for all possible situations. Note that ThreadingModel=Both really means "create me in my activator's apartment." Just like under Windows NT 4.0, this is substantially different than being apartment-neutral (or context-neutral). ThreadingModel=Both simply means that the new object should be initialized in the activator's apartment, with all subsequent method calls being serviced there as well. Using the FTM to indicate apartment/context neutrality has a very different meaning. The FTM indicates that the object should straddle the apartment and context boundaries of the process and that each method call should be executed in the context and apartment that issued the call. While ThreadingModel=Both and the FTM are often used in tandem, they solve very different problems and can be used independently of one another.
Contexts and Activities
Earlier, I indicated that the preferred ThreadingModel setting for nonvisual components is ThreadingModel=Neutral. This doesn't mean that component developers are now going to manage concurrency themselves. It means that as of Windows 2000, you won't use apartments to manage concurrency; rather, apartments will be simply used to associate particular threads with particular contexts. No more, no less.
Under Windows 2000, objects indicate that they need synchronized access using the Synchronization extended attribute. The Synchronization attribute is largely independent of the ThreadingModel setting, although some combinations of the two are incompatible. The ThreadingModel setting indicates which threads in a process can dispatch calls to the object. The Synchronization attribute controls when these calls can be dispatched. By coupling Synchronization=Required with the ThreadingModel=Neutral setting, you can achieve the mythical rental/worker/hotel model where any thread can call into the object, but only one thread at a time. To grasp how the Synchronization attribute affects the programming model, you have to look at the underlying abstraction that this attribute controls. This abstraction is called an activity and is inherited from MTS.
An activity is a collection of one or more contexts that share concurrency characteristics. Activities really serve two roles in COM. In their more obvious role, activities are used to enforce call serialization, which I'll describe shortly. More subtly, activities are also used as hints to help the COM thread allocator effectively manage its thread pool. Because activities can span multiple contexts and objects, they are far more powerful than the standard locking primitives such as critical sections and mutexes used by most developers working in multithreaded environments today.
Figure 12 Activity Examples |
As shown in Figure 12, every context in a process belongs to at most one activity, but many contexts do not belong to an activity at all. Contexts that do not belong to an activity (such as the default context of an apartment) get no intrinsic call serialization; that is, any thread in the context's apartment can enter the context at any time. Of course, if the context's apartment is an STA, then no more than one thread will ever access the context. But if the context's apartment is the MTA or the TNA, then objects in the context had better be prepared for concurrent access. An activity is very different from an apartment because activities can span contexts from multiple processes and hosts, although as you'll soon see, COM weakens the concurrency guarantees somewhat when an activity spans two or more processes.
Figure 13 shows how the Synchronization attribute affects which activity a new object will live in (if any). In general, the new object will reside in either the creator's activity, a new activity, or no activity at all. Be aware that not all Synchronization settings are supported in all situations. In particular, classes marked ThreadingModel=Apartment require an activity and must be marked Synchronization=Required or Requires New. Also, classes marked as supporting JIT activation or transactions require an activity as well.
COM does its best to ensure that concurrency will not occur within an activity. In particular, COM allocates a processwide lock for each activity. When a proxy tries to enter a context in an activity, the proxy's interceptor first attempts to acquire the lock prior to dispatching the call to the object. If someone else already holds the lock, the call is blocked until the lock is released. If the lock is not held by anyone (or is already held by the current caller), the interceptor acquires (or reacquires) the lock and allows the call to be processed. Once the method call returns control to the interceptor, the lock is released, potentially allowing the next caller to enter the activity. This has the effect of serializing access to all objects within an activity on a process-by-process basis.
Activities use the COM notion of causality to prevent deadlock. A COM causality is simply a chain of nested method calls in an object hierarchy. Consider the case where method A calls method B, which then calls method C, which calls method D. The invocation of A triggered the invocation of B, C, and D as nested method invocations. In COM, we say that all four invocations belong to the same causality and are related. COM tracks the causality automatically by tagging all cross-context method calls with a causality ID that follows a chain of calls from object to objecteven across host machines.
The causality begins when a thread calls through a proxy while running outside the scope of a method. The causality ends when the call chain returns control to where the causality began. Due to the synchronous nature of method invocation, a causality has a single logical thread of control throughout the network, despite the fact that several physical threads may be used to service the calls.
Causality is used to determine what to do when an object in an activity calls out to another object outside of the activity or process. In general, you don't want concurrency within an activity at any time. This means that one causality at most should be allowed to execute within an activity anywhere in the network. Given this goal, what should happen when an object calls out of an activity? Should the lock be released? The answer is no, with a caveat.
Here's how it works. Consider what happens when an object calls out of an activity or process. As you know, the object is blocked, waiting for the response. If, while waiting for the response, an incoming call arrives that is part of the same causality that is waiting for the response, COM must allow the call to be serviced. Otherwise, deadlock will certainly ensue. If, however, the incoming call is from a different causality, then COM correctly blocks its entrance to the activity, since allowing it to come in and execute the method would surely result in concurrent execution within the activity. (Remember, you were blocked waiting for a response from some other object that is hopefully executing while you wait.)
This causality/activity-aware concurrency management scheme is extremely difficult to implement yourself without tons of low-level goo that most developers can't easily access. While I've discussed some slightly esoteric COM concepts, the bottom line is that activity-based concurrency control works exactly the way you expect, virtually always.
The attorney types reading this article have probably noticed my use of the weasel word "virtually." I use the word virtually because there is at least one concurrency problem that activities do not solve. Consider the case where thread X calls into object A and thread Y calls into object B, where A and B are in the same activity. Ideally, either X's or Y's call will gain access first, blocking the other thread's request until the first call is completely processed (including any nested calls). Based on my previous description, if A and B are in the same process this is exactly what will happen.
If objects A and B are in two different processes, then it is entirely possible that X's and Y's requests will execute concurrently since there is no cross-process lock. Worse yet, if objects A and B were to call one another, deadlock would almost certainly ensue since X and Y represent two distinct causalities and would not be considered nested calls. This is one reason why objects are conventionally not shared by multiple clients. Rather, MTS and COM+ applications are typically based on private objects that access shared state protected by transactions.
Contexts, Activities, and Transaction Streams
Although activities are fairly effective for controlling concurrency within a process, they have some severe limitations when it comes to controlling concurrency across process/host boundaries. To deal with these limitations, COM provides an additional abstraction for controlling concurrency across the process and host boundaries: transactions and transaction streams.
Like causalities, transactions are temporal, and like activities, transaction streams are spatial. A transaction is a collection of operations in time that are protected by the ACID properties (atomicity, consistency, isolation, and durability). COM relies on the Distributed Transaction Coordinator (DTC) to manage transactions. A lot has been written about the concept of transactions, including several solid texts. (See Principles of Transaction Processing by Bernstein and Newcomer (Morgan Kaufman, 1997) for a great introduction.) However, very little has been written about transaction streams since their introduction in MTS 1.0.
Figure 14 Transaction Streams |
A transaction stream is a collection of one or more contexts in space that share a transaction. As shown in Figure 14, a transaction stream is completely contained inside an activity, but an activity can contain more than one transaction stream. All objects within a transaction stream share access to a single transaction in time. Because transactions are transient, COM will automatically start a new transaction if the stream's previous transaction has ended. Objects can access their transaction using IObjectContextInfo::GetTransaction to support resources that require manual transaction enlistment. More importantly, transaction-aware plumbing (such as ODBC, OLE DB and the Microsoft Message Queue) can access the context's transaction automatically when an object tries to use a transactional resource (a database or a message queue). This auto-enlistment feature is fundamental to the declarative style of programming that is slated to permeate COM under Windows 2000.
While COM itself does little with the transaction other than make it available to transactional objects and the plumbing they access, the DTC handles the coordination of transaction outcome. Beyond the obvious atomicity characteristics related to rollbacks, the DTC implies a lock management strategy based on transaction-affinity locks. This means that when a lock is held in a transactional resource manager (such as a database), it can be reentered from anywhere in the transaction stream since all objects in a transaction stream share a single DTC transaction. Additionally, transactional resource managers typically use a two-phase locking strategy that holds all state consistent until the transaction ends. This style of lock management is extremely difficult to implement without the help of a transaction manager like DTC.
Unlike activities, not all objects in a transaction stream are created equal. In particular, the first context in a transaction stream, the root of the stream, plays a special role. The root context is always a private context that can support only one object at a time. This object is often referred to as the root of the transaction. This root object's lifecycle is intimately tied to the lifecycle of the current transaction in the stream. In particular, when the root object deactivates, COM tries to end the transaction.
When an object is disconnected from its client, you say that it is deactivated. In general, objects deactivate when the client releases its last reference to the object. However, any object that supports JIT activation can hasten its deactivation by calling IContextState::SetDeactivateOnReturn on its context object. This method sets or clears the context's "done" bit, which, when set, tells COM to disconnect the current instance from the client as soon as no method calls are in progress. The next time the client issues a call through the now-disconnected proxy, COM will silently attach another (new) instance of the same class to service the call.
The combination of JIT activation and transactions is fairly interesting. When the root object of a transaction stream calls SetDeactivateOnReturn (TRUE), it is indicating that it wants the current transaction to end. Once this happens, COM will allocate a new transaction for the stream the next time a call arrives for any object in the transaction stream. Remember that the only way this second transaction will end is if an object in the root context deactivates. This means that someone needs to call a method on the root during the second transaction to trigger the deactivation/commit. It is important to note that the root context of the stream is established at object creation time, not method invocation time. This means that clients must remember which objects are roots to guarantee that subsequent transactions in the stream will commit in a timely manner.
Any object in a stream (root or non-root) can prevent a transaction from committing by clearing its context's "happy" bit. Each context in a transaction stream keeps track of whether or not its objects are happy with the current state of the transaction. An object can set or clear this bit using IContextState::SetMyTransactionVote. A transaction can only commit if all contexts in the transaction stream are happy. When the root object deactivates, COM consults all contexts in the transaction stream. If one or more contexts' happy bit is clear, the transaction will be aborted and any operations that are protected by the transaction will be rolled back. Note that the happy bit is only consulted when the root deactivates, so it is common to have a method leave the context unhappy, knowing that in a subsequent method call the happy bit can be set to allow the changes in the transaction to roll forward.
One interesting exception to this rule is when an object returns control with the happy bit clear and the done bit set. This tells COM that the object has detected an error it cannot recover from, and dooms the current transaction to failure. Note that both of these IContextState methods simply manipulate two bits in Thread Local Storage (TLS). These bits are not consulted until your method returns control to COM, which means that only the last call you make inside a method call counts.
Classes control how they relate to transaction streams using extended attributes. As shown in Figure 15, a new object will reside in either the creator's transaction stream, a new transaction stream, or no transaction stream at all. Also noteworthy is that marking a class TRANSACTION_ REQUIRES_NEW results in an object that is always the root of a transaction stream (and therefore should call SetDeactivateOnReturn to hasten transaction closure). Marking a class TRANSACTION_SUPPORTED results in an object that is never the root of a transaction stream (and therefore has little reason to call SetDeactivateOnReturn, at least with respect to transaction time). Marking a class TRANSACTION_REQUIRED results in an object that may or may not be the root of a transaction stream.
Finally, note that for any two transaction streams, the outcomes of their transactions are completely independent. Despite what your intuition may lead you to believe, the second transaction implied by TRANSACTION_REQUIRES_NEW is not a nested transaction. Rather, it is a completely independent transaction that is in no way coupled to the creator's transaction (unless, of course, the first transaction notes the outcome of the second transaction and cascades the failure of the second transaction by explicitly calling SetMyTransactionVote).
MTS does not allow transactional objects to know the outcome of their transactions. The same should be true in Windows 2000, where objects give passive consent via the happy bit, secure in the knowledge that either all or none of the transaction stream's work will be accepted. Because they have no idea of the success or failure of their transaction, all objects in a transaction stream are deactivated at transaction boundaries to avoid any violation of transaction isolation. This is done for efficiency and to simplify the programming model. There are times when a transactional object would like to have additional code execute at the end of the transaction, either to influence the outcome of the transaction and/or to perform a compensating transaction in the case of transaction failure. Windows 2000 will add support for user-defined compensators (sometimes referred to as Compensating Resource Managers or CRMs) to make this considerably easier than under MTS.
A compensator is a nontransactional object that is used by the system to represent a transactional component at the end of the transaction. Transactional objects that want to use a compensator use a system-provided log manager called the CRM Clerk. The Clerk records the user-specified CLSID of the desired compensator as well as a variable number of BLOBs that the transactional object writes to the log during its normal operation. When the transaction ends, the system creates an instance of the user-specified compensator class.
The system then uses the compensator's ICrmCompensator interface to carry out the transactional object's desires during the two-phase commit. This means that compensators (like any resource manager) can abort the transaction by failing during the prepare phase. This also means that the compensator (like any resource manager) gets notified of the eventual outcome of the transaction in phase two. In the case of a successful commit, the compensator typically releases any resources it had held. In the case of an aborted transaction, the compensator can issue a second compensating transaction to undo the work done by the transactional object (or by the compensator during the prepare phase). The management of the log as well as reenlistment at recovery time is completely managed by the system.
Finally, to give application developers more flexibility, Windows 2000 is slated to support a feature known as Bring Your Own Transaction (BYOT). BYOT allows an application-controlled transaction to be associated with a transaction stream. An interesting use of BYOT is to manually create a DTC transaction with an arbitrarily long timeout and associate new transactional objects with the long-lived transaction. While this can have a devastating effect on throughput if used carelessly, it does solve the problem of having a single transaction timeout setting for an entire machine. It is also possible to use BYOT to run transactional objects at lower isolation levels than serializable, another relatively tough task to accomplish under MTS.
Summary
Windows 2000 further refines the COM programming model. While Microsoft plans to ship tons of new and sexy features with this release, the most profound change in store for the programming model is the integration of component configuration and interception into the core model and runtime. Contexts replace apartments as the innermost execution scope of an object. Contexts are used to segregate incompatible objects from one another as well as to provide an environment for method execution that is consistent with the object's configuration. Additionally, activities and transaction streams replace apartments as the primary concurrency control abstractions in the model, with apartments being downgraded to simply control thread-to-context mappings.
For related information see: Transaction Context Objects in Microsoft Transaction Server at http://msdn.microsoft.com/library/backgrnd/html/msdn_transact.htm. Also check http://msdn.microsoft.com/developer/default.htm for daily updates on developer programs, resources and events. |
From the May 1999 issue of Microsoft Systems Journal
|