How Microsoft Transaction Server Changes the COM Programming Model --MSJ, January 1998

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

January 1998

How Microsoft Transaction Server Changes the COM Programming Model

David Chappell

Microsoft Transaction Server (MTS) isn't magic, but it does let you write simple, COM-based servers that are still powerful and scalable. MTS provides a range of services to those apps, including thread management, efficient object activation policies, and support for transactions.

This article assumes you're familiar with COM

David Chappell (david@chappellassoc.com) is principal of Chappell and Associates, an education and consulting firm in Minneapolis, MN. He is the author of Understanding ActiveX and OLE (Microsoft Press, 1996). His next book, due out in 1998, will describe Windows NT distributed services, including MTS.

The Component Object Model has changed everything. COM's benefits—well-defined interfaces, local and remote transparency, and much more—allow developers to use it in all kinds of software. But while it solves many of the problems developers have faced, it's arguable whether COM makes creating software a whole lot easier. In particular, building powerful COM servers can be daunting.
      Of course, some COM servers are very simple. It's pretty easy, for example, to write a server that never has more than one client at a time, doesn't share information with other servers, and doesn't make changes to persistent data, such as records in a database. This server can be single-threaded and doesn't have to worry about how long it holds database locks. Since it doesn't change anything except its own internal state, this simple server can ignore the possibility of leaving corrupted data behind after an unexpected failure.
      It would be great if all COM-based servers could be as simple as this. But servers this simple often don't fit the bill. Solving real problems frequently requires building servers that handle many simultaneous clients, effectively share data with other servers, and atomically make changes to one or more databases. Writing servers that meet these requirements can get really hard.
      But suppose developers had a way to write simple servers—servers that support only one client at a time, don't need to worry about committing multiple database changes, and so on—yet somehow could magically use them to handle many clients, atomically change multiple databases, and more.
      This is exactly what the Microsoft® Transaction Server (MTS) allows. MTS isn't magic, but it does let you write simple, COM-based servers that are still powerful and scalable. MTS provides a range of services to those applications, including thread management, efficient object activation policies, and support for transactions. By providing all these things automatically, MTS shoulders much of the burden in developing powerful, scalable COM servers. For a general introduction to MTS, see "Microsoft Transaction Server Helps You Write Scalable, Distributed Internet Apps" in the August 1997 issue of MSJ.
      MTS is built entirely on COM. In many ways, it's just an extension to the COM you already know and love. But MTS also brings some changes to the traditional COM programming model. Using MTS effectively requires understanding what those changes are and how you can take advantage of them. The goal of this article is to explain how MTS changes the familiar COM programming model and how you can take advantage of those changes to create powerful applications more easily.
      I know that some of you are thinking, "I don't need no stinkin' MTS—I'll just write my COM servers like I always have." There are a couple of reasons why you should shake this bad attitude. First, MTS will make your life simpler—really—so using its services makes a lot of sense. It's a standard part of Windows NT®, so there are no extra checks to write. Plus, MTS is actually just an extension of COM (and like COM, it's becoming part of the operating system), so the integration between the two will only get stronger over time. In other words, the changes that MTS brings to the traditional COM programming model are probably unavoidable. The time to start understanding them is now.
The Five Rules of MTS Programming
      There are five key changes that MTS makes to the traditional COM programming model, most of them affecting the server. Each one represents a rule you'll need to follow to get the most out of MTS. I'll describe all of them in detail, but let me list them up front. The five rules are:
Servers should call SetComplete as often as possible, indicating that they no longer have state to maintain. Doing this helps servers scale well.

Clients should acquire and hold onto interface pointers, even for COM objects that they won't use until much later. Getting a reference to an object can be costly, but if the object is stateless, holding it is free with MTS.

Servers should get resources such as database connections as late as possible, then give them up as soon as possible. MTS makes acquiring server resources cheap.

Servers should use roles and declarative security. The traditional approach—impersonation and ACLs with an entry for each user—just doesn't scale well.

Servers should use transactions wherever appropriate. Transactions automatically ensure that either all changes made by an application succeed or none of them do.

Calling SetComplete
      What is SetComplete, and why would I want to call it? An MTS application consists of one or more MTS objects, each of which is really an in-process COM object. As shown in Figure 1, these MTS objects are loaded into a process with the MTS Executive, implemented in mtxex.dll. This DLL and the DLLs that provide the code for the application's MTS objects can be loaded into the same process as the client, but this isn't common. Instead, the MTS Executive and the application DLLs (commonly referred to as MTS components) are typically loaded into the simple MTS-provided container mtx.exe.

Figure 1 A Typical MTS Application

Figure 1 A Typical MTS Application

      A client accesses the MTS objects like any other COM object—MTS is invisible to clients, and there's no MTS-specific code on the client—but the client's calls are actually intercepted by MTS before being passed on to their destination object. Because of this, I'll typically show the MTS Executive as sitting between the client and the MTS objects that client is using. And while it's possible for an MTS server and its client to run on the same machine, it's more common for them to be on different machines, with the client accessing MTS objects via DCOM.
      To create an MTS object, a client can call CoCreateInstance (from C++) or CreateObject (from Visual Basic®) or some other COM creation function, just as with any COM object. MTS catches this request, however, and does a few special things with it. The most important of these is that, for every MTS object it creates, MTS also creates an associated context object. This context object is (of course) just another COM object, one that supports the IObjectContext interface. The methods in this interface are the primary means that an MTS object has to communicate with the MTS Executive. To call those methods, an MTS object must acquire a pointer to the IObjectContext interface of its context object. Doing this is easy: the MTS object just calls the MTS-supplied API GetObjectContext, as shown in Figure 2. This function returns a pointer to the IObjectContext interface of the appropriate context object.

Figure 2 Getting IObjectContext Pointer

Figure 2 Getting IObjectContext Pointer

      IObjectContext contains several methods, the most important of which are SetComplete and SetAbort. When an MTS object calls either of these, it's telling the MTS Executive a couple of different things. First, if the object is part of a transaction, this object's work is ready to be committed (if SetComplete was called) or rolled back (if SetAbort was called). I'll say more about transactions later, but whether or not transactions are in use, by calling either of these methods the MTS object is telling the MTS Executive that the object no longer has any state to maintain.
      Think about this for a minute. Having no state to maintain means that the object really doesn't need to exist any more—you can just conjure up a new instance when it's needed again. Getting rid of objects that aren't needed right now means that the process those objects are in can take up less memory. For servers that must scale to handle hundreds or thousands of clients, this is a good thing. But at the same time, you don't want each object's client to know about its demise—the client should remain blissfully unaware that its object has gone away. Doing this allows you to build scalable servers while still preserving the appearance of the classic COM programming model for the client.

Figure 3 How MTS Wraps Objects

Figure 3 How MTS Wraps Objects

      To make this possible, the MTS Executive wraps every MTS object as shown in Figure 3. When an MTS object calls SetComplete or SetAbort, MTS releases every interface pointer it holds on this object. Since only MTS actually holds direct pointers to this object's interfaces, this causes the object to self-destruct, just as with any COM object. As shown in Figure 4, the client still holds what it thinks is an interface pointer to the object. In fact, this pointer references the MTS-supplied wrapper. In the jargon of MTS, the object has been deactivated. MTS can now reuse any resources the object maintained, including the memory that was taken up by its data.

Figure 4 A Deactivated MTS Object

Figure 4 A Deactivated MTS Object

      When the client invokes another method on the object (remember, the client doesn't know that it's gone), MTS just uses the appropriate class factory to create another instance of the object, then passes the client's call on to this instance. Known as Just-In-Time (JIT) activation, this causes the situation to return to that shown in Figure 3. Since the object wasn't maintaining any state, this newly created instance is indistinguishable from the one that was destroyed earlier—the client can't tell the difference. Calling SetComplete as often as possible allows MTS to get rid of objects that are no longer needed. Doing this means that your application will scale much better, but it doesn't really affect your clients—MTS shields them from what's going on.
      Some observers have criticized MTS for requiring the creation of stateless objects. Obviously, this criticism isn't correct. Instead, MTS allows developers to manage object state in an intelligent way. The simplest way to understand this is to look at an example. Imagine, for instance, an MTS object that implements the interface IOrderEntry. This interface's three methods—Add, Remove, and Submit—allow the object's client to create, build, and submit orders for products. Here's the psuedocode for this interface's (very simple) Add method:

Add(Item){ If (NoOrderExists) CreateOrder() Lookup(Item) If (Available) AddItemToOrder(Item) DecrementInventory(Item) Else Error() }

If no current order exists, the first call to this method creates one. Next, the database is checked to make sure that the requested item exists. If it does, the item is added to the current order and the current inventory count for this item is decremented in the database. Since this method does not end with a call to either SetComplete or SetAbort, the object isn't deactivated, so its state—the in-memory order being built—is available for subsequent calls.
      Here's the Remove method:

Remove(Item){ If (ItemInOrder) RemoveItemFromOrder(Item) IncrementInventory(Item) Else Error() }

It also is very simple—the object just checks that this item is actually in the current order, then deletes it and increments the inventory count in the database. Once again, the method doesn't end with a call to SetComplete or SetAbort, so the object and its state are maintained.
      When the client calls Submit, things get more interesting:

Submit(){ Context = GetObjectContext() If (EverythingIsOK()) SubmitOrder() Context.SetComplete() Else Context.SetAbort() }

This method begins with a call to the MTS API function GetObjectContext, which returns a pointer to the appropriate context object. The object then submits the completed order—exactly how doesn't matter here. What does matter is that if all went well, this method's final task is to call SetComplete. If anything failed, the method calls SetAbort. I'll talk later about how these calls affect any transaction this object is part of. What's relevant here is that they cause MTS to call Release on every interface pointer it holds on the object. The object and any state that it is maintaining go away. The next time its client calls Add, MTS will transparently create a new instance of the object.
      Implementing IOrderEntry in this way is entirely plausible, and an application built like this would scale reasonably well. A client could use what appeared to be the same object to create and submit many orders—it wouldn't need to create a new one each time—but the server would maintain an object only when an order was in progress. An even more scalable design, though, would be for the object to call SetComplete at the end of every method, not just when the order was submitted. For this to work, the Add and Remove methods would need to read the current order (if there was one) from disk, perform their functions, then save the order back to disk at the end of each call. Once the order was safely on persistent storage, each method could end with a call to SetComplete.
      Building the object in this way would allow MTS to deactivate it after every call, making it even more scalable. The trade-off, of course, is the extra disk accesses required in Add and Remove. There are cases, though, where the increase in scalability matters more than the performance hit. It may even be desirable to save the order persistently after each change. Microsoft Merchant Server, for example, does exactly this, allowing a Web-based client to begin creating an order, lose and reestablish its network connection, then pick up where it left off.
      The important point is that MTS allows both options. When designing MTS-based servers (something you will do—trust me), you must determine how to manage your state. The rule that MTS encourages is simple: call SetComplete as often as possible.
Acquiring and Holding Interface Pointers
      This rule is a reasonably obvious corollary to the one just described. In a bare COM or DCOM application, it may not be a great idea for a client to create a COM object and hold onto the object's interface pointers if it has no intention of using the object for a while. Doing this requires a traditional COM server to maintain that object in memory even though it's not being used. But with MTS, holding onto references to a stateless object is free—MTS deletes the object anyway as soon as it has no state to maintain. While acquiring interface pointers to objects is still a relatively expensive operation, holding onto them is not.
      Although most of the changes MTS brings to COM programming are on the server side, this is a good place to think about how MTS affects clients. I said earlier that MTS is invisible to clients, which is for the most part true. There is a subtle way, though, in which the semantics of MTS objects leak through to their clients. Imagine a Visual Basic client that's setting various properties in an MTS object. The client may think it's working with an ordinary COM object—all the language syntax is exactly the same. But suppose that after setting several properties in the object the client invokes a method in the MTS object that calls SetComplete. If this happens, the object will be deactivated, as described above, losing all of its state. If the client isn't aware this has occurred, it may be very surprised. The client might, for example, attempt to get the value of some property it has just set in this object, only to discover that that value has been reset to its default. If the client doesn't know which methods in an object's interfaces call SetComplete, it might be confused by the object's behavior. This in turn implies something else: the client must know that this is an MTS object, not just a vanilla COM object.
      And that's not all. Remember COM's rule that making any change to an interface requires giving it a new interface ID (IID), a new GUID? We all know that, for example, adding a new method to an existing interface requires defining an entirely new interface with its own IID. But with MTS, it's possible to make changes to an interface's semantics that require a new IID without changing anything in the interface definition itself. For instance, in the IOrderEntry example just described, there were two possible ways of implementing the interface's methods: calling SetComplete only at the end of the Submit method or calling it at the end of every method. The definition of IOrderEntry itself would be exactly the same in both, but distinct IIDs would be required for each of these two cases.
      Changing only which methods call SetComplete also changes the behavior of that interface as seen by a client. If this is done, that new interface must be assigned a new IID, even though it is syntactically identical to the earlier interface. It might be nice if there were some Interface Definition Language (IDL) keyword to indicate which methods call SetComplete, thus providing an obvious marker in the interface definition of what has changed. There isn't, though, so you just have to understand what's going on and follow the rules.
Servers and Resources
      The last rule allowed clients to be lazy—just getting whatever interface pointers are needed, then letting MTS worry about efficiently managing the objects those pointers refer to. By contrast, this rule requires servers to be better citizens than they have been up to now.
      Traditionally, acquiring a server resource such as an ODBC connection to a database has been an expensive operation. Accordingly, developers would write COM objects to acquire a resource early, then hold onto it for the life of the object. While this made the application run faster—it didn't need to incur the cost of acquiring the resource over and over—it also didn't scale very well. ODBC connections are a finite resource on a machine, and if every object acquires and holds onto its own for the object's entire lifetime, there might not be enough to go around. But the alternative—acquiring, using, and then freeing a connection every time it was needed—has historically been too slow.
      By defining the notion of a resource dispenser, MTS fixes this. A resource dispenser provides an efficient way of acquiring and freeing shared, non-persistent resources on a system. The most important resource dispenser in the MTS world (today, at least) is the one that allocates ODBC connections. Implemented by the ODBC 3.0 driver manager, this resource dispenser maintains a pool of available ODBC connections—and since the newer ADO and OLE DB interfaces typically use ODBC to access relational databases, everything described here applies to their clients, too. When an MTS object requests a connection to the database, the resource dispenser hands it one from this pool. When the object releases the connection, the resource dispenser returns it to the pool. All of this is transparent to the ODBC client—it makes the same calls as always.
      Because connections to the database aren't really allocated and freed, acquiring and releasing them is much faster. An MTS object can afford to ask for a connection only when it needs one, use it, then immediately give it back. The performance penalty of doing this is greatly reduced by the resource dispenser's connection caching. An object might, for example, request a database connection at the beginning of each method, use it, and then release it. Writing objects that behave in this way allows those objects to scale much better since they can more effectively share scarce resources.
Server Roles and Declarative Security
      When a client invokes a method in a COM object, that object may need to verify that the client has the right to execute that method. If the method accesses some other resource, such as a file, the object may also need to verify that the client has the right to use that resource in the desired way. COM and DCOM servers can use the methods in IServerSecurity to learn about the client, then do whatever is necessary to determine whether the client is authorized to carry out the requested function. One common technique is to have the server thread that's handling this client request impersonate the client, then try to access the requested resource. Assuming the server operating system is Windows NT, the resource's ACL will be checked automatically, saving the programmer the trouble of doing it herself. If the access works, great. If not, the call can be rejected.
      Does anybody really like this approach to authorization? I'm inclined to doubt it. This approach is complicated to get right, and it doesn't scale well. The creators of MTS were among the non-fans of this approach, so they provided an alternative way to let MTS objects make authorization decisions. It's called declarative security, and it's much simpler to use. (And for the hardcore among us, you can still use more detailed mechanisms if desired, an approach that's now known as imperative or programmatic security.)
      Declarative security depends on the idea of roles. A role is just a collection of Windows NT users and/or groups assigned a particular character-string name. (Role names are unique only within a defined collection of MTS components known as a package—they aren't globally unique.) In a banking application, for example, some obvious roles might be Teller, Manager, and Loan Officer. The MTS administrator defines which users and groups are associated with each role by using the all-purpose MTS administrative tool, the MTS Explorer.
      Once roles have been defined, the administrator can also use the MTS Explorer to specify exactly which roles are allowed to access each MTS component. It's even possible to assign role-based permissions on a per-interface basis. For instance, you can allow access to an interface on some object to Managers but not Tellers. When an MTS object is executing, MTS itself will check every incoming call on every interface, determining what role the caller is in. If the caller is not in a role that has access to this object or interface, the call is rejected. Otherwise, the call is passed through to the object.
      Notice what the object itself must do to make this work: nothing. You don't need any code for security in an MTS object that's using declarative security. Once an administrator has set things up, MTS itself blocks unauthorized calls.
      Of course, there are some things that you just can't do this way. Suppose my object wants to allow Tellers to make transfers under $10,000, but let Managers make transfers for any amount. If I want the same code to handle both cases, I can't rely only on declarative security. Instead, I'll need to include code to check what role the client is in, then make an appropriate decision. To support this, the IObjectContext interface (the same one that includes the SetComplete and SetAbort methods) provides a method called IsCallerInRole. Using this, I can make MTS check what role the current caller is in. If the caller is a Manager, I can allow a transfer of any size. If the caller is a Teller, however, I can reject transfers over $10,000.
      Roles are much simpler to use than per-user authorization checks. They're not the only option when using MTS, but whenever feasible, they're the best choice.
Server Transactions
      It's probably fair to say that "Microsoft Transaction Server" is a bit of a misnomer, since much of what MTS provides has nothing to do with transactions. Instead, a large part of the MTS functionality focuses on making it easier to build scalable COM servers. But support for transactions is a requirement for many kinds of servers, and it's certainly an important part of what MTS offers.
      What exactly does it mean to "use transactions"? In general, a transaction allows you to group together two or more units of work into a single, indivisible unit. For example, in the order entry scenario described earlier, a record in the inventory database must be changed for each item the client requests. For a client ordering several items, several records will be changed. If the order is submitted, all of those changes must be made permanent. If the client cancels the order, all of those changes must be rolled back, leaving things just as if the order had never existed. Allowing some of the order's changes to persist while others do not would leave the data in an inconsistent state. Rather than worrying about this itself, an application can rely on MTS to ensure that either all the changes occur or none of them do.
      But wait—can't databases do this themselves? Of course they can. If you're using ODBC to access a database, for example, you can use ODBC calls to start a transaction, make your changes, then commit or roll back those changes. The DBMS will make sure that either all of the changes happen or none of them do. But what if you're making changes to more than one database and you want all of those changes to be part of a single, atomic transaction? Those ODBC calls for transactions are no help here, since they only work on a single database. And even if your object makes changes to only one database, you'll be better off letting MTS take care of transactions for you rather than relying on the database directly (and I'll explain why shortly).
      So when should you use transactions? It's easy: if your COM object needs to make two or more changes, such as updates in a database, and you want all of those changes to either succeed or fail—partial success isn't allowed—use transactions. If your object doesn't need to do this, then you'll still probably want to use MTS to write your server (it makes your life easier), but you won't need to use its transaction features.
      Using transactions in MTS turns out to be surprisingly easy. When it's installed, an MTS component can have its transaction attribute set appropriately. You (or an administrator) can do this using the MTS Explorer. As shown in Figure 5, the transaction attribute is set using a simple dialog box. Choosing "Requires a transaction" or "Requires a new transaction" will guarantee that all data accesses made by this component will be committed or rolled back as a unit (and in some cases, "Supports transactions" will have the same effect). That's all it takes—you don't have to write any extra code to handle transactions. (To see what really goes on, though, see the "How Transactions Work" sidebar).

Figure 5 Configuring Transaction Requirements

Figure 5 Configuring Transaction Requirements

      The way an MTS developer deals with transactions is simple. It's also quite novel, since it merges the idea of transactions with existing notions of components. Unlike traditional transaction systems where a client makes explicit calls to begin and end transactions, MTS hides transaction boundaries from clients. This is another example of how MTS strives to preserve the traditional client semantics of COM, even when transactions are being used. Even more atypically, MTS hides transaction boundaries from the MTS objects themselves.
      The benefit of designing things this way is that the same component can run in its own transaction or be combined with others into a larger transaction. If each component made its own BeginTransaction and EndTransaction calls, this wouldn't be possible. By allowing MTS to automatically create transactions when required, it's possible to combine components in various ways. This allows you to create transactional applications from objects that were written by different organizations at different times, yet can still work together.
      For example, imagine that I have a component that performs order entry functions, like the one described earlier, and another component that knows how to transfer money between two different bank accounts. Each component is useful on its own, and each requires a transaction (otherwise, the changes the component makes might wind up only partially done). But suppose I want to use both components in a single transaction, combining the order entry with actually getting paid for that order. With MTS, doing this is straightforward. Here's how it works.
      Assume that both components have been configured (using the MTS Explorer, as shown in Figure 5) to require a transaction. A client creates an instance of the order entry component using CoCreateInstance. Since MTS intercepts this request, it can determine that this component needs a transaction, so it automatically starts one. Any changes the order entry component makes will be part of this transaction.
      Now suppose that the order entry component creates an instance of the money transfer component (it can do this through a method called CreateInstance in IObjectContext). When MTS loads this component, it again notices that a transaction is required. But since the creator of this component is already part of a transaction, the new instance of the money transfer component automatically becomes part of this existing transaction. When the money transfer component completes its task, it will call either SetComplete or SetAbort, like any other MTS object. MTS takes note of this, but does not end the transaction. Instead, the transaction ends only when the order entry component, the root of the transaction, calls SetComplete or SetAbort. If both components called SetComplete, all changes made by the components will be committed. If either one called SetAbort, all changes will be rolled back. The important point here is that the very same component binary can run in its own transaction or can be grouped with one or more other components into a single transaction. Microsoft calls this feature Automatic Transactions. It allows you to combine the traditional idea of transactions with the much newer notion of components—something that's essential for building component-based servers. It also explains why, even if your component only accesses a single database, you don't want to use the ODBC transaction features when MTS is available. Using ODBC transactions means your component will explicitly start and end a transaction, making it impossible to combine with other components into a single transaction. If you rely on MTS for transactions, your component and other MTS components can be combined in arbitrary ways to create transactions. Using MTS allows you to create transactional components that can be used much more flexibly.
      Using transactions with components also affects how you design those components. To be most useful, each component should encapsulate some discrete chunk of work. Doing this lets you combine that component with others in arbitrary ways. Adding transactions means those useful chunks of work shouldn't span transaction boundaries. For example, think about an application that must submit items of work to a queue, then process those items. It would be entirely possible to create one component that does both tasks, but it might also make sense to handle each of these functions—submitting work and processing it—in separate transactions. If both tasks are implemented in a single component, doing them in separate transactions isn't possible. Think about transaction boundaries before you design your components—the result will be more useful components.

Conclusion
The easiest way to think about MTS is to view it as just an extension to COM. This makes learning to use it easier, since COM has become an ingrained part of development. But MTS also brings a few changes to the traditional way COM programmers have done things. Those changes bring benefits, but change can also be painful. Following the rules described here will help maximize the benefits and minimize the pain.

HOW TRANSACTIONS WORK
It's possible to use MTS transactions without understanding what's really going on. But you may find it helpful to have a mental picture of how the parts fit together. To provide this, I'll walk through the order entry example described in the article (the original, not the stateless variant), this time showing how transactions are accomplished.
      In the order entry program, adding an item to an order requires decrementing the inventory count for that item in a database. To make the example more interesting, let's assume that the inventory count for the items in the sample order is kept in two different databases (although MTS is useful even if only one database is involved). When that order is completed, either both databases should contain the new inventory counts (if the order is submitted) or both should be restored to their original state (if the order is canceled). Configuring the MTS object that implements these functions to use transactions provides an easy way to make sure that these outcomes are the only ones possible.
      Before I launch into the description, there's an important point to make: database management systems (DBMS) that can be used with transaction processing (TP) monitors typically support an X/Open-defined interface called XA. MTS can be used with databases that support XA, and that's the process I'm about to describe. Microsoft has also defined another interface for interaction between a TP monitor and a database called OLE Transactions. SQL Server currently supports this new interface, and it's possible that other DBMSs eventually will, too. The way transactions are performed varies depending on whether XA or OLE Transactions is being used. The scenario described here uses XA.

Figure A Starting a Transaction

Figure A Starting a Transaction

      As shown in Figure A, the process begins when the client calls CoCreateInstance or some other COM creation function to create the MTS object. This call is actually caught by the MTS Executive, which (among other things) checks the transaction attribute for this component. I'm assuming that this attribute has been set so that the component requires a transaction. This fact, along with an identifier for the transaction, is stored in the context object associated with the newly created MTS object. To start the transaction, the MTS Executive contacts the Distributed Transaction Coordinator (DTC), which records the fact that this transaction has begun. The DTC is a Windows NT service running in a separate process, and as you'll see it actually handles the work required to coordinate the transaction (another reason why MTS is somewhat misnamed).

Figure B Adding RM 1 to the Transaction

Figure B Adding RM 1 to the Transaction

      Figure B shows what happens next: the client invokes the Add method, attempting to add ItemX to the order. When Add is invoked for the first time, an in-memory order is created. The Add method then makes an ODBC call to get a connection to the appropriate database. (Transaction-processing people think of a DBMS as just one kind of resource manager, so I've called this database Resource Manager 1.) When an MTS object requests a connection to the database, the ODBC driver queries the object's context object to see whether this object belongs to a transaction. If it does, as in this case, the driver contacts the DTC and enlists this database in the transaction. Finally, the driver informs the database itself that all requests until further notice are part of a transaction, then carries out the object's request.
      This same process occurs for each database (or resource manager) this MTS object accesses. Figure C shows the same steps occurring when the client adds ItemY to the order. ItemY's inventory information is maintained in the database Resource Manager 2, so it must be enlisted in the transaction as well.

Figure C Adding RM 2 to the Transaction

Figure C Adding RM 2 to the Transaction

      Finally, as shown in Figure D, the client indicates that the order is complete by calling Submit. A call to Submit results in the MTS object calling SetComplete (if anything had gone wrong, of course, it would call SetAbort instead). When the context object receives the SetComplete call, it tells the DTC to commit the transaction. The DTC then carries out the two-phase commit interactions needed to atomically commit the transaction. When XA is used, the DTC communicates with the ODBC drivers, and they in turn talk with the databases to commit the transaction. If OLE Transactions is used instead, the DTC communicates directly with the databases involved.

Figure D Committing the Transaction

Figure D Committing the Transaction

      All of the work required to accomplish this is invisible to the MTS object. Once it's configured to require a transaction, MTS (and the DTC) takes over. Even enlisting the databases in the transaction is invisible, since it's done by the ODBC driver. The object on whose behalf all this is being performed can just blithely call SetComplete—MTS does the heavy lifting.

THE THIRD WAVE

Don Box

MTS represents a new programming model for building distributed, object-oriented applications. To understand how MTS extends the traditional object-oriented paradigm, it is useful to retrace the evolution of object-orientation over the past decade.
      Object-oriented software development achieved widespread commercial acceptance in the late 1980s. The hallmark of object-orientation at that time was the use of classes, which allowed developers to model state and behavior as a single unit of abstraction. This bundling of state and behavior helped enforce modularity through the use of encapsulation. In classic object-orientation, objects belonged to classes and clients manipulated objects via class-based references. This is the programming model implied by most C++ and SmallTalk environments and libraries of the era.
      While it had been possible to achieve many of the benefits of classes using procedural-based languages through the use of disciplined programming styles, widespread acceptance of object-oriented software development did not happen until there was explicit support for object-orientation from tool and language vendors. Environments that were pivotal to the success of object-orientation include Apple's MacApp framework based on Object Pascal, the early ParcPlace and Digitalk SmallTalk environments, and Borland's Turbo C++.
      One of the benefits of development environments that support object-orientation was the ability to use polymorphism to treat groups of similar objects as if they were all type-compatible with one another. To support polymorphism, object-orientation introduced the notion of inheritance and dynamic binding, which allowed similar classes to be explicitly grouped into collections of related abstractions.
      Consider the following very simple C++ class hierarchy:

class Dog { public: virtual void Bark(void); }; class Pug : public Dog { public: virtual void Bark(void); }; class Collie : public Dog { public: virtual void Bark(void); };

Because the classes Collie and Pug are both type-compatible with class Dog, clients can write generic code as follows:

void BarkLikeADog(Dog& rdog) { rdog.Bark(); }

Since the Bark method is virtual and bound dynamically, the method-dispatching machinery of C++ ensures that the correct code is executed. This means that the BarkLikeADog function does not rely on the precise type of the referenced object, only that it is type-compatible with Dog. This example could easily be recast in any number of programming languages that support object-oriented programming.
      The class hierarchy I just described typifies the techniques that were practiced during the first wave of object-oriented development. One characteristic that dominated this first wave was the use of implementation inheritance. Implementation inheritance is a powerful programming technique when used in a disciplined fashion. However, when misused, the resultant type hierarchy can exhibit undue coupling between the base class and the derived class. One common aspect of this coupling is that it is often unclear whether the base class implementation of a method must be called from a derived class's version. For example, consider the Pug class's implementation of Bark:

void Pug::Bark(void) { this->BreathIn(); this->ConstrictVocalChords(); this->BreathOut(); }

      What happens if the underlying Dog class's implementation of Bark is not called, as is the case in this code fragment? Perhaps the base class method records the number of times that a particular dog barks for later use. If this is the case, the Pug class has violated that part of the underlying Dog implementation class. To use implementation inheritance properly, a nontrivial amount of internal knowledge is required to maintain the integrity of the underlying base class. This amount of detailed knowledge exceeds the level needed to simply be a client of the base class. For this reason, implementation inheritance is often viewed as white-box reuse.
      One approach to object-orientation that reduces excessive type system coupling while retaining the benefits of polymorphism is to only inherit type signatures, not implementation code. This is the fundamental principle behind interface-based development, which can be viewed as the second wave of object-orientation. Interface-based programming is a refinement of classical object-orientation. It assumes that inheritance is primarily a mechanism for expressing type relationships, not implementation hierarchies. Interface-based development is built on the principle of separation of interface from implementation. In interface-based development, interfaces and implementations are two distinct concepts. Interfaces model the abstract requests that can be made of an object. Implementations model concrete instantiable types that can support one or more interfaces.
      While it had been possible to achieve many of the benefits of interface-based development using traditional first-wave environments and disciplined programming styles, widespread acceptance of interface-based development did not happen until there was explicit support from tool and language vendors. Environments that were pivotal to the success of interface-based development include Microsoft's COM, Iona's Orbix Object Request Broker environment, and Java's explicit support for interface-based development.
      One of the key benefits of using an environment that supported interface-based development was the ability to model the "what" and the "how" of an object as two distinct concepts. Consider the following very simple Java type hierarchy:

interface IDog { public void Bark( ); }; class Pug implements IDog { public void Bark( ) { ... } }; class Collie implements IDog { public void Bark( ) { ... } };

Because the classes Collie and Pug are both type-compatible with interface IDog, clients can write generic code as follows:

void BarkLikeADog(IDog dog) { dog.Bark(); }

From the client's perspective, this type hierarchy is virtually identical to the previous C++-based example. However, because the IDog interface's Bark method cannot have an implementation, there is no coupling between the IDog interface definition and the Pug or Collie classes. While this implies that Pug and Collie each have to fully define their own notion of what it means to bark, the Pug and Collie implementers do not need to wonder what side-effects their derived classes will impose on the underlying IDog base type.
      One striking similarity between the first and second waves is that each could be characterized by a simple concept (class and interface, respectively). In both cases, it was not the concept itself that acted as the catalyst for success, but instead one or more key environments were required to spark the interest of the software development industry at large.
      An interesting aspect of second-wave systems is that the implementation is viewed as a black box—that is, all implementation details are considered opaque to the clients of an object. Often, when developers begin to use interface-based technologies such as COM, the degree of freedom afforded by this opacity is ignored, causing novice developers to view the relationship between interface, implementation, and object fairly simplistically. This often results in designs in which the state of an object winds up being very tightly coupled to the COM aspects of the object (its virtual function pointers). While advanced programming techniques such as tearoffs and flyweights can be used to more effectively manage state in complex object hierarchies, these techniques are not formally part of the COM programming model, but instead have evolved as de facto techniques of the COM development community.
      When applying COM to distributed application development, additional state management issues arise, including distributed error recovery, concurrency management, load balancing, and data consistency. Unfortunately, COM is ignorant of how an object manages its state, so there is little that COM can do to address these issues. While it is possible for developers to concoct their own schemes for dealing with state management, there are clear benefits in having a common infrastructure for developing objects in a state-conscious manner. MTS is one such infrastructure.
      The COM programming model extended the traditional object-oriented programming model by forcing developers to be conscious of the relationship between interface and implementation. The MTS programming model extends the COM programming model by forcing developers to also be conscious of the relationship between state and behavior. The fundamental principle of MTS is that an object may be logically modeled as state and behavior, but its physical implementation needs to distinguish the two explicitly.
      By explicitly allowing MTS to manage the state of an object, the application developer can leverage the infrastructure's support for concurrency and lock management, error recovery, load balancing, and data consistency. This means the majority of an object's state must not be stored contiguously with its virtual function pointers (which represent the object's behavior). Instead, MTS provides facilities for storing object state either in durable or transient storage. This storage is under the control of the MTS runtime environment and can be safely accessed from an object's methods without concern for lock management or data consistency. Object state that must remain consistent in the face of machine failure or abnormal program termination is stored in durable storage, with MTS ensuring that all updates are atomic throughout the network. State that is transient can be stored in MTS-managed memory, with MTS ensuring that memory accesses are serialized to prevent data corruption.
      As with class-based and interface-based development, the state-conscious programming model of MTS requires additional discipline and attention on the part of the developer. Fortunately, as with class-based and interface-based development, the state-conscious model of MTS can be adopted incrementally. Of course, incremental adoption means that the benefits of MTS will be realized incrementally as well. This allows developers to adopt MTS at a pace that is appropriate for the local development culture.
From the January 1998 issue of Microsoft Systems Journal.