House of COM -- Microsoft Systems Journal, January 2000

Don Box is a co-founder of DevelopMentor, a COM think tank that educates the software industry in COM, MTS, and ATL. Don wrote Essential COM and coauthored the follow-up Effective COM (Addison-Wesley). Reach Don at http://www.develop.com/dbox.

This article is adapted from Lessons from the Component Wars: An XML Manifesto, which can be found on MSDN Online.

Over the past two years, XML seems to have overtaken the Java language and Design Patterns as the solution to whatever ills plague the software industry. In this month's column I will discuss the emerging use of XML as a component technology after first taking a look at other more established component technologies such as COM and CORBA.
      As someone who has spent the last six years working with COM, I believe the primary goal of component software is to enable collaboration and cooperation among software development organizations, and the primary function of a component technology is to act as glue between multiple pieces of software. This is true of COM, the Java language, and CORBA. These three technologies provide infrastructure for integrating software components written by independent organizations. From the 10,000-foot view, these technologies look the same. If you look up close, however, each technology uses radically different techniques and programming styles to achieve its goals.

In-memory Interoperation
      Component technology is about interoperation. It is interesting to look at the degree to which each component technology enables interoperation. You can view these degrees of interoperation according to the layering model shown in Figure 1.

Figure 1 Degrees of Interoperation

      Figure 1 Degrees of Interoperation

      Mixing multiple components in memory is easily the most intimate interoperation possible. By standardizing an in-memory representation that all components must adhere to, a component technology can offer extremely high performance. Additionally, having a standardized in-memory representation allows the supporting runtime to offer a wider array of component management services at a substantially lower performance cost.
      COM standardizes the in-memory representation of object references based on simple C++-style virtual function tables. This makes in-process COM easy to support on just about any platform.
      The Java language standardizes the representation of component code, and each Java virtual machine defines its own in-memory representation for objects. The advantage of this approach is that each virtual machine implementor is theoretically free to innovate while still building on a common component format. The disadvantage is that components must run in the same virtual machine to interoperate, which, in the presence of versioning, is not always possible.
      The CORBA specification punts on in-memory representation, as the original goal of CORBA was to provide an object-based Remote Procedure Call (RPC) system.

Source Code Interoperation
      Component technologies often require the developer to program explicitly against an API of some sort. By standardizing an API for accessing component services, a technology can enable source-level interoperation, allowing component source code to be recompiled against another vendor's implementation of the technology (modulo OS-specific system calls like fork or CreateFile).
      COM exposes its services via the COM library and APIs (CoCreateInstance, CoInitializeEx, and so on). A significant subset of these APIs is consistent across platforms (including Windows NT®, Windows® 95, Solaris, and Linux), and this consistency allows COM source code to be recompiled on multiple platforms.
      The CORBA specification defines a set of standard interfaces that an ORB vendor must support to be considered CORBA-compliant. This set of interfaces is considered a bare minimum, and most ORB vendors augment the standard CORBA API with proprietary extensions.
      Most Java-based component services are simply integrated into the language and don't necessarily have an explicit API. This makes the component aspects of the Java language fairly transparent. However, Java critics often complain that you must port all of your software to the Java programming language, in essence tying your entire source code base to Java technology.

Type Information Interoperation
      Components need to be described to be useful to the programmers who will use them. Components also need to be described to the underlying component system to ensure proper integration. All three component technologies provide a standardized method for representing type information for consumption by both the supporting infrastructure and developers.
      CORBA provides a text-based Interface Definition Language (IDL) that allows objects to be described in a language-neutral manner. By defining all publicly accessible data types in IDL, it is possible to access a CORBA object from any programming language that has ORB support. CORBA IDL is required to enable integration with most CORBA products.
      COM also has a text-based IDL that is more or less equivalent to CORBA IDL. (COM IDL supports more data types; CORBA IDL is easier to author and parse.) As Microsoft is quick to point out, COM IDL is optional. However, it is used extensively on most COM-based development projects.
      Both COM and CORBA IDL tend to be good for authoring, but not so good for interoperation and interchange. Due to their text-based nature, IDL-aware tools and infrastructure must parse a fairly rich language that has some tricky grammar rules as well as dependencies on the C preprocessor. To address this problem, COM also provides a binary form of type information called type libraries. Type libraries contain most (but not all) of the information from a COM IDL file in a representation that is easily parsed using the system-provided type library parser (see the LoadTypeLib API for more information).
      Because all Java-language components adhere to a standard, self-describing class file format, no additional type information support is needed. As of Java 1.1, it is possible to traverse a component's public interface using intrinsic reflection services exposed by all Java virtual machines.

Wire Interoperation
      Distributed computing is the Holy Grail of component technology. Components are often viewed as the enabling technology that will make building distributed applications easy. In an attempt to satisfy this goal, component technologies often define new network protocols to allow components to communicate across host machines.
      Because Windows NT leans heavily toward the Open Software Foundation's Distributed Computing Environment (DCE) RPC mechanism, COM employs the DCE RPC protocol for framing and transport and uses Network Data Representation (NDR) for parameter encoding. The Distributed COM (DCOM) protocol simply defines a handful of DCE RPC interfaces that are used for object activation, type coercion, and lifecycle management. In essence, DCOM is just another DCE RPC application.
      CORBA supports a variety of protocols, with Internet Inter-ORB Protocol (IIOP) being the most common protocol for interoperation. IIOP layers simple framing and conversation management over TCP and uses the Common Data Representation (CDR) for parameter encoding. The Java language supports both IIOP and CDR as well as the native Remote Method Invocation (RMI) protocol, JRMP. JRMP is based loosely on the Java serialization format and can work over vanilla TCP or HTTP.

Components and Culture Clash
      Because COM opens up more possibilities for collaboration with other programmers, it presents more opportunities for disagreement. Many C++ developers are horrified by the "don't just stand there, ship something" ethic that many Visual Basic®-based shops live by. In contrast, most developers working with Visual Basic are amused by the syntactic preoccupation that C++ programmers exhibit as they write yet another template-based generic wrapper to a three function API. The culture clash is only exacerbated by the varying degrees of COM support each programming environment offers. C++ programmers often feel constricted by the lack of support for arbitrary pointers and arrays in Visual Basic. Programmers using Visual Basic have their own list of gripes related to versioning, GUID management, and support for interfaces and events.
      C++ developers prefer strong typing. This can be traced back to Bjarne Stroustrup's mantra of "prefer compile-time errors to runtime errors." Programmers using Visual Basic tend to prefer loosely typed systems. Perhaps this can be traced back to the Basic language's lack of support for typed variables. Or perhaps it's because Visual Basic-based projects are typically of short duration and there is little time (or need) for strong typing.
      It is easy to take the puritanical stance that strong typing is superior to loose typing. However, many of the applications that programmers build with Visual Basic are highly suited to loose typing. This is especially true for code that is written to be disposable or transient. Many applications developed by internal corporate development departments (the primary bastion of Visual Basic) need to be written to satisfy an immediate business need that is often transient or volatile. A majority of today's Web development also falls into this category, as most commercial Web sites change from month to month to hold consumer attention as well as to take advantage of newer Web technologies.
      Programmers using Visual Basic have expressed their preference for loose typing in their utilization of the ActiveX® Data Objects (ADO) Recordset. The ADO Recordset is a generic, extensible, self-describing data structure that most programmers who work with Visual Basic have come to depend on. While the ADO Recordset was originally designed to present an API to databases, it has evolved into a generic data structure that is useful even when no databases are in use. It is common to define component interfaces largely in terms of the Recordset.
      Because Recordsets can easily marshal by value, they are more efficient for data transfer than most of the other object-based solutions available in Visual Basic. Also, because the schema used by a Recordset is defined at runtime, not compile time, Recordsets offer a back door for interface evolution that does not require COM-style interface versioning. This style of evolution is not without its downside. Changing a Recordset schema requires that version negotiation be done manually at the application level. However, in some deployment scenarios, this is not a problem.

Component Technology for the Web
      It should be clear from the previous discussion that there are many valid approaches to integrating software components. Each of the technologies discussed has loyal followers who have committed considerable resources to their technology of choice. For this reason alone, it is unlikely that any of these technologies will disappear from the development landscape any time soon.
      However, it is also unlikely that any of these technologies will dominate the Internet. The network protocols they use tend to require a nontrivial amount of runtime support to function properly. Ironically, while Microsoft and the Object Management Group (OMG) were arguing over whether the Internet would be run on DCOM or CORBA, HTTP took over as the dominant Internet protocol. Like many other successful Internet protocols, HTTP is simple, text-based, and requires very little runtime support to work properly. Additionally, many corporate firewalls block DCOM and CORBA traffic while allowing HTTP packets into their guarded networks. Finally, when you consider the fact that HTTP servers are scalable, reliable, and easy to administer, it seems wise to expose your software components using HTTP technology.

XML as a Component Technology
      Many programmers view XML as a fourth component integration technology. While originally designed as a solution for adding extensions to HTML, XML is rapidly becoming the technology of choice for integrating heterogeneous component-based systems. Here's why:
       XML has minimal standards Recall the four degrees of interoperation discussed earlier and shown in Figure 1. XML is fundamentally about defining a minimal wire representation for data/message interchange. This is the minimal level of standardization needed to ensure that components can communicate. The core XML specification is extremely simple. It only lays down the syntactic ground rules for forming valid XML messages. While the W3C is rapidly layering additional standards on top of XML (XLink, XML Schemas, and so on), the base XML syntax has remained fairly stable. It has proven to be flexible and adaptable to many applications, and despite its hierarchical nature, XML lends itself reasonably well to nonhierarchical data types.
      As shown in Figure 2, XML does not mandate a type information representation per se. It provides a standard mechanism for describing an XML data stream known as a Document Type Definition (DTD). While DTDs can be useful for building validating XML parsers, they have several problems. For one, DTDs are barely readable to developers and difficult to write because the syntax of a DTD is not XML, but rather a DTD-specific grammar that is slightly different from XML. Because DTDs are not valid XML themselves, infrastructure and tool vendors need to develop two parsers, editors, or APIs, one for XML and one for DTDs. Worse yet, DTDs have a hard time dealing with scoping and namespaces, which makes them unusable in some scenarios. Currently, many XML-based systems simply define their own type information representations as XML vocabularies. The Microsoft XML Data specification is one example of such a vocabulary.
      As for source code or in-memory representations, XML doesn't go far enough. XML makes no attempt to address in-memory component or object representations except that XML data streams can be read into memory prior to parsing. While the W3C is currently working on an API recommendation known as the Document Object Model (DOM), this API is only a recommendation and is not required to host XML in an application or system.
       XML is platform-agnostic Despite the hopes of platform vendors or open-source zealots, the computing world will always consist of different programming languages, operating systems, and computing hardware. Since XML is only a wire representation, it has no particular affinity to one operating system, programming language, or hardware architecture. As long as two systems can exchange XML messages, they can potentially interoperate despite their differences. Because XML does not mandate an API or in-memory representation, it is simple to host XML in an application. There are XML parsers available for most, if not all, programming languages. While there are several standardized programmatic interfaces for parsing XML (such as the W3C DOM and SAX), there is no mandate that you must support a particular API in order to interoperate with other XML-based systems.
       XML is accessible XML is incredibly easy to understand, read, and author. This accessibility has been key to XML's rapid acceptance. Unlike binary wire protocols like DCOM, CORBA, or Java/RMI, you can easily create XML messages using a simple text editor or scripting language.
      While many XML parsers provide facilities for generating well-formed XML, it is also possible to generate XML using standard string manipulation facilities in your programming language of choice. The simple text-based nature of XML also makes it easier to debug and monitor distributed applications, since all component-to-component messages are human-readable when using a network monitoring tool.
       XML is extensible XML provides an elegant mechanism for allowing arbitrary parties to extend a given XML data stream. XML namespaces use the Uniform Resource Identifier (URI) namespace to allow arbitrary attributes and elements to be added to an existing XML vocabulary.
      For example, consider this simple XML fragment:

<order orderno="33512"> <customer custno="4462" /> <item itemno="3352" /> <item itemno="1829" /> </order>

Here, customer number 4462 is ordering items 3352 and 1829. Assuming that both the sender and receiver understand what this means, everything is great. But what if the sender wanted to annotate this message with additional information—for example, by adding an identifier to the order that associates it with a larger financial transaction? You could imagine the sender simply adding the attribute as follows:

<order orderno="33512" transid="55291"> <customer custno="4462" /> <item itemno="3352" /> <item itemno="1829" /> </order>

However, because the application receiving the message may have been developed independently from the sending application, there are several potential problems. For one, the receiver may not allow additional attributes or elements to be added to a message. If the receiver interprets the presence of this attribute as a parsing error, the request will fail.
      To deal with this problem, newer XML description technologies (such as Microsoft XML Data) allow XML vocabularies to be defined as either open or closed. A closed vocabulary cannot be extended beyond what is described in the base vocabulary schema. An open vocabulary can be extended, with the receiving application deciding how to interpret extended elements and attributes. Depending on the application, unrecognized extensions to a vocabulary can often be ignored.
      Assuming that the order message shown earlier is part of an open XML vocabulary, it should be safe to add the transid attribute. However, what if the receiver of the request also wanted to extend the vocabulary? What if the receiver had defined a new attribute for associating orders with low-level database transactions? If the receiver had the misfortune of choosing transid as the attribute name, then the sender's financial transaction ID would be misinterpreted as a low-level database transaction ID. To solve this problem, the W3C added namespaces to XML.
      XML namespaces allow attributes and elements to be scoped by a URI. The following XML fragment illustrates how XML namespaces can be used to unambiguously add the transid attribute to the order request:

<order orderno="33512" xmlns:fin="http://money.org/FinancialXML/ns" fin:transid="55291" > <customer custno="4462" /> <item itemno="3352" /> <item itemno="1829" /> </order>

When a receiver parses this XML fragment, it can detect that the transid attribute is scoped by the namespace http://money.org/FinancialXML/ns and is not the same as the transid attribute used to represent database transactions (which would have a different namespace URI). In fact, XML namespaces allow both transid attributes to appear in the same request unambiguously:

<order orderno="33512" xmlns:fin="http://money.org/FinancialXML/ns" xmlns:db="urn:xmltpc:XMLTransactions" fin:transid="55291" db:transid="46722" > <customer custno="4462" /> <item itemno="3352" /> <item itemno="1829" /> </order>

Despite the current energy being dedicated to XML-based type description, to date namespaces are still the most enabling enhancement to XML that has come out of the W3C.
       XML can be loosely typed Due to the use of open vocabularies and namespaces, XML can support loosely typed communications. While strong typing has many benefits (and is supported by XML using DTDs or their equivalents), it is extremely easy to build loosely typed systems using XML. This makes XML adaptable to generic application frameworks, data-driven applications, and rapid development scenarios such as disposable or transient Web-based applications. Many ADO Recordset fans are using the Microsoft XML parser (MSXML) to replace the Recordset as a data transfer mechanism for both its cross-platform benefits and its superior support for nontabular data.
       XML translates XML Simply adopting XML as a component integration technology does not completely solve the interoperability problem. Though much of the industry is embracing XML as an interoperability technology, this only pushes the interoperability problem up one level of abstraction. Even if the entire industry were to shift to XML overnight, this alone would not help, as different organizations are likely to use different XML vocabularies to represent the exact same information. Granted, there are currently industry-wide initiatives to standardize domain-specific XML vocabularies (such as BizTalk, FinXML, and OASIS). However, it is not known whether any of these efforts will achieve 100 percent penetration in a particular application domain.
      Fortunately, the lack of standardized vocabularies can be addressed using XML technology. In particular, in the presence of two competing vocabularies, it is likely that application-level gateways will transform requests from vocabulary A into requests in vocabulary B. An even more promising solution lies in XML transforms, also called XSL transformations (XSLT). XSLT allows one XML vocabulary to be transformed into another by specifying the transformation rules (in XML, of course). XSLT was originally devised to map XML to HTML, but is currently being applied in a variety of more interesting scenarios.
      To understand the power and elegance of XSLT, take a look at the following XSLT document, which converts all of the attributes of an XML document into elements:

<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <xsl:apply-templates /> </xsl:template> <xsl:template match="*" > <xsl:element> <xsl:for-each select="@*" > <xsl:element> <xsl:value-of /> </xsl:element> </xsl:for-each> <xsl:apply-templates select="*"/> </xsl:element> </xsl:template> </xsl:stylesheet>

If you were to run this transform over the <order> fragment shown earlier, you would wind up with the following as output:

<order> <orderno>33512</orderno> <customer><custno>4462</custno></customer> <item><itemno>3352</itemno></item> <item><itemno>1829</itemno></item> </order>

Each XML parser can have its own proprietary way to apply an XSLT. Using the MSXML parser, the following JScript program will do the job:

var dom = new ActiveXObject("Microsoft.XMLDOM"); dom.load("input.xml"); var xsl = new ActiveXObject("Microsoft.XMLDOM"); xsl.load("transform.xsl"); WScript.echo(dom.transformNode(xsl));

Note that this generic five-line script will work on arbitrary XSLT transformations. Also note that the XSLT does not have to generate valid XML. For example, consider the following XSLT:

<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <xsl:apply-templates select="order" /> </xsl:template> <xsl:template match="order" > Order <xsl:value-of select="@orderno" /> Customer <xsl:value-of select="customer/@custno" /> Items: <xsl:apply-templates select="item" /> </xsl:template> <xsl:template match="order/item" > Item <xsl:value-of select="@itemno" /> </xsl:template> </xsl:stylesheet>

When run over the same input <order> element, it generates the following:

Order 33512 Customer 4462 Items: Item 3352 Item 1829

As these two examples illustrate, XSLT makes it easy to automate translation from an XML vocabulary to other formats, including other XML vocabularies.

Summary
      Despite the hype, XML will not solve all of your problems. It may or may not help you ship software faster. XML will never replace programming languages such as C++ or Java or component technology such as COM. XML will, however, become widely used to enable software components to interoperate, acting as a gateway between autonomous, heterogeneous systems.

Have a question about programming with COM? Send your questions via email to Don Box: dbox@develop.com or http://www.develop.com/dbox.

From the January 2000 issue of Microsoft Systems Journal.