This article assumes you're familiar with COM, XML, JScript
Download the code (9KB)
SOAP: The Simple Object Access Protocol
Aaron Skonnard |
Remote objects can give a program almost unlimited power over the Internet, but most firewalls block non-HTTP requests. SOAP, an XML-based protocol, gets around this limitation to provide intraprocess communication across machines. |
Developers have a hard time agreeing on anything. They are fiercely loyal to the technologies they use and they don't like to compromise by adding support for the other guy's technology. Long and bitter technical wars have ensued, leaving interoperability the victim. See Don Box's recent article, "Lessons from the Component Wars: An XML Manifesto" (http://msdn.microsoft.com/workshop/xml/articles/xmlmanifesto.asp) for another perspective on the problem. For the most part, software component developers from opposite sides of the tracks stay as far away from each other's technology as possible. This polarity makes it difficult to achieve any level of interoperability. It would be great to find a component technology standard that everyone could agree on. It could be that a subset of minimal technologies is the best answer to this quandary. XML and HTTP are two such minimal technologies. The Simple Object Access Protocol (SOAP) defines the use of XML and HTTP to access services, objects, and servers in a platform-independent manner. SOAP is a protocol that acts as the glue between heterogeneous software components. If developer can agree on HTTP and XML, SOAP offers a mechanism for bridging competing technologies in a standard way. The main goal of SOAP is to facilitate interoperability. The industry has accepted HTTP. It's used everywhere, on all platforms. XML is becoming as ubiquitous as HTTP. Today you can find an XML processor for just about any common platform or language. This ubiquity makes HTTP a good choice for an interoperable transport mechanism. XML is a simple and extensible text markup language. Because XML is just text, any application can understand it as long as the app understands the character encoding in use. By default, XML assumes that all characters belong to ISO/IEC 10646, known as the Universal Character Set (UCS). The XML specification (http://www.w3.org/XML/) mandates that all XML processors must accept character data encoded using the UCS Transformation Formats UTF-8 or UTF-16. Therefore, any XML data stream encoded in UTF-8 or UTF-16 can be understood regardless of platform or programming language. (Note that since the first 256 character codes of UTF-8 match up with ASCII, a UTF-8-capable processor can understand straight ASCII text files. This makes XML a good choice for describing method invocations in a platform and language-neutral fashion. Combining HTTP and XML into a single solution gives you a whole new level of interoperability. For example, lathered with SOAP, clients written in Microsoft® Visual Basic® can easily invoke CORBA services running on UNIX boxes, JavaScript clients can easily invoke code running on the mainframe, and Macintosh clients can start invoking Perl objects running on Linux. The list goes on. While some interoperability is achieved today through cross-platform bridges for specific technologies, once SOAP becomes standard, bridges will no longer be necessary. As you'll see, much of what SOAP represents is simple, and it's nothing new. SOAP simply codifies existing practices into an industry standard from which everyone can benefit.
Firewall Woes
SOAP-like Technologies
|
|
Like CIS, the object creation and method invocation
happen over HTTP, making it possible for DCOM to get through the firewall. But like CIS, this solution is platform-specific, and the server also relies on an ISAPI DLL running on IIS. All business components that you want to expose through this mechanism must be configured differently in the registry than the way a standard DCOM component is registered. Furthermore, the client must have the appropriate RDS infrastructure in place for this communication link to work properly. Remote scripting is another technology that makes it possible to call server-side script functions directly from a browser's client-side script over HTTP. Microsoft implements remote scripting through a Java applet that runs in either Netscape Navigator or Microsoft Internet Explorer. Once you have the remote scripting runtime files configured properly on your Web server and within your client-side HTML files, you can start calling functions that live in your server-side ASP pages directly from DHTML. The following snippet illustrates some client-side JScript code that gets called in response to a button's onclick event: |
|
You can think of the ASP file as the class, and RSGetASPObject as a special type of moniker that binds you to the class (similar to CoCreateInstance). If you think about it, this is just another way to call methods over HTTP. Although this technology works from within different scripting languages on the client, it's tied to ASP and IIS on the server. And finally, even the most basic Visual Basic WebClasses or custom ASP pages are similar to SOAP. Both of these technologies make it possible to call server-side business objects over HTTP. For example, the following ASP page (which is similar to the functionality of a WebClass-generated ASP file) creates an instance of the Lib.BusinessObj component and invokes SomeMethod, passing the string sent by the client: |
|
Simply sending an HTTP request for the given ASP page makes it possible to invoke SomeMethod remotely. A client would invoke SomeMethod using the following URL: |
|
At first, technologies like CIS, RDS, and Remote Scripting seem to resemble SOAP, but this last example is actually a closer fit. Since ASP pages are called via HTTP, any client application capable of sending HTTP requests can invoke SomeMethod on Lib.BusinessObj regardless of platform or language. Similarly, if you write a Perl CGI script that happens to call a Perl object, any HTTP client capable of requesting the CGI script can also indirectly invoke the Perl object. As you can see, SOAP is not an entirely new concept. I've described several similar technologies available on the Windows platform today. SOAP attempts to codify the main concept behind these existing practices into a simple and generic protocol that can serve as an industry standard.
SOAP Requests
|
|
This HTTP request points to a uniform resource identifier (URI) of /cgi-bin/purchase-book.cgi. Since the SOAP specification says nothing about how a component is activated, it's up to the code behind this URI to decide how to activate the component and invoke the specified method.
SOAP HTTP Headers
|
|
Again, the M-POST method specifies that an extension declaration is mandatory for this HTTP request, and therefore a Man header must be present. The Man header contains the extension identifier along with a two-digit extension prefix. The prefix is used on each custom HTTP header that belongs to the extension, as shown in this example. This mechanism is similar to the XML namespace standard, which allows an arbitrary URI to scope a set of identifiers. You may be wondering why additional HTTP headers for MessageType, MethodName, and InterfaceName are used. Couldn't this information be embedded in the SOAP XML payload? The answer is yesSOAP could have left off these headers and contained the information in the XML payload. However, the reason for using HTTP headers has to do with firewall filtering.
Firewall Filtering
SOAP XML Payload
|
|
The SOAP root element is the top element in the XML payload tree. It provides the serialization context for the SOAP method call data stream. Hence, the SOAP specification gives it the name SerializedStream. If SerializedStream is going to have only a single child element for a given method call (meaning it won't use any SOAP header elements), using the root SerializedStream element is optional. For example, this XML payload |
|
is equivalent to this one: |
|
The SerializedStream element must contain an attribute named main, whose value is a standard URI fragment identifier that points to the call element for the given request. If a headers element exists, it must also contain another URI fragment identifier named headers, which references the headers element: |
|
These reference attributes make it possible for SOAP implementations to do quick retrievals of the call or headers elements by ID. The SerializedStream element may also contain a serializationPattern attribute, which identifies the version of the SOAP specification that this message was
coded against. The call element contains all the information related directly to the method invocation. The element name must be the same as the method name used in the MethodName HTTP header. The element must contain an id attribute that distinguishes it from other child elements of the root SerializedStream element. The call element also contains a child element for each [in] and [in/out] parameter for the given method. These elements can use either the defined parameter name as the element name or a more generic approach that allows you to use _ _param0, _ _param1, ... _ _paramn, where n is the number of parameters minus one. For example, this payload |
|
is equivalent to this one: |
|
The headers element may contain additional information not directly related to the method invocation for the given request. The headers element must contain an id attribute that distinguishes it from other child elements of the root SerializedStream element. The headers element also contains a list of header entries. The SOAP specification lists some standard header entries you can use including InterfaceName, MethodSignature, and UnorderedParams. You can also pass any other type of implicit information that needs to flow with the request, but is not directly related to the method invocation, such as a transaction ID or session state information (think cookies). |
|
SOAP Response
|
|
It will also contain one child element for each
[in/out] or [out] parameter. Again, these element names can either be the same as the defined method names or you can use the _ _paramN approach. If the method returns an error, the response element will contain a child element named _ _fault. The fault element can contain detailed error information for the given request including a fault code, a fault string, and a special run code that tells you if the request reached the destination: |
|
SOAP Security
A Generic SOAP Client
|
Figure 1: Generic SOAP Client |
This SOAP client allows you to easily build a SOAP request to any SOAP endpoint. You type in the endpoint you want to call along with the method and interface names for the given method call. The sample builds a standard SOAP payload and includes the appropriate SOAP HTTP headers in the request (including Content-Type, MessageType, MethodName, and InterfaceName). It also gives you a starting point for the XML payload. As you type in the method and interface names, it will add the call tag using the method name and a header tag using the interface name. Then all you do is type in any parameter values inside of the call tag (where it says ENTER_
PARAMETERS). The client also allows you to specify whether you will use POST or M-POST for a given request. If you will use M-POST, it allows you to enter the extension identifier along with an associated prefix. The client will then include the extension declaration in the HTTP request and prefix the SOAP HTTP headers with the supplied extension prefix. Not all HTTP servers are configured to support the M-POST method, so the client allows you to use POST. |
Figure 2: SOAP Request |
After you've built the SOAP request, press the Call Method button to send the request to the specified endpoint. After the call returns, the
client will display all of the SOAP request and
response information to help you become
more familiar with the protocol specifics (see
Figure 2 and 3). |
Figure 3: SOAP Response |
The sample also includes a handful of predefined SOAP endpoints that exist on test SOAP servers at DevelopMentor (http://www.develop.com/soap). Some of the endpoints are COM components implemented in Visual Basic, while others are Perl objects running on a Linux box. To use one of these endpoints, select it from the Test Endpoints combobox and the entire SOAP request will automatically appear in the form. For more detailed descriptions of the different test SOAP endpoints, click on [show descriptions] (see Figure 4). When you choose one of the Perl endpoints (reverse_string), you're calling a Perl object directly from a Windows script. |
Figure 4: Test Endpoints |
Although I've covered the basics of SOAP in this section, you'll want to read the SOAP specification for more details. As you can see, SOAP itself is very simple. It only defines how to encode and transmit method invocations and responses using HTTP and XML. To do something useful with SOAP, you'll have to write code to build and send the SOAP request from the client as well as code to understand the SOAP response on the server. While you could surely do this by hand, it would make things much easier if there were some natural language bindings for SOAP.
Language Bindings
|
|
To expose this Perl object through SOAP, you could create the following simple CGI script that uses the DevelopMentor Perl SOAP implementation: |
|
This CGI script simply uses SOAP::
Dispatcher to process the SOAP request. The $classes variable passed to handle_cgi_post lets you to specify
the class names that are allowed to
be called from the outside world
this adds an additional layer of application security. Remember, using SOAP you can call this Perl object from any client on any platform using any language that supports HTTP and XML. The Perl language binding makes it especially easy to make SOAP method calls from Perl clients. For example, the following Perl script calls the reverse_string method using SOAP: |
|
Notice how simple this script looks. There are no signs of HTTP, and no signs of XML. It creates a new SOAP::Proxy, sets the endpoint and interface name, and starts calling
methods through the proxy. Everything else happens transparently. The proxy will take care of setting the appropriate HTTP headers in the request and encoding the reverse_string method call using XML. It will also receive the HTTP response and process the XML payload for the result. DevelopMentor has also developed a SOAP binding for standard COM clients and servers. Any COM server can be exposed through either an ASP page or an ISAPI DLL. The following ASP allows you to call methods on the VBSoapSrv.VBSoapTest component implemented in Visual Basic: |
|
This creates an instance of the VBSoapSrv.VBSoapTest component through an <OBJECT runat=server> tag and delegates the work to the SoapCall component (which acts like a COM stub) to take care of the rest. The SoapCall component performs the same role as the SOAP::Dispatcher in the Perl implementation. On the client side, DevelopMentor provides a transparent proxy that hides all the SOAP details. And to make things even easier, they offer a SOAP moniker to bind your code to the proxy. Check out the following example, which uses the SOAP moniker: |
|
SOAP can't get much more transparent than that. The proxy builds, sends, and receives the underlying SOAP request for each method call (like GetOrigin). Besides the Perl and COM implementations, there is also a Java language binding. As of the writing of this article, both the COM and Java bits were still under development. Point your browser to http://www.develop.com/soap for more details on the current state of these implementations. At this point, I want to emphasize that these language bindings are not part of SOAP itself. They only make SOAP easier to use from their respective environments. Other organizations are free to produce their own SOAP implementations that improve upon these and make SOAP a more integral part of their products and services.
Additional SOAP Resources
Conclusion
|
From the January 2000 issue of Microsoft Internet Developer.