Make Your Legacy Apps Work on the Internet -- MIND September 1999

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

This article assumes you're familiar with XML, Microsoft Message Queue, and Visual Basic
Download the code (22KB)

Make Your Legacy Apps Work on the Internet
Scott Howlett and Jeff Dunmall

The Internet is increasingly becoming an important path for business-to-business data services. You can use BizTalk and COM to keep your existing systems useful for longer.

In the coming years, most companies will integrate their Internet presence with their mission-critical line-of-business systems. Creating these applications will be the most difficult challenge Internet developers have yet to face. It means enabling interoperability between legacy systems, possibly from different companies, and doing it with the availability and scalability that Internet applications demand. By getting into the right design mindset and making the right technology choices, notably Extensible Markup Language (XML) and Microsoft® Message Queue Services (MSMQ), Internet applications can provide the return-on-investment that has been promised in the past, but rarely delivered.
      First, let's explore what legacy systems are and why they're important. We'll walk you through interoperable Internet application development and the issues involved in building them. Then we'll build an Internet-based Address Change facility as a traditional Internet application that integrates with existing systems. We'll also show you how the design is extensible, enabling business-to-business communication using the BizTalk framework. Surprisingly, the framework can be written with only about 100 lines of ASP and Visual Basic® code.
      Microsoft will expand the technology available for integration with legacy systems when it releases the Microsoft Enterprise Interop Server, codenamed "Babylon" (see Figure 1). Babylon delivers application integration with COM Transaction Integrator (COMTI) and an MSMQ-to-MQ

Figure 1: Microsoft Enterprise Interop Server Architecture

      Figure 1: Microsoft Enterprise Interop Server Architecture

Series bridge, data integration with OLE DB providers and ODBC drivers, and network/platform integration via an SNa gateway or direct TCP/IP access. We'll only address the challenges of getting data to the Interop server. To find out more about Microsoft's interoperability strategy, go to http://www.microsoft.com/interoperability/.
Background
      If you were at Microsoft TechEd this past May, you probably heard Paul Maritz's discussion about the upcoming third generation of Internet applications. In his analysis, the first generation involved content publishing based on the HTML standard. The second generation was based on dynamic content using Windows® DNa technology, and the third generation will feature integrated Internet systems that enable more effective line-of-business applications.
      Building third-generation Internet systems will demand integration with legacy systems from one or more companies. Companies will be aggressively developing these applications because they know that customers and suppliers will demand integrated systems based on open Internet standards. Increased efficiency and streamlined business processes should have pleasant effects on the bottom line, in part through the competitive advantage gained by providing better and faster service. If a company doesn't or can't provide integrated services, business opportunities will likely be lost.
      If you read Don Box's article, "Windows 2000 Brings Significant Refinements to the COM(+) Programming Model" (Microsoft Systems Journal, May 1999), you were probably surprised at the notion that COM components developed before Windows 2000 are often referred to as "legacy" components. You've never written any COBOL, so how could you have any legacy code?
      The terminology shouldn't surprise anyone. It simply means that you have written software that is part of a production system. Having legacy code is a good thing because the alternative is that none of your code is working in production! So, simply put, a legacy system is any system that exists in production. It's made of legacy code and the manual processes, such as call centers, that support the system.
      Legacy systems often have hundreds of development-years invested in them. This is why it's so important to evaluate legacy systems and build Internet systems on top of their foundations. With the right design, you will be able to overcome the pitfalls associated with integration and interoperability.
      Building line-of-business systems in Internet time (less than six months) demands that existing systems be exploited. Usually, legacy systems have some manual processing where the combined expertise and experience of people is critical. You simply can't reproduce that effort in your timeframe, and you probably don't want to anyway—unless you have a penchant for pain or like to rewrite COBOL source code.
      Of course, legacy systems aren't all good news for Internet applications. When you bring them into the fold, you're adding the most dangerous software villain: the unknown quantity of legacy systems. But on the bright side, you've got an excellent starting point because you already have a working system.
      Legacy systems are frequently not documented, leaving everyone afraid that doing anything to them might break them. You can't change any of their code and no one can or will tell you how they work. So there have to be legacy experts on the team, at least for the design period. They'll be responsible for investigating, researching, and documenting the interfaces, business rules, and other facets of the legacy systems. Finding legacy experts is usually the first problem in building these systems.
      On the technology side—where we'll focus the rest of this article—there are a number of hurdles to overcome. Integration isn't always easy. In fact, it almost always involves some pain. But with the introduction of technologies such as XML and application services such as MSMQ, building interoperable systems has never been easier. Microsoft Windows 2000 offers more technology advancements that will make interoperability even easier, notably native support for queued components. Babylon, due in beta in the last quarter of 1999, will provide even more opportunity for legacy integration.
Availability, Performance, and Scalability
      Generally, integration problems fall into three familiar categories: availability, performance/scalability, and exception handling.
      The availability of your application is related directly to the availability of the legacy systems you connect with. Your Internet application likely needs to run 24/7, while your legacy systems may not have the same constraints. They may be scheduled for nightly maintenance or weekend shutdowns. What will your Internet application do when legacy systems are unavailable?
      From a performance and scalability perspective, the legacy system probably can't handle the volume of traffic that the Internet site can, at least not in the same form. Including legacy resources in your application increases transaction time, thereby decreasing scalability. Sometimes even connecting to the legacy systems can be a costly operation. If the only interface to the legacy system is through screen scraping (don't laugh, it happens all the time), then you definitely have a back-end resource bottleneck to overcome in your design.
      If you have any combination of these issues, you must find a way to decouple your legacy systems from your Internet application, making it transparent to your customers. The most effective way to accomplish this is with MSMQ. On the Internet side, you can concurrently send thousands of requests and get excellent performance from MSMQ. On the legacy side, you can process those messages at a pace that the legacy system can handle. If there are only five database connections available through SNA, for example, you can process five messages at a time. This effectively allows you to throttle the load on the legacy system, while at the same time addressing your availability concerns and allowing your Internet application to scale appropriately.
      A third grade French teacher once said, "Pour toutes les règles, il y a des exceptions." (Every rule has an exception.) We don't remember much French, but after years in the software business, we've refined these words of wisdom: "Every business rule that has existed for more than five years has at least one exception." And herein lies the third category of integration problem: if every rule can be broken, how do you write a middle tier that enforces the business rules? As a further complicating factor, many of these rules are entrenched in the legacy systems and manual processes (and people) that surround them. So to build a completely automated system, it's necessary to discover every business rule that's grown up around the process, and to uncover all the exceptions to the rules.
      Realistically, it takes time to flush all of these rules out and code them into your application, and they typically aren't documented very well. In many cases, this is not possible in a six-month timeframe, so you'll need to incorporate the manual processes to handle the remaining exceptions, at least in the first version. The golden rule is to make most things automatic and everything else possible, even if it means invoking a manual process.
      For example, if you move to another town, it's likely that your auto insurance policy needs to be updated to reflect your new neighborhood. So when building an Internet address change system for an insurance company, you have two design choices. You can either reprogram the system to automate the adjustment of the policy, or you can integrate this change within the existing process.
      If you've ever worked with an insurance company, option one should send shivers down your spine, especially if you have to deliver your application in six months. Changing a policy involves numerous rules that need to be enforced, many of them for legal reasons. Besides, you're supposed to be building an address change system, not an automatic policy update system. So the best alternative is to integrate the existing policy update process into the new system. This will save you time (the scarcest resource) in the development cycle because you won't have to discover every rule and exception up front.
      By incorporating the existing manual processes, you can rely on the expertise of the people who know the system best. Integration makes your life easier by saving you time and headaches. In version 2, after you get the app up and running, you can seek to uncover and incorporate more business rules and thereby automate the system more completely. Don't underestimate how complex this will be. Even though a division might look like a call center to you, it's also the brain center for many businesses, and it can't be easily replaced.
How to Bring Them Together
      So how do you overcome the availability, performance/scalability, and exception handling pitfalls? How many new design patterns and technologies do you need to learn? There's some good news. All three areas can be addressed by one technology choice: a combination of MSMQ and XML. But technology alone won't save the day. To do it right, you'll need to change the way you think about application development.
      Generally speaking, people are on-demand thinkers. When we're hungry, we eat; when we're thirsty, we drink. This type of thinking translates into a design pattern that will leave you dead in the water when it comes to building integrated Internet systems. You have to break this design pattern by thinking asynchronously first, and designing synchronous transactions only as a last resort. If you must use a synchronous transaction, hold on to your hat—in some cases it simply may not be possible with hundreds of users. Benchmark synchronous transactions early to avoid late-breaking scalability problems.
      You also have to get over the common mindset that goes something like this: "If I (re)build everything in the system, all the code will be mine and everything will work." While this may be true in some cases, it is not the proper approach, especially if you want to finish on time. The key here is to make use of the enormous amount of effort that has been invested in existing systems and processes. If you integrate with an existing process, it's likely that the development time spent handling exceptions can be reduced by half because the existing process already has built-in exception handling mechanisms.
      Finally, you must embrace the versioned approach to software development. If version 1.0 of your application is also meant to be the last version, it is almost certain to fail. a one-version approach will compromise proper system design, the first casualty of increased scope inside the same development schedule. Furthermore, you'll lose the ability to make course corrections, both in overall architecture and specific features.
MSMQ and XML
      As we mentioned previously, the three main problems with integrated systems (availability, performance/scalability, and exception handling) can be addressed with a single technology choice—MSMQ and XML.
      Many systems already communicate by passing a simple string. In order for both systems to understand the format and location of the data in the string, it's probably marked it up with some kind of token system or based on fixed-length fields. XML is the formalization of this concept. It also includes the tools and techniques to make development easier. By using valid XML—XML that conforms to a document type definition (DTD) or schema—industries can standardize on a data format, giving applications the ability to exchange data with a much larger audience. XML provides a great way to model data and represent that data in a simple and powerful format.
      What XML does for data, MSMQ does for transport. MSMQ gives systems a reliable and disconnected path for the transfer of XML between different systems in an organization. It guarantees that the XML is delivered once (and only once) when network conditions permit. If you're integrating with systems that are widely distributed geographically, not always available, slow, or nonscalable, MSMQ allows you to isolate these systems from your customers. Your customers need not be concerned with your legacy systems because customers never interact directly with them.
Sample Application
      About a year ago, one of our cars was stolen from the airport. The settlement for my car showed up in the requisite 60 days, but a check for the car's contents never arrived. After six months or so, we gave the insurance company a call. After all, the Smiths CD that was in the car had to be replaced. The agent said that the second check was indeed sent over five months ago. It turns out that while the auto policy had the new address, the home policy (which insured the contents of the car) still had the old address. The same insurance company held both policies, so the cause of this mishap was surely the lack of integration across their line-of-business systems.
      Let's examine an Internet-based address change facility that solves the problem we just outlined. The insurance company has many internal processes, but we'll focus on these three parts of the call center:
Auto policy address change: when a customer calls, the change is entered into a database via a terminal session.
Auto policy change: when a customer calls, an email is sent to an insurance agent, who then reviews the policy and makes the appropriate changes.
Home policy address change: for a simple address change, a fax is sent to the auto policy division that handles the address change.
      In our case, it's likely the fax was never sent and thus my home policy address was never updated.
      Let's take a quick look at the typical Internet solution and highlight some of the pitfalls. We'll then show you a good second-generation application and its extension to a third-generation app using the approach advocated by BizTalk.org, which was announced at TechEd in May 1999.
Typical Approach
      The typical approach toward building an Internet-based front end over a legacy system would be to build an HTML form that posts to an ASP page. The ASP code would connect to each of the data sources and execute some SQL to make the address changes. Finally, you'd write some HTML back to the client, indicating the success or failure of the operation. The HTML form would look something like Figure 2.

Figure 2: a Traditional HTML Form

      Figure 2: a Traditional HTML Form

      This solution does not address the availability, performance, scalability, or exception handling challenges. First, availability is not optimal because if any of the data sources are unavailable at submit time, the user will receive an error message. Furthermore, if the underlying data sources run on a mainframe (as most legacy systems do), there may also be transient problems with connectivity through the SNa gateway. And, of course, there are the usual network problems, which may be intensified if the database is located across the WAN.
      There is a big problem with performance: the user is waiting for all database transactions to complete. This can be disastrous, especially at busy periods (end-of-month, holidays, and so on) when legacy systems typically run at or near full capacity. Having this many database connections on a single page also presents scalability problems, especially if connections are limited (which they frequently are in legacy systems) and if there are contention issues. There may even be a show stopper if the transactions are lengthy and are locking resources.
      This system does not exploit existing systems, and there is no exception handling mechanism. There are also less obvious problems with this solution. What if complex business rules need to be enforced? Even worse, what if they had traditionally been enforced by a manual process? What if connectivity to the back-end database is simply not possible (existing instead in a flat file on the mainframe) or a subsystem is down for regular maintenance? What if you want to share this information with an affiliate company? This may sound like worst-case planning, but these are common concerns at large corporations.
a Better Solution
      Now, let's take a look at how this system could be built to address the shortcomings of the traditional approach. As you read, keep three "VIA" (Version, Integrate, Asynchronous) rules in mind:
Plan for another version (you don't have to do everything in version 1.0).
Integrate existing systems and processes (don't start from scratch).
Think asynchronous first (isolate existing systems).
      From the user's perspective, the interface is still an HTML form with the same appearance as the one shown in Figure 2. Instead of posting the HTML form, you're going to represent the data in XML and post the XML to the server. An excellent source of information about XML was Dino Esposito's June 1999 Cutting Edge column.
      The project starts with a DTD, which defines the structure of the XML. The source code is shown in Figure 3. DTDs have their place, but they don't describe data in a way that is useful to data architects. a new standard, schemas, is emerging. Schemas use XML itself to describe the XML document structure. We chose to use XML schemas for this reason. Schemas are also a key part of the BizTalk initiative, which was started to facilitate business-to-business communications using XML.
      The DTD and schema serve the same goal: they allow the XML parser to validate an XML document against a reference to make sure it conforms to the published standard. An XML document that conforms to a DTD or schema is said to be valid. Schemas are currently only supported by Microsoft Internet Explorer 5.0. a sample Customer schema is shown in Figure 4.
      Before we get into the nitty-gritty implementation details, let's go over the entire solution first. We're taking a straightforward, six-step approach (see Figure 5):

Generate XML on the client machine.
Submit the XML to the server.
Package the XML into an MSMQ message and send it to the Distributor.
Send XML confirmation back to client.
Based on data stored in the registry, the Distributor will send one or more additional messages (one for each registered system).
These messages will in turn be received by Handlers, which will initiate a manual process or execute an address change in a particular system.
      Now that you have the schema for the XML document, let's take a look at how to generate the XML document on the client (see Figure 6). Most of the work is done in AddXMLNode. Notice that the schema definition is added to each node. This ensures that every node in the document is bound to the XML schema. By setting the DOMDocument.async property to false and then checking the DOMDocument.parseError code, you can determine whether the generated XML is valid. The resulting XML document is shown in Figure 7.

Figure 7: Customer XML Document

      Figure 7: Customer XML Document

      At this point, you've defined your XML schema, created an XML document on the client, and validated it against the schema. Now it's time to submit it to the server. To post the XML to the page, use the XMLHTTPRequest object included in the msxml.dll that ships with Internet Explorer 5.0. Using this object offers the best performance (by providing the leanest possible HTTP POST forms without compression) and the most straightforward code. The client-side source code to post the XML is shown in the SendXML method in Figure 6. Prior to Internet Explorer 5.0, the preferred method for HTTP posting would have been through the WinInet API. We've seen the code that does this, and it's not pretty.
      The code to receive the XML on the server is shown in Figure 8. Note the call to set the async property of the XMLDOM object to False. If you omit this line, the resolution of the schema and the eventual parsing of the document will be performed asynchronously, which is not what you want in this case. The source code that receives XML on the server is shown in the ProcessRequest method in Figure 8. The SendMessage function sends an MSMQ message to the Distributor here as well.
      The address change message is sent from the ASP page to the source queue shown in Figure 9. When it arrives, MSMQ notifies a Listener that calls the Distributor, a Microsoft Transaction Services (MTS) component. The Distributor pulls the message off the queue and sends the XML body to the destination queues as specified in the registry. Should something go wrong while sending the messages, the transaction rolls back, leaving the message on the source queue and nothing in the destination queues. More detailed information about sending and receiving MSMQ messages is available in Ted Pattison's article, "Using Visual Basic to Integrate MSMQ into Your Distributed Applications" (Microsoft Systems Journal, May 1999).

Figure 9: Distributor Architecture

      Figure 9: Distributor Architecture

      To take a message off a queue in an MTS transaction, the application removing the message must be running locally on the same machine as the queue and it must be running in MTS.
The Listener
      MSMQ notifies the Listener application (see Figure 10) when a message arrives by declaring an MSMQEvent object called WithEvents:
Dim WithEvents msmqMsgEvent As MSMQEvent Dim msmqQue As MSMQQueue
To set up notification in the Visual Basic-based listener, you'd then use the following code:

Set msmqInfo = CreateObject("MSMQ.MSMQQueueInfo") Set msmqMsgEvent = CreateObject("MSMQ.MSMQEvent") msmqInfo.FormatName = txtSourceQueue Set msmqQue = msmqInfo.Open(MQ_RECEIVE_ACCESS, MQ_DENY_NONE) msmqQue.EnableNotification Event:=msmqMsgEvent
      Instead of using a path name here, you should use a format name to increase performance. Using a path name requires a query to the Message Queue Information Store (MQIS). That RPC call adds significant overhead to the open queue request. Using a format name, on the other hand, requires only a single RPC call to the MQIS (if it is not a direct format name). After the first call, MSMQ caches the connection information, which removes the site controller from the picture and increases performance. This will be particularly relevant in the MTS Distributor component.

Figure 10: The Listener Application

      Figure 10: The Listener Application

      When an MSMQEvent_Arrived event is fired, the Visual Basic-based component calls the Distributor in MTS with the format name of the source queue. It does not pass a reference to the source queue directly; calling ReceiveCurrent on the reference would not include the source message in the transaction because the queue was not opened in the context of the transaction. If the transaction aborted, the message would neither appear in the destination queue nor remain in the source queue.
The Distributor
      Based on data stored in a local database, the Distributor will send one or more additional messages based on the contents of the registry (see Figure 11). The source code for the Distributor is shown in Figure 12. The messages sent by the Distributor will in turn be received by Handlers, which will initiate and execute an address change for a particular system. The architecture of a Handler is very similar to the Distributor, so we're leaving out the details.

Figure 11: Registry Values

      Figure 11: Registry Values

      In a more robust system, the Distributor might determine the destination based on a database lookup and the type of XML schema used in the body of the message. This more flexible architecture could be used to process other message types as well.
      Figure 9 gives the impression that the Distributor and destination queues are all running side-by-side on the same machine. While this is possible, it is equally likely that the Handlers would be running in separate offices, maybe even in different countries. This transparency gives your application the ability to communicate over slow links or WAN connections, knowing that your message will be processed when network conditions permit.
Benefits of this Solution
      Before going into the benefits of this architecture, take a look at the amount of code written for this sample. Granted, the sample app is straightforward, but the table in Figure 13 shows just how little source code is required to build the framework around an integrated Internet site.
      As you can see in Figure 13, there is not much source code involved in our sample application. Remember VIA? This solution fulfills all three parts:
Versioned: the first version of this system was simple and could be delivered in a realistic timeframe. However, the design and technologies chosen allow for easy extensibility in the future (see the BizTalk section that follows).
Integrated: we did not start this project from scratch. For example, the existing fax-based submission process was incorporated rather than attempting to redo it from scratch. By doing this, we did not have to incorporate the business rules surrounding modifications to insurance policies.
Asynchronous: the system has very few synchronous processes and the legacy systems are isolated from the customer. This optimizes the availability of the system. Furthermore, the performance and scalability of the system will be exceptional because there are no synchronous database connections.
      Let's take a look at a possible extension of this simple application. Suppose an insurance company was going to provide an address change service as part of their policies. This service would update the customer's profile at their bank at the same time. How would the insurance company be able to provide this service? This is where the BizTalk framework fits in. BizTalk is dedicated to establishing, publishing, and maintaining industry-specific XML schemas to facilitate the exchange of information between businesses. Instead of using a local XML schema, this application might instead use a schema located directly on the BizTalk site (http://www.BizTalk.org).
      The next step would be to register another data service with the Distributor. This data service would simply post valid XML—according to the BizTalk customer schema—to the bank, which would then process the request. In effect, the bank would not be able to tell if the XML request came directly from a customer or via his insurance company. Version 3 of MSMQ is slated to support native HTTP message delivery that extends the guaranteed delivery concept across the Internet (version 2 is scheduled for release as part of Windows 2000).
       Figure 14 extends the conceptual diagram of the system, making this a business-to-business solution. As you can see, extending a second-generation application to include business-to-business operations is quite natural—especially if the BizTalk framework is used to standardize on the use of a single XML schema.
Conclusion
      To build third-generation Internet applications, you'll need to build an interoperable system. Look at legacy systems and processes as opportunities to be exploited because the technology exists to address their shortcomings. By making the right technology choices now, communication with the systems of other companies will be a natural extension in future versions.
      It's clear that XML is emerging as the standard data format for communication between systems. On the transport side, MSMQ is an easy-to-use, asynchronous mechanism that can increase both performance and scalability. It will also isolate legacy systems and increase the availability of the system.
      To build this new brand of applications, you'll need to focus on the VIa design techniques we discussed in this article. Using these technologies and the VIa design concepts, you should be able to find the way to your third-generation Internet application. Good luck!

http://msdn.microsoft.com/xml/articles/hess061499.asp
and
http://msdn.microsoft.com/xml/default.asp

From the September 1999 issue of Microsoft Internet Developer.

Send feedback to MSDN.

Look here for MSDN Online resources.