Cutting Edge: What XML Can Do For You; Microsoft Interactive Developer September 1998

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

cutting@microsoft.com Download the code (58KB)

Dino Esposito

What XML Can Do For You

ou may wonder why you, a developer creating Windows®-based desktop applications, should care about XML (Extensible Markup Language). If you create database applications, you should care quite a lot! XML is a universal method for describing structured data. It’s unique in that it complements HTML, a universal method for displaying data.
To understand what XML can do for you, it’s important to know the categories of problems it addresses. If you need a primer, then I suggest that you take a look at the proposed standard from the W3C organization at http://www.w3c.org/xml, as well as the documentation gathered on the Microsoft® XML site, http://www.microsoft.com/xml. In brief, XML is a data format that’s structured almost exactly like HTML. The difference is that instead of a set list of acceptable tags, the designer of an XML file can specify any tags that are appropriate to represent their structured data. XML cannot, however, provide effective screen formatting and layout. When used in tandem on the Web, XML and HTML serve complementary purposes.

The XML Mission
      The Web interface has popularized the Internet because it makes it easy for people to exchange information. First CGI and Perl, then ISAPI, ASP, and server scriptlets have provided the means for sending and receiving data between the client and the server. The data transferred back and forth reside in HTML code pages. HTML pages are the atomic units that glue together the presentation layer and the data.
      When you issue a query to a remote database, the server gives you back an HTML page-say, a table of records. The logical data that comprises the recordset has meaning only if you know the data format. It doesn’t come with a "data stylesheet" that defines the information’s layout. In HTML, tags can only be used to describe display features.
      XML provides a way of defining structured data that’s independent from the application that reads it. The tags used in XML documents are not predefined. Any descriptive string can be used (as long as it uses the approved character set). Moreover, the data defined by these tags can be displayed in any way you like.
      XML adds a new, intermediate level of abstraction between the data source on one hand and the user interface on the other. This layer lets you access cross-platform data from any system that does XML. Since the data is completely separate from the user interface, you can perform client-side processing before displaying the data.
      XML is already used internally in a number of commercial products, notably the new version of Microsoft Commerce Server. Flavors of XML also can be found inside Microsoft Office, server scriptlets, Active Channels,™ and Microsoft Internet Explorer. Internet Explorer 4.0 was the first browser to fully support XML, and it offers a powerful COM object to parse and componentize the data stream. A typical and effective use of XML was presented by Joe Graf in the July 1998 issue of MIND.
      Although you can use any tag you like to describe your data, it is expected that certain standards or traditions will develop over time. For example, there may be a standard for describing books in print. This will allow users to search across databases for books by a particular author. With current technology, there is no way to distinguish between books about an author versus those by an author.
      A universal syntax for describing data also makes it much easier to move data between disparate systems. If you’re designing a system that needs to integrate with the rest of the world, you’ll find XML especially useful.
      Channel Definition Format (CDF) and Open Software Distribution (OSD) are XML-based languages meant for use in specific contexts like Web subscriptions and auto- update software. Both CDF and OSD were covered in Cutting Edge last year by John P. Grieb (MIND, November and December 1997).
      XML also can help if you need to store information for use by your programs. You can create a generic layer of data between a client and a server (see Figure 1). For example, a server can send its data formatted according to the XML syntax so that you can arrange the final output knowing what each chunk of data represents. The same advantage can be gained in reverse, when a client sends information to the server. A server on any platform will be able to parse and store the data.

Figure 1: Creating a Generic Layer of Data

Another use for XML is configuring programs (see Figure 2). Most of today’s programs need some kind of configuration. It may vary quite a bit from user preferences to program settings, or from state information to internally created documents. XML can do the work with greater flexibility than INI files or registry keys.

Figure 2: Configuring Programs

XML can also be used instead of traditional databases to build easy-to-read and easy-to-share systems. Small repositories, user documents, and structures for internal use are all contexts in which XML may be of help (see Figure 3).

Figure 3: An Easy-to-Read System

Figure 4 provides a brief list of the pros and cons of XML. One of the major plusses of XML is its intrinsic flexibility. But this doesn’t mean that there are no syntactic rules. Let’s go over a few XML rules. First, each attribute must be enclosed between double or single quotes.

 <tag attrib="data">…</tag>

Each <tag> tag must be closed. You can use </tag>, or add a final slash as in <tag/>.

 <tag>…</tag>

You can’t have two or more overlapping tags. Nested tags must be closed at each level.

 <tag1>…<tag2>…</tag2>…</tag1>

Finally, special characters such as < and > must be expressed as escape sequences.

 This is a &lttag&gt.

XML-driven Program Configuration
Most presentations of XML discuss its value solely in terms of the Web. However, its easy-to-read format also makes it adaptable for traditional desktop applications. In the rest of this article, I’ll show you how to take advantage of XML to store configuration settings for a Windows-based program. I’ve always been a fan of INI files, and always used them to store the information that my programs rely upon. Sure, the registry is "in," and INI files are "out," but sometimes it’s still easier to pop a couple of values in an INI file where users can quickly get to them and transport them across machines. INI files, however, are flat files. While it’s possible to store structured data in an INI file, the work is up to you.
The registry, in contrast, is a generic hierarchical repository for any kind of data. It does have certain advantages, like allowing a program to handle multiple users without additional work. But it’s not a simple text file. Let’s see how XML helped me write a program whose behavior is coded once, but whose user interface and data can change. New behavior is often needed when data changes, but thanks to XML’s flexibility there’s no need to update the code.

XML Explorer
This month’s sample source program is a Visual Basic-based application called XML Explorer. It was built using the Visual Basic 5.0 AppWizard, and employs an interface much like that used by Windows Explorer. When started, it reads a template file that tells it how to fill the left-hand tree view pane, where to search for document files, and how to display them.
This program's interface is data-driven, but the behavior is always the same. As you'll soon see, when you change the underlying XML templates, it will appear to be two different applications. Figure 5 shows the demo program running one of the predefined templates. As you can see, I endeavored to give the program a compelling user interface.

Figure 5: XML Explorer in Action

Figure 5: XML Explorer in Action

      The idea behind my XML Explorer program is the same idea that makes, for example, Microsoft Transaction Server (MTS) Explorer work as a particular instance of Microsoft Management Console (MMC). Both provide a common layer of code that is customized to some extent by added modules. In the case of MTS Explorer, things are a bit more complex since you have to write a snap-in module made up of several COM interfaces in order to communicate with and extend the MMC. For XML Explorer, it is sufficient to use an XML file to completely change the program's user interface, while the global behavior of the program remains unchanged.
      The XML Explorer client area is divided into three panes: a tree view, a report list view, and a WebBrowser control. The tree view delineates a kind of hierarchical repository with multiple root nodes and two additional levels called folder and field. Both the icons displayed and the names used may be decided at runtime by reading from an XML template file. The caption of the window may be set from the template, too. Figure 6 shows the XML template file that renders the structure depicted in Figure 5. This template lets me store information about my MIND and MSJ articles.

Figure 7: XML Template Structure

Figure 7: XML Template Structure

      The adopted XML structure uses a <root> tag to identify a first-level node. In this case, there are two roots, one titled MIND and one called MSJ. The next level contains multiple <folder> tags to help organize the final content. Each folder may include one or more <field> tags.
      I've designed the structure so that these three levels of depth are necessary and only the <field> tags can define actual information as the leaves of the tree. The whole collection of nodes defines the namespace of the application. When you change the namespace (via a new template file), you change the application. Each node may use a varying number of the attributes, as depicted in Figure 7. The XML Explorer core application provides a default value for nodes.

Figure 8: XML Explorer Without a Template

Figure 8: XML Explorer Without a Template

Figure 9: Template Selections

Figure 9: Template Selections

      The application uses a base path for searching the templates. The default is hardcoded as c:\xml\templates. Figure 8 shows how the XML Explorer looks when no template is loaded. In Figure 9, you can see the templates defined on my home PC.
How XML Explorer Treats Documents
      At startup, XML Explorer looks in the registry for the last template opened. If no template is found, it displays an empty user interface. Users can use the menu or the toolbar to select and open an existing template or create a new one. When a new template is created, a dialog box pops up with default content for customization.
      By definition, a template is a file called template.xmt that is located in a given subdirectory of a base path. The template defines the namespace used by the program and the HTML layout for displaying the documents. The <template> tag can have four attributes, as shown in Figure 10.
      The namespace will be populated by all files with an XML file extension and a valid format that are located in the same directory as the template. These documents are scanned and parsed, and their headings fill the report list view on the right pane (see Figure 5). A typical document is listed in Figure 11.
      While you are free to adopt the tags that best suit your data, there are a few rules for linking documents and templates. To start out, you need to indicate which root, folder, and field will contain the document data. This is accomplished through the <link> tag.

<link root="MIND" folder="Published" field="1998"> </link>

      The information section (marked by an <info> tag) contains the records that describe the items defined in the XML file. A <record> tag marks the description of each item—be it an article idea, a memo, a to-do list, or other type. A single XML file can include multiple <record> tags, making it a kind of database (see Figure 12).
      All of the records in an XML file share the same link information. Each record in this file has two required attributes, called What and Who. They are concatenated and displayed in the tree view, and also form the rows in the report list view shown in Figure 5. In Figure 12, the records are depicted by the turquoise blocks that replace the "Free Tags" blocks of the tree.

Figure 12: Record Tags

Figure 12: Record Tags

      I've filled in the tree view with the available folders, and I've filled in the report list view with the current folder's records. I have the name of an XML file and a reference to a given record inside it. The next step is figuring out a creative way to render the XML data stream for the user.
      HTML is a great language for presenting data. As mentioned earlier, the lower portion of the right pane in XML Explorer is a WebBrowser control, so displaying an HTML page is no problem.
      I decided to associate each template (and hence, each namespace) with a fixed layout for displaying data. The template's display attribute is the name of an HTML file to be displayed by the WebBrowser. This file will initially contain every object that does not need to be replaced when a given record is displayed. These mostly consist of images, separators, and labels. The file also will include placeholders for the specific record tags.
      Once you've created a new namespace (that is, a new, custom collection of folders and record descriptions), it's easy to produce an HTML file that can be used to display the information. All you need is a way for the program to associate each tag with a given HTML element. In other words, you need a way to tell the WebBrowser control to fill specific fields with specific record content.
      Basically, there are two solutions. The first one involves the creation of a new, temporary HTML page that uses the first one as a template. This is the typical approach of most existing CGI applications. A better approach is to use Dynamic HTML (DHTML)to update the page being viewed.
      Linking the existing HTML fields with the record information can be done via the tag ID. DHTML lets you assign a string ID to any tag, be it visual such as <A> or <IMG> or nonvisual like <DIV> or <SPAN>. The idea is to arrange an HTML layout where many of the tags have IDs whose names match the XML tags.
The Default DHTML Layout
      XML Explorer assumes the presence of a default HTML layout called default.htm, located in the root directory of the template. (Users must specify this path the first time they run the program, or when they remove all the registry settings.) The layout can be replaced by setting the display attribute in the template file.
       Figure 13 shows the contents of default.htm. As you can see, the layout is described by two mutually exclusive DIVs. The first one is used only when there's no record to display. Its ID is hardcoded to "Empty." The second DIV (whose ID is hardcoded to "Content") is left blank by default and is filled in through direct writing on the document object.
       Figure 14 defines custom content for this DIV so it looks like the display in Figure 5. Notice that the IDs of the tags match the names of the XML tags that form a record. For instance, if your documents need a <DUEDATE> tag, then you have to place an HTML element somewhere and give it an ID of duedate.
      XML Explorer takes care of the rest, assigning to each field the text contained in the XML file for that tag. Sometimes, however, a text element is not sufficient and a hyperlink or an image works better. In this case, just add a TYPE attribute to the XML tag, and give it one of the following values: email, link, or img. XML Explorer handles it from there, as shown in Figure 15. Usually, XML Explorer sets the innerText property

wbShow.Document.All("what").innerText = sText

but if the type is, say, img, then it sets the src property of the underlying <IMG> tag. If the type is email, the href property will be prefixed with "mailto:"

Figure 16: Default Layout View

Figure 16: Default Layout View

       Figure 16 illustrates what happens in the absence of a custom layout for the view. In this case, the document is formatted dynamically:

For i = 0 To xmlItem.Children.length - 1 sTag = xmlItem.Children.Item(i).tagName s = s + "<b>" + sTag + " : </b><i>" + _ xmlItem.Children.Item(i).Text + _ "<br></i>" Next wbShow.Document.All("content").innerHTML = s

The resulting string overwrites the body of the existing document, no matter what the record's tags are. If you realize that you forgot an important tag, just add it to the XML file and press F5 to refresh the view.
The MSXML Component
      The sample program demonstrates the flexibility of XML in conjunction with msxml.dll (MSXML). This component ships with Internet Explorer 4.x, and can parse any well-formed XML document. Once you reference it in your Visual Basic project, you can load any XML file with the lines:

Dim xml As New XMLDocument xml.URL = sXmlFileName

      The XML object then exposes a hierarchy of items that fully describe the content of the XML document. Traditionally, the XML file is rendered as a tree, where each subtree corresponds to a browsable collection. The main node is accessed via

Dim cRoot As IXMLElementCollection Set cRoot = xml.Root.Children

while each subsequent node, say the nth, is given by

Dim cRoot As IXMLElementCollection Set cRoot = xml.Root.Children.Item(n).Children

Alternatively, you can get a reference to a given tag by name:

Dim cRoot As IXMLElementCollection Set cRoot = xml.Root.Children.Item("info").Children

For a complete reference for this component, refer to the article by Joe Graf in the July 1998 issue of MIND, or to the Internet Client SDK.
A Quick XML Viewer
      The source code for XML Explorer makes no assumptions about the tags it may encounter. Still, it is perfectly able to handle any well-formatted XML document. This flexibility can also be exploited to build a quick-and-dirty viewer that offers a tree-based view of the XML content. This month's sample code includes an ActiveX® control that does just this. Based upon the MSXML component, the viewer is an enhanced version of a tree view control and exposes a URL property.
      When you programmatically point the viewer to an XML file, the viewer code recursively loops through the collections provided by the MSXML component.

Dim nRoot As Node Dim s As String Set g_xmlDoc = New XMLDocument g_xmlDoc.URL = sXmlFile s = g_xmlDoc.root.tagName + " (" + sXmlFile + ")" Set nRoot = tvTreeView.Nodes.Add(, , , s) ScanDocument nRoot, g_xmlDoc.root.children

While it succeeds in scanning all the nodes, it fails to enumerate the attributes since they're not available as a collection. Figure 17 shows how it handles one of the demo channel files (color.cdf) provided with the Internet Client SDK.

Figure 17: XML Tree Viewer

Figure 17: XML Tree Viewer

      I attempted to load the code from the XML-based server scriptlet presented in the May 1998 Cutting Edge column, but I encountered some errors! These errors had two main causes. While it is a commonly accepted practice for a server scriptlet to omit quotes when specifying the value of an attribute, leaving out quotes when parsing with MSXML produces an error. Second, the presence of a < sign in a <script> tag is a source of confusion for the parser.
Using XML Explorer
      So far I've presented an XML-driven application and an ActiveX control capable of providing a hierarchical representation of an XML file. To conclude, let me show you the power of XML Explorer and, incidentally, the extolled flexibility of XML.
      As I mentioned earlier, it's easy to set up a new namespace and transform the XML viewer into another application. Suppose you want to keep an eye on a full year of MIND issues. The template will have a single root called MIND and twelve folders, one for each month. Each folder can be articulated in, say, three fields for feature articles, columns, and contributors. The template will look like this:

<template title="MIND" caption="Reminder" display="view.htm"> <root title="MIND 1998"> <folder title="January"> <field title="Feature"></field> <field title="Columns"></field> <field title="Contributors"></field> </folder>

      The file view.htm is the HTML layout for presenting information. Since the information is mostly about articles, typical fields would include the title, author, summary, and so on. Here's an example:

<record what="Scriptlets" who="Dino Esposito"> <column>Cutting Edge</column> <logo type="img">mind.gif</logo> <contact type="email">cutting@microsoft.com</contact> </record>

Figure 18 shows the same application discussed previously with this completely different data set.

Figure 18: Another View of XML Explorer

Figure 18: Another View of XML Explorer

Summary
      My primary goal was to present a concrete demonstration of XML rather than discuss its features in the abstract. My focus was not the structure of the MSXML component or the syntactical particulars of XML. There are other sources of information for that. Instead, this article presents some applications of XML you could use today.
      Incidentally, this project also gave me the opportunity to develop some interesting code in Visual Basic that uses the coolbar control, flat toolbar, resizable panels, and a hyperlink control (see XML Explorer's About box). The downloadable source code includes the Visual Basic-based project for XML Explorer and an ActiveX control for viewing XML documents.
      Internet Explorer 5.0 is expected to ship with a new version of the MSXML component in which several aspects of the exposed XML object model will change, though it is not clear at press time what this will entail. At the time of this writing, the first beta of Internet Explorer 5.0 has only been available a short time. The final version of the XML Document object model is still waiting for approval by the W3C, and this will affect the object model exposed by Internet Explorer 5.0. Only one thing is certain: there will be enhancements and changes to the current model.

From the September 1998 issue of Microsoft Interactive Developer.