Greater Office

Putting XML to Work

Parsing XML Data with MSXML

By Mike Gunderloy

Unless you've been on a long vacation lately, you've probably run across Extensible Markup Language (XML), which is being widely touted as the next great breakthrough for communication between applications. But you may be a bit fuzzy on how to use it in your own applications.

In brief, XML provides a text format for transmitting structured information. The good news is that you don't have to write your own parser for this format. Starting with Internet Explorer 4.01, Microsoft has shipped their own XML parser, MSXML, as a COM component. In this article, I'll show you how to use the MSXML parser from VBA to quickly and easily extract information from an XML document.

Why would you want to do this? The fact is that XML is one of this year's hot new terms in personal computing, and Microsoft is making a concerted push to incorporate it into all aspects of their product line. Other major software companies - from IBM to Oracle - have also produced XML tools and interfaces. With that much pressure behind XML as a standard, there's little doubt that you'll be faced with an XML data stream, sooner rather than later. Knowing how to read it quickly and easily will let you make the best use of your time in working with this data.

Understanding XML

XML is a huge topic, and one which very few people understand in depth. If you want to see the full standard, you need to go the World Wide Web Consortium (W3C) Web site at http://www.w3.org/TR/1998/REC-xml-19980210. A warning, though: The standard is very heavy going if you're not used to interpreting such legalistic documents. You might find the annotated version of the specification at http://xml.com/pub/axml/axmlintro.html more useful, or some of the articles at Microsoft's MSDN XML Developer Center (http://msdn.microsoft.com/xml/default.asp).

Fortunately, if all you want to do is read and parse an XML file from someone else, and you can make the assumption that the document is valid XML, you don't have to know the entire standard. In fact, there are just a few key terms you need to understand to get started in the XML world. You can always pick up more later, but for now, just concentrate on these four ideas:

Of course, I've left a lot out to produce this simple picture of XML. In particular, I'm ignoring the topic of Document Type Declarations (DTDs). A DTD is a section of XML that explains the rules for constructing a particular XML document, and it's important because it allows you to validate that an XML document is actually in the intended format. Since I'm assuming the source document comes from someone who knows how to construct XML, I can ignore this piece of the puzzle - at least for this article.

An XML Example

For this article, I'll be parsing the sample file shown in FIGURE 1, traffic.xml. This file tracks Web site visits to a fictitious Web site by country of origin. If you refer to the figure, you'll be able to identify all the parts of XML that I talked about in the previous section:

<?xml version="1.0"?> 
      <SiteVisits> 
        <Country 
      CountryName="USA"> 
          <TotalVisits>1348</TotalVisits> 
      
          <LatestVisit>1/4/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="UK"> 
          <TotalVisits>764</TotalVisits> 
      
          <LatestVisit>1/4/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Argentina"> 
           
      <TotalVisits>175</TotalVisits> 
          <LatestVisit>1/2/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Brazil"> 
          <TotalVisits>182</TotalVisits> 
      
          <LatestVisit>1/4/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Canada"> 
          <TotalVisits>688</TotalVisits> 
      
          <LatestVisit>1/3/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Denmark"> 
          <TotalVisits>204</TotalVisits> 
      
          <LatestVisit>1/1/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Germany"> 
          <TotalVisits>351</TotalVisits> 
      
          <LatestVisit>1/4/2000</LatestVisit> 
      
        </Country> 
        <Country CountryName="Hong 
      Kong"> 
          <TotalVisits>97</TotalVisits> 
      
          <LatestVisit>12/30/1999</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Ireland"> 
          <TotalVisits>522</TotalVisits> 
      
           
      <LatestVisit>1/4/2000</LatestVisit> 
        </Country> 
        <Country 
      CountryName="Malaysia"> 
          <TotalVisits>14</TotalVisits> 
      
          <LatestVisit>12/31/1999</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Netherlands"> 
          <TotalVisits>542</TotalVisits> 
      
          <LatestVisit>1/4/2000</LatestVisit> 
      
        </Country> 
        <Country CountryName="New 
      Zealand"> 
          <TotalVisits>599</TotalVisits> 
      
          <LatestVisit>1/3/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Norway"> 
          <TotalVisits>452</TotalVisits> 
      
          <LatestVisit>1/3/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Scotland"> 
          <TotalVisits>538</TotalVisits> 
      
          <LatestVisit>1/4/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Sweden"> 
          <TotalVisits>422</TotalVisits> 
      
          <LatestVisit>1/2/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Wales"> 
          <TotalVisits>301</TotalVisits> 
      
          <LatestVisit>1/1/2000</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Yugoslavia"> 
          <TotalVisits>37</TotalVisits> 
      
          <LatestVisit>12/30/1999</LatestVisit> 
      
        </Country> 
        <Country 
      CountryName="Zambia"> 
          <TotalVisits>42</TotalVisits> 
      
          <LatestVisit>1/4/2000</LatestVisit> 
      
        </Country> 
      </SiteVisits>

FIGURE 1: The example XML file, traffic.xml, as ASCII text

You can see that the traffic.xml file consists of markup (the mechanics of XML: angle brackets, element names, and so on) and data (the country names and numbers of visitors). Suppose someone sent you this XML document once a day, and your job was to extract the data and display it in Microsoft Excel? Because the structure is so simple, it's easy to see how you might start breaking it apart with VBA string functions, looking for angle brackets, element names, and data. You could build a special-purpose parser that understood the structure of this particular XML document, and feed the new document into your parser on a daily basis.

But there's a better way. Starting with Internet Explorer 4.01, Microsoft has been shipping a generalized XML parser with every Web browser. Even better, this generalized parser is available via COM. So there's no need to create your own!

The MSXML Object Model

The parser in question is called simply the Microsoft XML library, or MSXML. To understand the object model used by MSXML, it's helpful to think of an XML document, not as a nested set of elements, but as a tree. FIGURE 2 shows the traffic.xml document open in Microsoft XML Notepad, which uses the tree metaphor. (Microsoft XML Notepad is in beta as of this writing. You can download it at http://msdn.microsoft.com/xml/notepad/intro.asp.) Rather than showing the TotalVisits and LatestVisit elements as nested within the Country element, they're presented as child nodes on a tree of nodes. This tree metaphor is also implemented by the MSXML parser.


FIGURE 2: Another view of traffic.xml, i.e. as it appears in Microsoft XML Notepad

The MSXML library provides a wide range of objects that you can use to deal with the complexity of XML. Just as most of that complexity is beyond the scope of this article, most of the objects are unnecessary for the simple job of parsing straightforward XML. In fact, you can do the job with only four objects:

You'll note that the naming of these objects is a bit peculiar. DOM stands for Document Object Model, a general notion that documents of any sort can be assigned to a particular object model. This library provides the particular object model for XML. The "I" stands for interface, as most of these objects are actually COM interfaces.

The table in FIGURE 3 shows some of the properties and methods of these four objects, including all of the ones that I'll be using in this article. For a complete list, refer to the XML Developer's Guide in the Web Services section of the Platform SDK, or just use the object browser built into VBA to explore the objects. In particular, I've omitted the rich set of methods concerned with modifying XML documents, since the example here is only designed to read an existing document.

Object

Member

Type

Description

XMLDocument

async

Property

Set to True to enable asynchronous loading

XMLDocument

attributes

Property

List of attributes for the root node of the document

XMLDocument

childNodes

Property

Returns a NodeList containing all the children of the root node

XMLDocument

documentElement

Property

Returns the root node itself

XMLDocument

getElementsByTag

Method

Returns a collection of elements having the specified tag

XMLDocument

hasChildNodes

Property

True if the root node has children

XMLDocument

load

Method

Loads an XML document from a disk file or URL

XMLDocument

loadXML

Method

Loads an XML document specified as a string

XMLDocument

url

Property

Returns the URL for the most recently loaded document

XMLDocument

validateOnParse

Property

Set to True to validate the XML as it's loaded

IXMLDOMNode

attributes

Property

List of attributes for this node

IXMLDOMNode

childNodes

Property

Returns a NodeList containing all the children of this node

IXMLDOMNode

hasChildNodes

Property

True if this node has children

IXMLDOMNode

nodeName

Property

The XML name of the node

IXMLDOMNode

nodeTypedValue

Property

Formatted value of the node

IXMLDOMNode

parentNode

Property

Pointer to the parent node of this node

IXMLDOMNode

parsed

Property

True if this node and its children have been parsed

IXMLDOMNode

text

Property

Unformatted text of this node

IXMLDOMNodeList

item

Property

Zero-based index to the members of this collection

IXMLDOMNodeList

length

Property

Count of members in this collection

IXMLDOMNamedNodeMap

getNamedItem

Method

Returns the attribute with the specified name (if any)

IXMLDOMNamedNodeMap

item

Property

Zero-based index to the members of this collection

IXMLDOMNamedNodeMap

length

Property

Count of members in this collection

FIGURE 3: Representative methods and properties of selected XML objects

Using MSXML

Now that you've seen the basic objects you need to read XML, it's time to see what you can do with them. FIGURE 4 shows a presentation of the data from traffic.xml in an Excel 2000 worksheet. Note that the worksheet really doesn't contain any more information than the original XML file; it's just been transformed into a format that's easier for human beings to grasp.

FIGURE 4: Data from traffic.xml file as an Excel 2000 worksheet.

The code to generate the worksheet and embedded chart is shown in Listing One. It's tied to the Load button on the worksheet. When the user clicks this button, the VBA code loads and parses an XML file, and uses it to generate the worksheet.

The procedure starts, of course, by declaring variables and setting up an error handler. Then it calls the Load method of the XMLDocument object. This sets up the association between the object model and the actual XML. In this particular case, I don't want the code to proceed until the XML is fully loaded, and I don't care about validating the XML (because I trust the XML supplier). So the applicable part of the code looks like this:

oDoc.async = False oDoc.validateOnParse = False fSuccess = oDoc.Load(ActiveWorkbook.Path & "\traffic.xml")

Note that in this example, the XML is being loaded from a disk file in the same folder as the worksheet itself. However, the Load method is a good deal more flexible than that; it can also accept a URL. In fact, the odds are good that you'll actually be using that method to retrieve XML from a Web server somewhere that a supplier is maintaining for you. So in a real example, the load line would look something like this:

fSuccess = oDoc.Load( _   "http://TheirServer.com/traffic/traffic.xml")

Once the XML document is loaded, the MSXML object model makes it simple to pick it apart into its component pieces. The documentElement property of the XMLDocument object returns an IXMLDOMNode object that represents the root node of the XML tree. This particular code then uses the childNodes property of that root node object with a For Each loop to visit each of the top-level child nodes in turn. If you refer back to the XML in FIGURE 1, you'll see that this amounts to passing through the loop once for each country in the XML file.

The name of each country is available as an XML attribute. To get this attribute requires three steps. First, use the attributes property of the node to get the IXMLDOMNamedNodeMap object representing all the attributes of the node. Then, use the getNamedItem method of the node map to get the particular attribute we're interested in. This attribute is returned as a node. Finally, use the text property of the node to retrieve the country name:

SetoAttributes 
      = oCountry.Attributes
      Set
      oCountryName = oAttributes.getNamedItem("CountryName") 
      ActiveSheet.Cells(intI, 1).Value = 
      oCountryName.Text

Retrieving the visit information makes use of the fact that the properties and methods of nodes are recursive (representing the nested nature of XML). Given the country node, the code can walk through its children to find the nodes representing the total visits and the most recent visit. The only complication comes about in that the IXMLDOMNodeList object (the collection of nodes) doesn't support retrieving items by name. So the code has to step through all the children and decide what to do with each child based on its nodeName property:

ForEachoChild In
      oChildren
         If
      oChild.nodeName = "TotalVisits" Then
          ActiveSheet.Cells(intI, 2) = 
      oChild.nodeTypedValue
         
      EndIf
         If
      oChild.nodeName = "LatestVisit" Then
          ActiveSheet.Cells(intI, 3) = 
      oChild.nodeTypedValue
         
      EndIf
      Next
      oChild

Once you understand these code snippets, you've grasped all you need to know to parse simple XML documents using the MSXML object model. There is other code in the VBA procedure in Listing One, but it's all Excel code, concerned with clearing the worksheet and generating the chart. The core XML parsing code is really quite simple.

XML in Your Future

If you're a VBA developer, it's very likely that there is XML somewhere in your future. Consider just a few recent moves from Microsoft:

Remember 1995, when just about every new Microsoft application included HTML support in one form or another? Well, the year 2000 is shaping up as the year for pervasive XML support in Microsoft products. While I'm not aware of any XML announcements directly related to VBA, I'd say it's a safe bet that the VBA and Office teams won't be left out of this initiative.

Fortunately, although XML can be complex, you can dodge most of the complexity if you're just using it for data interchange. I hope this article has convinced you that parsing XML is no big deal, and that the tools already exist to integrate this into your VBA applications more easily than many other types of data.

The XML and Excel files referenced in this article are available for download.

Mike Gunderloy (mailto:MikeG1@mcwtech.com) is a Senior Consultant with MCW Technologies, a Microsoft Solution Provider. He's also the author of Visual Basic Developer's Guide to ADO (SYBEX, 1999) and the forthcoming Visual Basic and VBA Developer's Guide to the Windows Installer (SYBEX, 2000).

Begin Listing One - The cmdLoad_Click Sub procedure

PrivateSubcmdLoad_Click()
         
      DimoDoc As
MSXML.DOMDocument
         
      DimfSuccess AsBoolean
         
      DimoRoot As
      MSXML.IXMLDOMNode
         
      DimoCountry As
      MSXML.IXMLDOMNode
         
      DimoAttributes As
      MSXML.IXMLDOMNamedNodeMap
        Dim
      oCountryName AsMSXML.IXMLDOMNode
         
      DimoChildren As
      MSXML.IXMLDOMNodeList
         
      DimoChild As
      MSXML.IXMLDOMNode
         
      DimintI AsInteger
        
         On
      ErrorGoTo
      HandleErr
        
         
      SetoDoc = New
      MSXML.DOMDocument
        
         ' Load the 
      XML from disk, without validating it. Wait 
         ' for the 
      load to finish before proceeding. 
        oDoc.async = False
        oDoc.validateOnParse = False
        fSuccess = oDoc.Load( 
      _
          ActiveWorkbook.Path 
      & "\traffic.xml") 
        
         ' If 
      anything went wrong, quit now. 
         If
      NotfSuccess Then
           GoToExitHere
         
      EndIf
        
         ' Set up a 
      row counter. 
        intI = 5
        
         ' Delete 
      the previous information. 
        ActiveSheet.Cells(4, 
      1).CurrentRegion.ClearContents
        ActiveSheet.Shapes(2).Delete
         ' Create 
      column headers. 
        ActiveSheet.Cells(4, 1) = 
      "Country" 
        ActiveSheet.Cells(4, 2) = 
      "Total Visits" 
        ActiveSheet.Cells(4, 3) = 
      "Latest Visit" 
        
         ' Get the 
      root of the XML tree. 
         
      SetoRoot = oDoc.documentElement
        
         ' Go 
      through all children of the root. 
         
      ForEachoCountry InoRoot.childNodes
           ' Collect the attributes for this country. 
      
           SetoAttributes = oCountry.Attributes
           ' Extract the country name and
          ' place it on the worksheet. 
      
           SetoCountryName = _
            oAttributes.getNamedItem("CountryName") 
      
          ActiveSheet.Cells(intI, 1).Value = 
      oCountryName.Text
           ' Go through all the children of the country node. 
      
           SetoChildren = oCountry.childNodes
           ForEachoChild InoChildren
             ' Get information from each child node to the sheet. 
      
             IfoChild.nodeName = "TotalVisits" Then
              ActiveSheet.Cells(intI, 
      2) = oChild.nodeTypedValue
             EndIf
             IfoChild.nodeName = "LatestVisit" Then
              ActiveSheet.Cells(intI, 
      3) = oChild.nodeTypedValue
             EndIf
           NextoChild
          intI = intI + 
      1
         
      NextoCountry
        
         ' Now for 
      some eye candy; build a chart of the data. 
        Charts.Add
         
      WithActiveChart
          .ChartType = 
      xl3DPieExploded
          .SetSourceData 
      Source:=Sheets("Sheet1"). _
            Range("A5:B" & CStr(intI - 1)), PlotBy :=xlColumns
          .Location 
      Where:=xlLocationAsObject, Name:="Sheet1" 
         
      EndWith
        
        ActiveChart.HasTitle = True
        ActiveChart.ChartTitle.Characters.Text = 
      _
          "Web Site Visits" 
      
        
        ActiveSheet.Shapes(2).Top = 
      0
        ActiveSheet.Shapes(2).Left = 
      200
        
      ExitHere: 
         
      ExitSub
        
      HandleErr: 
        MsgBox "Error " & 
      Err.Number & ": " & Err.Description
         
      ResumeExitHere
         
      Resume
      EndSub

End Listing One

Copyright © 2000 Informant Communications Group. All Rights Reserved. • Site Use Agreement • Send feedback to the Webmaster • Important information about privacy