Greater Office
Unless you've been on a long vacation lately, you've probably run across Extensible Markup Language (XML), which is being widely touted as the next great breakthrough for communication between applications. But you may be a bit fuzzy on how to use it in your own applications.
In brief, XML provides a text format for transmitting structured information. The good news is that you don't have to write your own parser for this format. Starting with Internet Explorer 4.01, Microsoft has shipped their own XML parser, MSXML, as a COM component. In this article, I'll show you how to use the MSXML parser from VBA to quickly and easily extract information from an XML document.
Why would you want to do this? The fact is that XML is one of this year's hot new terms in personal computing, and Microsoft is making a concerted push to incorporate it into all aspects of their product line. Other major software companies - from IBM to Oracle - have also produced XML tools and interfaces. With that much pressure behind XML as a standard, there's little doubt that you'll be faced with an XML data stream, sooner rather than later. Knowing how to read it quickly and easily will let you make the best use of your time in working with this data.
XML is a huge topic, and one which very few people understand in depth. If you want to see the full standard, you need to go the World Wide Web Consortium (W3C) Web site at http://www.w3.org/TR/1998/REC-xml-19980210. A warning, though: The standard is very heavy going if you're not used to interpreting such legalistic documents. You might find the annotated version of the specification at http://xml.com/pub/axml/axmlintro.html more useful, or some of the articles at Microsoft's MSDN XML Developer Center (http://msdn.microsoft.com/xml/default.asp).
Fortunately, if all you want to do is read and parse an XML file from someone else, and you can make the assumption that the document is valid XML, you don't have to know the entire standard. In fact, there are just a few key terms you need to understand to get started in the XML world. You can always pick up more later, but for now, just concentrate on these four ideas:
Of course, I've left a lot out to produce this simple picture of XML. In particular, I'm ignoring the topic of Document Type Declarations (DTDs). A DTD is a section of XML that explains the rules for constructing a particular XML document, and it's important because it allows you to validate that an XML document is actually in the intended format. Since I'm assuming the source document comes from someone who knows how to construct XML, I can ignore this piece of the puzzle - at least for this article.
For this article, I'll be parsing the sample file shown in FIGURE 1, traffic.xml. This file tracks Web site visits to a fictitious Web site by country of origin. If you refer to the figure, you'll be able to identify all the parts of XML that I talked about in the previous section:
<?xml version="1.0"?>
<SiteVisits>
<Country
CountryName="USA">
<TotalVisits>1348</TotalVisits>
<LatestVisit>1/4/2000</LatestVisit>
</Country>
<Country
CountryName="UK">
<TotalVisits>764</TotalVisits>
<LatestVisit>1/4/2000</LatestVisit>
</Country>
<Country
CountryName="Argentina">
<TotalVisits>175</TotalVisits>
<LatestVisit>1/2/2000</LatestVisit>
</Country>
<Country
CountryName="Brazil">
<TotalVisits>182</TotalVisits>
<LatestVisit>1/4/2000</LatestVisit>
</Country>
<Country
CountryName="Canada">
<TotalVisits>688</TotalVisits>
<LatestVisit>1/3/2000</LatestVisit>
</Country>
<Country
CountryName="Denmark">
<TotalVisits>204</TotalVisits>
<LatestVisit>1/1/2000</LatestVisit>
</Country>
<Country
CountryName="Germany">
<TotalVisits>351</TotalVisits>
<LatestVisit>1/4/2000</LatestVisit>
</Country>
<Country CountryName="Hong
Kong">
<TotalVisits>97</TotalVisits>
<LatestVisit>12/30/1999</LatestVisit>
</Country>
<Country
CountryName="Ireland">
<TotalVisits>522</TotalVisits>
<LatestVisit>1/4/2000</LatestVisit>
</Country>
<Country
CountryName="Malaysia">
<TotalVisits>14</TotalVisits>
<LatestVisit>12/31/1999</LatestVisit>
</Country>
<Country
CountryName="Netherlands">
<TotalVisits>542</TotalVisits>
<LatestVisit>1/4/2000</LatestVisit>
</Country>
<Country CountryName="New
Zealand">
<TotalVisits>599</TotalVisits>
<LatestVisit>1/3/2000</LatestVisit>
</Country>
<Country
CountryName="Norway">
<TotalVisits>452</TotalVisits>
<LatestVisit>1/3/2000</LatestVisit>
</Country>
<Country
CountryName="Scotland">
<TotalVisits>538</TotalVisits>
<LatestVisit>1/4/2000</LatestVisit>
</Country>
<Country
CountryName="Sweden">
<TotalVisits>422</TotalVisits>
<LatestVisit>1/2/2000</LatestVisit>
</Country>
<Country
CountryName="Wales">
<TotalVisits>301</TotalVisits>
<LatestVisit>1/1/2000</LatestVisit>
</Country>
<Country
CountryName="Yugoslavia">
<TotalVisits>37</TotalVisits>
<LatestVisit>12/30/1999</LatestVisit>
</Country>
<Country
CountryName="Zambia">
<TotalVisits>42</TotalVisits>
<LatestVisit>1/4/2000</LatestVisit>
</Country>
</SiteVisits>
FIGURE 1: The example XML file, traffic.xml, as ASCII text
You can see that the traffic.xml file consists of markup (the mechanics of XML: angle brackets, element names, and so on) and data (the country names and numbers of visitors). Suppose someone sent you this XML document once a day, and your job was to extract the data and display it in Microsoft Excel? Because the structure is so simple, it's easy to see how you might start breaking it apart with VBA string functions, looking for angle brackets, element names, and data. You could build a special-purpose parser that understood the structure of this particular XML document, and feed the new document into your parser on a daily basis.
But there's a better way. Starting with Internet Explorer 4.01, Microsoft has been shipping a generalized XML parser with every Web browser. Even better, this generalized parser is available via COM. So there's no need to create your own!
The parser in question is called simply the Microsoft XML library, or MSXML. To understand the object model used by MSXML, it's helpful to think of an XML document, not as a nested set of elements, but as a tree. FIGURE 2 shows the traffic.xml document open in Microsoft XML Notepad, which uses the tree metaphor. (Microsoft XML Notepad is in beta as of this writing. You can download it at http://msdn.microsoft.com/xml/notepad/intro.asp.) Rather than showing the TotalVisits and LatestVisit elements as nested within the Country element, they're presented as child nodes on a tree of nodes. This tree metaphor is also implemented by the MSXML parser.
FIGURE 2: Another view of traffic.xml, i.e. as it appears in
Microsoft XML Notepad
The MSXML library provides a wide range of objects that you can use to deal with the complexity of XML. Just as most of that complexity is beyond the scope of this article, most of the objects are unnecessary for the simple job of parsing straightforward XML. In fact, you can do the job with only four objects:
You'll note that the naming of these objects is a bit peculiar. DOM stands for Document Object Model, a general notion that documents of any sort can be assigned to a particular object model. This library provides the particular object model for XML. The "I" stands for interface, as most of these objects are actually COM interfaces.
The table in FIGURE 3 shows some of the properties and methods of these four objects, including all of the ones that I'll be using in this article. For a complete list, refer to the XML Developer's Guide in the Web Services section of the Platform SDK, or just use the object browser built into VBA to explore the objects. In particular, I've omitted the rich set of methods concerned with modifying XML documents, since the example here is only designed to read an existing document.
Object |
Member |
Type |
Description |
XMLDocument |
async |
Property |
Set to True to enable asynchronous loading |
XMLDocument |
attributes |
Property |
List of attributes for the root node of the document |
XMLDocument |
childNodes |
Property |
Returns a NodeList containing all the children of the root node |
XMLDocument |
documentElement |
Property |
Returns the root node itself |
XMLDocument |
getElementsByTag |
Method |
Returns a collection of elements having the specified tag |
XMLDocument |
hasChildNodes |
Property |
True if the root node has children |
XMLDocument |
load |
Method |
Loads an XML document from a disk file or URL |
XMLDocument |
loadXML |
Method |
Loads an XML document specified as a string |
XMLDocument |
url |
Property |
Returns the URL for the most recently loaded document |
XMLDocument |
validateOnParse |
Property |
Set to True to validate the XML as it's loaded |
IXMLDOMNode |
attributes |
Property |
List of attributes for this node |
IXMLDOMNode |
childNodes |
Property |
Returns a NodeList containing all the children of this node |
IXMLDOMNode |
hasChildNodes |
Property |
True if this node has children |
IXMLDOMNode |
nodeName |
Property |
The XML name of the node |
IXMLDOMNode |
nodeTypedValue |
Property |
Formatted value of the node |
IXMLDOMNode |
parentNode |
Property |
Pointer to the parent node of this node |
IXMLDOMNode |
parsed |
Property |
True if this node and its children have been parsed |
IXMLDOMNode |
text |
Property |
Unformatted text of this node |
IXMLDOMNodeList |
item |
Property |
Zero-based index to the members of this collection |
IXMLDOMNodeList |
length |
Property |
Count of members in this collection |
IXMLDOMNamedNodeMap |
getNamedItem |
Method |
Returns the attribute with the specified name (if any) |
IXMLDOMNamedNodeMap |
item |
Property |
Zero-based index to the members of this collection |
IXMLDOMNamedNodeMap |
length |
Property |
Count of members in this collection |
FIGURE 3: Representative methods and properties of selected XML objects
Now that you've seen the basic objects you need to read XML, it's time to see what you can do with them. FIGURE 4 shows a presentation of the data from traffic.xml in an Excel 2000 worksheet. Note that the worksheet really doesn't contain any more information than the original XML file; it's just been transformed into a format that's easier for human beings to grasp.
FIGURE 4: Data from traffic.xml file as an Excel 2000 worksheet.
The code to generate the worksheet and embedded chart is shown in Listing One. It's tied to the Load button on the worksheet. When the user clicks this button, the VBA code loads and parses an XML file, and uses it to generate the worksheet.
The procedure starts, of course, by declaring variables and setting up an error handler. Then it calls the Load method of the XMLDocument object. This sets up the association between the object model and the actual XML. In this particular case, I don't want the code to proceed until the XML is fully loaded, and I don't care about validating the XML (because I trust the XML supplier). So the applicable part of the code looks like this:
oDoc.async = False oDoc.validateOnParse = False fSuccess = oDoc.Load(ActiveWorkbook.Path & "\traffic.xml")Note that in this example, the XML is being loaded from a disk file in the same folder as the worksheet itself. However, the Load method is a good deal more flexible than that; it can also accept a URL. In fact, the odds are good that you'll actually be using that method to retrieve XML from a Web server somewhere that a supplier is maintaining for you. So in a real example, the load line would look something like this:
fSuccess = oDoc.Load( _ "http://TheirServer.com/traffic/traffic.xml")Once the XML document is loaded, the MSXML object model makes it simple to pick it apart into its component pieces. The documentElement property of the XMLDocument object returns an IXMLDOMNode object that represents the root node of the XML tree. This particular code then uses the childNodes property of that root node object with a For Each loop to visit each of the top-level child nodes in turn. If you refer back to the XML in FIGURE 1, you'll see that this amounts to passing through the loop once for each country in the XML file.
The name of each country is available as an XML attribute. To get this attribute requires three steps. First, use the attributes property of the node to get the IXMLDOMNamedNodeMap object representing all the attributes of the node. Then, use the getNamedItem method of the node map to get the particular attribute we're interested in. This attribute is returned as a node. Finally, use the text property of the node to retrieve the country name:
SetoAttributes
= oCountry.Attributes
Set
oCountryName = oAttributes.getNamedItem("CountryName")
ActiveSheet.Cells(intI, 1).Value =
oCountryName.Text
Retrieving the visit information makes use of the fact that the properties and methods of nodes are recursive (representing the nested nature of XML). Given the country node, the code can walk through its children to find the nodes representing the total visits and the most recent visit. The only complication comes about in that the IXMLDOMNodeList object (the collection of nodes) doesn't support retrieving items by name. So the code has to step through all the children and decide what to do with each child based on its nodeName property:
ForEachoChild In
oChildren
If
oChild.nodeName = "TotalVisits" Then
ActiveSheet.Cells(intI, 2) =
oChild.nodeTypedValue
EndIf
If
oChild.nodeName = "LatestVisit" Then
ActiveSheet.Cells(intI, 3) =
oChild.nodeTypedValue
EndIf
Next
oChild
Once you understand these code snippets, you've grasped all you need to know to parse simple XML documents using the MSXML object model. There is other code in the VBA procedure in Listing One, but it's all Excel code, concerned with clearing the worksheet and generating the chart. The core XML parsing code is really quite simple.
If you're a VBA developer, it's very likely that there is XML somewhere in your future. Consider just a few recent moves from Microsoft:
Remember 1995, when just about every new Microsoft application included HTML support in one form or another? Well, the year 2000 is shaping up as the year for pervasive XML support in Microsoft products. While I'm not aware of any XML announcements directly related to VBA, I'd say it's a safe bet that the VBA and Office teams won't be left out of this initiative.
Fortunately, although XML can be complex, you can dodge most of the complexity if you're just using it for data interchange. I hope this article has convinced you that parsing XML is no big deal, and that the tools already exist to integrate this into your VBA applications more easily than many other types of data.
The XML and Excel files referenced in this article are available for download.
Mike Gunderloy (mailto:MikeG1@mcwtech.com) is a Senior Consultant with MCW Technologies, a Microsoft Solution Provider. He's also the author of Visual Basic Developer's Guide to ADO (SYBEX, 1999) and the forthcoming Visual Basic and VBA Developer's Guide to the Windows Installer (SYBEX, 2000).
PrivateSubcmdLoad_Click()
DimoDoc As
MSXML.DOMDocument
DimfSuccess AsBoolean
DimoRoot As
MSXML.IXMLDOMNode
DimoCountry As
MSXML.IXMLDOMNode
DimoAttributes As
MSXML.IXMLDOMNamedNodeMap
Dim
oCountryName AsMSXML.IXMLDOMNode
DimoChildren As
MSXML.IXMLDOMNodeList
DimoChild As
MSXML.IXMLDOMNode
DimintI AsInteger
On
ErrorGoTo
HandleErr
SetoDoc = New
MSXML.DOMDocument
' Load the
XML from disk, without validating it. Wait
' for the
load to finish before proceeding.
oDoc.async = False
oDoc.validateOnParse = False
fSuccess = oDoc.Load(
_
ActiveWorkbook.Path
& "\traffic.xml")
' If
anything went wrong, quit now.
If
NotfSuccess Then
GoToExitHere
EndIf
' Set up a
row counter.
intI = 5
' Delete
the previous information.
ActiveSheet.Cells(4,
1).CurrentRegion.ClearContents
ActiveSheet.Shapes(2).Delete
' Create
column headers.
ActiveSheet.Cells(4, 1) =
"Country"
ActiveSheet.Cells(4, 2) =
"Total Visits"
ActiveSheet.Cells(4, 3) =
"Latest Visit"
' Get the
root of the XML tree.
SetoRoot = oDoc.documentElement
' Go
through all children of the root.
ForEachoCountry InoRoot.childNodes
' Collect the attributes for this country.
SetoAttributes = oCountry.Attributes
' Extract the country name and
' place it on the worksheet.
SetoCountryName = _
oAttributes.getNamedItem("CountryName")
ActiveSheet.Cells(intI, 1).Value =
oCountryName.Text
' Go through all the children of the country node.
SetoChildren = oCountry.childNodes
ForEachoChild InoChildren
' Get information from each child node to the sheet.
IfoChild.nodeName = "TotalVisits" Then
ActiveSheet.Cells(intI,
2) = oChild.nodeTypedValue
EndIf
IfoChild.nodeName = "LatestVisit" Then
ActiveSheet.Cells(intI,
3) = oChild.nodeTypedValue
EndIf
NextoChild
intI = intI +
1
NextoCountry
' Now for
some eye candy; build a chart of the data.
Charts.Add
WithActiveChart
.ChartType =
xl3DPieExploded
.SetSourceData
Source:=Sheets("Sheet1"). _
Range("A5:B" & CStr(intI - 1)), PlotBy :=xlColumns
.Location
Where:=xlLocationAsObject, Name:="Sheet1"
EndWith
ActiveChart.HasTitle = True
ActiveChart.ChartTitle.Characters.Text =
_
"Web Site Visits"
ActiveSheet.Shapes(2).Top =
0
ActiveSheet.Shapes(2).Left =
200
ExitHere:
ExitSub
HandleErr:
MsgBox "Error " &
Err.Number & ": " & Err.Description
ResumeExitHere
Resume
EndSub
Copyright © 2000 Informant Communications Group. All Rights Reserved. • Site Use Agreement • Send feedback to the Webmaster • Important information about privacy