This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.


MIND

Beyond the Browser
beyond@microsoft.com         Download the code (3KB)
Ken Spencer

An XML Parser in Visual Basic
X
ML seems to be one of the hottest things going these days. People are touting XML as this super technology that is going to solve everyone's problems. Hmmm—I think I've heard that one before. What is the difference between the hype and the reality of XML?
      Back in the July 1998 issue of MIND, Joe Graf covered XML in the article, "Data Over the Web? XML Marks the Spot." Joe explained how to use XML with HTML, COM, C++, and JScript®. This month, I am going to revisit XML from a different perspective.
      My sample application this month was created using Visual Basic® and SQL Server. In a Web application that uses this architecture, the XML would never be sent to the client for parsing, unless the user wanted to pull down the XML and let Microsoft® Internet Explorer 5.0 parse it as a data island or straight XML. It's really interesting to see what Internet Explorer 5.0 does with XML all by itself, but that's the stuff for a future column.

XML Overview

      First, let's take a look at what XML is. Many people think that XML is a file type or programming language. XML is really a specification for serving, transporting, and saving data. Here's the W3C definition:

The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.
This is from the Extensible Markup Language 1.0 recommendation dated February 10, 1998. You can find this document at http://www.w3.org/TR/1998/REC-xml-19980210.
      The specification is very particular about the structure of XML data, unlike the much more forgiving HTML. The first thing you need in an XML file or data stream is the XML declaration.

 <?xml version="1.0"?> 
      The declaration lets any application that uses this file or data know that it is XML-specific and conforms to XML version 1.0. This paves the way for future applications to use data created for different XML versions. This tag is required for most commercial parsers to use an XML stream or file.
      XML uses tags to define elements in a document.

 <?xml version="1.0"?>
 <Books>
     <Book>
         <Title_ID>BU1032</Title_ID>
         <Title>The Busy Executive's Database Guide</Title>
 <Notes>An overview of available database systems with emphasis on common
 business applications. Illustrated.</Notes>
     </Book>
     <Book>
         <Title_ID>BU1111</Title_ID>
 <Title>Cooking with Computers: 
	Surreptitious Balance Sheets</Title>
 <Notes>Helpful hints on how to use 
	your electronic resources to 
	the best advantage.</Notes>
 </Book>
     </Books>
It's easy to see how XML is structured. Each tag (for instance, <Books>) delimits an entry in the data. The <Books> tag defines the entire stream in this case. Each <Book> tag defines the data for one book. Inside each <Book> tag, you find the data for that book such as Title_ID, Title, and Notes.
      XML is pretty simple, right? So far, so good. XML does get a bit more complicated. Notice the tags in the previous data—where did they come from? Well, XML uses what's called a Document Type Definition (DTD) to define the tags in a document. If you do not use a DTD, XML parsers generally do not complain and go merrily about, using the data in the stream. If you have a DTD, then the tags must match the DTD entries.
      How do you create a DTD for your data? A simple and practical way is to base the DTD on your data model. The DTD for the Books XML stream is loosely based upon the Pubs sample database that comes with SQL Server. The <Title>, <Title_ID>, and <Notes> tags are all columns in the Titles table. Using the data model as the basis for the DTD makes sense. It's easy to create the DTD based upon the data model. It is also easier to maintain the DTD because it is linked directly to the data model and not a separate specification. If users know about the names in the data model, it's easy for them to understand the XML DTD. You'll see why this is important later on when I look at what Internet Explorer 5.0 does with an XML document.
      The structure of your data is important. XML has certain rules that apply to any XML parser. First, the data must contain the declaration I showed earlier. Second, the data must be well-formed. Well-formed data contains all the appropriate XML tags. Figure 1 shows what happens when you try to use a malformed XML document with Internet Explorer 5.0.
Figure 1: XML Error Message in Internet Explorer 5.0
      Figure 1: XML Error Message in Internet Explorer 5.0

       Figure 1 shows a common XML pitfall. The XML data the browser is reading contains several tags, but they are all at the same level—there is no top-level parent tag. This triggers the error. Adding a surrounding parent title (such as Book) will fix the problem.
      XML is also case-sensitive. That's right: Title, TITLE, and title are all considered different tag names.
      One last point on the XML overview. XML is useful for a number of things and not useful for others. Some are proposing that XML be used directly in the browser. This requires client script and a parser to work with the XML on the client. The client script may be in the HTML document or it may be in an Extensible Stylesheet Language (XSL) file specified as a tag in the XML.
      I personally think that XML in the browser works OK, but that it's not necessarily the best way to use XML. XML works great when you are using data that has a hierarchical format because it nests the tags in a data stream. The Books example actually demonstrates a simple hierarchical view. ADO recordsets are another great way to work with data. The Recordset object allows you to work with connected or disconnected records and pass them around to different applications. You can even save a recordset to a file and reuse it later.
      So, where do you use what? If you are using relational data, use a relational engine like ADO. If the data is hierarchical, use XML. You can mix and match ADO and XML; just be aware of the limitations of the technology or specification that you are using.

XML and Applications

      Let's dive into some XML application code. To demonstrate how to use XML with components and Web applications, I created a simple application in Visual Basic. The main interface is shown in Figure 2.

Figure 2: The Sample App Interface
      Figure 2: The Sample App Interface

      The application uses two COM components. The Publication.Title class (see Figure 3) interacts with the Pubs database. Its methods are listed in Figure 4. The second component is XML.xmlparser (see Figure 5), which does all of the XML work. The xmlparser class has three methods, which are described in Figure 6.
      Now, let's make this application do some useful work. Both the Visual Basic form and the Title class use the XML parser to do the XML work. But why use your own parser instead of the one built into Internet Explorer? That's simple: a custom parser does anything you want it to. You can also make it easy to use. Using my sample XML parser, you don't have a huge number of methods or properties to use, no object hierarchy to walk through, and just two methods. This makes using XML quite simple. Another advantage is portability. You can't be sure that the Microsoft parser will be everywhere you need it. Besides, the Microsoft parser and others try to do everything with XML, and sometimes that's just too much overhead.
      The first step in generating an XML stream is to create the XML declaration. This is trivial, but I created a method to do this so I would not have to type it over and over again. This will also make it easy to update the parser to a new version of XML at some point. The declaration is placed in a string using this syntax:

 vOutPut = oXML.XMLDeclaration()
      This method is called from the cmdStream_Click event code in Form1. This results in vOutPut containing the following text:

 <?xml version="1.0"?>
If you are using XML data between objects or between an object and ASP, you do not need the declaration. But it's a good idea to include it so the XML stream can be used in any application.
      Once you have created the declaration, you can create the rest of the XML stream. The Title class does this in the RetrieveTitle method by calling the Format method. The Format method generates the XML tags and packages the text in them:

 sTitle = oXML.Format("Title", rsTitle("title"))
 sID = oXML.Format("Title_ID", rsTitle("title_id"))
 sNotes = oXML.Format("Notes", rsTitle("notes"))
 sReturn = sID & sTitle & sNotes
 RetrieveTitle = sReturn
This code generates the entire XML stream that is returned from the RetrieveTitle method. This is not a well-formed stream, and it will not work in Internet Explorer 5.0 because it doesn't have parent tags around the detail tags. This code results in the following return data for RetrieveTitle:

 <Title_ID>BU1111</Title_ID><Title>Cooking with Computers: 
	Surreptitious Balance Sheets</Title>
	<Notes>Helpful hints on how to use 
	your electronic resources to the best advantage.</Notes>
      This data represents one record from the Titles table. Returning the data from the RetrieveTitle method in an XML stream lets your application do anything it needs to with the data. The GetTitle method in Form1 executes the RetrieveTitle method to fetch the title the user selects from the list.

 Set objPub = CreateObject("Publication.Title")
 sReturn = objPub.RetrieveTitle(cboTitles.Text)
 Set oXML = New xmlparser
 sTitle = oXML.Parse(sReturn, "Title")
 sNotes = oXML.Parse(sReturn, "Notes")
 txtTitle = sTitle
 txtNotes = sNotes
The Parse method of the xmlparser class extracts the Title and the Notes data from the data returned from RetrieveTitle. Then the parsed data is placed in the textboxes txtTitle and txtNotes.
      So far you've seen how to package and use a simple XML stream. What happens when you need to create a well-formed XML stream that can be used by Internet Explorer or another XML parser? The cmdcreateXML_Click event code in Form1 uses the same objects to retrieve each title from the Titles table and output all the titles in an XML stream. The first line of code generates the declaration; the second line adds the <Books> parent tag to the stream. Then the for…next loop iterates through each title in cboTitles and extracts data for that title from the database. The data is returned from RetrieveTitle in XML format, so the procedure simply places a parent <Book> tag around it. Finally, the code adds the closing </Books> tag and outputs the stream to the c:\temp\Titles.xml file.

 vOutPut = oXML.XMLDeclaration()
 vOutPut = vOutPut & "<Books>"
 For i = 0 To cboTitles.ListCount - 1
 sReturn = objPub.RetrieveTitle(cboTitles.List(i))
 vOutPut = vOutPut & "<Book>"
 vOutPut = vOutPut & sReturn
 vOutPut = vOutPut & "</Book>"
 Next
 vOutPut = vOutPut & "</Books>"
 Open "c:\temp\Titles.xml" For Output As #1
    Print #1, vOutPut
 Close #1
      Now Internet Explorer 5.0 can display this data using its XML parser (see Figure 7). It displays the XML data in a nice hierarchical view that allows the user to hide/display various levels of detail. Internet Explorer 5.0 can do a lot more than that with XML, but I will leave that for another time.
Figure 7: The Internet Explorer 5.0 XML Parser in Action
      Figure 7: The Internet Explorer 5.0 XML Parser in Action

      This brings up an interesting question. Now that you can generate an XML stream with more than one record in it, how can you pull the data from within that stream? Use the Parse method. The cmdStream_Click event code in Form1 uses the same technique I just covered to build an XML stream of all the titles in the listbox. Instead of saving it to a file, it extracts the titles from the first five entries and displays them in Form2.

 Load Form2
 sNewStuff = oXML.Parse(vOutPut, "Title", 1) & vbCrLf
 sNewStuff = sNewStuff & oXML.Parse(vOutPut, "Title", 2) & vbCrLf
 sNewStuff = sNewStuff & oXML.Parse(vOutPut, "Title", 3) & vbCrLf
 sNewStuff = sNewStuff & oXML.Parse(vOutPut, "Title", 4) & vbCrLf
 sNewStuff = sNewStuff & oXML.Parse(vOutPut, "Title", 5) & vbCrLf
 Form2.txtTitles.Text = sNewStuff
 Form2.Show
      The Parse method takes two optional parameters after the tag name. The first of these is the instance of the tag in the XML stream that should be parsed. You can see above that you can simply increment the instance number to extract all tags. In this example, the code pulls the first five titles and stores them in the sNewStuff string. The last two lines load the txtTitles textbox on Form2, then display the form.
      The last parameter of the Parse method is the default value for the tag. This parameter is not used here.
      

Conclusion

      You may be wondering why I used a Visual Basic-based GUI application for this column, not an ASP application. There were two reasons. First, it's much simpler to test COM components in Visual Basic than it is in Visual InterDev, so the testing went very quickly. Second, the code for the ASP application is the same as that used in the Visual Basic form with a few minor tweaks. I routinely create code in Visual Basic to test applications, and then paste the code into an ASP file in Visual InterDev. I simply remove the type declarations, then change the properties for the various controls and view the page in the browser.
      XML is a hot technology, but it is not a panacea. XML should be used with discretion. It's quite handy to allow users to output an XML version of their data or to send the data in XML format to systems that cannot use ADO or other Microsoft features. You can also save ADO recordsets in XML format using the Save method, further extending the reach of both ADO and XML. One final note: to run the application don't forget to change the getDSN function in the Title class to point to your SQL Server.

From the September 1999 issue of Microsoft Internet Developer.