Scott Stanfield
Vertigo Software, Inc.
September 1998
Summary: Discusses the decision to use XML to encode FMCorp expense report data. (8 printed pages) Covers:
Introduction
XML Overview
Original Architecture
Creating the XML Document
XML Parsing Technology
Tools
For More Information
The World Wide Web Consortium (W3C) recently defined a new standard for data interchange called the Extensible Markup Language (XML). FMCorp uses XML to encode the expense report data for transmission over the Internet. This document describes why the decision to use XML was made, how the .xml files are created, and how they are parsed on the server.
Hypertext Markup Langauge (HTML) is a great way to display data, but it's pretty lousy when it comes to describing data. The W3C recognized the need for a new language to address this problem. The solution, XML, provides a standard for defining your own markup tags and data structure so that data can easily be exchanged online.
XML tags can be used directly within an HTML page to make the page "smarter." In fact, most of the information on the Internet now describes how XML will revolutionize Web pages. Understanding the design goals will make it easier to see why it was adapted for our project.
One of these goals is to define an exact grammar for XML to make it easy to parse and understand. The rules for start/end tags, name/value pairs, and layout look similar to HTML. But the rules only determine if the document is well formed, not if it makes sense. The semantics of any XML data stream are dependent on the implementation.
What does an XML document look like? The .xml file that describes a single-entry expense report is shown here:
<?XML VERSION="1.0"?>
<EXPENSEREQUEST>
<VERSION>Expense Report 1.0</VERSION>
<USEREMAIL>alice@fmcorp.com</USEREMAIL>
<DESCRIPTION>Enter Description here</DESCRIPTION>
<ITEMS>
<ITEM>
<TYPE>Travel</TYPE>
<DESCRIPTION>rental car</DESCRIPTION>
<COST>44</COST>
<DATE>6/1/98</DATE>
</ITEM>
</ITEMS>
</ExpenseRequest>
In order to understand how XML is used in FMCorp, it's important to understand the site's original architecture flaws.
FMCorp's original architecture didn't use XML. A user's expense report was transferred to the Web server as a native Microsoft® Excel document (an *.xls file). When the Web server received the .xls file, a Microsoft Visual Basic® Component Object Model (COM) component used Excel's Automation interface to parse the expense report data. The data was then entered into the database and the file was discarded.
This technique should be raising some eyebrows. It put an incredible burden on the Web server. Not only did it require an installation of Excel on the Web server, but it also took approximately 20 seconds to complete the entire process. Most of this time was spent starting the Excel engine.
No Webmaster would ever allow this extra processing burden on the server. Instead, we rethought the whole problem and reasoned that the processing should be done on the client. After all, the employee already had the data in a parsable format: Excel.
Because Excel supported macros, we discovered that we could add a macro (see the discussion to follow) that was executed when the file was saved. This macro could use the Excel object model to walk through the data and create an ASCII file representing the contents on the local hard drive. Later, the contents of the file could be uploaded via the Web site.
The question remained as to what format the data should take as it is transferred across to the server. We considered creating our own proprietary ASCII format, but that would require two pieces of code: one to build the file, the other to parse the file.
Because Microsoft Internet Explorer 4.0 comes with an XML Parser object, we reasoned that we could use it to decompose the .xml file created by the Excel macro. Because our Web site required Microsoft Internet Information Server (IIS) 4.0, which installs Internet Explorer 4.0, we knew the parser would be available.
XML Parser support is provided by a COM object whose implementation lives in a file called msxml.dll. The ProgID for this component is "MSXML."
Although you can use the XML Parser to programmatically create an XML document, we couldn't ensure the user had Internet Explorer 4.0 on the machine that was running Excel. We opted to create the file "by hand," using a long macro in Excel written in Visual Basic for Applications.
The Excel macro is embedded in the downloadable template. The code looks like this:
Private Sub Workbook_BeforeSave(ByVal SaveAsUI As Boolean, Cancel As Boolean)
createXMLfile (Left(ActiveWorkbook.FullName, _
Len(ActiveWorkbook.FullName) - 4) & ".xml")
End Sub
Sub createXMLfile(filepath As String)
Dim vbNewLine As String
vbNewLine = Chr$(13) + Chr$(10)
Dim XMLFileText As String 'working XML conversion of this speadsheet
Dim ERsheet As Worksheet
Set ERsheet = ActiveWorkbook.Worksheets("Expense Report")
Const tagItemDescription = "ItemDescription"
Const tagItemDate = "ItemDate"
'list of Expense types
Dim eTypes(6) As String
eTypes(1) = "ItemRoom"
eTypes(2) = "ItemTransport"
eTypes(3) = "ItemFuel"
eTypes(4) = "ItemPhone"
eTypes(5) = "ItemEntertain"
eTypes(6) = "ItemOther"
eTypes(0) = "ItemMeals"
XMLFileText = ""
XMLFileText = XMLFileText & "<?XML VERSION=" & Chr(34) & "1.0" & Chr(34) & "?>" & vbNewLine
XMLFileText = XMLFileText & tabs(0) & UCase("<ExpenseRequest>") & vbNewLine
XMLFileText = XMLFileText & tabs(1) & "<VERSION>" & ERsheet.Range("ExpenseVersion") & "</VERSION>" & vbNewLine
XMLFileText = XMLFileText & tabs(1) & "<USEREMAIL>" & ERsheet.Range("Email") & "</USEREMAIL>" & vbNewLine
XMLFileText = XMLFileText & tabs(1) & "<DESCRIPTION>" & ERsheet.Range("description") & "</DESCRIPTION>" & vbNewLine
XMLFileText = XMLFileText & tabs(1) & "<ITEMS>" & vbNewLine
Dim nMaxItems As Integer
nMaxItems = ERsheet.Range("ItemDescription").Count
With ERsheet
' note we start with an index of 2 instead of 1
' the excel column ranges first index is the title
' of the column, the actual items start at index 2
For col = 0 To UBound(eTypes)
'for each column of data
'set the column we are working on
sColumn = eTypes(col)
For nIndex = 2 To nMaxItems
If .Range(sColumn)(nIndex) <> Empty And .Range(tagItemDescription)(nIndex) <> Empty Then
' get the item information
XMLFileText = XMLFileText & tabs(2) & "<ITEM>" & vbNewLine
XMLFileText = XMLFileText & tabs(3) & "<TYPE>" & .Range(sColumn)(1) & "</TYPE>" & vbNewLine
XMLFileText = XMLFileText & tabs(3) & UCase("<Description>") & .Range(tagItemDescription)(nIndex) & UCase("</Description>") & vbNewLine
XMLFileText = XMLFileText & tabs(3) & UCase("<Cost>") & CCur(.Range(sColumn)(nIndex)) & UCase("</Cost>") & vbNewLine
XMLFileText = XMLFileText & tabs(3) & UCase("<Date>") & .Range(tagItemDate)(nIndex) & UCase("</Date>") & vbNewLine
XMLFileText = XMLFileText & tabs(2) & "</ITEM>" & vbNewLine
End If
Next nIndex
Next col
End With
XMLFileText = XMLFileText & tabs(1) & "</ITEMS>" & vbNewLine
XMLFileText = XMLFileText & tabs(0) & "</ExpenseRequest>" & vbNewLine
'MsgBox XMLFileText
Dim FSO As Object
Dim newFile As Object
Set FSO = CreateObject("Scripting.FileSystemObject")
Set newFile = FSO.CreateTextFile(filepath)
newFile.Write (XMLFileText)
newFile.Close
Set newFile = Nothing
Set FSO = Nothing
End Sub
Function tabs(n As Integer) As String
tabs = ""
For x = 0 To n - 1
tabs = tabs & vbTab
Next
End Function
Workbook_BeforeSave is called right before the spreadsheet is saved. Because it is called before the employee has a chance to name the file, the default file name will be ExpenseReportTemplate.xml. Subsequent saves will pick up the saved file name, like Jack1.xml.
Once the .xml file has been created on the client, it's the user's responsibility to upload the file to the server. The file upload technique is discussed in "The Fitch & Mather Corporation Web Site."
Once the file has been received on the server, Active Server Pages (ASP) code uses the FMCorp ExpenseReports component SubmitXMLFile method to take apart the file. This method uses the IXMLDocument interface that is exposed by the XML Parser. The full details of the interfaces and methods exposed by MSXML can be found in the links referenced at the end of this document.
Let's examine the InitialParse method, because it shows the minimal amount of code necessary to take apart an .xml file:
Private Function InitialParse(ByVal XMLFilePath As String, ByRef xml As IXMLDocument, _
ByRef Email As String, ByRef Description As String) As Boolean
On Error GoTo InitialParse_Error
Dim xd As New XMLDocument
Dim root As IXMLElement
InitialParse = False ' assume the worst
' Check version
xd.URL = XMLFilePath
Set root = xd.root
If root.children.item("VERSION").Text <> "Expense Report 1.0" Then
Set xd = Nothing
Exit Function
End If
Email = root.children.item("USEREMAIL").Text
Description = root.children.item("DESCRIPTION").Text
Dim item As Object
For Each item In root.children.item("ITEMS").children
Debug.Print item.children.item("TYPE").Text
Debug.Print item.children.item("DESCRIPTION").Text
Debug.Print item.children.item("DATE").Text
Debug.Print item.children.item("COST").Text
Next
InitialParse = True
InitialParse_Exit:
If Not xd Is Nothing Then
Set xml = xd ' possible memory leak here--call set xd = nothing next
End If
Exit Function
InitialParse_Error:
InitialParse = False
Resume InitialParse_Exit
End Function
An instance of the IXMLDocument interface is instantiated as the variable "xd." Its URL property points to the full path representing the .xml file on the server. The next block of code checks the version string. If this attribute isn't found, the code returns.
Next, the USEREMAIL tag and DESCRIPTION tag are extracted into two output parameters. The For Each syntax provides a convenient way to enumerate each item in the ITEMS collection. Each item should contain the TYPE, DESCRIPTION, DATE, and COST attributes. For debugging purposes, these items are sent to the Debug window.
The return value for the method is set to true and the XML Parser interface is returned. The code that uses InitialParse later enumerates the items in the collection and writes them to the database.
For general editing of any XML-based data, you can try Microsoft's XML Notepad tool, which is available free of charge at http://msdn.microsoft.com/xml/notepad/intro.asp. XML Notepad offers a simple user interface that shows the tree structure of XML data. As of this writing, the tool is in beta, but it was stable enough for our purposes.
Figure 1 shows an expense report .xml file that was generated by the Excel template.
Figure 1. XML Notepad showing a simple expense report