Charles Heinemann
Program Manager, XML
Microsoft Corporation
August 19, 1998
Updated: November 4, 1998
The following article was originally published in the Site Builder Network Magazine "Extreme XML" column (now MSDN Online Voices Extreme XML).
Download the source code for this article (zipped, 1.68K)
The whole thing began when I ran up to the fourth floor to make a quick pizza delivery. Now, the delivery went fine. It wasnt until I got down to the parking lot -- and realized that I hadnt got my parking ticket stamped -- that things went awry.
I got involved in the pizza business about four weeks ago, right about the last time I wrote to you. My uncle Edd recently started an online pizza business, a franchise of the All-American Pizza Co. Well, every month, the All-American Pizza Co. posts a menu update. This update is marked up in XML, so that each franchisee can gain access to the menu in a way he or she sees fit. Along with this monthly update, the mother company also posts a Document Type Definition (DTD), describing the content model for these monthly updates. The point of posting the DTD is so that the application Uncle Edd or his fellow pizza brokers write to display the menu will understand the logical structure of the XML document containing the update. With the DTD, Uncle Edd can both prepare for the data he will receive each month and validate the updates.
There was one small problem, however. Uncle Edd had no idea what to do with the said DTD. He, of course, immediately got me on the horn, asking for my assistance.
Over the phone, I briefly explained that XML can be both well formed and valid. To be well formed, the XML need simply adhere to the syntax rules as laid out in the XML specification. To be valid, however, the XML document must adhere to the logical structure described in the DTD. After a short pause, which I was sure signaled an intense lack of understanding, I told uncle Edd that Id be over in a minute to explain the DTD.
The DTD itself, a file called pizzas.dtd, is pretty simple:
<!ELEMENT pizzas (pizza)*> <!ELEMENT pizza (name, toppings, description, price)> <!ELEMENT name (#PCDATA)> <!ELEMENT toppings (topping)+> <!ELEMENT topping (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT price (#PCDATA)>
It basically describes an XML document such as the following (the example being a subset of the actual menu):
<pizzas> <pizza> <name>The Nebraskan</name> <toppings> <topping>corn nibblets</topping> <topping>mozzarella cheese</topping> <topping>tomato sauce</topping> </toppings> <description> With every corn-laden slice, the memories of those wild Omaha nights become more and more vivid.</description> <price>7.99</price> </pizza> </pizzas>
<!ELEMENT pizzas (pizza)*> describes a single "pizzas" element that contains zero or more "pizza" elements. <!ELEMENT pizza (name, toppings, description, price)> describes a single "pizza" element that contains exactly one "name", "toppings", "description", and "price" element. <!ELEMENT toppings (topping)+> describes a single "toppings" element that contains one or more "topping" elements. And <!ELEMENT name (#PCDATA)> describes a single "name" element that contains only text.
To validate the menu against this DTD, all you need to do is place the following DOCTYPE declaration at the top of your XML document:
<!DOCTYPE pizzas SYSTEM "pizzas.dtd">
Now, when the parser loads the XML, it will check the validity of each node within the XML document and fail to load the document if it is invalid.
Uncle Edd, having taken all this in, turned to me and inquired, "What if I want to add my own pizzas to the list?"
"What exactly do you have in mind?"
"I dunno. What if I got a whole list of pizzas -- monthly specials, lets say -- and I want Junior, here (pointing at my cousin) to add them to the XML file they give me each month? Whatll I do with this DTD thing then?"
"As long as the new entries conform to the DTD, it wont matter," I assured him.
"Well, how do I know if they will or wont?"
"Ill show you."
And with that, I wrote a little helper function that would validate Juniors work
function validateJunior(){ xmlid.async=false; xmlid.load("specialtyPizzas.xml"); var pizzaList = xmlid.documentElement.childNodes; for (var i=0;i<pizzaList.length;i++){ xmlString = "<!DOCTYPE pizza SYSTEM 'pizza.dtd'>" + pizzaList.item(i).xml; pizzaIsland.loadXML(xmlString); if (pizzaIsland.parseError.reason != "") xmlid.documentElement.removeChild(pizzaList.item(i)); } return xmlid; }
Adding, as well, the following data islands to the page:
<XML ID="xmlid"></XML> <XML ID="pizzaIsland"></XML>
The above code takes the following XML authored by Junior. (Notice his carelessness in not supplying a price for the "Texan".)
<pizzas> <pizza> <name>The Washingtonian</name> <toppings> <topping>apple slices</topping> <topping>salmon</topping> <topping>mozzarella cheese</topping> <topping>tomato sauce</topping> </toppings> <description>Who says you can't mix seafood with America's number one fruit pie filling?</description> <price>7.99</price> </pizza> <pizza> <name>The Texan</name> <toppings> <topping>barbeque brisket</topping> <topping>dill pickles</topping> <topping>onions</topping> <topping>mozzarella cheese</topping> <topping>tomato sauce</topping> </toppings> <description>Put the lone in lone star state! Ask for extra onions</description> </pizza> <pizza> <name>The Mississippian</name> <toppings> <topping>fried catfish</topping> <topping>greens</topping> <topping>mozzarella cheese</toppings> <topping>tomato sauce</topping> </toppings> <description>This Southern treat will have you dreaming of pine trees, Faulkner, and floating casinos.</description> <price>7.99</price> </pizza> </pizzas>
The XML iterates through the "pizza" elements and validates each pizza element against the following DTD:
<!ELEMENT pizza (name, toppings, description, price)> <!ELEMENT name (#PCDATA)> <!ELEMENT toppings (topping)+> <!ELEMENT topping (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT price (#PCDATA)>
The parser validates on load according to the DTD pointed to within the DOCTYPE declaration. This means I can get the parser to validate that particular pizza element. How? By creating an XML string that contains a DOCTYPE declaration pointing at "pizza.xml" and the XML for a single "pizza" element, and loading that string through the loadXML method.
If a "pizza" element is not valid (the second in the above case), that node is removed from the tree. Consequently, regardless of Juniors carelessness, the function always returns a valid XML document for the application to process and display.
"The darn things Junior proof!" yelled Uncle Edd.
"Thats the idea."
I sat there for a moment while Uncle Edd looked back over the DTD and my code. He still looked a little puzzled. After a couple of minutes he spoke again, expressing concern over Juniors ability to learn a new syntax. "It was tough enough," he claimed, "teaching him the XML thing."
This gave me a perfect opportunity to tell him about the new XML Schema support in Internet Explorer 5. The XML Schema syntax is a subset of the XML Data Submission to the World Wide Web Consortium (W3C) and the XML Schema functionality reflects theDCD Submission to the World Wide Web Consortium. XML Schemas are a way to describe XML documents using an XML-based syntax, so Junior would be free from having to learn another syntax and could simply validate his documents against another XML document, a schema. These schemas, I continued, also allow more precise description, because they will incorporate data typing and inheritance.
To give you an idea of what these Schema looked like, I drew one up for Uncle Edd that he could use to validate his specialty pizzas data against:
<Schema xmlns="urn:schemas-microsoft-com:xml-data"> <ElementType name="name" content="textOnly"/> <ElementType name="topping" content="textOnly"/> <ElementType name="toppings" content="eltOnly" model="closed"> <element type="topping" maxOccurs="*"/> </ElementType> <ElementType name="description" content="textOnly"/> <ElementType name="price" content="textOnly"/> <ElementType name="pizza" content="eltOnly" model="closed"> <element type="name"/> <element type="toppings"/> <element type="description"/> <element type="price"/> </ElementType> <ElementType name="pizzas" content="eltOnly" model="closed"> <element type="pizza" maxOccurs="*"/> </ElementType> </Schema>
"And I can access them using the XML object model," announced Uncle Edd.
"Sure can."
The problem now solved, Uncle Edd asked me if I could do one more thing for him. "Of course," I said, always willing to answer one more question about XML and related technologies.
"Great," said Uncle Edd, and with that he went into the next room. A second or two later, he came back with an armload of pizzas and list of addresses.
Charles Heinemann is a program manager for Microsoft's Weblications team. Coming from Texas, he knows how to think big.
Dear Charles:
An example in your Happy Days are Here Again... column on XML issued this statement: Set CUSTLIST = Server.CreateObject("Microsoft.XMLDOM"). There is no such object on my machine (Windows 95 / Internet Explorer 4.0 / Personal Web Server 4.0). Where do I get it? I tried searching Microsoft, but to no avail.
Mark
Charles answers:
Microsoft.XMLDOM is the progID for the parser shipped with the Internet Explorer 5 Developer Preview Release. The progID for the parser shipped with Internet Explorer 4.0 is "MSXML". The particular sample app Mark mentions works only in Internet Explorer 5.