Handling Documents and Irregular Data

[This is preliminary documentation and subject to change.]

One of the benefits of using XML is the ability to model irregular data hierarchies, including data with these characteristics:

Collections of heterogeneous elements
Structures with many optional elements
Structures where the order is important
Recursive structures
Structures with complex containment requirements

This all sounds complex, but most of these conditions are present in XML that represents documents. The following Pole example exhibits many of these characteristics.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="pole.xsl"?>
<document>
  <title>To the Pole and Back</title>
  <section>
    <title>The First Day</title>
    <p>It was the <emph>best</emph> of days, it was the
      <emph>worst</emph> of days.</p>
    <list>
      <item><emph>best</emph> in that the sun was out.</item>
      <item><emph>worst</emph> in that it was 39 degrees below zero.</item>
    </list>
    <section>
      <title>Lunch Menu</title>
      <list>
        <item>ice cream</item>
        <item>popsicles</item>
      </list>
    </section>
  </section>
  <section>
    <title>The Second Day</title>
    <p>Ditto the first day.</p>
  </section>
</document>

Comparing this sample with the characteristics of irregular data, you can see that it contains heterogeneous collections of elements—a section can contain an arbitrary collection of "title" elements, "p" elements, "list" elements, and so on. Many elements are indeed optional—a section need not contain "p" or "list" elements, or other "section" elements. The order of most elements is important to preserve in the output—the first section comes before the second section. The structure is recursive since a "section" element can contain another "section". The "emph" element is probably allowed anywhere—indicating a complex set of containment requirements. Who would have guessed such a simple document represents such complex data?

XSL's ability to handle such irregular and recursive data makes it useful for transforming documents into a display language such as HTML—hence the name and origins of the Extensible Stylesheet Language.

The mechanism for handling data-driven transformations is similar to subroutines in programming languages. Template fragments (subroutines) can be defined and called. Instead of calling the templates by name, however, XSL chooses the most appropriate fragment based on the type of element the template is designed for.

To manage this data, start by writing an output template for the HTML "wrapper," which inserts the document title into the output in two places, and then asks XSL to find the appropriate template (call the appropriate subroutine) for "section" elements. For example:

<HTML>
  <HEAD>
    <TITLE><xsl:value-of select="document/title"/></TITLE>
  </HEAD>
  <BODY>
    <H1><xsl:value-of select="document/title"/></H1>
    <xsl:apply-templates select="document/section"/>
  </BODY>
</HTML>

The <xsl:apply-templates> element selects the "section" children of the document (not all of them, just the top level) and asks XSL to find and apply an appropriate template. Now it is necessary to write a template that is appropriate for "section" elements.

<xsl:template match="section">
  <DIV>
    <xsl:apply-templates />
  </DIV>
</xsl:template>

The XSL processor will output this template fragment for each of the "section" elements selected by the <xsl:apply-templates> element. The value of the match attribute indicates the kinds of nodes for which this template is appropriate. In this case it indicates that this template is appropriate for "section" elements. XSL takes the nodes selected by <xsl:apply-templates> and matches them up with the correct template.

Note that the template for sections itself contains an <xsl:apply-templates> element. Without a "select" attribute all the children will be selected, and the XSL processor will take each one in order (title, p, list, section) and look for an appropriate template. There already is a section template—this one—and the XSL processor will recursively apply it, resulting in a nested structure of DIV elements that mirrors the nested structure of "section" elements in the source document.

Now define some more templates to handle other element types.

<xsl:template match="title">
  <H2><xsl:apply-templates /></H2>
</xsl:template>
<xsl:template match="p">
  <P><xsl:apply-templates /></P>
</xsl:template>
<xsl:template match="list">
  <UL>
    <xsl:for-each select="item">
      <LI><xsl:apply-templates /></LI>
    </xsl:for-each>
  </UL>
</xsl:template>
<xsl:template match="emph">
  <I><xsl:apply-templates /></I>
</xsl:template>

In each case you can include <xsl:apply-templates> to continue selecting the children (whatever they may be) and finding the appropriate template.

<xsl:apply-templates> element is not limited to selecting element children, but can select other child nodes as well, including text. You can add a template to copy text children to the output.

<xsl:template match="text()"><xsl:value-of /></xsl:template>

This template for handling the text nodes is used quite often when processing XSL documents. In fact, the XSL Working Draft provides for this template to be among the templates built into the XSL language. While these built-in templates are not yet implemented in Microsoft® Internet Explorer 5, they can be added easily to style sheets that need them, as shown above. More on the built-in templates can be found at Simulating Built-in Templates.

When run against the Pole sample document, the templates above produce the following output:

<HTML>
  <HEAD>
    <TITLE>To the Pole and Back</TITLE>
  </HEAD>
  <BODY>
    <H1>To the Pole and Back</H1>
    <DIV>
      <H2>The First Day</H2>
      <P>It was the <I>best</I> of days, it was the
        <I>worst</I> of days.</P>
      <UL>
        <LI><I>best</I> in that the sun was out.</LI>
        <LI><I>worst</I> in that it was 39 degrees below zero.</LI>
      </UL>
      <DIV>
        <H2>Lunch Menu</H2>
        <UL>
          <LI>ice cream</LI>
          <LI>popsicles</LI>
        </list>
      </DIV>
    </DIV>
  </BODY>
</HTML>

By recursively processing the source document with <xsl:apply-templates>, this style sheet essentially converts the element types in the source XML to HTML element types. Even though this example is fairly trivial as far as documents go, you can already see some additional structural modifications occurring, notably the creation of the HEAD element and the duplication of the document title in both the H1 element and the TITLE element.

This collection of templates can be packaged into a style sheet file by placing them within an <xsl:stylesheet> element. The XSL namespace must be declared here.

The top-level template (or root template) needs to be marked as such by placing it within a template and giving it the special pattern / to indicate that this is the template for the document root. Here is the final complete style sheet.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
  <xsl:template match="/">
    <HTML>
      <HEAD>
        <TITLE><xsl:value-of select="document/title"/></TITLE>
      </HEAD>
      <BODY>
        <H1><xsl:value-of select="document/title"/></H1>
        <xsl:apply-templates select="document/section"/>
      </BODY>
    </HTML>
  </xsl:template>
  <xsl:template match="section">
    <DIV>
      <H2><xsl:value-of select="title"/></H2>
      <xsl:apply-templates />
    </DIV>
  </xsl:template>
  <xsl:template match="p">
    <P><xsl:apply-templates /></P>
  </xsl:template>
  <xsl:template match="list">
    <UL>
      <xsl:for-each select="item">
        <LI><xsl:apply-templates /></LI>
      </xsl:for-each>
    </UL>
  </xsl:template>
  <xsl:template match="emph">
    <I><xsl:apply-templates /></I>
  </xsl:template>  
  <xsl:template match="text()"><xsl:value-of /></xsl:template>
  
</xsl:stylesheet>

This example illustrates the data-driven model of XSL processing. For most of the structure of a document, you won't know what could be coming next. Instead you can create isolated templates for the types of nodes you expect to see in the output without too much consideration of their structure. In places where the structure is locally known, you can use <xsl:for-each> and <xsl:value-of> to populate the template. For instance, "list" and "item" elements appear in a regular and predictable structure. The ability to switch smoothly between data-driven and template-driven transformation is an important feature of XSL.