Writing Well-Formed XML Documents

[This is preliminary documentation and subject to change.]

An XML document is well-formed if it follows some basic rules. The XML format has a simpler set of parsing rules than HTML, allowing an XML parser to read and expose XML data without an external description or special knowledge about the meaning of the XML data.

Start tags and end tags must match

XML elements can contain text and other elements, with the exact rules for a specific document type given in its schema. However, elements must be strictly nested: Each start tag must have a corresponding end tag.

Elements cannot overlap

The following example does not adhere to proper XML syntax.

<title>Evolution of Culture <sub>in Animals
      </title> by John T. Bonner</sub>

To following syntax corrects the overlap problem:

<title>Evolution of Culture
  <sub>in Animals</sub>
    <author>by John T. Bonner</author>
</title>

XML tags are case-sensitive

The following are each a different element.

<City> <CITY> <city>

Denote empty elements

XML has a shorthand for an empty element: A sole tag ending with a /> signals that the element has no contents. For example, the following two lines are equivalent:

<title/>
<title></title>

Reserved characters

Several characters are part of the syntactic structure of XML and will not be interpreted as themselves if simply placed within an XML data source. You need to substitute a special character sequence (called an "entity" by XML). Note that case matters.

<   &lt;
&   &amp;
>   &gt;
"   &quot;
'   &apos;

For example, "Melons cost < $1 at the A&P" would be encoded as "Melons cost < $1 at the A&P."

Each XML document must have a unique root element

For instance, in the weather report example from Introduction to XML, the element <weather-report> denotes the unique root element of the XML document.