Microsoft XML 2.5 SDK


 

Lesson 6: Authoring XML Schemas

[This is preliminary documentation and subject to change.]

What is an XML Schema?

An XML Schema is an XML-based syntax, or schema, for defining how an XML document is marked up. XML Schema is schema specification recommended by Microsoft and it has many advantages over Document Type Definition (DTD), the initial schema specification for defining an XML model. DTDs have many drawbacks, including the use of non-XML syntax, no support for data-typing, and non-extensibility. For example, DTDs don't allow you to define element content as anything other than another element, or a string. To find out more about DTDs, see the W3C XML Recommendation. XML Schema improves upon DTDs in several ways, including the use of XML syntax, and support for data-typing and namespaces. For example, an XML Schema allows you to specify an element as an integer, a float, a Boolean, a URL, etc.

The XML parser in Internet Explorer 5 can validate an XML document with both a DTD and an XML Schema.

How can I create an XML Schema?

Run the mouse over the following XML document to reveal the schema declarations for each node.

  <class xmlns="x-schema:classSchema.xml">
    <student studentID="13429">
      <name>James Smith</name>
      <GPA>3.8</GPA>
    </student>
  </class>

You'll notice in the above document that the default namespace is "x-schema:classSchema.xml." This tells the parser to validate the entire document against the schema (x-schema) at the following URL ("classSchema.xml").

Below is the entire schema for the above document. The schema begins with the Schema element containing the declaration of the schema namespace and, in this case, the declaration of the datatypes namespace as well. The first, xmlns="urn:schemas-microsoft-com:xml-data", indicates that this XML document is an XML Schema. The second, xmlns:dt="urn:schemas-microsoft-com:datatypes", allows you to type element and attribute content by using the dt prefix on the type attribute within their ElementType and AttributeType declarations.

<Schema xmlns="urn:schemas-microsoft-com:xml-data" 
  xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <AttributeType name='studentID' dt:type='string' required='yes'/>
  <ElementType name='name' content='textOnly'/>
  <ElementType name='GPA' content='textOnly' dt:type='float'/>
  <ElementType name='student' content='mixed'>
    <attribute type='studentID'/>
    <element type='name'/>
    <element type='GPA'/>
  </ElementType>
  <ElementType name='class' content='eltOnly'>
    <element type='student'/>
  </ElementType>  
</Schema>

The declaration elements that you use to define elements and attributes are described as follows:

ElementType: Assigns a type and conditions to an element, and what, if any, child elements it can contain.

AttributeType: Assigns a type and conditions to an attribute.

attribute: Declares that a previously defined attribute type can appear within the scope of the named ElementType element.

element: Declares that a previously defined element type can appear within the scope of the named ElementType element.

The content of the schema begins with the AttributeType and ElementType declarations of the innermost elements.

<AttributeType name='studentID' dt:type='string' required='yes'/>
<ElementType name='name' content='textOnly'>
<ElementType name='GPA' content='textOnly' dt:type='float'/>

The next ElementType declaration is followed by its attribute and child elements. When an element has attributes or child elements they must be included this way in its ElementType declaration. They must also be previously declared in their own ElementType or AttributeType declaration.

<ElementType name='student' content='mixed'>
  <attribute type='studentID'/>
  <element type='name'/>
  <element type='GPA'/>
</ElementType>

This process is continued throughout the rest of the schema, until every element and attribute has been declared.

Unlike DTDs, XML Schemas allow you to have an open content model, allowing you to do such things as type elements and apply default values without necessarily restricting content.

In the following schema, the GPA element is typed and has an attribute with a default value, but no other nodes are declared within the student element.

<Schema xmlns="urn:schemas-microsoft-com:xml-data" 
  xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <AttributeType name="scale" default="4.0"/>
  <ElementType name="GPA" content="textOnly" dt:type="float">
    <attribute type="scale"/>
  </ElementType>
  <AttributeType name="studentID"/>
  <ElementType name="student" content="eltOnly" model="open" order="many">
    <attribute type="studentID"/>
    <element type="GPA"/>
  </ElementType>  
</Schema>

The above schema allows you to validate only the area with which you are concerned. This gives you more control over the level of validation for your document and allows you to use some of the features provided by the schema without having to employ strict validation.