XML Glossary

[This is preliminary documentation and subject to change.]

attribute

XML structural construct. A name-value pair, separated by an equals sign, included inside a tagged element that modifies certain features of the element. All attribute values, including things like size and width, are in fact text strings and not numbers. For XML, all values must be enclosed in quotation marks.

You can declare attributes for an XML element type using an attribute list declaration.

Cascading Style Sheets (CSS)

Formatting descriptions that provide augmented control over presentation and layout of HTML and XML elements. CSS can be used for describing the formatting behavior of simply structured XML documents, but does not provide a display structure that deviates from the structure of the source data. See also Extensible Stylesheet Language.

CDF

See Channel Definition Format.

Channel Definition Format (CDF)

An XML-based data format used in Microsoft® Internet Explorer 4.0 and later to describe Active Channel™ content and desktop components.

CDF permits a Web publisher to offer frequently updated collections of information, or channels, enabling automatic delivery to compatible Web clients. The user only needs to choose the channel once, and scheduled deliveries of the channel information will be delivered to the client without further intervention.

character data

All the text content of an element or attribute that is not markup. XML differentiates this plain text from binary data. In the XML OM, character data is stored in text nodes, which are implemented as DOMText objects.

CIP

See Commerce Interchange Pipeline.

Commerce Interchange Pipeline (CIP)

An infrastructure used by Microsoft® Site Server Commerce Edition to exchange data between applications in XML over HTTP. It is an underlying mechanism used for tying Web payment technologies to applications.

CSS

See Cascading Style Sheets.

data island

An XML document (<XML> or <SCRIPT language="XML">). that exists within an HTML page. It allows you to script against the XML document without having to load it through script or through the <OBJECT> tag. Almost anything that can be in a well-formed XML document can be inside a data island.

HTML is used as the primary document or display format, and XML is used to embed data within the document.

Data Source Object

Provides a way to bind HTML controls directly to an XML data island. It assists developers in connecting to structured XML data and supplying it to an HTML page by using the data-binding facility of dynamic HTML.

XML Data Source Object allows you to work with data one node at a time, but you can also work with multiple nodes at a time, without having to walk the document tree. It binds the data to specific controls on the page and the controls are automatically populated with data from the Data Source Object.

document element

The element in an XML document that contains all other elements. It is the top-level element of an XML document and must be the first element in the document. There is exactly one document element, no part of which appears in the content of any other element. The document element represents the document as a whole; every other element represents a component of the document.

The terms root element and document element are interchangeable.

document entity

The starting-point for an XML parser. Unlike other entities, the document entity has no name and cannot be referenced. It is the entity in which the XML declaration and document type declaration can occur.

Document Object Model (DOM)

A platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model provides a standard set of objects for representing HTML and XML documents, a standard model of how these objects can be combined, and a standard interface for accessing and manipulating them. Vendors can support the DOM as an interface to their proprietary data structures and APIs, and content authors can write to the standard DOM interfaces rather than product-specific APIs, thus increasing interoperability on the Web.

document type declaration

XML structural construct. Consists of markup code that indicates the grammar rules, or Document Type Definition (DTD), for the particular class of document. The document type declaration can also point to an external file that contains all or part of the DTD. It must appear following the XML declaration and preceding the document element. The syntax of the document type declaration is <!DOCTYPE content>.

Document Type Definition (DTD)

Can accompany a document, essentially defining the rules of the document, such as which elements are present and the structural relationship between the elements. It defines what tags can go in your document, what tags can contain other tags, the number and sequence of the tags, the attributes your tags can have, and optionally, the values those attributes can have.

DTDs help to validate the data when the receiving application does not have a built-in description of the incoming data. The DTD is declared within the document type declaration production of the XML file. With XML, however, DTDs are optional.

See also schema.

DOM

See Document Object Model.

DTD

See Document Type Definition.

EDI

See Electronic Data Interchange.

Electronic Data Interchange (EDI)

An existing format used to exchange data and support transactions. EDI transactions can be conducted only between sites that have been specifically set up with compatible systems. Proprietary EDI formats are more difficult to write than XML, and cannot be transmitted over HTTP like XML can.

element

XML structural construct. An XML element consists of a start tag, an end tag, and the information between the tags, which is often referred to as the contents. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and a value. An instance of an element is declared using <element> tags.

Elements used in an XML file are described by a DTD or schema, either of which can provide a description of the structure of the data.

entity

XML structural construct. A file, database record, or another item that contains data. The primary purpose of an entity is to hold content—not structure, rules, or grammar. Each entity is identified by a unique name and contains its own content, from a single character inside the document to a large file that exists outside the document. The function of an XML entity is similar to that of a macro definition.

The entity can be referred to by an entity reference to insert the entity's contents into the tree at that point. Entity declarations occur in the DTD.

entity reference

XML structural construct. Acts as a placeholder for the content author, and the XML parser places the actual content at each reference site. To include an entity reference, you first insert an ampersand (&) and then enter the entity name followed by a semicolon (;), as follows: &YourEntityName;. Then, when the line is processed, the entity will be replaced with the entity's content.

It is used in much the same way as a macro.

Extensible Linking Language (XLL)

An XML vocabulary that provides links in XML similar to those in HTML but with more functionality. In addition to offering URL-based hyperlinks and anchors, XLL also supports linking to an arbitrary position in a document and multidirectional links. These features make XLL suitable for many new uses, as well as many traditional uses that are problematic in pure HTML, such as cross-references, footnotes, endnotes, and interlinked data. Designed for use in XML documents, links can exist at the object level rather than just at a page level.

Extensible Markup Language (XML)

A subset of SGML that is optimized for delivery over the Web, XML provides a uniform method for describing and exchanging structured data that is independent of applications or vendors.

The key is that with XML, the information is in the document, while the rendering instructions are elsewhere. In other words, content and presentation are separate. XML is the Web's language for data interchange and HTML is the Web's language for rendering.

At the time of this writing, XML 1.0 is a Worldwide Web Consortium Recommendation, which means that it is in the final stage of the approval process.

Extensible Stylesheet Language (XSL)

A language used to transform XML-based data into HTML or other presentation formats, for display in a Web browser. The transformation of XML into formats, such as HTML, is done in a declarative way, making it often easier and more accessible than through scripting. In addition, XSL uses XML as its syntax, freeing XML authors from having to learn another markup language.

In contrast to CSS, which "decorates" the XML tree with formatting properties, XSL transforms the XML tree into a new tree (the HTML), allowing extensive reordering, generated text, and calculations—all without modification to the XML source. The source can be maintained from the perspective of "pure content" and can simultaneously be delivered to different channels or target audiences by just switching style sheets.

XSL consists of two parts, a vocabulary for transformation and the XSL Formatting Objects.

invalid document

Documents that don't follow the XML tag rules. If a document has a DTD or schema, and it doesn't follow the rules defined in its DTD or schema, that document is invalid as well.

mixed content

Element types with mixed content are allowed to hold either character data alone or character data interspersed with child elements. In this case, the types of the child elements can be constrained, but not their order or their number of occurrences.

namespace

A mechanism that allows developers to uniquely qualify the element names and relationships and to make these names recognizable. By doing so, they can avoid name collisions on elements that have the same name but are defined in different vocabularies. They allow tags from multiple namespaces to be mixed, which is essential if data is coming from multiple sources. Namespaces ensure that element names do not conflict, and clarify who defined which term.

A namespace identifies an XML vocabulary defined within a URN. An attribute on an element, attribute, or entity reference associates a short name with the URN that defines the namespace; that short name is then used as a prefix to the element, attribute, or entity reference name to uniquely identify the namespace. Namespace references have scope. All child nodes beneath the node that specifies the namespace inherit that namespace. This allows nonqualified names to use the default namespace. See also RDF namespace.

notation

Tells the parser what type of object is being referenced. Usually refers to a data format of non-XML data, such as BMP. A notation identifies by name the format of unparsed entities, the format of elements that bear a notation attribute, or the application to which a processing instruction is addressed.

notation declaration

Tells the parser how to deal with a specific binary file type, as well as provides a name and an external identifier for a notation.

The notation declaration gives an internal name to an existing notation so that it can be referred to in attribute list declarations, unparsed entity declarations, and processing instructions.

The external identifier is used for the notation, which can allow an XML parser or its client application to locate a helper application capable of processing data in the given notation.

parsed entity

An entity that has content that is parsed and replaced with actual literal values. The result is called the replacement text. Parsed entities can only contain character data or XML markup.

processing instruction

XML structural construct. A mechanism for embedding information in a file intended for proprietary applications rather than the XML parser or browser. The XML parser passes the instructions to the application.

A processing instruction is a string of text included almost anywhere in an XML document's character data between <? and ?> marks. It begins with the name of the application for which the PI is intended, followed by the data for the instruction.

An example is the XMLdeclaration that begins every valid XML file:

<?xml version="1.0" standalone="yes" ?>

reference node

The reference node for a search context is the node that is the immediate parent of all nodes in the search context. Every search context has an associated reference node.

replacement text

The content of parsed entities, after replacement of character references and parameter-entity references.

SAX

See Simple API for XML.

schema

A formal specification of element names that indicates which elements are allowed in an XML document, and in what combinations. It also defines the structure of the document: which elements are child elements of others, the sequence in which the child elements can appear, and the number of child elements. It defines whether an element is empty or can include text. The schema can also define default values for attributes.

A schema is functionally equivalent to a DTD, but is written in XML. A schema also provides for extended functionality such as data typing, inheritance, and presentation rules. Consequently, the new schema languages are far more powerful than DTDs.

SGML

See Standard Generalized Markup Language.

Simple API for XML (SAX)

An XML API that allows developers to take advantage of event-driven XML parsing. Unlike the DOM specification, SAX doesn't require the entire XML file to be loaded into memory. SAX notifies you when certain events happen as it parses your document. When you respond to an event, any data you don't specifically store is discarded. If your document is very large, using SAX will save significant amounts of memory when compared to using DOM. This is especially true if you only need a few elements in a large document.

Simple Object Access Protocol (SOAP)

Provides an open, extensible way for applications to communicate using XML-based messages over the Web, regardless of what operating system, object model, or language they use. SOAP provides a way to use the existing Internet infrastructure to enable applications to communicate directly with each other without being unintentionally blocked by firewalls.

SOAP

See Simple Object Access Protocol.

Standard Generalized Markup Language (SGML)

The international standard for defining descriptions of structure and content of electronic documents. Despite its name, SGML is not a language in itself, but a way of defining languages that are developed along its general principles. SGML defines the way that a markup language is built by specifying the syntax and definitions for the elements and attributes that compose it.

XML is a subset of SGML designed to deliver SGML-type information over the Web, while HTML is an application of SGML.

template

The basis of the XML style sheet is the template rule, which makes a template that allows a user agent to construct a styled Result node from a Source node. The template has two parts:

The matching part identifies the source (XML) node to which the processing action is to be applied. The matching information is contained in the match attribute.

The processing part defines how the children are to be processed and what styling is to be applied to them. The processing information is contained in the template's child elements.

tokenized attribute type

In a tokenized type, the parser will normalize all whitespace to a single space character and will eliminate leading and trailing whitespace altogether. It will also validate the contents based on the declared type.

Seven attribute types are characterized as tokenized types because each value represents either a single token (ID, IDREF, ENTITY, NMTOKEN) or a list of tokens (IDREFS, ENTITIES, and NMTOKENS).

Uniform Resource Identifier (URI)

A superclass that includes both URNs and URLs. Presently, URI means URL in nearly all cases when discussing XML, although it is expected that URNs will become more numerous in the future. The URI supplies a universally unique number or name that can identify an element or attribute in a universally unique way.

URIs are a slightly more general scheme for locating resources on the Internet that focuses a more on the resource and less on the location. In theory, a URI could find the closest copy of a mirrored document or locate a document moved from one site to another.

Uniform Resource Locator (URL)

The set of URI schemes that have explicit instructions on how to access the resource on the Internet.

URLs are uniform in that they have the same basic syntax no matter what specific type of resource (Web page, newsgroup) is being addressed or what mechanism is described to fetch it.

Uniform Resource Name (URN)

Identifies a persistent Internet resource. A URN can provide a mechanism for locating and retrieving a schema file that defines a particular namespace. While an ordinary URL could provide similar functionality, a URN is more robust and easier to manage for this purpose because a URN can refer to more than one URL.

URNs are not location-dependent as URLs are.

unparsed entity

Any block of non-XML data, sometimes referred to as a binary entity because its content is often a binary file (such as an image) that is not directly interpreted by the XML parser. An unparsed entity could contain plain text, so the term binary is a bit misleading.

Unlike a parsed entity, an unparsed entity requires a notation, which identifies the format or type of resource to which the entity is declared. Beyond a requirement that an XML parser make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities.

URI

See Uniform Resource Identifier.

URL

See Uniform Resource Locater.

URN

See Uniform Resource Name.

valid XML

XML that conforms to the rules defined in the XML specification, as well as the rules defined in the DTD or schema.

The parser must understand the validity constraints of the XML specification and check the document for possible violations. If the parser finds any errors, it must report them to the XML application. The parser must also read the DTD, validate the document against it, and again report any violations to the XML application.

Because all of this parsing and checking can take time and because validation might not always be necessary, XML supports the notion of the well-formed document.

vocabulary

See XML vocabulary.

W3C

See Worldwide Web Consortium.

well-formed XML

XML that follows the XML tag rules listed in the W3C Recommendation for XML 1.0, but doesn't have a DTD or schema. A well-formed XML document contains one or more elements; it has a single document element, with any other elements properly nested under it; and each of the parsed entities referenced directly or indirectly within the document is well formed.

Well-formed XML documents are easy to create because they don't require the additional work of creating a DTD. Well-formed XML can save download time because the client does not need to download the DTD, and it can save processing time because the XML parser doesn't need to process the DTD.

Worldwide Web Consortium (W3C)

A standards body physically located at MIT and virtually at that sets standards for XML, HTML, XSL, and many other Web technologies.

XLL

See Extensible Linking Language.

XML

See Extensible Markup Language.

XML-Data

A language used to create a schema, which identifies the structure and constraints of a particular XML document. XML-Data carries out the same basic tasks as DTD, but with more power and flexibility. Unlike DTD, which requires its own language and syntax, XML-Data uses XML syntax for its language.

XML declaration

The first line of an XML file can optionally contain the "xml" processing instruction, which is known as the XML declaration. The XML declaration can contain pseudo-attributes to indicate the XML language version, the character set, and whether the document can be used as a standalone entity.

An example is the XMLdeclaration that begins every valid XML file:

<?xml version="1.0" standalone="yes" ?>

XML document

A document object that is well formed, according to the XML recommendation, and that might (or might not) be valid. The XML document has a logical structure (composed of declarations, elements, comments, character references, and processing instructions) and a physical structure (composed of entities, starting with the root, or document entity).

XML engine

Software that supports XML functionality on the client; Internet Explorer 4.0, and Internet Explorer 5 include XML engines. Its components include the XML parser, the XSL processor, and schema support.

XML Object Model

An API that defines a standard way in which developers can interact with the elements of the XML structured tree. The XML object model exposes properties, methods, and the actual content (data) contained in an object. It controls how users communicate with trees, and exposes all tree elements as objects, which can be accessed without any return trips to the server. The XML OM uses the W3C standard Document Object Model.

XML parser

A software module used to read XML documents and provide access to their content and structure. The XML parser generates a hierarchically structured tree, then hands off data to viewers and other applications for processing, and finally returns the results to the browser. A validating XML parser also checks the XML syntax and reports errors.

XPath

The result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations (XSLT) and XPointer. The primary purpose of XPath is to address parts of an XML document. It also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath gets its name from its use of a path notation as used in URLs for navigating through the hierarchical structure of an XML document.

XML Pointer Language (XPointer)

A W3C initiative that specifies constructs for addressing the internal structures of XML documents. In particular, it provides for specific reference to elements, character strings, and other parts of XML documents, whether or not they bear an explicit ID attribute.

An XPointer consists of a series of location terms, each of which specifies a location, usually relative to the location specified by the prior location term. Each location term has a keyword (such as id, child, ancestor, and so on) and can have arguments, such as an instance number, element type, or attribute. For example, the XPointer:

child(2,precocious)

refers to the second child element whose type is precocious.

XML Query Language (XQL)

A set of extensions to XSL Patterns proposed to the W3C.

XQL is an extension to the capabilities of XSL that will provide for searching into, and data retrieval from, XML documents. It provides ways to manipulate XML in order to create new documents, to control the content of existing documents, and to manage the ordering and presentation of these documents along with XSL.

XML Schema

See schema.

XML vocabulary

A set of actual elements and the structure for a specific document type used in particular data formats. Vocabularies, along with the structural relationships between the elements, are defined in a DTD that serves as the rulebook for that vocabulary.

One of the first and probably most well-know vocabularies is the Channel Definition Format used to define Web pages that are designed to be sent automatically, or "pushed" to client users.

XPointer

See XML Pointer Language.

XQL

See XML Query Language.

XSL

See Extensible Stylesheet Language.

XSL Formatting Objects

A set of formatting semantics expressed as an XML vocabulary.

Conceptually, these objects form a tree. The formatting objects denote typographic elements such as page, paragraph, rule, and so forth. Finer control over the presentation of these elements is provided by a set of formatting properties, such as indents; word- and letter-spacing; and widow, orphan, and hyphenation control. The formatting objects and formatting properties provide the vocabulary for expressing presentation intent.

XSL Patterns

Provide a simple query language for identifying nodes in an XML document, based on their type, name, and values, as well as the relationship of the node to other nodes in the document.

Just like XSL, XSL Patterns is a declarative, not procedural, language. That is, its queries specify what should be found in an XML document, not how to find it. This provides the application with much more flexibility to determine the most efficient method to use to find a piece of data.

Internet Explorer 5 supports XSL Patterns with some of the extensions described in XML Query Language.

XSL Transformations (XSLT)

Makes use of the expression language defined by XPath for selecting elements for conditional processing and for generating text.

XSLT provides two "hooks" for extending the language, one hook for extending the set of instruction elements used in templates and one hook for extending the set of functions used in XPath expressions. These hooks are both based on XML namespaces.