Frequently Asked Questions About Extensible Markup Language (XML)

Microsoft Corporation

December 1997

Contents

XML Language
Standards
XML Vocabularies and Data Formats
Competition
Tools Support

XML Language

What is XML?

Extensible Markup Language (XML) is the universal language for data on the Web. It gives developers the power to deliver structured data from a wide variety of applications to the desktop for local computation and presentation. XML allows the creation of unique data formats for specific applications. It is likewise an ideal format for the server-to-server transfer of structured data.

Does XML replace HTML?

No. XML complements HTML by describing data. HTML, along with Cascading Style Sheets (CSS), will continue to be used to describe and present the physical rendition of pages. Microsoft Corporation expects many authors and developers to use XML and HTML in tandem.

What are the benefits of adding XML to HTML?

There are many benefits of using XML on the Web:

Where will XML be used on the Web?

Since XML describes data in a consistent, self-describing, open format, XML could potentially be used anywhere there is a need for data interchange and delivery. We expect that initially XML will be used to describe information about HTML pages, such as is the case today with channel definition format (CDF) for building Active Channel™ content, as well as future applications such as searching, distributed printing, and so on.

More important, since XML can describe data itself, it will be useful for delivering any kind of data such as financial transactions, news updates, weather information, patient records, or legal libraries to the desktop. Once on the desktop, applications can perform computations on the data and dynamically present the data.

Does Microsoft Internet Explorer 4.0 support XML?

Yes, Microsoft Internet Explorer version 4.0 supports XML. It supports the following features:

A generalized XML parser that reads XML files and hands them off for processing to applications, such as viewers. Microsoft has two parsers, the MSXML Parser, a high-performance, nonvalidating parser written in C++ that ships with Internet Explorer 4.0, and the MSXML Parser in Java, available for download from the Extensible Markup Language (XML) section of the Microsoft Site Builder Network Web site (http://www.microsoft.com/xml/).

The XML Object Model (XML OM) uses the W3C standard Document Object Model (DOM) to allow programmatic access to the structured data, through the XML parsers, giving developers the power to interact and compute on the data. For more information on the DOM, see the W3C Web site (http://www.w3.org/DOM/).

The XML Data Source Object (XML DSO) allows developers to connect to XML data and supply it to the HTML page using the Dynamic HTML data-binding facility.

What is the difference between SGML and XML?

Standard Generalized Markup Language (SGML) is the international standard (ISO 8879) for defining descriptions of structure and content in electronic documents. XML is a simplified version of SGML; XML was designed to maintain the most useful parts of SGML. Whereas SGML requires that structured documents reference a Document Type Definition (DTD) to be "valid," XML allows for "well-formed" data and can be delivered without a DTD. XML was designed so that SGML can be delivered, as XML, over the Web.

What is the relationship between HTML, Dynamic HTML, and XML?

HTML (and CSS today) is used in conjunction to format and present linked pages. Dynamic HTML, through the Document Object Model, makes all elements in HTML accessible through language-independent scripting and other programming languages, thus dramatically increasing client-side interactivity without additional requests to the server. The page's object model allows any aspect of its content (including additions, deletions, and movement) to be changed dynamically. By adding XML for structured data, developers have the technologies needed to build the next generation of rich Web applications. With it, they can deliver structured data to the desktop, perform computations on the data via the object model, apply formatting rules to the data with Extensible Style Language (XSL), and present the data with HTML.

Will it be necessary to compress XML for transmission over the Web?

In general, the need to compress XML data will be application-dependent and largely a function of the amount of data being moved between the server and the client. XML compresses extremely well due to the repeated nature of the tags used to describe the structure of the data. Benchmarks will be provided in the future to assist with determining whether compression is necessary. It is worth noting that compression is standard to HTTP 1.1 servers and clients, and XML will automatically benefit from this.

How secure is XML as a data format? Are there plans to add security to XML?

XML is as secure as HTML. Just as HTTPS can be used to add encryption to HTTP, protecting HTML, it can be used to protect XML. XML is text-based in format to represent structured data. This maximizes simplicity and interoperability with the data. A number of steps can be taken to add security and authentication to the XML format. First, XML can be encrypted on the server before transmission to the client, then decrypted on the client. In addition, XML can be authenticated by digital signatures applied to the data itself.

How will XML be generated from existing databases?

In general, this will be handled using a three-tier architecture. Agents will be built to run on the middle tier to access multiple existing Database Management Systems (DBMS) and output XML. These agents will also support the ability to generate XML updategrams bidirectionally, to inform the client of changes made to the data on the middle tier or database server, and vice versa. Consequently, the agents will be able to receive updategrams from the client and send updates to the DBMS.

What is a DTD? What is it used for?

The Document Type Definition (DTD) is a reference file that formally describes a particular class of XML document. The DTD describes the semantics of a specific "type" of XML document—the unique elements (or namespace) and the relationship between the elements—and typically would reside on a Web server. The syntax of the DTD today uses a special grammar, derived from the SGML specification. The MSXML Java parser references the DTD to validate the XML data.

Do Web developers have to include a DTD when they use XML to describe data?

No. XML can be used to describe data with or without a DTD. The term "valid XML" refers to XML data that references a DTD, while "well-formed" XML refers to XML that does not use a DTD. The addition of well-formed XML is one of the fundamental differences between XML and SGML. Clearly, in both cases, the XML itself must conform to the standards for the language (so, for example, all tags must be closed and tags may not overlap).

What are XML schemata? How are they different from DTDs?

Schemata combine concepts from DTDs, relational databases, and object-oriented design. Schemata can describe the structure of XML documents, databases, directed-labeled-graphs, and other similar organizations of data. Schemata supply additional semantic information to documents, and contain new facilities such as data types, inheritance, and extensibility that are not available in DTDs. Schemata use the same syntax as XML documents. Schema components are reusable through the facility of "XML namespaces."

What are namespaces? Why are they important?

Namespaces are another advanced feature of XML, outlined in a W3C note as part of the XML 1.0 specification. They allow developers to qualify uniquely the element names and relationships and make these names recognizable, to avoid name collisions on elements that have the same name but are defined in different vocabularies. They allow tags from multiple namespaces to be mixed, which is essential if data is coming from multiple sources.

For example, a bookstore may define the <TITLE> tag to mean the title of a book, contained only within the <BOOK> element. A directory of people, however, might define <TITLE> to indicate a person's position, for instance: <TITLE>President</TITLE>. Namespaces help define this distinction clearly.

What is XSL? What does it let Web developers do that they can't do today?

Extensible Style Language (XSL) is a style-sheet language that defines the rules for mapping structured XML data to, for example, HTML for presentation. A group of these rules defines a style sheet. With XSL, developers can generate a presentation structure that may be quite different from the original data structure. XSL allows an element to be formatted and displayed in multiple places on a page, rearranged or removed from display. For example, an <ITEM> element described in an XML-based purchase order could be presented in HTML in a list <UL> or in a table <TD>. Many style sheets can exist for one set of data, describing various delivery platforms or output devices.

How is XSL different from Cascading Style Sheets? Why is a new style-sheet language needed?

XSL is compatible with Cascading Style Sheets (CSS) and is designed to handle the new capabilities of XML that CSS can't handle. XSL is derived from Document Style Semantics and Specification Language (DSSSL), a complex style-sheet language with roots in the SGML community. The syntax of XSL is quite different from CSS, which could be used to display simple XML data but isn't general enough to handle all the possibilities generated by XML. XSL adds the capability to handle these possibilities. For instance, CSS cannot add new items or generated text (for instance, to assign a purchase order number) or add a footer (such as an order confirmation). XSL allows for these capabilities. (For more information about XSL, see the W3C Web site [http://www.w3.org/].)

Standards

What is the relationship between XML and the W3C?

The W3C has an active XML Working Group. Microsoft was one of the co-founders of this group in June 1996, and since then numerous industry players have joined, including Netscape Communications Corporation. For more information on the XML standards process, see the W3C Web site (http://www.w3.org/).

What is the status of XML with the W3C?

XML version 1.0 has just moved from the working draft phase to the proposed recommendation phase, the last step in the approval process before becoming a W3C recommendation. For more information on the current XML specification, and on the submission and review process within the W3C, please refer to their Web site (http://www.w3.org/).

What is the status of CDF in the W3C?

Channel Definition Format (CDF), an application based on XML, was recently resubmitted to the W3C. This resubmission of CDF takes advantage of some of the recent advances in the XML world at the W3C, namely XML/RDF (Resource Description Format). For example, it includes an RDF diagram of the CDF vocabulary showing the relationships between various elements within CDF. For more information, see the "Resource Description Framework (RDF)" document available at the W3C Web site (http://www.w3.org/metadata/rdf/overview.html).

What is the status of XSL in the W3C?

The specification was jointly submitted to the W3C by Microsoft, ArborText Inc., and Inso Corporation in September 1997 and is now under review. For more information, see "A Proposal for Extensible Style Language (XSL)," available on the Specs & Standards page of the Microsoft Site Builder Network Web site (http://www.microsoft.com/standards/xsl/).

XML Vocabularies and Data Formats

What are XML vocabularies?

XML vocabularies are the elements used in particular applications or data formats—the definitions of the meanings of those formats. For example, in CDF, element names such as <schedule>, <channel>, and <item> make up the vocabulary for describing collections of pages, when these pages should be downloaded, and so on. Vocabularies, along with the structural relationships between the elements, are defined in XML DTDs or XML schemata.

What is CDF?

Channel Definition Format (CDF) is an XML-based data format used in the Microsoft Internet Explorer version 4.0 browser, for describing Active Channel content and the Desktop components. It is used by content developers and end users to describe collections of pages and data about pages, such as channel bar display, download behavior, Web page usage, and page-hit logging. For more information on CDF, see "Channel Definition Format (CDF)," available on the Specs & Standards page of the Microsoft Site Builder Network Web site (http://www.microsoft.com/standards/cdf-f.htm).

What is OSD?

Open Software Description (OSD) is an XML-based data format fully supported in Microsoft Internet Explorer version 4.01, for advertising and installing software components over the Internet. When new versions of software become available, OSD provides a mechanism to notify the user (referred to as publishing). In addition, OSD provides the functionality to describe in great detail how to install ActiveX® controls and Java packages and class files, adding functionality to the use of .inf files for setup. Microsoft and Marimba Inc. submitted this specification to the W3C in August 1997. For more information, see "Open Software Description (OSD)," available on the Specs & Standards page of the Microsoft Site builder Network Web site (http://www.microsoft.com/standards/osd/).

What is OFX?

Open Financial Exchange (OFX) is a data format that Microsoft Money and Intuit Quicken personal finance applications use to communicate with financial institutions over the Web. Although it is currently described using SGML, OFX will soon be based on XML.

What is RDF?

Resource Description Format (RDF) is a future XML-based application being developed in the W3C. It brings together ideas from Meta Content Format (MCF), atechnology acquired by Netscape from Apple Computer Inc., and XML-Data (defined in a position paper written by Microsoft, Inso, Arbortext and other experts).

RDF enables generalized searching of information without application-specific rules, such as those defined in DTDs. RDF allows a complementary view of data through graphs and nodes, rather than through a structured tree, which the current XML technology enables. RDF, together with XML schemata, will provide a standard way for developers to write these relationships down for broad classes of XML elements.

The crucial technologies that will deliver value this year and next year are XML for structured data, XML namespaces to make names unique and recognizable, and new XML tags that add meaning to data so smarter search engines can perform better searches.

Competition

Does Netscape Navigator support XML?

No. Netscape has recently talked about support for XML, and the company recently joined the XML Working Group in the W3C, but it has referred to XML as a "futures" technology, for release in 1998. Microsoft supports XML today in Internet Explorer 4.0.

Tools Support

What tools support XML today?

Many of the top SGML vendors have made generalized XML versions of their products available, such as ArborText Adept7 (http://www.arbortext.com/), Inso Dynabase (http://www.inso.com/), Chrystal Software Astoria (http://www.chrystal.com/), and POET Object Server (http://www.poet.com/) for authoring, editing, and database publishing. Other vendors, such as DataChannel Inc. (http://www.datachannel.com/) have products based on XML for data management.

What companies have promised XML support in their products in the near future?

Allaire Corp., ExperTelligence Inc., InterMax Solutions Inc., Pictorius Inc., Powersoft, and SoftQuad Inc. recently committed to providing XML support in their products by March 1998.

Where will the tools come from in the future?

Microsoft expects a wide variety of applications to be developed in the coming months that convert information currently stored in documents and databases into XML for delivery to the desktop. In addition, Microsoft expects XML-centric databases, rich authoring and application developer tools, as well as data format-specific tools such as wizards to be developed as new vocabularies are defined.