Robert Carter
Writer
Microsoft Corporation
August 14, 1997
Updated: September 11, 1997
The following article was originally published in the MSDN Online Magazine.
If you're designing data-hungry sites, especially for intranets, you should be getting excited about XML (Extensible Markup Language).
Why? Two reasons:
Do you want smarter pages that allow users to reconfigure the data on their screen without having to re-access the server? Grander schemes envision using common XML elements to allow intelligent agents to search the Web automatically on behalf of the user, offload processing from the server to the client, or mediate between and extract information from different databases. While you may now be able to implement many of these features using a combination of HTML and some kind of custom programming (via Java or Visual Basic®, for instance), XML can handle them all -- by itself -- saving you and your site's viewers lots of time and trouble.
HTML, as you should know, is short for Hypertext Markup Language. The "Markup" refers to the set of commands that describe to the browser how to lay out data on a page (how the browser actually interprets the commands is a different question). Put some italics here, start a new table row, or jump to another section of the document.
One thing HTML doesn't do, though, is give any information about what the data means, called metadata. Without metadata, search engines and other data-filtering techniques have to rely on brute-force methods of selection, such as keyword or even content searches, to isolate information. Even then, they can miss the boat entirely (returning information about potato chips instead of computer chips).
XML is all about metadata and the idea that certain groups of people have similar needs for describing and organizing the data they use. Like HTML, XML is a set of tags and declarations -- but rather than being concerned with formatting information on a page, XML focuses on providing information about the data itself and how it relates to other data.
Some data types are pretty much universal (<First Name>, <Address>, <City>, and so forth). Others are industry- or even company-specific (<price>, <manufacturer>, <componentID>). Healthcare organizations, for example, have a whole set of data types and acronyms understandable (some would say penetrable) only to claims processors. XML allows each of these data types to be easily recognized and, for site developers, used to create sites optimized around both the data and the people using it.
I can hear you saying, "This is all very nice, but what does it mean to me?" In the near term, it provides an immediate opportunity for intranet database-driven site development. As is often the case within large organizations, many departments may use the same database, but in different ways. Accounting needs payable and receivable information for the current quarter. Sales wants to monitor information by salesperson to figure out commission structures. Marketing wants data organized by product, industry segment, or whatever to figure out future release strategies.
Using XML, you will be able to customize the presentation of the queried data in a fashion most useful to the person making the query. Accounting can view a general ledger output; sales, a bar chart of results by salesperson for the last four quarters; marketing, a pretty pie chart of sales by segment. Once the data has been sent, it can be re-compiled without having to re-access the server. For instance, let's say that the sales team wants to see breakouts by region instead of by salesperson. Just as important, you will be able to compile that information even if it exists on separate databases (as long as the databases use similar data attributes).
Beyond company-specific intranets, XML will facilitate information-sharing across companies. The electronics and other industries have developed a data-transfer standard called EDI, short for electronic data interchange, which enables the electronic exchange of such things as purchase orders, invoices, and requests for quotations (RFQs). Already, XML and EDI interfaces are under development to streamline the migration of proprietary EDI offerings to Internet-based products. More information about this interface can be found at http://www.geocities.com/WallStreet/Floor/5815/ediindex.htm .
The architects of XML wanted to do more than aid intranet development. They envisioned the development of de facto data description standards for large groups with common data needs -- in short, data communities. Doctors, for example, work with unique data that interrelates differently than other types of information, and they need to sort through lots of it. XML presents an opportunity to develop standard medical templates (X-RAY, INSURANCETYPE, INSULINLEVEL) that could enable easy information sharing with patients, other doctors, and insurance examiners. In fact, rather than showing your health coverage card when you check into a hospital, you might instead establish a new account via an Internet terminal, and allow the hospital direct access to the records it needs to conduct its tests. For more about this vision, review the Jon Bosak article, XML, Java, and the Future of the Web .
By establishing a standard for how to characterize data, instead of establishing the characterizations themselves, the XML architects are allowing data communities to emerge organically, rather than by committee. This is accomplished by a Document Type Declaration (DTD). A DTD gives you the ability to specify all the characteristics of the data elements you will be accessing. DTDs can be made public and available for anyone to use, or they can exist only within one company's private server.
For example, software and Internet companies are proposing a DTD for distributing software and updates over the Internet. Microsoft and Marimba recently announced their Open Software Description, which uses XML (and Channel Definition Format) to characterize software installation variables (for example, operating system and CPU type) and to allow automated "push" capability. The XML descriptors are robust enough to self-select a channel that identifies the software configuration that applies, locate a download site, get the necessary files, and deliver them to your hard drive. Pretty nifty.
Moreover, DTDs are interoperable; you can easily take an XML document generated under one DTD and re-format it for a different view of the data using another (acceptable) DTD.
Because XML is really about specifying characteristics of data, and not simply presenting it, you will need to write style sheets to use it. Since dynamic HTML, cascading style sheets, and channel-defined formats are all standards supported by Internet Explorer 4.0 (and that other browser), you can start using XML today. Also, new tools are constantly emerging to evaluate your XML conventions and ensure that other parsers can use them as you intended. Visit Robin Cover's XML home page to keep abreast of XML developments
Microsoft has been a participant in the XML standards committee since its announcement last year, and has promised complete adherence to the eventual standard in all its Internet products. Netscape has been on the panel since April.
One final thought: Remember that XML is at this writing still a working-group proposal at the W3C, which means that it's passed most of the hurdles needed for recognition by the W3C, but still has a little way to go. Keep up to date with the status of XML by checking on the official W3C XML site .
Robert Carter recently moved to the Pacific Northwest and refuses to support the Seattle rain misinformation campaign. Before writing and editing for MSDN Online, Robert worked as an emissions trader; he can spot a lot of hot air from fifty paces.
Although XML is relatively new, a bunch of articles are already out there. (Remember, XML is being proposed as a standard -- and if there's anything that standards require, it's documentation.)
Microsoft, as part of its ongoing commitment to recommended standards, maintains an XML page that includes references to some of our whitepapers, proposals, and articles.
Robin Cover's XML home page tracks all things XML-related: specification developments, white papers, conferences, software, FAQs, and more.
The Web is Ruined and I Ruined It , a two-part diatribe by David Siegel in Web Review , contains some useful perspective.
XML, Java, and the Future of the Web is written by Jon Bosak, one of the key architects of XML.
Michael Edwards article on XML, XML: Data the Way You Want It, focuses on XML declarations, structure, and programming requirements.
Microsoft is one of a trio of software makers to jointly submit a spec to the W3C for an XML style sheet language, dubbed XSL, which will add powerful formatting capabilities to Extensible Markup Language.