Updated: September 23, 1999
(See the XML Technical FAQ for additional information in Q&A format.)
 Download this document in Microsoft Word
(.DOC) format (zipped).
Download this document in Microsoft Word
(.DOC) format (zipped). 
XML, the language
What is XML? 
Does XML replace HTML? 
What are the benefits of adding XML to HTML? 
How does XML fit into the Microsoft® Windows® Distributed interNet
Applications (Windows DNA) strategy for building three-tier, Web-enabled applications?
Where will XML be used on the Web? 
Does Microsoft Internet Explorer 4 support XML? 
What is the level of XML support in Internet Explorer 5? 
What is the difference between SGML and XML? 
How are HTML, Dynamic HTML, and XML related? 
Will it be necessary to compress XML for transmission over the Web?
How will XML be generated from existing databases? 
What is a DTD? What is it used for? 
Do Web developers have to include a DTD when they use XML to describe
data? 
What are XML schemas? How are they different from DTDs? 
What are namespaces? Why are they important? 
Extensible Stylesheet Language (XSL)
What is XSL? What can you do with XSL today?
XML-Data
Standards
What is the relationship between XML and the World Wide Web Consortium?
What is the status of XML with the W3C? 
What is the status of DOM with the W3C? 
Where does XSL stand in the W3C? 
XML vocabularies and data formats
What are XML vocabularies? 
What is CDF? 
What is OSD? 
What is OFX? 
What is RDF? 
Tool support
What tools support XML today? 
Where will the tools come from in the future? 
Issues and solutions
Why is my document object still empty after I call the Load() method? 
How do I load a document with foreign and special characters?
How do I use MSXML COM components in Visual Studio 6.0 C++?
How do I use HTML Entities in my XML?
How is white space handled in element content?
How is white space handled for attributes?
How is white space handled in the XML object model?
What does the XML declaration do?
How do I print my XML document in a readable format?
How do I use namespaces in DTDs?
How do I use XMLDSO in Visual Basic?
How do I use the XML DOM with Java?
Extensible Markup Language (XML) is the universal language for data on the Web. It gives developers the power to deliver structured data from a wide variety of applications to the desktop for local computation and presentation. XML allows the creation of unique data formats for specific applications. It is also an ideal format for server-to-server transfer of structured data.
Microsoft expects many authors and developers to use XML and HTML in tandem, for example by using XSL to generate HTML.
There are many benefits to using XML on the Web:
XML is quickly becoming the vehicle for delivering structured data from the middle tier to the desktop. XML-based data can be integrated from multiple server (database) sources, using agents on the middle tier. Schemas (see the XML-Data section) can improve this process, as developers can describe and exchange data more precisely.
Because XML describes data in a consistent, self-describing, open format, XML could potentially be used anywhere there is a need for data interchange and delivery. Microsoft expects that initially XML will be used to describe information about HTML pages, as is the case today with the channel definition format (CDF) for building Active Channel content, as well as future applications such as searching and distributed printing.
More important, because XML can describe data itself, it will be useful for delivering any kind of data, such as financial transactions, news updates, weather information, patient records, and legal libraries, to the desktop. Once on the desktop, applications can compute with the data and dynamically present the data.
Yes, Internet Explorer 4 supports XML. It supports the following features:
 .
. Internet Explorer 5 has the following XML support:
The Standard Generalized Markup Language, or SGML (ISO 8879), is the international standard for defining descriptions of structure and content in electronic documents. XML is a simplified version of SGML; XML was designed to maintain the most useful parts of SGML. While SGML requires that structured documents reference a document type definition (DTD) to be valid, XML allows for "well-formed" data and can be delivered without a DTD. XML was designed so that SGML can be delivered, as XML, over the Web.
HTML is used in conjunction with CSS to format and present hyperlinked pages. Dynamic HTML, through the Document Object Model, makes all elements in HTML accessible through language-independent scripting and other programming languages, thus dramatically increasing client-side interactivity without additional requests to the server. The page's object model allows any aspect of its content (including additions, deletions, and movement) to be changed dynamically.
By adding XML for structured data, developers have the technologies they need to build the next generation of rich, flexible Web applications. With XML, they can deliver structured data to the desktop and compute on the data via the XML Object Model. Today developers can display XML-based data in a browser, such as Microsoft Internet Explorer 4.0 and Microsoft Internet Explorer 5, or in other applications through scripting. In addition, they can also apply formatting rules to the data without complex scripting using XSL style sheets, which essentially transform the XML-based data into display. These two methods of displaying XML-based data make it possible to generate multiple views of complex data.
In general, the need to compress XML data will be application-dependent and largely a function of the amount of data being moved between the server and the client. XML compresses extremely well because of the repetitive nature of the tags used to describe the structure of the data. Benchmarks will be provided in the future to assist in determining whether compression is necessary. It is worth noting that compression is standard to HTTP 1.1 servers and clients, and XML will automatically benefit from this.
In general, this will be handled using a three-tier architecture. Agents will be built to run on the middle tier to access multiple existing database management systems (DBMSs) and output XML. XML enables the generation of common logical views on these databases. These agents will also support the ability to generate XML "updategrams" bidirectionally, that is, to inform the client of changes made to the data on the middle tier or database server, and vice versa. Consequently, the agents will be able to receive updategrams from the client and send updates to the DBMS.
The document type definition (DTD) defines the valid syntax of a class of XML documents. That is, it lists a number of element names, which elements can appear in combination with which other ones, what attributes are available for each element type, and so on. A DTD uses a different syntax from that used by XML documents.
No. XML can be used to describe data with or without a DTD. The term "valid" XML refers to XML data that references a DTD, while "well-formed" XML refers to XML that does not use a DTD. The addition of well-formed XML is one of the fundamental differences between XML and SGML. Clearly, in both cases, the XML itself must conform to the standards of the language (so, for example, all tags must be closed and tags may not overlap).
[From the W3C XML Activity Page at http://www.w3.org/XML/Activity.html  ]
]
While XML 1.0 supplies a mechanism, the Document Type Definition (DTD), for declaring constraints on the use of markup, automated processing of XML documents requires more rigorous and comprehensive facilities in this area. Requirements are for constraints on how the component parts of an application fit together, the document structure, attributes, datatyping, and so on. The W3C XML Schema Working Group is addressing means for defining the structure, content and semantics of XML documents.
In Internet Explorer 5, Microsoft is providing a release of XML Schema as a technology preview that 
  may be useful for developers interested in building prototypes and gaining experience with schema. This 
  technology preview is based on the XML-Data  note submitted to the W3C. XML Schema, as implemented in this technology preview, can be thought of as the subset of the XML-Data submission that corresponds 
  to the feature set proposed for Document Content 
  Description (DCD)
 
  note submitted to the W3C. XML Schema, as implemented in this technology preview, can be thought of as the subset of the XML-Data submission that corresponds 
  to the feature set proposed for Document Content 
  Description (DCD)  . Microsoft is actively involved in defining the emerging W3C XML schema standard and will track this effort. Developers should note that the version of XML Schema
  released with Internet Explorer 5 is subject to change.
. Microsoft is actively involved in defining the emerging W3C XML schema standard and will track this effort. Developers should note that the version of XML Schema
  released with Internet Explorer 5 is subject to change. 
The namespace facility is another advanced feature of XML, outlined in a W3C Working Draft. Namespaces allow developers to qualify uniquely the element names and relationships and to make these names recognizable. By doing so, they can avoid name collisions on elements that have the same name but are defined in different vocabularies. They allow tags from multiple name spaces to be mixed, which is essential if data is coming from multiple sources.
For example, a bookstore may define the <TITLE> tag as the title of a book, contained only within the <BOOK> element. A directory of people, however, might define <TITLE> as a person's position. Consider, for instance, <TITLE>President</TITLE>. Namespaces help define this distinction clearly.
The W3C Working Draft for XSL divides the language into two
  main parts: transformation and formatting semantics. This release supports the transformation part of the W3C XSL specification  . Microsoft is tracking the W3C Working Draft and
  will be updating this implementation to match the final W3C recommendation.
. Microsoft is tracking the W3C Working Draft and
  will be updating this implementation to match the final W3C recommendation. 
XSL is defined as an XML grammar that consists of a set of XSL elements. This grammar can be used to transform XML documents into HTML or XML documents.
You can use XSL for direct browsing of XML files and from the XML DOM. The XML DOM transformNode method supports the use of XSL Elements to perform transformations. The DOM selectNodes and selectSingleNode methods support the XSL pattern-matching syntax that enables sophisticated queries for nodes within a particular context of the overall tree structure.
XML-Data, a specification that has been submitted to the W3C for review, makes XML even more powerful and extensible. It outlines a richer method of describing and validating data, making XML even more powerful for integrating data from multiple disparate sources and building three-tier Web applications.
In January 1998, the W3C acknowledged the Extensible Markup Language XML-Data submission from Microsoft, ArborText Inc., DataChannel Inc., and Inso Corp. The specification is available for public review at http://www.w3.org/TR/1998/NOTE-XML-data-0105/  or http://msdn.microsoft.com/standards/.
 or http://msdn.microsoft.com/standards/.
  
The W3C has an active XML Working Group. Microsoft was one of the co-founders of this group in June 1996, and since then numerous industry players have joined, including Netscape Communications Corp, IBM and Oracle. For more information on the XML standards process, see http://www.w3.org/  .
. 
XML version 1.0 recently moved from the proposed recommendation phase to the recommendation phase, which is the last step in the approval process at the W3C, and is a very stable standard. For more information on the current XML specification, and on the submission and review process within the W3C, see http://www.w3.org/  .
. 
The XML DOM recently moved from the proposed recommendation phase to the recommendation phase, which is the last step in the approval process at the W3C, and is a very stable standard. For more information on the current DOM specification, and on the submission and review process within the W3C, see http://www.w3.org/  .
. 
XSL is currently in the Working Draft stage in the W3C. It was submitted by ArborText, Inso, and Microsoft in September 1997. Microsoft plans to update its XSL code to track changes as it moves forward in the standard-development process.
XML vocabularies are the elements used in particular applications or data formats, the definitions of those formats. For example, in Channel Definition Format (CDF), element names such as <Schedule>, <Channel>, and <Item> make up the vocabulary for describing collections of pages, when these pages should be downloaded, and so on. Vocabularies, along with the structural relationships between the elements, are defined in XML DTDs and XML-Data schemas.
The channel definition format (CDF) is an XML-based data format used in Microsoft Internet Explorer 4.0, for describing Active Channel content and Active Desktop components. It is used by thousands of content developers and millions of end users to describe collections of pages and data about pages, such as channel bar display, download behavior, Web page usage, and page-hit logging. For more information on CDF, see the Content & Component Delivery section of the Web Workshop.
The open software description (OSD) is an XML-based data format, fully supported in Microsoft Internet Explorer 4.01, for advertising and installing software components over the Internet. When new versions of software become available, OSD provides a mechanism to notify the user (a process referred to as publishing). In addition, OSD provides the functionality to describe in great detail how to install ActiveX® Controls, as well as Java packages and class files, adding functionality to the use of .INFs for setup. Microsoft and Marimba Inc. submitted this specification to the W3C in August 1997. For more information, see http://msdn.microsoft.com/standards/.
The open financial exchange (OFX) is a data format that Microsoft Money and Intuit Quicken personal finance applications use to communicate with financial institutions over the Web. Although it is currently described using SGML, OFX will soon be based on XML.
The resource description framework (RDF), is an XML-based application being developed under the direction of W3C. It brings together ideas from the meta content format, or MCF (technology acquired by Netscape from Apple Computer Inc.) and XML-Data (defined in a proposal recently submitted to the W3C by Microsoft, ArborText, DataChannel, and Inso).
RDF allows for generalized searching of information without application-specific rules, such as those defined in DTDs. RDF allows a complementary view of data through graphs and nodes, rather than through a structured tree, which the current XML technology enables. RDF, together with XML schemas, will provide a standard way for developers to write these relationships for broad classes of XML elements.
The crucial technologies that will deliver value this year and next are XML for structured data, XML namespaces to make names unique and recognizable, and new XML tags that add meaning to data, so smarter search engines can perform better searches.
Many vendors offer support for XML in their products today. See the XML Tools page for a listing of the top third-party vendors.
Microsoft expects a wide variety of applications to be developed in the coming months that convert information currently stored in documents and databases into XML for delivery to the desktop. In addition, Microsoft expects XML-centric databases, rich authoring and application developer tools, and data format-specific tools such as wizards to be developed as new vocabularies are defined.
 Back to top
Back to top 
  By default, operations are loaded asynchronously. This means that if you provide an http URL location, the load() method will return immediately and your document object will still be empty because the data hasn't come back from the server yet. To fix this, add the following line to your code:
xmldoc.async = false;Also, if you are loading http XML documents from a standalone C++ application, you will have to query the message queue in order to continue downloading.
A document may contain foreign characters such as the following:
<test>foreign characters (úóíá) </test>Foreign characters such as úóíá must be prefaced with an escape sequence. Foreign characters can be either UTF-8 encoded or specified with a different encoding as follows:
<?xml version="1.0" encoding="iso-8859-1"?> <test>foreign characters (úóíá) </test>
Now your XML will load correctly.
Other characters are reserved in XML and also need to be handled differently. The following XML:
<foo>This & that</foo>generates this error:
Whitespace is not allowed at this location. Line 0000001: <foo>This & that</foo> Pos 0000012: ----------^
The ampersand is part of the syntactic structure of XML and will not be interpreted as an ampersand if simply placed within an XML data source. You need to substitute a special character sequence called an "entity".
<foo>This & that</foo>
The following characters require the corresponding entities:
<< && >> "" ''Quote characters are used as delimiters for attribute values inside a tag, and therefore cannot always be used inside the value of an attribute. For example, the following will return an error:
<foo description='John's Stuff'>
The single quote is used both as an attribute delimiter and in the attribute value itself. To fix this, you can either switch to use a double quote for the attribute delimiter as follows:
<foo description="John's Stuff">
Or you can escape the single quote to the entity '
<foo description='John' Stuff'>Both of the above will return the attribute value John's Stuff via the getAttribute method in the XML object model. Similarly for the double quote, you can use the entity
".
You can also handle special characters in element content by putting your text inside a CDATA section. The following is valid:
<xml> <![CDATA[ This & that <stuff> is just "text" content. ]]> </xml>
In this example, the XML Object Model will show a CDATA node as a child of the xml node which will return the string
This & that <stuff> is just "text" content.
as the nodeValue.
The easiest way to use MSXML COM components in Visual C++ 6.0 is to use the #import directive:
#import "msxml.dll" named_guids no_namespaceThis defines all the IXML* interfaces and interface IDs so you can use them in your application. You can also get the MSXML type libraries and header files, and the uuid.lib that contains the class IIDs from the INETSDK.
The following XML contains an HTML entity:
<copyright>Copyright © 1999, Microsoft Inc, All rights reserved.</copyright>
It generates the following error:
Reference to undefined entity 'copy'. Line: 1, Position: 23, ErrorCode: 0xC00CE002 <copyright>Copyright © 1999, ... ----------------------^
This is because XML has only five built-in entities. See How do I load a document with special characters? for more information about built-in entities.
To use HTML entities, you need to define them with a DTD. To find out more about DTDs, see the W3C XML Recommendation. To use this DTD, include it directly in a DOCTYPE tag as follows:<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> <copyright>Copyright © 1999, Microsoft Inc, All rights reserved.</copyright>
For this to load, you need to turn off the validateOnParse property of the IXMLDOMDocument interface. Try pasting this into the Validator Test Page, turn off DTD validation, and click Validate. Notice that the document loads and the copyright character is available in the DOM tree shown at the end of the validator page.
If you are already doing DTD validation, then you must include the HTML entities as a parameter entity in your existing DTD as follows:<!ENTITY % HTMLENT SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> %HTMLENT;This will define all the HTML entities so you can use them in your XML document.
The XML DOM has three methods for accessing the text content of elements:
| Property | Behavior | 
| nodeValue | Returns the original text content (including white space) on TEXT, CDATA, COMMENT, and PI nodes as specified in the original XML source. Returns null on ELEMENT nodes and on the DOCUMENT itself. | 
| data | Same as nodeValue | 
| text | Recursively concatenates multiple TEXT and CDATA nodes in a specified subtree and returns the combined result. | 
Note: White space consists of newline, tab, and space characters.
The nodeValue property always returns what is in the original document independent of how the document is loaded and current xml:space scope.
The text property concatenates all text in the specified subtree and expands entities. This is dependant upon how the document is loaded, the current state of the preserveWhiteSpace switch, and the current xml:space scope, as follows:
preserveWhiteSpace = true when the document is loaded| preserveWhiteSpace=true | preserveWhiteSpace=true | preserveWhiteSpace=false | preserveWhiteSpace=false | xml:space=preserve | xml:space=default | xml:space=preserve | xml:space=default | 
| preserved | preserved | preserved | preserved and trimmed | 
preserveWhiteSpace = false when the document is loaded
| preserveWhiteSpace=true | preserveWhiteSpace=true | preserveWhiteSpace=false | preserveWhiteSpace=false | 
| xml:space=preserve | xml:space=default | xml:space=preserve | xml:space=default | 
| half preserved | half preserved and trimmed | half preserved | half preserved and trimmed | 
Where preserved means the exact original text content as found in the original XML document, trimmed means the leading and trailing spaces have been removed, and half preserved means that "significant white space" is preserved and "insignificant white space" is normalized. Significant white space is white space inside of text content. Insignificant white space is white space between tags as follows:
<name>\n \t<first> Jane</first>\n \t<last>Smith </last>\n </name>
In this example, the red is insignificant white space and can be ignored, while the green is significant white space since it is part of the text content and therefore has a significant meaning and cannot be ignored. So in this example, the text property returns the following results:
| state | returned value | 
| preserved | "\n\t Jane\n\tSmith \n" | 
| preserved and trimmed | "Jane\n\tSmith" | 
| half preserved | " Jane Smith " | 
| half preserved and trimmed | "Jane Smith" | 
Notice that "half preserved" normalizes insignificant white space, for example, the newlines and tab characters are collapsed down into a single space character. You can change the xml:space attributes and the preserveWhiteSpace switch and the text property will return a different value accordingly.
CDATA and xml:space="preserve" subtree boundariesIn the following example, the contents of the CDATA node or the "preserved" node are concatenated as they are and do not participate in the insignificant white space normalization. For example:
<name>\n \t<first> Jane </first>\n \t<last><![CDATA[ Smith ]></last>\n </name>
In this case, the white space inside the CDATA node is never "merged" with "insignificant" white space and is never trimmed. Therefore, the "half preserved and trimmed" case will return the following:
"Jane Smith "
Here, the insignificant white space between the </first> and <last> tags is included regardless of the contents of the CDATA node. The same result is returned if the CDATA is replaced with the following:
<last xml:space="preserve"> Smith </last>
Entities are special
Entities are loaded and parsed as part of the DTD and appear under the DOCTYPE node. They do not necessarily have any xml:space scope. For example:
<!DOCTYPE foo [ <!ENTITY Jane "<employee>\n \t<name> Jane </name>\n \t<title>Software Design Engineer</title>\n </employee>"> ]> <foo xml:space="preserve">&Jane;</foo>
Assuming that preserveWhiteSpace=false (in the scope of the DOCTYPE tag), the insignificant white space is lost when the entity is parsed. The entity will not have white space nodes. The tree will look like this:
DOCTYPE foo
    ENTITY: Jane
        ELEMENT: employee
            ELEMENT: name
                TEXT: Jane 
            ELEMENT: title
                TEXT>:Software Design Engineer
    ELEMENT: foo
       ATTRIBUTE: xml:space="preserve"
       ENTITYREF: Jane
   
Notice that the DOM tree exposed under the ENTITY node inside the DOCTYPE does not contain any WHITESPACE nodes. This means that the children of the ENTITYREF node will also have no WHITESPACE nodes even though the entity reference is in the scope of xml:space="preserve".
Every instance of an ENTITY referenced in a given document always has the identical tree.
If an entity absolutely must preserve white space, then it must specify its own xml:space attribute inside itself or the document preserveWhiteSpace switch must be set to true.
There are several ways of accessing an attribute value. The IXMLDOMAttribute interface has a nodeValue property, which is equal to nodeValue and a text property which is the Microsoft extension. These properties return the following:
| property | text returned | 
| attrNode.nodeValue attrNode.value getAttribute("name") | Returns exact content (with entities expanded) as found in the original document. | 
| attrNode.nodeTypedValue | Null | 
| attrNode.text | Same as nodeValue except the leading and trailing white space is trimmed. | 
The XML Language specification defines the following behavior for XML Applications:
| Attribute type | Text returned | 
| CDATA | ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, enumeration | 
| half normalized | fully normalized | 
Sometimes the XML Object Model will show TEXT nodes containing white space characters. This can be confusing when most of the time white space is stripped. For example the following XML example:
<?xml version="1.0" ?> <!DOCTYPE person [ <!ELEMENT person (#PCDATA|lastname|firstname)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT firstname (#PCDATA)> ]> <person> <lastname>Smith</lastname> <firstname>John</firstname> </person>
Generates the following tree:
Processing Instruction: xml DocType: person ELEMENT: person TEXT: ELEMENT: lastname TEXT: ELEMENT: firstname TEXT:
The first name and last name are surrounded by TEXT nodes containing only white space because the content model for the "person" element is MIXED; it contains the #PCDATA keyword. A MIXED content model indicates that the elements can have text interspersed between them. Therefore, the following is also valid:
<person> My last name is <lastname>Smith</lastname> and my first name is <firstname>John</firstname> </person>
And this results in the following similar looking tree:
ELEMENT: person TEXT: My last name is ELEMENT: lastname TEXT: and my first name is ELEMENT: firstname TEXT:
Without the white space after the word "is" and before <lastname>, and the white space after the </lastname> and before the word "and", the sentence would be unintelligible. So, for MIXED content models, the combination of text, white space, and elements is relevant. For non-MIXED content models this is not the case.
To make the white-space-only TEXT nodes go away, remove the #PCDATA keyword from the "person" element declaration:
<!ELEMENT person (lastname,firstname)>
which results in the following clean tree:
Processing Instruction: xml DocType: person ELEMENT: person ELEMENT: lastname ELEMENT: firstname
The XML declaration must be listed at the top of the XML document:
<?xml version="1.0" encoding="utf-8"?>
It specifies the following items:
Note: The XML declaration must be the first line in an XML document, so the following XML file:
<!--HEADLINE="Dow closes as techs get hammered"--> <?xml version="1.0"?>
generates the following parse error:
Invalid xml declaration. Line 0000002: <?xml version="1.0"?> Pos 0000007: ------^
Note: The XML declaration is optional. If you need to specify a comment or processing instruction at the top, then don't put the XML declaration in at all. However, the encoding will be UTF-8, the default.
When generating an XML file by building a document from scratch using the DOM, everything is on a single line with no whitepace in between. This is the default behavior.
The default XSL style sheet built into Internet Explorer 5 displays and prints XML documents in a readable format. For example, if you have IE5 installed, try viewing the nospace.xml file. You should see the following tree display in your browser:
- <ORDER>
 - <ITEM NAME="123">
    <NAME>XYZ</NAME> 
    <PRICE>12.56</PRICE> 
   </ITEM> 
  </ORDER>
No white space is inserted into the XML.
Printing readable XML is quite tricky, especially when you have a DTD that defines different kinds of content models. For example in the mixed content model (#PCDATA), you may not want to insert spaces because this may change the meaning of the content. For example, consider the following XML:
<B>E</B><I>lephant</I>
This better not be output as:
<B>E</B> <I>lephant</I>
because then the word boundaries are no longer correct.
All this makes automatic printing problematic. If you do need to print readable XML, you can use the DOM to insert white space as text nodes in the appropriate places.
<!ELEMENT x:customer ANY > <!ATTLIST x:customer xmlns:x CDATA #FIXED "urn:...">The namespace has to be of type #FIXED. Namespaces on attributes work the same way:
<!ELEMENT customer ANY >
<!ATTLIST customer
          x:value CDATA #IMPLIED
          xmlns:x CDATA #FIXED "urn:...">
Namespaces and XML Schemas
DTD's and XML Schemas cannot be mixed. For example, the following
xmlns:x CDATA #FIXED "x-schema:myschema.xml"will not result in the use of schema definitions defined in myschema.xml. The use of DTDs and XML Schemas is mutually exclusive.
Using the following XML as an example:
<contacts> <person> <name>Mark Hanson</name> <telephone>206 765 4583</telephone> </person> <person> <name>Jane Smith</name> <telephone>425 808 1111</telephone> </person> </contacts>You can bind to an ADO Recordset as follows:
Dim dso As New XMLDSOControl
Dim doc As IXMLDOMDocument
Set doc = dso.XMLDocument
doc.Load ("d:\test.xml")
Dim da As New DataAdapter Set da.Object = dso Dim rs As New ADODB.Recordset Set rs.DataSource = da
MsgBox rs.Fields("name").Value
This displays the string "Mark Hanson"
The IE5 version of MSXML.DLL must have already been installed. In Visual J++ 6.0, from the Project menu, select Add COM Wrapper, and choose "Microsoft XML 1.0" from the list of COM objects. This builds the required Java wrappers into a new package called "msxml". These pre-built Java wrappers are also available for download. The classes can be used as follows:
import com.ms.com.*; import msxml.*;
public class Class1
{
  public static void main (String[] args)
  {
    DOMDocument doc = new DOMDocument();
    doc.load(new Variant("file://d:/samples/ot.xml"));
    System.out.println("Loaded " + doc.getDocumentElement().getNodeName());
  }
}
The code sample loads a 3.8 MB test file "ot.xml" from the sun religion example. The Variant class is used for wrapping the Win32 VARIANT primitive type.
You cannot use pointer comparisons on the nodes since each time you retrieve a node you actually get a new wrapper. So, rather than using the following code,
IXMLDOMNode root1 = doc.getDocumentElement(); IXMLDOMNode root2 = doc.getDocumentElement(); if (root1 == root2)...
use the following instead:
if (ComLib.isEqualUnknown(root1, root2)) ....
The total size of the .class wrappers is about 160 KB. However, to be fully compliant with the W3C specification, you should use only the IXMLDOM* wrappers. The following classes are old IE 4.0 XML interfaces and can be deleted from the msxml folder:
This brings the size down to 147 KB. You may also want to delete the following additional items:
This brings the size down to 116 KB. To get it even smaller, consider the fact that the DOM itself comes in two layers: a core layer consisting of:
and DTD information that you probably want to keep:
All nodes in an XML document are of type IXMLDOMNode, which provides complete functionality, but higher level wrappers exist for each node type. Therefore, all the following interfaces can also be deleted if you modify the DOMDocument wrapper and change these specific types to use IXMLDOMNode instead:
Deleting these brings the size down to 61 KB. However, with IXMLDOMElement, the getAttribute and setAttribute methods are useful. Otherwise, you will need to use:
IXMLDOMNode.getAttributes().setNamedItem(...)