PRB: Strings Passed to loadXML must be UTF-16 Encoded BSTRs

ID: Q247708


The information in this article applies to:
  • Microsoft Internet Explorer (Programming) version 5


SYMPTOMS

When using the loadXML method of the MSXML parser, attempting to include non UTF-16 character sequences in the BSTR parameter passed to loadXML may result in the following error message:

An Invalid character was found in text content.

Furthermore, attempting to change the encoding of the string, specifying an "encoding" attribute on the main XML processing instruction, for example results in the following error message:
Switch from current encoding to specified encoding not supported.


CAUSE

The string parameter must be BSTR format. BSTR format strings are always UTF-16.


RESOLUTION

Scripting developers have two options available:

  1. Convert your XML documents to UTF-16-formatted Unicode, either automatically or by hand.
  2. Escape all non-Unicode character encodings inside the XML document using XML Unicode entity references. Any XML character can be encoded in plain ASCII using the form &#xxxx, where xxxx is its index into the Unicode character set.
Microsoft Visual C++ developers have a third option: load data into MSXML using a method other than loadXML. Typical misuse of loadXML results from the desire to load XML data from memory; the IXMLDOMDocument::Load method actually has several overloads that are superior alternatives to loadXML.

See the following Knowledge Base article for more information:
Q223337 INFO:Loading/Saving XML Data with Internet Explorer XML Parser

Specifically, the Load method can be passed a SAFEARRAY stuffed full of tasty XML data encoded in any scheme.


STATUS

This behavior is by design.

Additional query words:

Keywords : kbGrpInet kbIE500 kbXML kbDSupport
Version : WINDOWS:5
Platform : WINDOWS
Issue type : kbprb


Last Reviewed: January 11, 2000
© 2000 Microsoft Corporation. All rights reserved. Terms of Use.