INF: Extended Characters Different Under Windows

ID Number: Q83461

3.00 3.10

WINDOWS

Summary:

Applications in the Windows environment must typically deal with two

different character sets: the ANSI (American National Standards

Institute) character set and the OEM (original equipment manufacturer)

character set. Conversely, applications in the MS-DOS environment must

deal only with the OEM character set. This article describes how

Windows deals with the ANSI and OEM character sets.

1. When ALT+xxx is used to enter a character from the OEM character

set into an application in the Windows environment that uses the

ANSI character set, Windows displays the character in the ANSI

character set that most closely matches the entered character.

2. When a character from the OEM character set is entered into a file

using a text editor under MS-DOS and the file is displayed under

Windows, the character from the ANSI character set that has the

same character code number as the OEM character is displayed.

More Information:

OEM and ANSI Character Sets

---------------------------

MS-DOS uses the OEM character set. This character set varies between

computers and depends on the code page ROM (read-only memory)

installed by the computer manufacturer. For example, personal

computers manufactured for use in the United States use a character

set called code page 437, while computers manufactured for use in

Portugal use code page 860. MS-DOS uses the OEM character set in

applications and to create files and filenames.

For the most part, Windows uses fonts organized according to the ANSI

character set (called ANSI-set fonts, in this article). Windows also

supports fonts that use the same OEM character set that MS-DOS uses

(called OEM-set fonts, in this article).

Character positions 32 through 127 are identical in the ANSI and OEM

character sets for most code pages (including code pages 437, 850,

852, 860, 861, and 865). The remaining characters of the OEM character

set (character positions 0 through 31 and 128 through 255) either do

not appear in the ANSI character set, or exist at a different position

in the ANSI character set. Therefore, some characters in the OEM

character set cannot be displayed in Windows using an ANSI-set font.

If an application must display such characters under Windows, an OEM-

set font is required.

Typing ANSI and OEM Characters in Windows

-----------------------------------------

In the Windows environment, a user can enter any character in the

character set by holding down the ALT key and typing 0xxx, where "xxx"

is the decimal number of the desired character position in the font.

For example, with an ANSI-set font in use, ALT+0123 will display the

123rd character in the ANSI character set. Similarly, with an OEM-set

font in use, ALT+0123 will display the 123rd character in the OEM

character set.

In the MS-DOS environment, a user can enter any character in the OEM

character set by holding down the ALT key and typing xxx (no leading

zero), where "xxx" is the decimal number of the desired character

position in the font.

If a user enters an MS-DOS OEM character set code (ALT+xxx) in an

application for Windows that uses an ANSI-set font, Windows converts

the OEM-set character to the character that most closely matches in

the ANSI set. This conversion is governed by a mapping table that is

installed with Windows. Because some OEM-set characters with positions

greater than 127 do not exist in the ANSI character set, the result of

the conversion in Windows may differ from the character in the OEM

set. The OemToAnsi function uses the same mapping table to perform its

character conversions.

For example, while OEM character-set code page 437 contains a square-

root symbol at position 251, the ANSI character set does not contain

this character. Consequently, when the user types an ALT+251 in an

edit control that uses the ANSI character set, an underscore character

appears because Windows defines the character mapping in this manner.

As another example, the C-cedilla character exists in both the ANSI

character set and in the OEM character-set code page 437. Therefore,

typing ALT+128 in an edit control creates the desired C-cedilla

character. Note that while the character exists in both character

sets, its position is different in each set (128 in the OEM character

set and 199 in ANSI). The alternative method to request a C-cedilla is

to type ALT+0199, which specifies the character's position in the ANSI

character set.

An edit control that uses the ES_OEMCONVERT style and a combo box that

uses the CBS_OEMCONVERT style have a different behavior from that

described above. These two styles cause their text contents to be

converted from lowercase letters to uppercase letters, then from the

ANSI set to the OEM set and then back to the ANSI set for display.

This behavior is important for an edit control in which the user

specifies a filename. If the user enters characters that do not exist

in the underlying OEM character set, the name of the file will differ

from the name specified by the user, which would be confusing. Because

the characters are mapped into characters that exist in the OEM

character set, the filename specified always matches the filename

actually used. The contents are converted to uppercase characters

because it is customary in some languages to eliminate diacritical

marks when a character is in uppercase, and the OEM character set does

not contain uppercase characters with these diacritical marks.

Displaying a String Containing OEM-Set Characters

in an Application that Uses the ANSI Character Set

--------------------------------------------------

Text editors running under MS-DOS use the OEM character set for

display and in the files they create. When a Windows-based text editor

loads a file that uses the OEM character set, the editor interprets

the characters according to the ANSI character set. Character

positions 32 through 127 are not affected under most code pages

because both the ANSI and OEM character sets have identical

characters. However, character positions greater than 127 may be

displayed differently than in the MS-DOS-based text editor because the

character positions represent different characters in the ANSI

character set.

The solution to this difficulty is to use a Windows-based text editor

that uses the ANSI character set when the text contains characters in

both the OEM and ANSI character sets. A Windows-based editor accepts

ANSI-set characters directly and converts OEM-set characters to the

closest matching ANSI-set characters. The resulting text contains only

ANSI-set characters, which can be displayed by any application running

under the Windows environment. If an application must display OEM-set

characters that are not in the ANSI character set, it must use an OEM-

set font.

Consider the following example: An MS-DOS-based text editor is used to

edit a application's resource file on a system with OEM character-set

code page 437 installed. The user types ALT+129 as part of the static

text to label a button in a dialog box. However, when the dialog box

is displayed, the text is not as expected but contains a black

rectangle where the u-umlaut character belongs. The black rectangle is

used to signify character positions that are not defined in the ANSI

character set.

To workaround to this problem is to edit the resource file with a

Windows-based text editor that uses the ANSI character set. Typing

ALT+129 will create a u-umlaut as desired because the editor will

convert the OEM-set character to the closest matching ANSI-set

character. In this case OEM-set character position 129 maps to ANSI-

set character position 252. The alternative method to specify u-umlaut

in the Windows-based editor is to type ALT+0252, using its ANSI

character set character position directly.

As another example, an application requires the square-root symbol,

which does not exist in the ANSI character set, as part of a button

label. Assuming the code page 437 is installed, and that the resource

file is edited under Windows, enter ALT+0251 in the button label

because the square-root symbol is the 251st character of the OEM

character set. When the application is run, send a WM_SETFONT message

to the control, specifying an OEM-set font. An OEM-set font is always

available from the GetStockObject function through its OEM_FIXED_FONT

index.

For more information on code pages and character sets under Windows,

query on the following words in the Microsoft Knowledge Base:

prod(winsdk) and code and pages and character and sets

For a reference to a Windows Developer's Note regarding this subject,

query on the following word in the Microsoft Knowledge Base:

INTLAPPS

Additional reference words: 3.00 3.10 folding