Code Pages and Unicode

A code page is an ordering or encoding of a standard set of characters within a specific locale. This encoding provides a consistent way for computer devices to exchange and process data. Each code page includes a common set of core characters (the first 128 characters of the code page). Windows NT supports several code pages, including ANSI and OEM code pages. ANSI code pages are supported for Windows 3.1 compatibility; OEM code pages are supported for MS-DOS and OS/2 compatibility. Other code pages are available, based on the installed locale, for use in data translation. These include secondary OEM code pages, MAC code pages, and EBCDIC code pages.

The following table shows the various code pages supported in Windows NT.

Table C.2 Windows NT Code Pages

Code page name

Number

Type

Windows 3.1 Eastern European

1250

ANSI

Windows 3.1 Cyrillic

1251

ANSI

Windows 3.1 US (ANSI)

1252

ANSI

Windows 3.1 Greek

1253

ANSI

Windows 3.1 Turkish

1254

ANSI

MS-DOS U.S.

437

OEM

MS-DOS Greek

737

OEM

MS-DOS Multilingual (Latin I)

850

OEM

MS-DOS Slavic (Latin II)

852

OEM

IBM Cyrillic (primarily Russian)

855

OEM

IBM Turkish

857

OEM

MS-DOS Portuguese

860

OEM

MS-DOS Icelandic

861

OEM

MS-DOS Canadian-French

863

OEM

MS-DOS Nordic

865

OEM

MS-DOS Russian (former USSR)

866

OEM

IBM Modern Greek

869

OEM

Macintosh Roman

10000

Macintosh Greek I

10006

Macintosh Cyrillic

10007

Macintosh Latin II

10029

Macintosh Icelandic

10079

Macintosh Turkish

10081

EBCDIC

037

EBCDIC "500V1"

500

EBCDIC

1026

EBCDIC

875


Windows NT uses Unicode (the BMP region of ISO specification 10646) for all internal text processing. Unicode is a 16-bit, fixed-width character encoding standard, with sufficient encoding space to accommodate most of the world's modern characters. All character sets and code pages supported by Windows NT can be mapped to Unicode.

By using Unicode-enabled applications, users can benefit from multilingual processing and a rich selection of characters.

For more information, see The Unicode Standard (version 1.0); The Unicode Consortium, Addison-Wesley Publishing Company, Inc.; 1991

Note Most code pages have a core set of characters in common (ASCII characters–the first 128 characters in the code page). In addition, each code page includes some unique "extended" characters not available on other code pages. Be sure not to use these extended characters in server names, computer names, and share names. Also, don't use these extended characters with applications used across the network. The FAT and HPFS file systems, which use the OEM code page, must translate the characters they don't recognize in the filename to a best-fit character, no character, or some non-recognized character.