Character Codes

Characters are represented by character codes. Character codes are generated and stored when a user inputs a document. Single-Byte character sets (SBCS) provide 256 character codes (28). This is an adequate number to encode most of the characters needed for Western Europe. For example, the Windows Extended ANSI character set contains 256 characters consisting of Latin letters, Arabic numerals, punctuation, and drawing characters.

However, 256 character codes are not enough to represent all the characters needed by multi-lingual users in a single font, or by users in the Far East, where over 12,000 characters may need to be addressed at any one time. Consequently, Multi-Byte character sets (commonly known as Double-Byte character sets) are necessary. Double-Byte character sets (DBCS) are a mixture of Single-Byte and Double-Byte character encodings and provide over 65,000 character codes (216).

Unicode

Unicode is a 16-bit encoding that encompasses many characters used in general text interchange throughout the world. Each Unicode index refers unambiguously to a given character. Unicode allows a larger range of characters to be addressed than is possible using a Single-Byte character encoding. All Unicode values are Double-Byte, which simplifies the way a Unicode-based system reads a string of text. In comparison, a Double-Byte system must determine which values in a string are Single-Byte character codes and which are Double-Byte character codes.

NT internally uses Unicode for character encoding. Under NT, applications can still support existing Single-Byte codepages (discussed below) using the NLS APIs. DBCS-to-Unicode mappings are handled via the MultiByteToWideChar and WideCharToMultiByte API's.

Windows 95 does not use Unicode internally for character encoding. However Windows 95 is able to handle Multi-Byte character sets, and is able to map to Unicode using International API's (such as MultiByteToWideChar mentioned above).