Summary

Windows 95 encodes characters using the code page model inherited from Windows 3.1, though Windows 95 can support more than one code page at a time. (See Chapter 6.) Single-byte code pages are limited to 256 characters. Each double-byte code page, which is what Far East editions of Windows 95 use, covers several thousand characters.

Double-byte character sets on Windows include a mix of 1-byte and 2-byte characters. When parsing strings in a DBCS environment, always treat 2-byte characters as a unit. Never separate a Lead Byte- from its trail byte.

Windows NT encodes characters in Unicode which is a fixed-width, 16-bit industry standard that encompasses most of the characters currently used on computers. Unicode can simplify the process of sharing multi-lingual data if your goal is to create a single, international-aware code base for Windows-based applications.

Because it is only a character encoding, Unicode does not directly address sorting, font, or layout issues.

Programs that create Unicode-based documents can use compression algorithms to keep files from doubling in size.

Commercial products based on the Unicode standard are becoming available. Even if you do not choose a full Unicode implementation for your Windows-based software, add support for converting Unicode data.

In the long-term future, all Microsoft operating-system technology will be based on Unicode. At present, the Win32 API string-handling functions exist in two forms: one that expects string parameters to be in Unicode (-A entry points) and one that expects string parameters to be expressed in Windows code pages (-W entry points).

Windows NT supports both -A and -W API function calls, whereas Win32s and Windows 95 support only -A API function calls. You can create a single source-code base for both types of function calls by using generic prototypes.

The Win32 API contains two functions, WideCharToMultiByte and MultiByteToWideChar, for converting between Unicode and Windows code pages. Windows also supports clipboard formats for converting between Unicode and other encodings.

Both the Win32 SDK and Visual C++ 2 ship with sample code that can help you learn techniques for programming for Unicode.

The Visual C++ 2 run-time libraries contain functions for processing wide characters. The same functions also exist in versions that process single-byte characters and multibyte characters.

By following several basic steps, you can easily change existing programs so that they can use Unicode. You need to convert code to use wide-character function calls and data types, revise code that assumes that characters are 8 bits wide, and add support for special Unicode characters such as the byte order mark or combining characters.