Summary
- Windows 95 encodes characters using the code page model
inherited from Windows 3.1, though Windows 95 can support
more than one code page at a time. (See Chapter 6.)
Single-byte code pages are limited to 256 characters.
Each double-byte code page, which is what Far East
editions of Windows 95 use, covers several thousand
characters.
- Double-byte character sets on Windows include a mix of
1-byte and 2-byte characters. When parsing strings in a
DBCS environment, always treat 2-byte characters as a
unit. Never separate a Lead Byte- from its trail byte.
- Windows NT encodes characters in Unicode which is a
fixed-width, 16-bit industry standard that encompasses
most of the characters currently used on computers.
Unicode can simplify the process of sharing multi-lingual
data if your goal is to create a single,
international-aware code base for Windows-based
applications.
- Because it is only a character encoding, Unicode does not
directly address sorting, font, or layout issues.
- Programs that create Unicode-based documents can use
compression algorithms to keep files from doubling in
size.
- Commercial products based on the Unicode standard are
becoming available. Even if you do not choose a full
Unicode implementation for your Windows-based software,
add support for converting Unicode data.
- In the long-term future, all Microsoft operating-system
technology will be based on Unicode. At present, the
Win32 API string-handling functions exist in two forms:
one that expects string parameters to be in Unicode (-A
entry points) and one that expects string parameters to
be expressed in Windows code pages (-W entry points).
- Windows NT supports both -A and -W API function calls,
whereas Win32s and Windows 95 support only -A API
function calls. You can create a single source-code base
for both types of function calls by using generic
prototypes.
- The Win32 API contains two functions, WideCharToMultiByte
and MultiByteToWideChar, for converting
between Unicode and Windows code pages. Windows also
supports clipboard formats for converting between Unicode
and other encodings.
- Both the Win32 SDK and Visual C++ 2 ship with sample code
that can help you learn techniques for programming for
Unicode.
- The Visual C++ 2 run-time libraries contain functions for
processing wide characters. The same functions also exist
in versions that process single-byte characters and
multibyte characters.
- By following several basic steps, you can easily change
existing programs so that they can use Unicode. You need
to convert code to use wide-character function calls and
data types, revise code that assumes that characters are
8 bits wide, and add support for special Unicode
characters such as the byte order mark or combining
characters.