Do Not Limit Character Parsing to Latin Script

Operations that check to see whether a character is between A and Z might work for English, Hawaiian, and Indonesian, but they exclude important characters in just about every other language in the world. A similar mistake is to assume that all characters can be expressed in only 7 bits. This assumption works for only the ASCII set.

// Search until you find a noncharacter.
while ((*pch >= 'A' && *pch <= 'Z') ||
(*pch >= 'a' && *pch <= 'z'))
pch++;

In this case, it is safer to call the system, which returns the correct information based on the locale of the calling thread. (See Chapter 5.)

// Use Win32 API call IsCharAlpha instead.
while (IsCharAlpha(*pch))
pch++;

Windows NT supports Unicode, and Windows 95 supports multiple code pages. Therefore, you cannot assume that the active character set is always Latin 1 ANSI. Neither can you assume a homogeneous network environment in which all machines use the same character encoding. The following code fragments assume a specific Windows code page and will not work on all systems:

if (ch == 223) // special case for German esszett

...

if ((*pch >= 0x81) && (*pch <= 0x9F))
// Test to see whether the character is in lead-byte range
// for Japanese CP 932.

Chapter 4 describes how to localize your user interface into languages from different character sets. Chapter 6 explains in greater detail how to create more flexible applications by saving language and character set information with the application's documents. You need this information in order to display text with fonts that contain the right characters. Chapter 6 also describes layout functions introduced with Windows 95 that are useful for writing generic code and that will work for right-to-left or vertical text streams as well as for left-to-right text streams.