DBCS-Enabling Your Core-Code Base

As mentioned in Chapter 2, a good internationalization shortcut is to use a single source-code base for all language editions of a program. This means that all language editions are built from the same source files, but it can also mean that all language editions share some or all of the executable code. With a fully run-time DBCS-enabled code base, any language edition can handle double-byte characters when running on a DBCS edition of the operating system. A user can run an English application on a Japanese edition of Windows, freely typing in and editing kanji strings without problems. For example, Microsoft Visual C++ 2's integrated editing environment is fully DBCS-enabled. If you run Visual C++ 2 on Japanese Windows, you can put kanji literal characters and strings in your source files. The following is an example of fully run-time DBCS-enabled code:

// an example of a fully run-time DBCS-enabled function
int charcount (char *pszStr)
{
int count;
for (count = 0; *pszStr; pszStr = CharNext(pszStr))
++count;
return count;
}

This might seem like a great scheme at first glance, but constantly calling the system API CharNext in the inner loop is needlessly expensive, especially when the application is running on a non-DBCS platform. (How many French users will actually run your program on a Japanese edition of Windows?) Not only will the code be less efficient, but it will have to contain buffers that are twice as large in order to hold DBCS characters.

Run-Time Optimization

If having fully run-time DBCS-enabled code is important, optimization can help. One option for run-time optimization is to sidetrack the system by writing your own edition of CharNext, using information about the code page provided by the Win32 API GetCPInfo. The example in Figure 3-6 avoids the overhead of making a system call in the inner loop and uses an inline function to keep code readable.

CPINFO CPInfo; // a Windows-defined structure for code-page info
BYTE *vbLBRange; // table of lead-byte range values, which can vary
// in length depending on the code page
BOOL vfDBCS; // Are we running on a DBCS edition of Windows?

{
// ...somewhere in the initialization code...
GetCPInfo(CP_ACP, &CPInfo);
vbLBRange = CPInfo.LeadByte;
vfDBCS = (CPInfo.MaxCharSize > 1); // Is the max length in bytes of
// a character in this code
// page more than 1?
}

...

inline char* MyCharNext (char *pszStr)
{
BYTE bRange = O;

// Check to see whether *pszStr is a Lead Byte-. The constant 12
// allows for up to 6 pairs of lead-byte range values.
while ((bRange < 12) && (vbLBRange[bRange] != NULL))
{
if ((*pszStr >= vbLBRange[bRange]) &&
(*pszStr <= vbLBRange[bRange+1]))
return (pszStr + 2); // Skip two bytes.

bRange += 2; // Go to the next pair of range values.
}

return (pszStr + 1); // Skip one byte.
}

Figure 3-6 By writing your own version of CharNext you optimize performance by avoiding the need to call the system for a heavily used operation.

A further optimization would be to make DBCS-related calls only when the program is running on a DBCS platform. (See Figure 3-7.) You'll find that this amount of effort pays off only with code that's called the most frequently.

// fully DBCS-enabled code
int charcount (char *pszStr)
{
int count;
if (vfDBCS)
{
for (count = 0; *pszStr; pszStr = MyCharNext(pszStr))
++count;
}
else
{
for (count = 0; *pszStr; pszStr++)
++count;
}

return count;
}

Figure 3-7 Making DBCS-related calls only when a program runs on a DBCS platform enhances the performance of frequently called code.

Dual Compilation

Another widely used approach to DBCS enabling is dual compilation. Sections of string-handling code bracketed by

#ifdef DBCS
...
#else
...
#endif

allow you to use one set of source code files and substitute code using a compile-time switch. With this approach, the DBCS-enabled code doesn't affect your program when it's compiled with the DBCS switch off, which is a great advantage. The main disadvantage of this method is that in effect it creates a dual code base that you have to compile, test, and maintain separately. An example of this approach is shown below.

int charcount (char *pszStr)
{
int count;
#ifndef DBCS
for (count = 0; *pszStr; pszStr++)
#else
for (count = 0; *pszStr; pszStr = CharNext(pszStr))
// (or you could use MyCharNext)
#endif
++count;
return count;
}

Macros and Inline Functions

You could greatly reduce the number of #ifdef DBCS blocks (and greatly increase the ease of maintaining the code) by defining several macros.

#ifndef DBCS
#define CharNext(pc) ((*pc) ? pc + 1 : pc)
#define CharPrev(pcStart, pc) ((pc > pcStart) ? pc - 1 : pcStart)

#ifndef WIN32
#define IsDBCSLeadByte (bByte) (FALSE)
#endif
#endif

// The macro Dbcs can surround code that is DBCS-only.
#ifdef DBCS
#define Dbcs (x) (x)
#else
#define Dbcs (x) // Do nothing.
#endif

Visual C++ developers can sometimes use inline functions instead of macros, thus gaining the benefit of easy code maintenance without the potential traps that come with simple text substitution.

Notice that because the sample code in Figure 3-6 above calls GetCPInfo, it will work on any Far East edition of Windows. Unfortunately, a large body of existing software uses the values in the above Figure 3-5 to hard-code lead-byte or trail-byte ranges and thus has to be edited and recompiled to work for different DBCS code pages. Keep these functions in mind as you try to spare yourself additional work.