International Enabling
Home | Unicode Tasks | Multibyte Character Set (MBCS) Tasks
Most traditional C and C++ code makes assumptions about character and string manipulation that do not work well for international applications. While both MFC and the run-time library support Unicode or MBCS, there is still work for you to do. To guide you, this section explains the meaning of “international enabling” in Visual C++:
- Both Unicode and MBCS are enabled by means of portable data types in MFC function parameter lists and return types. These types are conditionally defined in the appropriate ways, depending on whether your build defines the symbol _UNICODE or the symbol _MBCS (which means DBCS). Different variants of the MFC libraries are automatically linked with your application, depending on which of these two symbols your build defines.
- Class library code uses portable run-time functions and other means to ensure correct Unicode or MBCS behavior.
- You still must handle certain kinds of internationalization tasks in your code:
- Use the same portable run-time functions that make MFC portable under either environment.
- Make literal strings and characters portable under either environment, using the _T macro. For more information, see Generic-Text Mappings in TCHAR.H.
- Take precautions when parsing strings under MBCS. These precautions are not needed under Unicode. For more information, see MBCS Programming Tips.
- Take care if you mix ANSI (8-bit) and Unicode (16-bit) characters in your application. It’s possible to use ANSI characters in some parts of your program and Unicode characters in others, but you cannot mix them in the same string.
- Don’t “hard-code” strings in your application. Instead, make them STRINGTABLE resources by adding them to the application’s .rc file. Your application can then be localized without requiring source code changes or recompilation. For more information on STRINGTABLE resources, see the String Editor documentation in Visual C++ User’s Guide.
Note European and MBCS character sets have some characters, such as accented letters, with character codes greater than 0x80. Since most code uses signed characters, these characters greater than 0x80 are sign-extended when converted to int. This is a problem for array indexing because the sign-extended characters, being negative, will index outside the array.
Languages that use MBCS, such as Japanese, are also unique. Since a character may consist of one or two bytes, you should always manipulate both bytes at the same time.
See Also Internationalization Strategies