Home | Unicode Tasks | Multibyte Character Set (MBCS) Tasks
To simplify transporting code for international use, the Microsoft run-time library provides Microsoft-specific “generic-text” mappings for many data types, routines, and other objects. You can use these mappings, which are defined in TCHAR.H, to write generic code that can be compiled for single byte, multibyte, or Unicode, depending on a manifest constant you define using a #define statement. Generic-text mappings are Microsoft extensions that are not ANSI compatible.
Using the header file TCHAR.H, you can build single-byte, MBCS, and Unicode applications from the same sources. TCHAR.H defines macros prefixed with _tcs, which, with the correct preprocessor definitions, map to str, _mbs, or wcs functions as appropriate. To build MBCS, define the symbol _MBCS. To build Unicode, define the symbol _UNICODE. To build a single-byte application, define neither (the default). By default, _MBCS is defined for MFC applications.
The _TCHAR data type is defined conditionally in TCHAR.H. If the symbol _UNICODE is defined for your build, _TCHAR is defined as wchar_t; otherwise, for single-byte and MBCS builds, it is defined as char. (wchar_t, the basic Unicode wide character data type, is the 16-bit counterpart to an 8-bit signed char.) For international applications, use the _tcs family of functions, which operate in _TCHAR units, not bytes. For example, _tcsncpy copies n _TCHARs, not n bytes.
Because some SBCS string-handling functions take (signed) char* parameters, a type mismatch compiler warning will result when _MBCS is defined. There are three ways to avoid this warning, listed in order of efficiency:
Preprocessor Directives for Generic-Text Mappings
# define | Compiled Version | Example |
_UNICODE | Unicode (wide-character) | _tcsrev maps to _wcsrev |
_MBCS | Multibyte-character | _tcsrev maps to _mbsrev |
None (the default: neither _UNICODE nor _MBCS defined) | SBCS (ASCII) | _tcsrev maps to strrev |
For example, the generic-text function _tcsrev, defined in TCHAR.H, maps to _mbsrev if you defined _MBCS in your program, or to _wcsrev if you defined _UNICODE. Otherwise _tcsrev maps to strrev. Other data type mappings are provided in TCHAR.H for programming convenience, but _TCHAR is the most useful.
Generic-Text Data Type Mappings
Generic-Text Data Type Name |
_UNICODE & _MBCS Not Defined |
_MBCS Defined |
_UNICODE Defined |
_TCHAR | char | char | wchar_t |
_TINT | int | int | wint_t |
_TSCHAR | signed char | signed char | wchar_t |
_TUCHAR | unsigned char | unsigned char | wchar_t |
_TXCHAR | char | unsigned char | wchar_t |
_T or _TEXT | No effect (removed by preprocessor) | No effect (removed by preprocessor) | L (converts following character or string to its Unicode counterpart) |
For a complete list of generic-text mappings of routines, variables, and other objects, see Appendix B, Generic-Text Mappings in the Run-Time Library Reference.
Note Do not use the str family of functions with Unicode strings, which are likely to contain embedded null bytes. Similarly, do not use the wcs family of functions with MBCS (or SBCS) strings.
The following code fragments illustrate the use of _TCHAR and _tcsrev for mapping to the MBCS, Unicode, and SBCS models.
_TCHAR *RetVal, *szString;
RetVal = _tcsrev(szString);
If _MBCS has been defined, the preprocessor maps this fragment to the code:
char *RetVal, *szString;
RetVal = _mbsrev(szString);
If _UNICODE has been defined, the preprocessor maps this fragment to the code:
wchar_t *RetVal, *szString;
RetVal = _wcsrev(szString);
If neither _MBCS nor _UNICODE has been defined, the preprocessor maps the fragment to single-byte ASCII code:
char *RetVal, *szString;
RetVal = strrev(szString);
Thus you can write, maintain, and compile a single source code file to run with routines that are specific to any of the three kinds of character sets.