Unicode and the Win32 API

Glossary

Unicode is part of Microsoft's long-term strategy. Currently, most Windows-based applications use Windows 3.1 character sets; some are just beginning to make the transition to Unicode, such as those applications tailored to run on Windows NT. Win32s, Windows 95, and Windows NT all support the Win32 API, but only Windows NT contains full Unicode support. The Win32 API is designed so that all system functions that accept string parameters exist in two flavors: one that expects string parameters to be expressed as "traditional" Windows characters and one that expects string parameters to be expressed in Unicode. Only a single name for each function appears in the Win32 documentation, but there are two different system entry points.

For example,

SetWindowText(HWND, LPTSTR);

in source code becomes either

SetWindowTextA(HWND, LPSTR); // Unicode not defined (default)

or

SetWindowTextW(HWND, LPWSTR); // Unicode defined

Each Win32 function prototype in WINDOWS.H is a macro that expands based on whether the compile-time symbol UNICODE is defined (usually by adding -DUNICODE to the compiler's command line). If the UNICODE flag is defined, the compiler appends a W (for Wide character) to the function names. If the Unicode flag is not defined, the compiler appends an A (for ANSI). WINDOWS.H also defines generic data types (TCHAR, LPTSTR) and data structures. With generic declarations, it is possible to maintain a single set of source files and compile them for either Unicode or ANSI support, as the following figure illustrates.

Most editors and compilers cannot accept Unicode text directly for string and character literals. Visual C++ lets you prefix a literal with an L to indicate Unicode, as shown here:

LPWSTR str = L"This is a Unicode string";

In the source file, the string is expressed in the code page that the editor or compiler understands. When compiled, the characters are converted to Unicode. The Win32 SDK resource compiler also supports the L prefix notation, even though it can interpret Unicode source files directly. WINDOWS.H defines a macro called TEXT() that will translate string literals to Unicode, depending on whether the UNICODE flag is set.

LPTSTR str = TEXT("This is a generic string");

The L prefix also tells the resource compiler that hex escape sequences will consist of four digits instead of two.

Literal Result
L"\x2326" "Ö"
"\x2326" "#26"

In this way, Unicode characters that do not exist in the code page of the source file, such as the keyboard symbol Ö or an ideographic character, can be expressed as literals.