Unicode represents a 16 bit character set – each individual character is 16 bits – these are often called “wide characters”. All the modern characters of the world can be fit within this range of 65,536 characters. Only displayable strings need be unicode – those that the user is going to see or edit on the screen. Strings which represent internal identifiers – such as object names, window class names, resource names, etc., do not need unicode equivalents.
Win32 allows 32 bit applications to be either Unicode or Ansi ascii applications, or even mix Unicode and ansi ascii calls. A strict Unicode or Ansi ascii “mode” approach was not taken because of the problems this approach implies. Therefore every function that can take a displayable string has two counterparts – a unicode and an ansi ascii version of that function.
To keep this straightforward, and to allow applications to share code between 16 bit ansi ascii Win3 and 32 bit Win32, a new type was defined TCHAR. This is a compile-dependent type that can refer to an ansi ascii char or a unicode char. Developers using this type can optionally compile for unicode or ansi by #define-ing the UNICODE label. Additional types are LPTSTR, for pointer to a string, a LPTCH for a pointer to a character. The programmers reference currently uses these types for displayable strings and characters.
In addition, ansi ascii functions end with the character “A”, and unicode functions end with the character “W”. Based on the UNICODE compile switch, functions are #define-ed to link to the “A” or “W” versions of the functions. In addition, applications that want to specifically call unicode or ansi versions of the functions can directly reference the “A” or “W” versions of the functions. For example:
SetWindowText | type independent version (macro) |
SetWindowTextA | ansi ascii specific version |
SetWindowTextW | unicode specific version |
This approach allows existing ansi ascii apps to use existing functions without compile change. It allows developers to share code between 16 and 32 bit platforms when that code is ansi ascii on the 16 bit platform and unicode on the 32 bit platform. It also allows developers to mix types. The full range of string types is therefore:
Character Set | String Pointer | Char Pointer | Char |
Win 3 | LPSTR | LPCH | CHAR |
UNICODE | LPWSTR | LPWCH | WCHAR |
Either | LPTSTR | LPTCH | TCHAR |
The programmers reference always refers to displayable characters as of the TCHAR type, and non-displayable characters (object names, for example) as the CHAR type. Additionally, the programmers reference does not specifically refer to the “W” or the “A” versions of the functions.
Window classes can be either ansi ascii or unicode. An app can determine if a window is unicode or ansi ascii by calling the function IsWindowUnicode. See the programmers reference for more detail on these.
Window messages also follow the “W” and “A” conventions. If an application sends an ansi ascii window message to a unicode application, that message will be translated in route so the receiving window procedure understands that message. This will probably not be too common since applications are likely to be all unicode or all ansi ascii, but this does allow mixing. It also allows for the transparent communication between the windows of different applications.
This brings up problems with window subclassing. If an ansi ascii subclass proc subclasses a unicode window, automatic translation of text sensitive messages will occur. Likewise, if an ansi ascii application sets a “windows hook” which a unicode application calls, automatic translation of text sensitive messages will occur.