Unicode in Win32s and Windows 95

Glossary

Clipboard: A Windows utility used as a buffer for copying and pasting text.
Win32s API: A subset of the Win32 API that makes it possible to create a single binary that runs on Windows 3.1 and all 32-bit versions of the Windows platform.

The 16-bit Windows (Win16) API contains no Unicode support at all. It's possible to create a Unicode-based program for Windows 3.1, but the application must carry character conversion mapping tables and routines, and possibly specialized font support. Creating a new 16-bit Unicode-based application for Windows 3.1 doesn't make sense at this point, because 32-bit applications for Windows are now standard.

The most Unicode-accessible alternative available for Windows 3.1 is the Win32s API. Applications written for Win32s, a subset of the Win32 API, can run on Windows 3.1, Windows NT, and Windows 95 without being recompiled. Win32s supports ANSI versions of the Win32 API entry points but excludes virtually all of the wide-character entry points because neither Windows 3.1 nor Windows 95 can support them. For example, the Windows 3.1 graphics device interface (GDI) and file allocation table (FAT) do not support Unicode. An application that calls the wide-character variants of the Win32 API—a Win32 application compiled with the UNICODE flag defined, for instance—cannot run on Windows 3.1 using Win32s or on Windows 95. Errors indicating that the -W API calls are not implemented will appear under Windows 3.1. Under Win32s and Windows 95, the -W API entry points are stubbed; your application will get a return value indicating that the call failed.

Although it does not support wide-character API entry points, Win32s does contain two important API functions—MultiByteToWideChar and WideCharToMultiByte—that can convert data between Unicode and local Windows code pages. For example, an application running on a Japanese edition of Windows 3.1 could call these two functions to convert between the Unicode and Shift-JIS encodings.

Windows 95 inherited these conversion functions and supports a handful of low-level wide-character API calls for Unicode text output, such as TextOutW, GetCharWidthW, and GetTextExtentPointW. Thus a non-Unicode application running on Windows 95 can share a Unicode-based file format with a sister application that is targeted for Windows NT. Using these functions, it's also possible to develop an application for Windows 95 that processes data in Unicode. Because the system API is still limited to the native character set (except for the text functions mentioned above), such applications need to explicitly convert data before calling the system, and only those characters in the native character set can be successfully converted.

For example, if you want a list box to display data from your document, you will either have to convert the data or use an owner-draw list box. An owner-draw control informs you when each item within it needs to be drawn; you can then use TextOutW to do the drawing. If you are using edit controls for input, remember that in Windows 95 the edit control will rely on one of the local character sets. You will have to convert the text returned from the edit control by calling MultiByteToWideChar.

The Win32 API also handles converting data between character sets via the clipboard. Windows 95 supports the same clipboard formats as Windows NT: CF_UNICODETEXT, CF_OEMTEXT, and CF_TEXT (which should really be called CF_ANSITEXT). Any text copied to the clipboard is enumerated in all three formats, so it's possible to cut and paste across applications supporting different character sets. For example, a simple piece of text can be copied from one application in ANSI format and pasted to another in Unicode format or vice versa. Windows 95 also adds the clipboard format CF_LOCALE, which allows applications to mark clipboard text with a specific language and character set.

Functions such as GetStringType and LCMapString operate on only one code page at a time in Windows 95. If you have decided to support Unicode data, you might need to call these functions several times with different code page arguments and then manually combine their output. This requires a little work. Be sure to use the full Unicode versions of these functions when your application is running on Windows NT. Programs not using Unicode can use the Windows95 font charset property to create multilingual rich-text documents.

Because converting data to and from Unicode adds system overhead, Win32-based applications running on Windows 95 might perform more efficiently when based on Windows code pages, depending on how much text processing they do. Keep in mind, however, that Unicode is always more efficient for processing Far Eastern languages, processing multilingual text, and creating a global code base.

You'll need to decide what's best for your application. The developers of the Windows 95 help system weighed these factors and decided to base the full-text search portion of the help engine on Unicode, primarily since the parsing, searching, and indexing algorithms the developers used were not easily portable to variable-width character sets. (See Chapter 5.) The developers of 32-bit OLE also decided to base their system on Unicode.

The following sample illustrates a typical approach to handling conversion between Unicode and local character sets. To optimize performance, you should use a temporary variable on the stack, allocating a buffer only when a string is too long.

To calculate the necessary buffer size, you have to determine whether the Windows character set is single-byte or multibyte. GetCPInfo will return the maximum number of bytes in a character for the given code page:

CPINFO CPInfo;
GetCPInfo(CP_ACP, &CPInfo); // Get info on current code page.
int cmaxCharSize = CPInfo.MaxCharSize;

CP_ACP is a predefined constant that always refers to the currently installed Windows character set. For some sample "data," the literal string constant lpWide is set to a Unicode string using the prefix L notation.

const WCHAR* lpWide = L"Unicode";

BOOL WideSetWindowText(HWND hwnd, LPWSTR lpWide)
{
CHAR ach[20]; // Try a small buffer on the stack.
LPSTR lpsz = ach;
int cchpsz = sizeof ach;

DWORD dwFlags = 0;

// Do the conversion.
int len = WideCharToMultiByte(CP_ACP, dwFlags,
lpWide, -1, lpsz,
cchpsz, NULL, NULL);
if ( !len )
{
if (GetLastError() == ERROR_INSUFFICIENT_BUFFER)
{
int cchWide = wcslen(lpWide);
lpsz = (LPSTR) malloc((cchWide+1)*cmaxCharSize); // usually
// enough

// Try conversion again with the larger buffer.
cchpsz = cchWide;
len = WideCharToMultiByte(CP_ACP, dwFlags,
lpWide, cchWide, lpsz,
cchpsz, NULL, NULL);
if ( !len )
{
free (lpsz);
return FALSE;
}
BOOL returnVal = SetWindowText(hwnd, lpsz);
free(lpsz);
return returnVal;
}
else
{
return FALSE; // some other error
}
}

// Set the Window text and return the result.
return SetWindowText(hwnd, lpsz);
}