Analyzing and Preprocessing Text

Windows 95 supports two convenient, generic API calls that analyze and preprocess text for display: GetCharacterPlacement and GetFontLanguageInfo. These functions are particularly useful for handling Arabic and Hebrew text streams, which always require reordering. You'll notice that many of the flags in Figures 6-7 and 6-8 are specific to Middle Eastern fonts. If you are writing a new Middle Eastern application for Windows 95 or porting a Windows 3.1–based Middle Eastern application, be sure to use these new API calls and the Win32 extended styles for the Middle East instead of the extended styles and API calls defined by Arabic and Hebrew Windows 3.1. Windows 3.1–based Middle Eastern applications will run on Middle Eastern editions of Windows 95, and applications written using the Win32 text layout API calls will also run on other systems if the Middle Eastern libraries are installed. For more information on compatibility issues between Arabic and Hebrew editions of Windows 3.1 and Windows 95, please consult "Writing a BiDi Application" in the Windows 95 SDK documentation.

GetFontLanguageInfo Meaning
Return Flag  
FLI_GLYPHS The font contains additional glyphs that are generally not encoded in the code page. Use GetCharacterPlacement to access these glyphs.
   
GCP_DBCS The charset is DBCS.
   
GCP_DIACRITIC The font language contains glyphs with diacritics.
   
GCP_GLYPHSHAPE The font language contains multiple glyphs per code point or per code point combination (to support shaping and/or ligation), as well as advanced glyph tables that provide glyphs for the extra shapes. If this flag is set,the GlyphIndex array should be used in calls to GetCharacterPlacement, and the ETO_GLYPHINDEX flag should also be passed into the ExtTextOut call when thestring is drawn.
   
GCP_KASHIDA The font and language support kashidas.
   
GCP_LIGATE The font language contains glyphs representing ligatures that can be substituted for specific character combinations.
   
GCP_REORDER The font covers languages that require the reordering of characters for display—for example, Arabic and Hebrew.
   
GCP_USEKERNING The font contains a kerning table that can be used to improve spacing between the characters or glyphs.


Figure 6-7 Return flags for the function GetFontLanguageInfo.

GetCharacterPlacement is a preprocessing function that will return detailed information about a string. When you call GetCharacterPlacement, you need to specify what kind of preprocessing you would like it to do. Your first step, then, should be to call GetFontLanguageInfo, which analyzes a device context and tells you, by setting one or more of the flags listed in Figure 6-7 (above), whether text displayed in the currently selected font requires any special processing. If the return value is 0, the selected font represents plain Latin characters and has no special properties. If the return value contains any flags, you can mask them with the constant FLI_MASK and pass the result to GetCharacterPlacement.

In addition to the flags you get back from GetFontLanguageInfo, you can specify other flags to request preprocessing from GetCharacterPlacement. (See Figure 6-8.)

GetCharacter- Can Be Retrieved Meaning
Placement Flag Using GetFontLanguageInfo  
     
GCP_CLASSIN No The lpClass array contains preset classifications for characters. Anyunknown classifications should beset to zero in the lpClass array.
     
GCP_DIACRITIC Yes If the string contains diacritics, you must specify this flag or the function will ignore them and remove them from output arrays. This is useful for languages that support diacritics but do not always display them, such as Hebrew.
     
GCP_DISPLAYZWG No Display characters that do not typically display, such as left to right and right to left markers.
     
GCP_GLYPHSHAPE Yes Display characters using alternate shapes, if appropriate. Arabic characters, for example, change shape depending on their position in a string (initial, medial, final, or isolated).
     
GCP_JUSTIFY No Justify the lpDx array by microspacing the characters. The call will pad the extents until the string length reaches nMaxExtent, and it will strip the last word from the result if it extends beyond the limit. You must also specify the GCP_MAXEXTENT flag.
     
GCP_JUSTIFYIN No Justify the string by adjusting the characters as specified in the lpDx array. For example, for non-Arabic fonts, a value of 1 in the lpDx array means that a character can be microspaced.
     
GCP_KASHIDA Yes Use kashidas in addition to or instead of adjusted extents to justify text. You must also specify the GCP_JUSTIFY flag. Call GetFontLanguageInfo first to determine whether the font supports kashidas.
     
GCP_LIGATE Yes Where characters ligate, use the ligations. To get meaningful results, you must also specify the GCP_REORDER flag if it is usually required for the charset.
     
GCP_MAXEXTENT No Process the string only until the logical width reaches nMaxExtent, or until all the characters in the string have been processed.
     
GCP_NEUTRALOVERRIDE No Treat neutral characters, such as punctuation, as characters with strong directionality that matches the directionality of the rest of the string.
     
GCP_NUMERICOVERRIDE No Treat numeric characters in Arabic and Hebrew text as characters with strong directionality that matches the directionality of the rest of the string.
     
GCP_NUMERICSLATIN No Override the system default and use standard Latin glyphs for numeric characters.
     
GCP_NUMERICSLOCAL No Override the system default and use local glyphs for numeric characters.
     
GCP_REORDER Yes Reorder the string for display. Used primarily in the context of Arabic and Hebrew languages, which store text elements in logical order but display them right to left.
     
GCP_SYMSWAPOFF No Do not swap characters such as the open parenthesis and close parenthesis ( and ) in a right-to-left string.
     
GCP_USEKERNING Yes If the font supports kerning, use kerning to adjust the lpDx array. Some charsets require kerning for proper font rendering.

Figure 6-8 Flags that you can specify for GetCharacterPlacement.

GetCharacterPlacement is a very useful, multipurpose function, especially for non-Latin languages. It can kern, shape, justify, and reorder the string you pass in; set the caret position; and clip a string, if necessary, according to a specified maximum extent. Particularly if your application will handle text in different languages, you're better off calling GetCharacterPlacement instead of GetTextExtent—and in some cases, GetCharWidth—because GetCharacterPlacement will work in any international setting. Figure 6-9 (below) and Figure 6-10 (below) illustrate the properties of Arabic text that GetCharacterPlacement flags help you identify.

Figure 6-9 Glyph shaping in Arabic.

Figure 6-10 Arabic text justified with kashidas, which are horizontal connecting lines added between some characters.

GetCharacterPlacement returns a pointer to a GCP_RESULTS structure. You can pass the lpDx and lpGlyphs fields directly to the ExtTextOut API call.

typedef struct tagGCP_RESULTS {
DWORD lStructSize;
LPTSTR lpOutString;
UINT* lpOrder;
INT* lpDx;
INT* lpCaretPos;
LPTSTR lpClass;
UINT* lpGlyphs;
UINT nGlyphs;
UINT nMaxFit;
} GCP_RESULTS;