Platform SDK: International Features

Displaying Text with Uniscribe

An application that uses complex scripts has problems with a simple approach to formatting and display. First, the width of a complex script character depends on its context. It is not possible to save the widths in simple tables. Second, breaking between words in scripts like Thai requires dictionary support since there is no separator character between Thai words. Third, Arabic, Hebrew, Farsi, Urdu and other bidirectional text requires reordering before display. And finally, some form of font association is often required to easily use complex scripts.

To deal adequately with these issues, Uniscribe uses the paragraph as the unit for display. Note, this means that Uniscribe must be used for the entire paragraph, even if sections of the paragraph are not complex scripts.

Before using Uniscribe, an application divides the paragraph into runs, that is, a string of characters with the same style. The style depends on what the application has implemented, but typically includes such attributes as font, size, and color. Uniscribe divides the paragraph into items -- strings that have the same script and direction. The application applies the item information to produce runs that are unique in script and direction.

Uniscribe identifies the clusters in each run and determines the size of each cluster. A cluster is a script-defined, indivisible character grouping. For European languages, a cluster is a single character, but, in languages such as Thai, it is a grouping of glyphs. Uniscribe sums the clusters to determine the size of a run. Then the application sums the lengths of the runs until they overflow a line (or reach the margin), and divides the run that overflows the line between the current line and the next line. For each line, a map is built from visual position to a run. For each run, the code points are shaped into glyphs, which are then positioned and rendered.

With this overview in mind, we can look at the process in detail and how Uniscribe fits in. An application does text layout, or formatting, one time. Then it either saves the glyphs and positions for display purposes or it generates them each time it displays the text. The trade-off is speed vs. memory. Typically, an application will generate the glyphs and positions each time for display, so the process is presented as a layout procedure and a display procedure.

To Lay out Text Using Uniscribe

This procedure assumes that the application has already divided the paragraph into runs.

  1. Call ScriptRecordDigitSubstitution only when the application starts, or when receiving a WM_SETTINGCHANGE message.
  2. (optional) Call ScriptIsComplex to determine if the paragraph requires complex processing.
  3. For automatic digit substitution, call ScriptApplyDigitSubstitution to prepare the SCRIPT_CONTROL and SCRIPT_STATE structures in ScriptItemize. If the application does its own reordering and layout, it must substitute the proper digits for Unicode U+0030 through U+0039 (the Western digits).
  4. Call ScriptItemize to divide the paragraph into items. If an application already knows the bidirectional order -- for example, because of the keyboard layout used to enter the character -- it can call ScriptItemize with NULL for the SCRIPT_CONTROL and SCRIPT_STATE parameters. This generates items only by shaping engine. The application can then reorder the items using its information.
  5. Merge the item information with the run information to produce runs with a single style, script, and direction.
  6. Call ScriptGetCMap to assign a font to a run and get glyphs. If some glyphs are not supported by the font, either substitute another font or set the eScript member to SCRIPT_UNDEFINED.
  7. Call ScriptShape to identify clusters and generate glyphs.
  8. Call ScriptPlace to generate advance widths and x and y positions for the run width.
  9. Sum the run widths until the line overflows.
  10. Break the run on a word boundary by using the fSoftBreak and fWhiteSpace members in the logical attributes. To break a single character cluster off the run, use the information returned by calling ScriptBreak.

This completes layout of the line. Repeat steps 6 through 10 for each line in the paragraph. However, if the application needed to break the last run on the line, call ScriptShape to reshape the remaining part of the run as the first run on the next line.

To Display Text Using Uniscribe

This procedure is done for each line. It assumes that the text has already been laid out using Uniscribe, and that the glyphs and positions from the layout process were not saved. If speed is a concern, an application can save the glyphs and positions from the layout procedure and start at #2.

  1. For each run, in logical order.
    1. If the style has changed since the last run, update the hdc.
    2. Call ScriptShape to generate glyphs for the run.
    3. Call ScriptPlace to generate an advance width and an x,y offset for each glyph.
  2. Call ScriptLayout to establish the correct display order for the runs within this line.
  3. (optional) To justify the text, either call ScriptJustify or use specialized knowledge of the text. For more information, see Related Processing by Uniscribe.
  4. For each run, in visual order, call ScriptTextOut to render the glyphs.