Platform SDK: International Features

ScriptItemize

The ScriptItemize function breaks a Unicode string into individually shapeable items.

HRESULT WINAPI ScriptItemize(
  const WCHAR *pwcInChars, 
  int cInChars, 
  int cMaxItems, 
  const SCRIPT_CONTROL *psControl, 
  const SCRIPT_STATE *psState, 
  SCRIPT_ITEM *pItems, 
  int *pcItems 
);

Parameters

pwcInChars
[in] Pointer to a Unicode string to be itemized.
cInChars
[in] Number of characters in pwcInChars to be itemized.
cMaxItems
[in] Maximum number of SCRIPT_ITEM structures to process.
psControl
[in] Pointer to a SCRIPT_CONTROL structure containing flags indicating the type of itemization to be performed. Use NULL if this is not needed.
psState
[in] Pointer to a SCRIPT_STATE structure indicating the initial bidirectional algorithm state. Use NULL if this is not needed.
pItems
[out] Pointer to a buffer to receive the SCRIPT_ITEM structures processed. The buffer pointed to by pItems should be cMaxItems * sizeof(SCRIPT_ITEM) bytes in length.
pcItems
[out] Pointer to a variable to receive the number of SCRIPT_ITEM structures processed.

Return Values

If the function succeeds, the return value is zero.

If the function fails, it returns a nonzero value. The function returns E_INVALIDARG if pwcInChars is NULL or cInChars is 0 or pItems is NULL or cMaxItems < 2.

The function returns E_OUTOFMEMORY if the output buffer length (cMaxItems) is insufficient. Note that in this case, as in all error cases, no items have been fully processed—so no part of the output array contains defined values.

If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.

Remarks

Items are delimited by either a change of shaping engine or a change of direction.

The client may create multiple runs from each SCRIPT_ITEM returned by ScriptItemize, but should not combine multiple items into a single run. The reason for this is that later the client will call ScriptShape for each run (when measuring or rendering), and must pass the SCRIPT_ANALYSIS structure that ScriptItemize returned. Each SCRIPT_ITEM contains a SCRIPT_ANALYSIS structure.

If psControl and psState are NULL on entry, ScriptItemize breaks the Unicode string purely by character code. If the parameters are all non-NULL, ScriptItemize performs a full Unicode bidirectional analysis.

The ScriptItemize function always adds a terminal item to the item analysis array (pItems) such that the length of an item at pItem is always available as (in the case of one item):

pItem[1].iCharPos - pItem[0].iCharPos

For this reason, it is invalid to call ScriptItemize with a buffer of less than two SCRIPT_ITEM structures.

To perform a correct Unicode bidirectional analysis, the SCRIPT_STATE structure should be initialized according to the reading order at paragraph start, and ScriptItemize should be passed the whole paragraph.

The bidirectional stack is not large, just 16 bytes. It should be shared between calls.

The fRTL member of SCRIPT_ANALYSIS (referenced in SCRIPT_ITEM) and the fNumeric member of SCRIPT_PROPERTIES (which is returned by ScriptGetProperties) together provide the same classification as the lpClass member of GCP_RESULTS that is referenced by lpResults in GetCharacterPlacement.

If shaping is disabled (fDisableGlyphShape in SCRIPT_STATE), complex scripts are substituted by SCRIPT_UNDEFINED, causing shaping to be performed with contextual substitution following the one-to-one code point to glyph mapping provided by the fonts cmap table. The rendering direction is still set appropriately.

European digits U+0030 through U+0039 may be rendered as national digits as shown in the following table.

fDigitSubstitute FContextDigits Digit shapes displayed for Unicode U+0030 through U+0039
False Any Western (European / American) digits
True False As specified in SCRIPT_CONTROL.uDefaultLanguage.
True True As prior strong text, defaulting to SCRIPT_CONTROL.uDefaultLanguage.

Note that in context digit mode, any digits encountered before the first letters are rendered in SCRIPT_CONTROL.uDefaultLanguage if that script is in the same direction as the output, and in Arabic-Indic, that is, Western, digits if the direction is opposite. For example if SCRIPT_CONTROL.uDefaultLanguage is LANG_ARABIC, initial digits will be in Arabic-Indic in a RTL embedding, but in Western, which is also known as Arabic, in a LTR embedding.

Effect of Unicode control characters on SCRIPT_STATE.

SCRIPT_STATE flag Set by Cleared by
fDigitSubstitute NADS NODS
fInhibitSymSwap ISS ASS
fCharShape AAFS IAFS

SCRIPT_STATE.fArabicNumContext controls the Unicode EN-AN rule. At the beginning of a paragraph it should normally be initialized to TRUE for an Arabic locale, FALSE for any other. The ScriptItemize function will update it as it processes strong text.

Requirements

  Windows NT/2000: Requires Windows 2000.
  Header: Declared in Usp10.h.
  Library: Use Usp10.lib.

See Also

Uniscribe Overview, Uniscribe Functions, GetCharacterPlacement, ScriptItemize, ScriptShape, GCP_RESULTS, SCRIPT_ANALYSIS, SCRIPT_CONTROL, SCRIPT_ITEM, SCRIPT_PROPERTIES, SCRIPT_STATE