Platform SDK: International Features

SCRIPT_PROPERTIES

The SCRIPT_PROPERTIES structure has information about special processing for each script.

typedef struct {
  DWORD   langid              :16;  
  DWORD   fNumeric            :1;
  DWORD   fComplex            :1;
  DWORD   fNeedsWordBreaking  :1;   
  DWORD   fNeedsCaretInfo     :1;
  DWORD   bCharSet            :8;   
  DWORD   fControl            :1;   
  DWORD   fPrivateUseArea     :1;   
  DWORD   fNeedsCharacterJustify :1;
  DWORD   fInvalidGlyph       :1;
  DWORD   fInvalidLogAttr     :1;
  DWORD   fCDM                :1;
  DWORD   fAmbiguousCharSet   :1;
  DWORD   fClusterSizeVaries  :1;
  DWORD   fRejectInvalid      ;1;
} SCRIPT_PROPERTIES;

Members

langid
Primary language and sublanguage associated with the script. When a script is used for many languages, langid represents a default language. For example, Western script is represented by LANG_ENGLISH although it is also used for French, German, and other European languages.
fNumeric
If set, the script contains only digits and the other characters used in writing numbers by the rules of the Unicode bidirectional algorithm. For example, currency symbols, the thousands separator, and the decimal point are classified as numeric when adjacent to or between digits.
fComplex
If set, this is a language whose script requires special shaping or layout. If fComplex is false, the script contains no combining characters and requires no contextual shaping or reordering.
fNeedsWordBreaking
If set, this is a language whose word break placement requires that the application call ScriptBreak and that word break placement include character positions marked as fWordStop in SCRIPT_LOGATTR.

If not set, word break placement is identified by scanning for characters marked as fWhiteSpace in SCRIPT_LOGATTR, or for glyphs marked as uJustify == SCRIPT_JUSTIFY_BLANK or SCRIPT_JUSTIFY_ARABIC_BLANK in SCRIPT_VISATTR.

fNeedsCaretInfo
If set, this is a language that restricts caret placement to cluster boundaries, for example, Thai and Indian. To determine valid caret positions, inspect the fCharStop flag in the logical attributes returned by ScriptBreak, or compare adjacent values in the pwLogClust array returned by ScriptShape.

Note that ScriptXtoCP and ScriptCPtoX automatically apply caret placement restictions.

bCharSet
The nominal charset associated with the script. This charset may be used in a log font when creating a font suitable for displaying this script. Note that for a new script where no charset is defined, bCharSet may be inappropriate. In this case, DEFAULT_CHARSET should be used instead. See the description of fAmbiguousCharSet.
fControl
If set, the script contains only control characters. Note, the converse is not necessarily true—not every control character ends up in a SCRIPT_CONTROL structure.
fPrivateUseArea
If set, the script uses a special set of characters that is privately defined for the Unicode range U+E000 through U+F8FF.
fNeedsCharacterJustify
If set, justification for the script is achieved by increasing the space between all letters, not just between words. When performing inter-character justification, insert extra space only after glyphs marked with SCRIPT_VISATTR.uJustify == SCRIPT_JUSTIFY_CHARACTER.
fInvalidGlyph
If set, this is a script for which ScriptShape generates an invalid glyph to represent invalid sequences. That is, it generates wgInvalid in the glyph buffer. The glyph index of the invalid glyph for a particular font may be obtained by calling ScriptGetFontProperties.
fInvalidLogAttr
If set, this is a script for which ScriptBreak marks invalid combinations by setting fInvalid in the logical attributes buffer.
fCDM
If set, the script contains an item that was analyzed by ScriptItemize as including Combining Diacritical Marks (U+0300 through U+36F).
fAmbiguousCharSet
If set, the script contains characters that are supported by more than one charset. See the Remarks section for more information. The bCharSet member should be set to DEFAULT_CHARSET.
fClusterSizeVaries
If set, this is a script, such as Arabic, in which contextual shaping may cause a string to increase in size when removing characters. An example of this is Arabic.
fRejectInvalid
If set, this is a script, such as Thai, where invalid sequences conventionally cause an editor program—such as Notepad—to beep and ignore keystrokes.

Remarks

This structure is filled by the ScriptGetProperties function.

Many Uniscribe scripts do not correspond directly to 8-bit character sets. In the case where some of their characters are supported by more than one charset, the fAmbiguousCharSet member is set. The Uniscribe client should do further processing to determine which charset to use when requesting a font suitable for the run. For example, it may determine that the run consists of multiple languages and split the run so that a different font is used for each language.

Use the following code during initialization to get a pointer to the SCRIPT_PROPERTIES array:

const SCRIPT_PROPERTIES **g_ppScriptProperties; // Array of pointers 
                                                // to properties
int iMaxScript;
HRESULT hr;

hr = ScriptGetProperties(&g_ppScriptProperties, &g_iMaxScript);

Then inspect the properties of the script of an item 'iItem' as follows:

hr = ScriptItemize( ... , pItems, ... );
...
if (g_ppScriptProperties[pItems[iItem].a.eScript]->fNeedsCaretInfo) 
    {
       // Use ScriptBreak to restrict the caret from entering clusters (for example).
    }

Requirements

  Windows NT/2000: Requires Windows 2000.
  Header: Declared in Usp10.h.

See Also

Uniscribe Overview, Uniscribe Structures, ScriptBreak, ScriptCPtoX, ScriptGetFontProperties, ScriptItemize, ScriptGetProperties, ScriptShape, ScriptXtoCP, SCRIPT_CONTROL, SCRIPT_LOGATTR, SCRIPT_VISATTR