SCRIPT_PROPERTIES
The SCRIPT_PROPERTIES structure has information about special processing for each script.
typedef struct {
DWORD langid :16;
DWORD fNumeric :1;
DWORD fComplex :1;
DWORD fNeedsWordBreaking :1;
DWORD fNeedsCaretInfo :1;
DWORD bCharSet :8;
DWORD fControl :1;
DWORD fPrivateUseArea :1;
DWORD fNeedsCharacterJustify :1;
DWORD fInvalidGlyph :1;
DWORD fInvalidLogAttr :1;
DWORD fCDM :1;
DWORD fAmbiguousCharSet :1;
DWORD fClusterSizeVaries :1;
DWORD fRejectInvalid ;1;
} SCRIPT_PROPERTIES;
Members
- langid
- Primary language and sublanguage associated with the script. When a script is used for many languages, langid represents a default language. For example, Western script is represented by LANG_ENGLISH although it is also used for French, German, and other European languages.
- fNumeric
- If set, the script contains only digits and the other characters used in writing numbers by the rules of the Unicode bidirectional algorithm. For example, currency symbols, the thousands separator, and the decimal point are classified as numeric when adjacent to or between digits.
- fComplex
- If set, this is a language whose script requires special shaping or layout. If fComplex is false, the script contains no combining characters and requires no contextual shaping or reordering.
- fNeedsWordBreaking
- If set, this is a language whose word break placement requires that the application call ScriptBreak and that word break placement include character positions marked as fWordStop in SCRIPT_LOGATTR.
If not set, word break placement is identified by scanning for characters marked as fWhiteSpace in SCRIPT_LOGATTR, or for glyphs marked as uJustify == SCRIPT_JUSTIFY_BLANK or SCRIPT_JUSTIFY_ARABIC_BLANK in SCRIPT_VISATTR.
- fNeedsCaretInfo
- If set, this is a language that restricts caret placement to cluster boundaries, for example, Thai and Indian. To determine valid caret positions, inspect the fCharStop flag in the logical attributes returned by ScriptBreak, or compare adjacent values in the pwLogClust array returned by ScriptShape.
Note that ScriptXtoCP and ScriptCPtoX automatically apply caret placement restictions.
- bCharSet
- The nominal charset associated with the script. This charset may be used in a log font when creating a font suitable for displaying this script. Note that for a new script where no charset is defined, bCharSet may be inappropriate. In this case, DEFAULT_CHARSET should be used instead. See the description of fAmbiguousCharSet.
- fControl
- If set, the script contains only control characters. Note, the converse is not necessarily true—not every control character ends up in a SCRIPT_CONTROL structure.
- fPrivateUseArea
- If set, the script uses a special set of characters that is privately defined for the Unicode range U+E000 through U+F8FF.
- fNeedsCharacterJustify
- If set, justification for the script is achieved by increasing the space between all letters, not just between words. When performing inter-character justification, insert extra space only after glyphs marked with SCRIPT_VISATTR.uJustify == SCRIPT_JUSTIFY_CHARACTER.
- fInvalidGlyph
- If set, this is a script for which ScriptShape generates an invalid glyph to represent invalid sequences. That is, it generates wgInvalid in the glyph buffer. The glyph index of the invalid glyph for a particular font may be obtained by calling ScriptGetFontProperties.
- fInvalidLogAttr
- If set, this is a script for which ScriptBreak marks invalid combinations by setting fInvalid in the logical attributes buffer.
- fCDM
- If set, the script contains an item that was analyzed by ScriptItemize as including Combining Diacritical Marks (U+0300 through U+36F).
- fAmbiguousCharSet
- If set, the script contains characters that are supported by more than one charset. See the Remarks section for more information. The bCharSet member should be set to DEFAULT_CHARSET.
- fClusterSizeVaries
- If set, this is a script, such as Arabic, in which contextual shaping may cause a string to increase in size when removing characters. An example of this is Arabic.
- fRejectInvalid
- If set, this is a script, such as Thai, where invalid sequences conventionally cause an editor program—such as Notepad—to beep and ignore keystrokes.
Remarks
This structure is filled by the ScriptGetProperties function.
Many Uniscribe scripts do not correspond directly to 8-bit character sets. In the case where some of their characters are supported by more than one charset, the fAmbiguousCharSet member is set. The Uniscribe client should do further processing to determine which charset to use when requesting a font suitable for the run. For example, it may determine that the run consists of multiple languages and split the run so that a different font is used for each language.
Use the following code during initialization to get a pointer to the SCRIPT_PROPERTIES array:
const SCRIPT_PROPERTIES **g_ppScriptProperties; // Array of pointers
// to properties
int iMaxScript;
HRESULT hr;
hr = ScriptGetProperties(&g_ppScriptProperties, &g_iMaxScript);
Then inspect the properties of the script of an item 'iItem' as follows:
hr = ScriptItemize( ... , pItems, ... );
...
if (g_ppScriptProperties[pItems[iItem].a.eScript]->fNeedsCaretInfo)
{
// Use ScriptBreak to restrict the caret from entering clusters (for example).
}
Windows NT/2000: Requires Windows 2000.
Header: Declared in Usp10.h.
See Also
Uniscribe Overview, Uniscribe Structures, ScriptBreak, ScriptCPtoX, ScriptGetFontProperties, ScriptItemize, ScriptGetProperties, ScriptShape, ScriptXtoCP, SCRIPT_CONTROL, SCRIPT_LOGATTR, SCRIPT_VISATTR