Appendix B - Calculation of font (FTC) and language (LID)

Appendix B – Calculation of font (FTC) and language (LID)

Certain Unicode characters are shared between Far East and non-Far East scripts requiring the calculation of font and language based on the Unicode character code and the chp.idctHint property.

Characters are classified into one of four groups, ASCII, Far East, floating, and non-Far East. Properties are calculated as follows:

Character type	Font (ftc)	Language (lid)
ASCII	sprmCRgftc0	sprmCRglid0
non-Far East	sprmCRgftc2	sprmCRglid0
Far East	sprmCRgftc1	sprmCRglid1
shared character	sprmCRgftc2 if chp.idctHint is 0 sprmCRgftc1 if chp.idctHint is 1	sprmCRglid0 if chp.idctHint is 0 sprmCRglid1 if chp.idctHint is 1

The table below defines the classification of various ranges of Unicode characters:

Unicode subrange	character range	Classification
usrBasicLatin	0x20->0x7f	ASCII
usrLatin1	0xa0->0xff	some shared (see notes below)
usrLatinXA	0x100->0x17f	some shared (see notes below)
usrLatinXB	0x180->0x24f	some shared (see notes below)
usrIPAExtensions	0x250->0x2af	some shared (see notes below)
usrSpacingModLetters	0x2b0->0x2ff	shared
usrCombDiacritical	0x300->0x36f	shared
usrBasicGreek	0x370->0x3cf	shared
usrGreekSymbolsCop	0x3d0->0x3ff	non-Far East
usrCyrillic	0x400->0x4ff	shared
usrArmenian	0x500->0x58f	non-Far East
usrBasicHebrew	0x5d0->0x5ff	non-Far East
usrHebrewXA	0x590->0x5cf	non-Far East
usrBasicArabic	0x600->0x652	non-Far East
usrArabicX	0x653->0x6ff	non-Far East
usrDevangari	0x900->0x97f	non-Far East
usrBengali	0x980->0x9ff	non-Far East
usrGurmukhi	0xa00->0xa7f	non-Far East
usrGujarati	0xa80->0xaff	non-Far East
usrOriya	0xb00->0xb7f	non-Far East
usrTamil	0x0b80->0x0bff	non-Far East
usrTelugu	0x0c00->0x0c7f	non-Far East
usrKannada	0x0c80->0x0cff	non-Far East
usrMalayalam	0x0d00->0x0d7f	non-Far East
usrThai	0x0e00->0x0e7f	non-Far East
usrLao	0x0e80->0x0eff	non-Far East
usrBasicGeorgian	0x10d0->0x10ff	non-Far East
usrGeorgianExtended	0x10a0->0x10cf	non-Far East
usrHangulJamo	0x1100->0x11ff	non-Far East
usrLatinExtendedAdd	0x1e00->0x1eff	shared
usrGreekExtended	0x1f00->0x1fff	non-Far East
usrGeneralPunct	0x2000->0x206f	shared
usrSuperAndSubscript	0x2070->0x209f	shared
usrCurrencySymbols	0x20a0->0x20cf	shared
usrCombDiacriticsS	0x20d0->0x20ff	shared
usrLetterlikeSymbols	0x2100->0x214f	shared
usrNumberForms	0x2150->0x218f	shared
usrArrows	0x2190->0x21ff	shared
usrMathematicalOps	0x2200->0x22ff	shared
usrMiscTechnical	0x2300->0x23ff	shared
usrControlPictures	0x2400->0x243f	shared
usrOpticalCharRecog	0x2440->0x245f	shared
usrEnclosedAlphanum	0x2460->0x24ff	shared
usrBoxDrawing	0x2500->0x257f	shared
usrBlockElements	0x2580->0x259f	shared
usrGeometricShapes	0x25a0->0x25ff	shared
usrMiscDingbats	0x2600->0x26ff	shared
usrDingbats	0x2700->0x27bf	shared
usrCJKSymAndPunct	0x3000->0x303f	Far East
usrHiragana	0x3040->0x309f	Far East
usrKatakana	0x30a0->0x30ff	Far East
usrBopomofo	0x3100->0x312f	Far East
usrHangulCompatJamo	0x3130->0x318f	Far East
usrCJKMisc	0x3190->0x319f	Far East
usrEnclosedCJKLtMnth	0x3200->0x32ff	Far East
usrCJKCompatibility	0x3300->0x33ff	Far East
usrHangul	0xac00->0xd7a3	Far East
usrReserved1
usrReserved2
usrCJKUnifiedIdeo	0x4e00->0x9fff	Far East
usrPrivateUseArea	0xe000->0xf8ff	shared
usrCJKCompatibilityIdeographs	0xf900->0xfaff	Far East
usrAlphaPresentationForms	0xfb00->0xfb4f	shared
usrArabicPresentationFormsA	0xfb50->0xfdff	shared
usrCombiningHalfMarks	0xfe20->0xfe2f	Far East
usrCJKCompatForms	0xfe30->0xfe4f	Far East
usrSmallFormVariants	0xfe50->0xfe6f	Far East
usrArabicPresentationFormsB	0xfe70->0xfefe	shared
usrHFWidthForms	0xff00->0xffef	Far East
usrSpecials	0xfff0->0xfffd	non-Far East

The table below describes the behavior of the unicode subrange usrLatin1. Shared characters are marked in this table with a 1, while characters marked with a 0 are considered "non-Far East". All other characters in this unicode subrange are considered "non-Far East".

   // 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
      0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, // 0x00a0-0x00af
      1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, // 0x00b0-0x00bf
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00c0-0x00cf
      0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00d0-0x00df
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00e0-0x00ef
      0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00f0-0x00ff
      };

The table below describes the behavior of the unicode range usrLatinXA. Shared characters are marked in this table with a 1, while characters marked with a 0 are considered "non-Far East". All other characters in this unicode subrange are considered "non-Far East".

   // 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
      1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0100-0x010f
      0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0110-0x011f
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0120-0x012f
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0130-0x013f
      0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, // 0x0140-0x014f
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0150-0x015f
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0160-0x016f

In usrLatinXB shared characters are 0x192, 0x1FA, 0x1FB, 0x1FC, 0x1FD, 0x1FE and 0x1FF. All other characters in this unicode subrange are considered "non-Far East".

In usrIPAExtensions shared characters are 0x251, and 0x261.

An optimization is available. If the Far East font chp.ftcFE is 0 and chp.idctHint is 0 and chp.ftcAscii is equal to chp.ftcOther, the font is chp.ftcAscii and the language is chp.lidDefault.