Appendix B – Calculation of font (FTC) and language (LID)

Certain Unicode characters are shared between Far East and non-Far East scripts requiring the calculation of font and language based on the Unicode character code and the chp.idctHint property.

Characters are classified into one of four groups, ASCII, Far East, floating, and non-Far East. Properties are calculated as follows:

Character type

Font (ftc)

Language (lid)

ASCII

sprmCRgftc0

sprmCRglid0

non-Far East

sprmCRgftc2

sprmCRglid0

Far East

sprmCRgftc1

sprmCRglid1

shared character

sprmCRgftc2 if chp.idctHint is 0
sprmCRgftc1 if chp.idctHint is 1

sprmCRglid0 if chp.idctHint is 0

sprmCRglid1 if chp.idctHint is 1


The table below defines the classification of various ranges of Unicode characters:

Unicode subrange

character range

Classification

usrBasicLatin

0x20->0x7f

ASCII

usrLatin1

0xa0->0xff

some shared (see notes below)

usrLatinXA

0x100->0x17f

some shared (see notes below)

usrLatinXB

0x180->0x24f

some shared (see notes below)

usrIPAExtensions

0x250->0x2af

some shared (see notes below)

usrSpacingModLetters

0x2b0->0x2ff

shared

usrCombDiacritical

0x300->0x36f

shared

usrBasicGreek

0x370->0x3cf

shared

usrGreekSymbolsCop

0x3d0->0x3ff

non-Far East

usrCyrillic

0x400->0x4ff

shared

usrArmenian

0x500->0x58f

non-Far East

usrBasicHebrew

0x5d0->0x5ff

non-Far East

usrHebrewXA

0x590->0x5cf

non-Far East

usrBasicArabic

0x600->0x652

non-Far East

usrArabicX

0x653->0x6ff

non-Far East

usrDevangari

0x900->0x97f

non-Far East

usrBengali

0x980->0x9ff

non-Far East

usrGurmukhi

0xa00->0xa7f

non-Far East

usrGujarati

0xa80->0xaff

non-Far East

usrOriya

0xb00->0xb7f

non-Far East

usrTamil

0x0b80->0x0bff

non-Far East

usrTelugu

0x0c00->0x0c7f

non-Far East

usrKannada

0x0c80->0x0cff

non-Far East

usrMalayalam

0x0d00->0x0d7f

non-Far East

usrThai

0x0e00->0x0e7f

non-Far East

usrLao

0x0e80->0x0eff

non-Far East

usrBasicGeorgian

0x10d0->0x10ff

non-Far East

usrGeorgianExtended

0x10a0->0x10cf

non-Far East

usrHangulJamo

0x1100->0x11ff

non-Far East

usrLatinExtendedAdd

0x1e00->0x1eff

shared

usrGreekExtended

0x1f00->0x1fff

non-Far East

usrGeneralPunct

0x2000->0x206f

shared

usrSuperAndSubscript

0x2070->0x209f

shared

usrCurrencySymbols

0x20a0->0x20cf

shared

usrCombDiacriticsS

0x20d0->0x20ff

shared

usrLetterlikeSymbols

0x2100->0x214f

shared

usrNumberForms

0x2150->0x218f

shared

usrArrows

0x2190->0x21ff

shared

usrMathematicalOps

0x2200->0x22ff

shared

usrMiscTechnical

0x2300->0x23ff

shared

usrControlPictures

0x2400->0x243f

shared

usrOpticalCharRecog

0x2440->0x245f

shared

usrEnclosedAlphanum

0x2460->0x24ff

shared

usrBoxDrawing

0x2500->0x257f

shared

usrBlockElements

0x2580->0x259f

shared

usrGeometricShapes

0x25a0->0x25ff

shared

usrMiscDingbats

0x2600->0x26ff

shared

usrDingbats

0x2700->0x27bf

shared

usrCJKSymAndPunct

0x3000->0x303f

Far East

usrHiragana

0x3040->0x309f

Far East

usrKatakana

0x30a0->0x30ff

Far East

usrBopomofo

0x3100->0x312f

Far East

usrHangulCompatJamo

0x3130->0x318f

Far East

usrCJKMisc

0x3190->0x319f

Far East

usrEnclosedCJKLtMnth

0x3200->0x32ff

Far East

usrCJKCompatibility

0x3300->0x33ff

Far East

usrHangul

0xac00->0xd7a3

Far East

usrReserved1

usrReserved2

usrCJKUnifiedIdeo

0x4e00->0x9fff

Far East

usrPrivateUseArea

0xe000->0xf8ff

shared

usrCJKCompatibilityIdeographs

0xf900->0xfaff

Far East

usrAlphaPresentationForms

0xfb00->0xfb4f

shared

usrArabicPresentationFormsA

0xfb50->0xfdff

shared

usrCombiningHalfMarks

0xfe20->0xfe2f

Far East

usrCJKCompatForms

0xfe30->0xfe4f

Far East

usrSmallFormVariants

0xfe50->0xfe6f

Far East

usrArabicPresentationFormsB

0xfe70->0xfefe

shared

usrHFWidthForms

0xff00->0xffef

Far East

usrSpecials

0xfff0->0xfffd

non-Far East


The table below describes the behavior of the unicode subrange usrLatin1. Shared characters are marked in this table with a 1, while characters marked with a 0 are considered "non-Far East". All other characters in this unicode subrange are considered "non-Far East".

   // 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
      0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, // 0x00a0-0x00af
      1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, // 0x00b0-0x00bf
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00c0-0x00cf
      0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00d0-0x00df
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00e0-0x00ef
      0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00f0-0x00ff
      };
The table below describes the behavior of the unicode range usrLatinXA. Shared characters are marked in this table with a 1, while characters marked with a 0 are considered "non-Far East". All other characters in this unicode subrange are considered "non-Far East".

   // 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
      1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0100-0x010f
      0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0110-0x011f
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0120-0x012f
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0130-0x013f
      0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, // 0x0140-0x014f
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0150-0x015f
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0160-0x016f
In usrLatinXB shared characters are 0x192, 0x1FA, 0x1FB, 0x1FC, 0x1FD, 0x1FE and 0x1FF. All other characters in this unicode subrange are considered "non-Far East".

In usrIPAExtensions shared characters are 0x251, and 0x261.

An optimization is available. If the Far East font chp.ftcFE is 0 and chp.idctHint is 0 and chp.ftcAscii is equal to chp.ftcOther, the font is chp.ftcAscii and the language is chp.lidDefault.