Identifying Writing System Information within a Font

As mentioned earlier, Windows can not determine an intended writing system or language based solely on the glyphs contained in a font. Before giving the user or application writing system options, Windows must know which writing systems a font covers.

Fortunately, fonts contain a great deal of information about their glyphs: in well-designed fonts you'll find hinting instructions, metrics, language information, attachment points for diacritical marks, underline and strikethrough information, and more. Fonts are comprised of many data structures, commonly referred to as tables, each containing specific information.

Language information about a font is stored in the "OS/2" table of the font. This table contains a variety of information about typeface weight, superscripts, strikeouts, ascender/descender values, PANOSE classification, licensing info, and more. For more information about the structure of TrueType Font Files, see the TrueType 1.0 Font File Specification (available on MSDN).

Writing systems covered by the glyphs in a font can be specified according to the Unicode script ranges covered by the font, or the codepages covered by the font. A font manufacturer sets script ranges and/or codepages by setting the appropriate bits of the ulCodePageRange fields or the ulUnicodeRange fields in the OS/2 table of the font. Multiple ranges can be specified for a single font. This encoding can not be changed by the user.

ulUnicodeRange bit settings (OS/2 table)

Bit

Description

0

Basic Latin

1

Latin-1 Supplement

2

Latin Extended-A

3

Latin Extended-B

4

IPA Extensions

5

Spacing Modifier Letters

6

Combining Diacritical Marks

7

Basic Greek

8

Greek Symbols And Coptic

9

Cyrillic

10

Armenian

11

Basic Hebrew

12

Hebrew Extended (A and B blocks combined)

13

Basic Arabic

14

Arabic Extended

15

Devanagari

16

Bengali

17

Gurmukhi

18

Gujarati

19

Oriya

20

Tamil

21

Telugu

22

Kannada

23

Malayalam

24

Thai

25

Lao

26

Basic Georgian

27

Georgian Extended

28

Hangul Jamo

29

Latin Extended Additional

30

Greek Extended

31

General Punctuation

32

Superscripts And Subscripts

33

Currency Symbols

34

Combining Diacritical Marks For Symbols

35

Letterlike Symbols

36

Number Forms

37

Arrows

38

Mathematical Operators

39

Miscellaneous Technical


40

Control Pictures

41

Optical Character Recognition

42

Enclosed Alphanumerics

43

Box Drawing

44

Block Elements

45

Geometric Shapes

46

Miscellaneous Symbols

47

Dingbats

48

CJK Symbols And Punctuation

49

Hiragana

50

Katakana

51

Bopomofo

52

Hangul Compatibility Jamo

53

CJK Miscellaneous

54

Enclosed CJK Letters And Months

55

CJK Compatibility

56

Hangul

57

Hangul Supplementary-A

58

Hangul Supplementary-B

59

CJK Unified Ideographs

60

Private Use Area

61

CJK Compatibility Ideographs

62

Alphabetic Presentation Forms

63

Arabic Presentation Forms-A

64

Combining Half Marks

65

CJK Compatibility Forms

66

Small Form Variants

67

Arabic Presentation Forms-B

68

Halfwidth And Fullwidth Forms

69

Specials

70–127

Reserved for Unicode SubRanges


ulCodePageRange bit settings (OS/2 table)

Bit

Codepage

Description

0

1252

Latin 1

1

1250

Latin 2: Eastern Europe

2

1251

Cyrillic

3

1253

Greek

4

1254

Turkish

5

1255

Hebrew

6

1256

Arabic

7

1257

Windows Baltic

8–16

Reserved for Alternate ANSI

17

874

Thai

18

932

JIS/Japan

19

936

Chinese: Simplified chars--PRC and Singapore

20

949

Korean Wansung

21

950

Chinese: Traditional chars--Taiwan and Hong Kong SAR, China

22–29

Reserved for Alternate ANSI & OEM

30

Macintosh Character Set (Standard Roman)

31

Symbol Character Set

32-47

Reserved for OEM

48

869

IBM Greek

49

866

MS-DOS Russian


50

865

MS-DOS Nordic

51

864

Arabic

52

863

MS-DOS Canadian French

53

862

Hebrew

54

861

MS-DOS Icelandic

55

860

MS-DOS Portuguese

56

857

IBM Turkish

57

855

IBM Cyrillic; primarily Russian

58

852

Latin 2

59

775

MS-DOS Baltic

60

737

Greek; former 437 G

61

708

Arabic; ASMO 708

62

850

WE/Latin 1

63

437

US