cmap: Character To Glyph Index Mapping

The 'cmap' table defines the mapping of character codes to the glyph index values used in the font.

Requirements

Fonts should support the full character set defined by the code page(s) for the target market(s). Only those characters should be included in the 'cmap', to ensure no conflicts with present and future EUDC (End-User Defined Characters).
All 'cmap' subtables for Microsoft platforms should use Format 4. Problems with complex Format 4 subtables under Windows 3.1 have been addressed in all later Microsoft platforms.
The character set is separate from the encoding. Microsoft strongly recommends using the Unicode encoding. The cmap table should contain a subtable with Platform ID = 3 (Microsoft), Encoding ID = 1 (Unicode), Format = 4.
The subtables must be stored in sorted order by Platform ID and Encoding ID.
For every 'cmap' subtable defined, a corresponding complete set of NameRecords must exist in the 'name' table, using the same Platform and Encoding IDs as the cmap subtable.

Recommendations

In addition to appropriate local code page(s), Microsoft suggests supporting the WGL4 character set, defined in the TrueType Font File spec. The WGL4 set includes complete sets of Latin, Greek, and Cyrillic characters, which extend the subsets included in X-JIS, GB2312-80, and other FE standards. This extension will support better document portability and electronic communication between Far East and European platforms.