I.2. Key
- nnnn: This is the Unicode code number, in hex.
- *: This is a single mark indicating the origin of the character:
* = Unicode 1.0 character
´ = Unicode 1.0 character (unnamed Hangul syllable block)
† = Unicode 1.0.1 character (moved from the 1.0 position)
‡ = Unicode 1.1 character
- new-name: This is the Unicode 1.1 name, which is identical to the ISO/IEC 10646 name. In many cases, the names differ trivially. For example, the ISO names insert WITH, as in LATIN CAPITAL LETTER A WITH GRAVE versus the Unicode 1.0 LATIN CAPITAL LETTER A GRAVE. The ISO names also use British English spellings, such as OPEN CENTRE CROSS versus the Unicode 1.0 OPEN CENTER CROSS. In some cases, a name was retained for compatibility with existing ISO standards, even though it is less clear, or inaccurate:
APOSTROPHE {old: APOSTROPHE-QUOTE}
LEFT PARENTHESIS {old: OPENING PARENTHESIS}
PILCROW SIGN {old: PARAGRAPH SIGN}
LATIN SMALL LETTER CLOSED REVERSED OPEN E{old: LATIN SMALL LETTER CLOSED REVERSED EPSILON}
Note These names are important, since the ISO/IEC names are used for the identification of characters across different standards. However, names are not always sufficient to identify character usage: You should reference the Unicode 1.0 name, cross references, and aliases for clarity.
- comment: (optional) This is the comment from ISO/IEC 10646.
- old-name: (optional) This is the Unicode 1.0 name, where it differs from the new name.
- decomposition: (optional) This is the canonical decomposition of the Unicode character, where it exists. This is used to determine character equivalency, as per Section 4.4, p. 10. The maximal decomposition normally consists of a sequence of characters separated by "&". Curly brackets (braces) may be used to indicate that a character is optional.8 There may also be indicators in angle brackets, which show that a substitution is dependent on context, or that a particular format is indicated. These are listed below.
Note It is very important to realize that the decompositions are for the purposes of the character equivalency, and do not imply that the character on the left is preferred to the characters on the right. In particular, there are a few halves of characters included for compatibility, such as the following:
[3032]* VERTICAL KANA REPEAT WITH VOICED SOUND MARK = [3034]* VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF & [3035]* VERTICAL KANA REPEAT MARK LOWER HALF
For the purposes of the canonical equivalence algorithm, it is simplest to view the whole character as the sequence of the two parts.