4.4.1. Maximal Decomposition

The maximal decomposition contained in Appendix I: Unicode 1.1 Character List, p. 43 does contain some compatibility characters that are not in the compatibility zone. In general, compatibility characters are those that would not otherwise have been encoded, because they are in some sense variants of characters that have already been coded. The prime examples are the glyph variants in the compatibility zone: halfwidth characters, Arabic contextual form glyphs, and Arabic ligatures.

Historically, a number of such characters were added to Unicode before there was a recognized "Compatibility Zone." Examples of these include Roman numerals, such as the IV "character." By the time such a zone was distinguished, it was impractical to move those characters to the zone. However, for the companies that form the Unicode consortium, it is important to be able to identify which characters are compatibility characters so that Unicode systems can treat them in a uniform way.

Extreme care must be taken not to make artificial distinctions among characters. This is the reason, for example, that the IPA characters are identified with the Latin characters where possible. Users become very confused when they see an Å on the screen, but their search dialog does not find it—because it is not an Å (A-ring), it is an Å (Angstrom). Normally, many characters have different usages, such as "," for either thousands-separator (English) or decimal-separator (French). Unicode tries to avoid duplicating characters just because of specific usage in different languages.

Identifying a character A as a compatibility variant of another character B implies that generally A can be remapped to B without loss of information other than formatting. Such remapping cannot always take place: many of the compatibility characters are in place just to allow systems to maintain one-to-one mappings to existing code sets. In such cases, a remapping would lose information that is felt to be important in the original set. A complete set of mappings is supplied in Appendix I: Unicode 1.1 Character List, p. 43, but implementations may choose to use a subset of these mappings in specific domains because of these issues.

It is important to realize that the compatibility mappings are specific to a version of Unicode. Some changes may occur as the result of character additions in the future. For example, if new precomposed characters are added, it may result in the addition of mappings between those characters and the corresponding composed character sequences. If a new non-spacing mark is added, it may produce decompositions for precomposed characters that did not previously have decompositions.

Note A large number of mappings are introduced by the very large number of additional characters in Unicode 1.1 from ISO/IEC 10646-1. Should errors be found in the mapping list in the future, errata notices will be made available through Unicode, Inc., and on the corresponding unicode.org FTP site.4