Far Eastern Input Method Editors

Glossary

Input method: Any method used to enter text that doesn't involve typing each character directly. Input methods are widely used for entering ideographs and other characters phonetically or component by component.
Romaji: A writing system based on the Latin alphabet that is used to represent Japanese text.
Bopomofo: A Chinese standard phonetic script developed in 1913.
Determined string: A string that has been converted from a phonetic representation into ideographs.
Conversion or composition window: The window of an Input Method Editor that displays text typed by the user, either as entered or as converted to ideographic form.
Status window: The window of an Input Method Editor in which the user can change the IME's conversion mode or input mode.
Candidate window: The window of an Input Method Editor that lists characters that the user can choose to replace the text highlighted in the composition window.

Input Method Editors, also called front-end processors, are applets that allow the user to enter the thousands of different characters used in Far Eastern written languages using a standard 101-key keyboard. The user composes each character in one of several ways: by radical, by phonetic representation, or by typing in the character's numeric code page index. IMEs are widely available; Windows ships with standard IMEs that are based on the most popular input methods used in each target country, and a number of third-party vendors sell IME packages.

An IME consists of an engine that converts keystrokes into phonetic and ideographic characters, plus a dictionary of commonly used ideographic words. As the user enters keystrokes, the IME engine attempts to guess which ideographic character or characters the keystrokes should be converted into. Because many ideographs have identical pronunciation, the IME engine's first guess isn't always correct. When the suggestion is incorrect, the user can choose from a list of homophones; the homophone that the user selects then becomes the IME engine's first guess the next time around. This process is summarized in Figure 7-5.

Figure 7-5 The process through which an IME engine converts keystrokes into ideographic characters.

You don't have to use a localized keyboard to enter ideographic characters. While localized keyboards can generate phonetic syllables (such as kana or hangul) directly, the user can represent phonetic syllables using Latin characters. In Japanese, Latin characters that represent kana are called romaji. Japanese keyboards contain extra keys that allow the user to toggle between entering romaji and entering kana. If you are using a non-Japanese keyboard, you need to type in romaji in order to generate kana.

The best way to learn how an IME works from the user's perspective is to try using it. The next sections take a look at the Chinese, Japanese, and Korean IMEs that ship with Windows NT 3.5 and Windows 95.