Appendix A Glossary

–A APIs The Win32 API entry points that expect string parameters to be encoded in ANSI (the local Windows 3.1 code page).

Accelerator An Alt+character combination used to activate menus, menu items, and dialog items in Windows. The character that activates the menu or dialog item is underlined. It is also called a "hot key."

Accented character A character that has a diacritic attached to it. See Extended characters.

Accessibility The extent to which computers are easy to use and available to a wide range of users, including people with disabilities.

Alphabet A set of characters used to spell words in a particular language.

AltGr The Alt key on the right on some non–US Windows keyboard layouts. The AltGr key is equivalent to the Ctrl+Alt key combination and is used to create an alternative shift state for accessing additional characters on some keys.

Alt+Numpad A method of entering characters, usually accented characters, by typing in the character's decimal code with the Numpad keys (Num Lock turned on). In Windows, pressing Alt+<Num> generates an ASCII character. Pressing Alt+0<Num> generates an ANSI character.

ANSI (1) Acronym for the American National Standards Institute. (2) The Microsoft Windows ANSI character set, essentially ISO 8859/x plus additional characters, which was originally based on an ANSI draft standard.

ANSI C The standardized C programming language.

Application programming interface (API) A set of functions supported by the operating system.

Array input method An input method that builds characters using radicals. This method defines 10 basic keystrokes, numbered 0 through 9, that represent basic radicals. The columns of keys beneath each number—for example, on a US keyboard, the letters QAZ beneath 1 and the letters WSX beneath 2—are used to select specific characters.

ASCII Acronym for American Standard Code for Information Interchange, a 7-bit code that is the US national variant of ISO 646.

Banja Double-byte Latin letters.

Base character (1) A character that has meaning independent of other characters. (2) Any graphical character that is not a diacritic.

Beta testing Distributing prerelease software to users and potential customers in order to get feedback and bug reports.

Bidirectional (BiDi) text A mixture of characters that are read from left to right and from right to left. Most Arabic and Hebrew characters, for example, are read from right to left, but numbers and quoted Western terms within Arabic or Hebrew text are read from left to right.

Big-5 The multibyte encoding standardized by Taiwan.

Big font A single font file that contains glyphs representing characters from multiple charsets.

Binary file A file that has been encrypted, encoded, or compiled, as opposed to a plaintext file.

Bitmap font A font whose characters are represented by bitmaps or by a pattern of dots, as opposed to a TrueType font, whose characters are represented by lines and curves. Bitmap fonts are generally less scalable and more jagged than TrueType fonts.

Bopomofo A standard Chinese phonetic script developed in 1913.

Boundary The point of interaction between systems or applications that use different character encodings.

Byte order mark (BOM) The Unicode character U+FEFF—or its non-character mirror-image, U+FFFE—used to indicate the byte order, or its non-character mirror image, of a text stream. The presence of a BOM is a strong clue that a file is encoded in Unicode.

Cairo See Windows NT "Cairo."

Candidate window The window of an Input Method Editor that lists characters that the user can choose to replace the text highlighted in the composition window.

Case The capitalized (uppercase) or uncapitalized (lowercase) form of an alphabetic character, usually in Latin script.

Chang Jei An input method that uses radicals to build Chinese characters. See Radical. Twenty-five radicals are assigned to the letters A through Y. The letter X is used to generate more complex radicals.

Character (1) The smallest abstract element of a writing system or script. A character refers to an abstract idea rather than to a specific shape. (2) A code element. See Glyph.

Charset A set of characters used in Windows. Charsets refer to the same collections of characters as those defined by Windows code pages, but their ID numbers can be expressed in a single byte.

CJK Abbreviation for Chinese, Japanese, and Korean.

Clipboard A Windows utility used as a buffer for copying and pasting text.

Code page An ordered set of characters in which a numeric index (code-point value) is associated with each character. This term is generally used in the context of code pages defined by Windows 3.1 or MS-DOS, and may also be called a "character set" or "charset."

Code point, or code element (1) The minimum bit combination that can represent a unit of encoded text for processing or exchange. (2) An index into a code page.

Common dialogs Standard dialog boxes defined by Windows—such as Open, Save As, Print, and Find—that are used for operations in numerous applications. Applications can call common dialog API functions directly instead of having to supply a custom dialog template and dialog procedure.

Compatibility zone The area in Unicode from U+F900 through U+FFEF that is assigned to characters from other standards. These characters are variants of other Unicode characters.

Composed character, or composite character A text element consisting of a base character (usually Latin) and a diacritic or accent mark.

Console The Windows subsystem that runs character-based applications, as opposed to applications that have a graphical user interface (GUI).

Constant A numeric value, typically an integer, that refers to a character value, the size of a buffer, the position of a character in a string, and so forth. It is assumed that the value does not change during the time a program is running.

Contextual analysis A process for determining how to handle text based on surrounding characters, as in Arabic, in which a letter changes shape depending on its position in a word.

Control Panel A group of Windows utilities used to edit system settings, including international preferences.

Conversion window, or composition window The window of an Input Method Editor that displays text typed by the user, either as entered or as converted to ideographic form.

Country setting The set of preferences in the Windows NT 3.x Control Panel that determine the user default date, time, currency, and number formats.

Cross-platform Portable to more than one operating system.

CTYPE Character type. A flag sent to the API GetStringType that specifies a character property test for the string parameter.

Cultural convention Data or data formats that are specific to a language, local dialect, or geographic location. Examples are currency symbols, date formats, calendars, numerical separators, and sort orders.

Cyrillic script The script used to represent characters in Slavic languages.

Date picture string/time picture string A string used to represent a date or time format—for example, "MMMM dddd yyyy".

Da Yi input method An input method that builds characters using radicals. This method defines 40 basic radicals, arranged on a standard 101-key keyboard and corresponding to the stroke order in which characters are handwritten.

Dead key A key that produces a WM_CHAR value of VK_DEADKEY when pressed. By itself, the dead key does not generate a character. Pressing a dead key followed by another key is one way to generate accented characters. See Appendix R

Decomposition The breakdown of an accented character or a precomposed character into an ordered set of components. For ã, the components are a followed by the combining character ~.

Determined string A string that has been converted from a phonetic representation into ideographs.

Device Driver Kit (DDK) A set of tools and libraries for creating Windows-based software to run hardware devices such as printers.

Diacritic (1) Any mark placed over, under, or through a Latin-based character, usually to indicate a change in phonetic value from the unmarked state. (2) A character that is attached to or overlays a preceding base character. Most diacritics are nonspacing characters that don't increase the width of the base character.

Diaeresis Two dots placed over a vowel to indicate that the vowel is pronounced as a separate syllable. Typically used when two vowels are adjacent but should be pronounced separately rather than as a diphthong, as in coöperation. See Umlaut.

Digraph A combination of characters that is written separately but forms a single lexical unit—for example, the Danish aa and the Spanish ch and ll.

Double-byte character set (DBCS) Any 2-byte form of character encoding. See Multibyte character set.

Dynamic link library (DLL) A module containing functions that other programs or DLLs can call. DLLs cannot run by themselves; other programs have to load them.

Enabling Altering program code to handle input, display, and editing of bidirectional languages, such as Arabic, and double-byte languages, such as Japanese.

Encoding A system of assigning numeric values to characters.

End-User Defined Character (EUDC) A special character, such as a rare ideograph, that the user creates with an editor and assigns to a code point within a reserved range.

Extended characters (1) Characters above the ASCII range (32 through 127) in Windows-based single-byte character sets. (2) Accented characters.

Floating accent See Diacritic and Floating diacritic.

Floating diacritic A nonspacing diacritic that overlays the preceding base character and might change position or shape according to the shape of the base character.

Following characters Characters—such as closing quotation marks, closing parentheses, and punctuation marks—that shouldn't be separated from succeeding characters.

Font Any of numerous sets of graphical representations of characters that can be installed on a computer or a printer.

Font association The automatic pairing of a font that contains ideographs with a font that does not contain ideographs. This allows the user to enter ideographic characters regardless of which font is selected.

Front-end processor See Input Method Editor (IME).

Full-width character In a double-byte character set, a character that is represented by 2 bytes and typically has a half-width variant.

GB 2312-80 The multibyte encoding standardized by the People's Republic of China.

Generic data type A macro, such as TCHAR, that resolves to either an ANSI type or a wide-character (Unicode) type, depending on compile-time flags.

Generic prototype A macro representing an API call or a function call. The macro resolves to an entry point that expects either ANSI parameters or wide-character (Unicode) parameters, depending on compile-time flags.

Globalization See Internationalization.

Glyph The actual shape (bit pattern, outline, and so forth) of a character image. For example, an italic "a" and a roman "a" are two different glyphs representing the same underlying character.

Half-width character In a double-byte character set, a character that is represented by 1 byte and typically has a full-width variant.

Han unification The process of assigning the same code point to characters historically perceived as being the same character but represented as unique in more than one East Asian ideographic character standard. This results in a group of ideographs shared by several cultures and significantly reduces the number of code points needed to encode them.

Hangul The native name for the Korean language.

Hanja The Korean name for ideographic characters of Chinese origin.

Hanzi (hantsu) The Chinese name for ideographic characters of Chinese origin.

Hard-coding Putting string or character literals in the main body of code, such as .C files or .H files, instead of in Windows resource files. Basing numeric constants on the assumed length of a string.

Hiragana The Japanese cursive script. Each hiragana character represents a phonetic syllable.

HKL See Input language handle (HKL) and Language/layout pair.

Ideographic character A character of Chinese origin representing a word or a syllable that is generally used in more than one Asian language. Sometimes referred to as a Chinese character.

Input context An internal structure that stores IME-related status information. Windows 95 supports multiple IME contexts, automatically creating an input context for each active thread.

Input language handle (HKL) A type of variable that Windows 95 uses to track language/layout pairs.

Input method Any method used to enter text that doesn't involve typing each character directly. Input methods are widely used for entering ideographs and other characters phonetically or component by component.

Input Method Editor (IME) A program that performs the conversion between keystrokes and ideographs or other characters, usually by user-guided dictionary lookup.

Input Method Manager (IMM) The module on Windows that handles communication between Input Method Editors (IMEs) and applications.

Input Method Profiler (IMP) The module on Windows NT 3.5 that keeps track of Input Method Editors (IMEs) installed on the system.

Internal Code input method An input method that allows the user to select a character by typing in its Big-5 code-point index.

Internationalization, or globalization The process of developing a program core whose feature and code designs don't make assumptions based on a single language or locale and whose source code base simplifies the creation of different language editions of a program.

ISO 8859 The International Standards Organization's 8-bit encoding that served as the basis for the Windows ANSI code page (also called code page 1252, Western European, or Latin 1).

ISO 10646 The International Standards Organization's encoding that is code-for-code equivalent to Unicode.

Isolate, initial, medial, and final character forms The different shapes of an Arabic character that correspond to its position in a word.

Jamos The 24 basic elements of the Korean script.

Johab The Korean standard character set (KS C-5601-1992), which corresponds to Windows code page 1361. This character set includes all possible hangul character combinations.

Kana The set of Japanese hiragana and katakana characters.

Kanji The Japanese name for ideographic characters of Chinese origin.

Katakana A Japanese script of phonetic syllables, chiefly used to spell words borrowed from other languages. Each katakana character represents a phonetic syllable.

Keyboard layout A standard arrangement of characters on a keyboard that defines which keys produce particular characters or scan codes.

KS C-5601-1987 The multibyte Wansung encoding standardized by Korea.

KS C-5601-1992 The multibyte Johab encoding standardized by Korea.

Language ID (LANGID) A 16-bit value defined by Windows, consisting of a primary language ID and a secondary language ID. Used as a parameter to several Win32 functions and messages.

Language/layout pair (1) A language installed on the system and the keyboard layout associated with it. (2) The input language.

Latin script The set of 26 characters (A–Z) inherited from the Roman Empire that, together with later additions, is used to write languages throughout Africa, the Americas, parts of Asia, Europe, and Oceania. The Windows 3.1 Latin 1 character set covers Western European languages and languages that use the same alphabet, while the Latin 2 character set covers Central and Eastern European languages.

Layout The order and spacing of displayed text.

LCTYPE A constant defined by Windows that specifies a particular type of locale information. See Appendix I.

Lead Byte- The byte value that is the first half of a double-byte character. See Doublebyte character set (DBCS).

Leading characters Characters—such as opening quotation marks opening parentheses, and currency signs—that shouldn't be separated from succeeding characters.

Letter (1) The basic element of a script as understood by the end user. (2) A higher level of abstraction than character. For example, both the Spanish ch and the Danish aa can be considered as single letters for some purposes (both sort as a single character). See Text element.

Levels of localization The amount of translation and customization necessary to create different language editions. The levels, which are determined by balancing risk and return, range from translating nothing to shipping a completely translated product with customized features.

Ligature Two or more characters combined to represent a single typographical character. The modern Latin script uses only a few. Other scripts use many ligatures that depend on font and style. Some languages, such as Arabic, have mandatory ligatures; other languages have characters that were derived from ligatures, such as the German ligature of long and short "s" (ß) and the ampersand (&), which is the contracted form of the Latin word et.

Literal In program code, a string surrounded by double quotation marks or a character surrounded by single quotation marks.

Locale The features of the user's environment that are dependent on language, country, and cultural conventions. The locale determines conventions such as sort order; keyboard layout; and date, time, number, and currency formats. In Windows, locales usually provide more information about cultural conventions than about languages.

Locale ID (LCID) A 32-bit value defined by Windows that consists of a language ID, a sort ID, and reserved bits.

Locale-sensitive Exhibiting different behavior or returning different data, depending on the locale. For example, the Win32 sort functions return different results depending on the locale parameter sent to each function.

Localizable resource Any element of a program's user interface that requires translation or modification for different languages.

Localization The process of adapting a program for a specific international market, which includes translating the user interface, resizing dialog boxes, customizing features (if necessary), and testing results to ensure that the program still works.

Localization kit A subset of tools, source files, and binary files that can be used to create a localized edition of a program. Generally given to translators or third-party contractors.

Logical order The order in which something is typed. Generally refers to text that can be displayed in a different order, such as Arabic, Hebrew, or bi-directional text.

Logograph, or logographic From the Greek logo, meaning word: a letter, symbol, or sign used to represent an entire word. Chinese characters are more properly termed logographic than ideographic because they represent words or parts of words rather than abstract concepts.

Message table A Win32 resource that uses sequential numbers rather than escape letters to mark replacement parameters, making it convenient to store alert messages and error messages that contain several replacement parameters.

Mixed environment A computer environment, usually a network, in which the operating systems of different machines are based on different character encodings.

Morpheme The smallest meaningful unit of a word. The word dog is one morpheme. The word dogs is two morphemes: dog + the plural marker s. Many ideographs are based on morphemes.

Multibyte character set (MBCS) A mixed-width character set, in which some characters consist of more than 1 byte. A double-byte character set (DBCS), which is a specific type of multibyte character set, includes some characters that consist of 2 bytes.

Multilingual Supporting more than one language simultaneously. Often implies the ability to handle more than one script or character set.

Multilingual API The set of system functions in Windows 95 that supports multilingual content in documents.

National standard A linguistic rule, measurement, educational guideline, or technology-related convention as defined by a government or an industry standards organization. Examples include character sets, keyboard layouts, and some cultural conventions, such as punctuation. Windows incorporates many International Standards Organization (ISO) naming conventions.

Neutral character A character that can be considered as either right-to-left or left-to-right, depending on the direction of the surrounding context.

NLSAPI Abbreviation for National Language Support API. The set of system functions in 32-bit Windows that contain national language support (information that is based on language and cultural convention).

No-compile Refers to source code that doesn't require recompiling when you create international editions of a program.

Nonspacing character A character, such as a diacritic, that has no meaning by itself but overlaps an adjacent character to form a third character.

OLE A set of standard software services built on top of the OLE Component Object Model (COM) that allows software components from different vendors (possibly written in different programming langauges) to be combined with one another to form complete applications.

Original Equipment Manufacturer (OEM) Often used to refer to MS-DOS standards, such as OEM code pages.

Overflow characters Punctuation characters that are allowed to extend beyond the right margin for horizontal text or below the bottom margin for vertical text.

Phoneme A unique individual sound used in a language.

Plaintext Computer-encoded text that contains only code elements and no other formatting or structural information (for example, font size, font type, or other layout information). Plaintext exchange is commonly used between computer systems that might have no other way to exchange information.

Points The vowel signs in written Hebrew, which are sets of dots and/or short lines written below consonants.

Precomposed character A single Unicode character that represents a sequence of characters, usually a combination of a base character and one or more diacritics.

Private-use zone The area in Unicode from U+E000 through U+F8FF that is set aside for vendor-specific or user-designed characters.

Radical A group of strokes in a Chinese character that are treated as a unit for the purposes of sorting, indexing, and classification. A character can contain more than one element that is recognized as a radical, but each character contains only one element, called the main radical, that is used as the indexing radical. The main radical often gives a hint as to the general meaning of the character, and other radicals in the character might indicate how the character is pronounced.

RCDATA resource A custom Windows resource element.

Registry A Windows file that stores user preferences, including international settings.

Release delta The time between the release of the domestic product and the release of the localized edition.

Rendering The way in which a character is graphically displayed.

Resource (1) An element, such as a string, icon, bitmap, cursor, dialog, accelerator, or menu, that is included in a Microsoft Windows resource (.RC) file. (2) Any item that needs to be translated.

Rich text Text saved with formatting instructions that multiple applications, including compatible Microsoft applications, can read and interpret.

Romaji A writing system based on the Latin alphabet that is used to represent Japanese text.

Round-trip conversion Mapping a character from one character encoding to another and back. Of particular interest is how well information is preserved during round-trip conversion.

Run-time library Functions included with a C-compiler that programs can call to perform various basic operations.

Screen dump A bitmap of an element in a program's graphical user interface, such as a dialog or menu.

Script A system of characters used to write one or several languages. Characters denote isolated sounds, syllables, or word elements and are governed by a general set of rules for creating text, such as default writing direction.

Separators Symbols used to separate items in a list, mark the thousands place in numbers, or represent the decimal point. Different locales follow different conventions for separators.

Shift-JIS The Japan Industry Standard multibyte encoding. The codes are numerically shifted from the codes used by the JIS standard X-0208; hence the name.

Shortcut key A keyboard combination that activates a program command directly, as an alternative to activating the command through the program menus.

Simplified Chinese The Chinese script used in the People's Republic of China. It consists of several thousand ideographic characters that are simplified versions of traditional Chinese characters.

Simultaneous ship, or "sim-ship" The release of localized editions of a product at the same time or soon after the domestic edition is released, usually within 30 days.

Single-byte character set (SBCS) A character encoding in which each character is represented by 1 byte. Single-byte character sets are mathematically limited to 256 characters.

Software Development Kit (SDK) A set of tools and libraries for creating software applications for Windows operating systems.

Sort key A numeric representation of a sort element based on locale-specific sort rules. A sort key consists of several weighted components that represent a character's script, diacritics, case, and so on.

Spacing character A character with a non-zero width.

Specification, or "spec" A detailed plan of a program's user interface design and the expected functionality of program features.

Status window The window of an Input Method Editor (IME) in which the user can change the IME's conversion mode or input mode.

Syllabary A set of written characters in which each character represents a syllable—for example, a consonant sound followed by a vowel sound.

Tab order In Windows, the order in which the Tab key activates the list boxes, radio buttons, and other elements of a dialog box.

Text element A script's smallest unit of text that can be displayed or edited.

Traditional Chinese The set of Chinese characters, used in such countries/regions as Hong Kong SAR, China, Singapore, and Taiwan, that is consistent with the original form of Chinese ideographic characters that are several thousand years old.

Trail byte The byte value that is the second half of a double-byte character.

Umlaut The two dots placed above a vowel, such as ä, ö, and ü, which are used in German and other European languages to indicate a change in the pronunciation of the vowel. See Diaeresis.

Unicode A fixed-width, 16-bit worldwide character encoding that was developed and is maintained and promoted by the Unicode Consortium, a nonprofit computer industry organization.

Usability testing A series of tests in which users are observed trying to complete a given set of tasks. The purpose of usability testing is to determine how intuitive and easy-to-use test subjects find new program features.

User-defined character See End-User Defined Character (EUDC).

Version stamp In Windows, the information included in the resource file that specifies the company name, application name, copyright, version number, and language edition of a program.

Visual C++ Microsoft's object-oriented C-compiler.

Wansung The Korean standard character set (KS C-5601-1987), which corresponds to Windows code page 949. It covers the most common hangul character combinations. Extended Wansung covers all possible hangul combinations.

–W APIs The Win32 API entry points that expect string parameters to be wide characters (encoded in Unicode). The Korean standard character set (KS C-5601-1987), which corresponds to Windows code page 949. It covers the most common hangul character combinations.

wchar_t The ANSI C–defined wide-character type, usually either 16 or 32 bits. ANSI rules say that wchar_t should be at least as wide as the char data type, and that the wide-character equivalents of the C language source character set should be created by simple zero or sign extension.

Wide character A 16-bit or 32-bit character. Often used to refer to Unicode-encoded characters.

Win32 API The set of 32-bit functions supported by Windows.

Win32s API A subset of the Win32 API that makes it possible to create a single binary that runs on Windows 3.1 and on all 32-bit versions of the Windows platform.

Windows 95 The 32-bit successor to Windows 3.1, Microsoft's low-end operating system.

Windows Intelligent Font Environment (WIFE) An operating system layer, introduced with the Far East editions of Windows 3, that manages multiple font technologies and font drivers that can be installed.

Windows NT "Cairo" The code name for the next version of Microsoft's high-end operating system, Windows NT.

Windows NT Server The high-end version of Windows NT that includes additional features for servers.