Appendix A Phases of Translation

A C++ program consists of one or more “source files,” each of which contains some of the text of the program. A source file, together with its “include files” (files that are included using the #include preprocessor directive) but not including sections of code removed by conditional-compilation directives such as #if, is called a “translation unit.”

Source files can be translated at different times—in fact, it is common to translate only out-of-date files. The translated translation units can be kept either in separate object files or in object-code libraries. These separate translation units are then linked to form an executable program (for example, a .EXE or .COM file).

Translation units can communicate using:

Calls to functions that have external linkage.

Calls to class member functions that have external linkage.

Direct modification of objects that have external linkage.

Direct modification of files.

Interprocess communication (for Microsoft Windows applications only).

The following translation phases are not strictly required, but every implementation of C++, including Microsoft C++, must behave “as if” these rules were followed. (The actual order of translation is not important.)

1.Character mapping. Characters in the source file are mapped to the internal source representation. Trigraph sequences are converted to single-character internal representation in this phase.

2.Line splicing. All lines ending in a backslash (\) and immediately followed by a newline character are joined with the next line in the source file, forming logical lines from the physical lines. Unless it is empty, a source file must end in a newline character that is not preceded by a backslash.

3.Tokenization. The source file is broken into preprocessing tokens and white-space characters. Comments in the source file are replaced with one space character each. Newline characters are retained.

4.Preprocessing. Preprocessing directives are executed and macros are expanded into the source file. Use #include statements to invoke translation steps 1 through 4 on included text.

5.Character-set mapping. All source-character-set members and escape sequences are converted to their equivalents in the execution-character set. For Microsoft C++, both the source and the execution character sets are ASCII.

6.String concatenation. All adjacent string and wide-string literals are concatenated. For example, "String " "concatenation" becomes "String concatenation".

7.Translation. All tokens are analyzed syntactically and semantically; these tokens are converted into object code.

8.Linkage. All external references are resolved to create an executable program.

The compiler issues warnings or errors during phases of translation in which it encounters syntax errors.