7.2 Phases of Translation

A C program consists of one or more source files, each of which contains some of the text of the program. A source file, together with all of its “include files,” which are files that are inserted at the location of the #include preprocessor directive, is called a “translation unit.”

Source files are translated in a series of phases. Preprocessing treats a source file as a sequence of text lines. You can specify directives and macros to insert, delete, and alter source text. Once translated, the translation units can be kept either in separate object files or in object-code libraries. These separate translation units are then linked to form an executable program (.EXE or .COM file).

Functions in different translation units can pass values through:

Calls to functions that have external linkage.

Direct modification of identifiers that have external linkage.

Direct modification of files.

Interprocess communication (Windows only).

Modification of environment variables.

The following list describes the phases in which the compiler translates files:

Character mapping

Characters in the source file are mapped to the internal source representation. Trigraph sequences are converted to single-character internal representation in this phase. See topic for information on trigraphs.

Line splicing

All lines ending in a backslash (\), immediately followed by a newline character, are joined with the next line in the source file, forming logical lines from the physical lines. A non-empty source file must end in a newline character that is not preceded by a backslash.

Tokenization

The source file is broken into preprocessing tokens and white-space characters. Each comment in the source file is replaced with a space character. Newline characters are retained.

Preprocessing

Preprocessing directives are executed and macros are expanded into the source file. The #include statement invokes the preprocessing steps starting with the preceding three translation processes on any included text.

Character set mapping

All source-character-set members and escape sequences are converted to their equivalents in the execution character set. For Microsoft C/C++, both the source and the execution character sets are ASCII.

String concatenation

All adjacent string literals and wide-string literals are concatenated. For example, "String " "concatenation" becomes "String concatenation".

Translation

All tokens are analyzed syntactically and semantically; these tokens are converted into object code.

The linker resolves all external references and creates an executable program by combining one or more separately processed translation units along with standard libraries.