1.1 Tokens

A token is the smallest element of a C++ program that is meaningful to the compiler. The C++ parser recognizes these kinds of tokens: identifiers, keywords, literals, operators, and other separators. A stream of these tokens makes up a translation unit.

Tokens are most commonly separated by “white space.” White space can be one or more:

Blanks

Horizontal or vertical tabs

New lines

Formfeeds

Comments

Syntax

token:
keyword
identifier
constant
operator
punctuator

preprocessing-token:
header-name
identifier
pp-number
character-constant
string-literal
operator
punctuator
each non-white-space character that cannot be one of the above

The parser separates tokens out of the input stream by creating the longest token possible using the input characters. Consider the following code fragment:

a = i+++j;

The intention of the programmer who wrote the code might have been one of the following:

Preincrement j, add the values of i and j, and assign the sum to a (where the tokens are i, +, and ++j). For more information about prefix incrementing, see “Increment and Decrement Operators” in Chapter 4 on topic .

This interpretation is equivalent to the expression a = i + (++j).

Add the values of i and j, assign the sum to a, then postincrement i (where the tokens are i, ++, +, and j). For more information about postincrementing, see “Postfix Increment and Decrement Operators” in Chapter 4 on topic .

This interpretation is equivalent to the expression a = (i++) + j.

Because the parser creates the longest token possible from the input stream, it chooses the second interpretation, making the tokens i++, +, and j.