A token is the smallest element of a C++ program that is meaningful to the compiler. The C++ parser recognizes these kinds of tokens: identifiers, keywords, literals, operators, and other separators. A stream of these tokens makes up a translation unit.
Tokens are most commonly separated by “white space.” White space can be one or more:
Blanks
Horizontal or vertical tabs
New lines
Formfeeds
Comments
token:
keyword
identifier
constant
operator
punctuator
preprocessing-token:
header-name
identifier
pp-number
character-constant
string-literal
operator
punctuator
each non-white-space character that cannot be one of the above
The parser separates tokens out of the input stream by creating the longest token possible using the input characters. Consider the following code fragment:
a = i+++j;
The intention of the programmer who wrote the code might have been one of the following:
Preincrement j, add the values of i and j, and assign the sum to a (where the tokens are i, +, and ++j). For more information about prefix incrementing, see “Increment and Decrement Operators” in Chapter 4 on topic .
This interpretation is equivalent to the expression a = i + (++j).
Add the values of i and j, assign the sum to a, then postincrement i (where the tokens are i, ++, +, and j). For more information about postincrementing, see “Postfix Increment and Decrement Operators” in Chapter 4 on topic .
This interpretation is equivalent to the expression a = (i++) + j.
Because the parser creates the longest token possible from the input stream, it chooses the second interpretation, making the tokens i++, +, and j.