A token is the smallest element of a C++ program that is meaningful to the compiler. The C++ parser recognizes these kinds of tokens: identifiers, keywords, literals, operators, punctuators, and other separators. A stream of these tokens makes up a translation unit.
Tokens are usually separated by “white space.” White space can be one or more:
Syntax
token :
keyword
identifier
constant
operator
punctuator
preprocessing-token :
header-name
identifier
pp-number
character-constant
string-literal
operator
punctuator
each nonwhite-space character that cannot be one of the above
The parser separates tokens from the input stream by creating the longest token possible using the input characters in a left-to-right scan. Consider this code fragment:
a = i+++j;
The programmer who wrote the code might have intended either of these two statements:
a = i + (++j)
a = (i++) + j
Because the parser creates the longest token possible from the input stream, it chooses the second interpretation, making the tokens i++
, +
, and j
.