C++ Tokens

A token is the smallest element of a C++ program that is meaningful to the compiler. The C++ parser recognizes these kinds of tokens: identifiers, keywords, literals, operators, punctuators, and other separators. A stream of these tokens makes up a translation unit.

Tokens are usually separated by “white space.” White space can be one or more:

Syntax

token :

keyword
identifier
constant
operator
punctuator

preprocessing-token :

header-name
identifier
pp-number
character-constant
string-literal
operator
punctuator
each nonwhite-space character that cannot be one of the above

The parser separates tokens from the input stream by creating the longest token possible using the input characters in a left-to-right scan. Consider this code fragment:

a = i+++j;

The programmer who wrote the code might have intended either of these two statements:

a = i + (++j)

a = (i++) + j

Because the parser creates the longest token possible from the input stream, it chooses the second interpretation, making the tokens i++, +, and j.