1.1 Tokens

In a C source program, the basic element recognized by the compiler is the “token.” A token is source-program text that the compiler does not break down into component elements.

Syntax

token :
keyword
identifier
constant
string-literal
operator
punctuator

Note:

See the introduction to Appendix B for an explanation of the ANSI grammar conventions.

The keywords, identifiers, constants, string literals, and operators described in this chapter are examples of tokens. Punctuation characters such as brackets ([ ]), braces ({ }), parentheses (( )), and commas(,) are also tokens.

White-Space Characters

Space, tab, linefeed, carriage-return, formfeed, vertical-tab, and newline characters are called “white-space characters” because they serve the same purpose as the spaces between words and lines on a printed page—they make reading easier. Tokens are delimited (bounded) by white-space characters and by other tokens, such as operators and punctuation. When parsing code, the C compiler ignores white-space characters unless you use them as separators or as components of character constants or string literals. Use white-space characters to make a program more readable. Note that the compiler also treats comments as white space.

Comments

A “comment” is a sequence of characters beginning with a forward slash/asterisk combination (/*) that is treated as a single white-space character by the compiler and is otherwise ignored. A comment can include any combination of characters from the representable character set, including newline characters, but excluding the “end comment” delimiter (*/). Comments can occupy more than one line but cannot be nested.

Comments can appear anywhere a white-space character is allowed. Since the compiler treats a comment as a single white-space character, you cannot include comments within tokens. The compiler ignores the characters in the comment.

Use comments to document your code. This example is a comment accepted by the compiler:

/* Comments can contain keywords such as

for and while without generating errors. */

Comments can also appear on the same line as a code statement:

printf( "Hello\n" ); /* Comments can go here */

You may choose to precede functions with a descriptive comment block:

/* MATHERR.C illustrates writing an error routine

* for math functions.

* The error function must be:

*matherr

*

* To use matherr, you must turn on the No Extended Dictionary in

* Library flag within the PWB environment (LINK Options from the

* Options menu)or use the /NOE linker option outside the environment.

* For example:

*CL matherr.c /link /NOE

*/

Since comments cannot contain nested comments, this example causes an error:

/* Comment out this routine for testing

/* Open file */

fh = _open( "myfile.c", _O_RDONLY );

.

.

.

*/

The error occurs because the compiler recognizes the first */, after the words Open file, as the end of the comment. It tries to process the remaining text and produces an error when it finds the */ outside a comment.

While you can use comments to render certain lines of code inactive for test purposes, the preprocessor directives #if and #endif and conditional compilation are a useful alternative for this task. For more information, see “Conditional Compilation”.

Microsoft Specific

The Microsoft compiler also supports single-line comments preceded by two forward slashes (//). If you compile with /Za (ANSI standard), these comments generate errors. These comments cannot extend to a second line.

// This is a valid comment in C 7.0

Comments beginning with two forward slashes (//) are terminated by the next newline character that is not preceded by an escape character. In the next example, the newline character is preceded by a backslash (\), creating an “escape sequence.” This escape sequence causes the compiler to treat the next line as part of the previous line. (For information on escape sequences, see topic .)

// my comment \

i++;

Therefore, the i++; statement is commented out.

The default for Microsoft C is that the Microsoft extensions are enabled. Use the /Za command-line option to disable these extensions.¨

Evaluation of Tokens

When the compiler interprets tokens, it includes as many characters as possible in a single token before moving on to the next token. Because of this behavior, the compiler may not interpret tokens as you intended if they are not properly separated by white space. Consider the following expression:

i+++j

In this example, the compiler first makes the longest possible operator ( ++ ) from the three plus signs, then processes the remaining plus sign as an addition operator ( + ). Thus, the expression is interpreted as (i++) + (j), not (i) + (++j). In this and similar cases, use white space and parentheses to avoid ambiguity and ensure proper expression evaluation.

Microsoft Specific

The C compiler treats a CTL+Zcharacter as an end-of-file indicator. It ignores any text after CTRL+Z.¨