Character Constants

A “character constant” is formed by enclosing a single character from the representable character set within single quotation marks (' '). Character constants are used to represent characters in the execution character set.

Syntax

character-constant :
'c-char-sequence'
L'c-char-sequence'

c-char-sequence
:
c-char
c-char-sequence c-char

c-char
:
Any member of the source character set except
the single quotation mark ('), backslash (\), or
newline character
escape-sequence

escape-sequence
:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence

simple-escape-sequence
: one of
\a \b \f \n \r \t \v
\' \" \\ \?


octal-escape-sequence :
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit

hexadecimal-escape-sequence :
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit

Character Types

An integer character constant not preceded by the letter L has type int. The value of an integer character constant containing a single character is the numerical value of the character interpreted as an integer. For example, the numerical value of the character a is 97 in decimal and 61 in hexadecimal.

Syntactically, a “wide-character constant” is a character constant prefixed by the letter L. A wide-character constant has type wchar_t, an integer type defined in the STDDEF.H header file. For example:

char schar = 'x'; /* A character constant */

wchar_t wchar = L'x'; /* A wide-character constant for

the same character */

Wide-character constants are 16 bits wide and specify members of the extended execution character set. They allow you to express characters in alphabets that are too large to be represented by type char. See “Multibyte and Wide Characters” for more information about wide characters.

Execution Character Set

This manual often refers to the “execution character set.” The execution character set is not necessarily the same as the source character set used for writing C programs. The execution character set includes all characters in the source character set as well as the null character, newline character, backspace, horizontal tab, vertical tab, carriage return, and escape sequences. The source and execution character sets may differ in other implementations.

Escape Sequences

Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called “escape sequences.” To represent a newline character, single quotation mark, or certain other characters in a character constant, you must use escape sequences. An escape sequence is regarded as a single character and is therefore valid as a character constant.

Escape sequences are typically used to specify actions such as carriage returns and tab movements on terminals and printers. They are also used to provide literal representations of nonprinting characters and characters that usually have special meanings, such as the double quotation mark ("). Table 1.4 lists the ANSI escape sequences and what they represent.

Table 1.4 Escape Sequences

Escape
Sequence

Represents

\a Bell (alert)
\b Backspace
\f Formfeed
\n New line
\r Carriage return
\t Horizontal tab
\v Vertical tab
\' Single quotation mark
\ Double quotation mark
\\ Backslash
\? Literal question mark
\ooo ASCII character in octal notation
\xhhh ASCII character in hexadecimal notation

Note that the question mark preceded by a backslash (\?) specifies a literal question mark in cases where the character sequence would be misinterpreted as a trigraph. See “Trigraphs” for more information.

Microsoft Specific

If a backslash precedes a character that does not appear in Table 1.4, the compiler handles the undefined character as the character itself. For example, \xis treated as an x.¨Escape sequences allow you to send nongraphic control characters to a display device. For example, the ESC character (\033) is often used as the first character of a control command for a terminal or printer. Some escape sequences are device-specific. For instance, the vertical-tab and formfeed escape sequences (\v and \f) do not affect screen output, but they do perform appropriate printer operations.

You can also use the backslash (\) as a continuation character. When a newline character (equivalent to pressing the RETURN key) immediately follows the backslash, the compiler ignores the backslash and the newline character and treats the next line as part of the previous line. This is useful primarily for preprocessor definitions longer than a single line. For example:

#define assert(exp) \

( (exp) ? (void) 0:_assert( #exp, __FILE__, __LINE__ ) )

In previous versions of the compiler, this feature was also used to create strings longer than one line. However, the string-concatenation feature (see “String Literals”) is now preferable when creating long string literals.

Octal and Hexadecimal Character Specifications

The sequence \ooo means you can specify any character in the ASCII character set as a three-digit octal character code. The numerical value of the octal integer specifies the value of the desired character or wide character.

Similarly, the sequence \xhhh allows you to specify any ASCII character as a hexadecimal character code. For example, you can give the ASCII backspace character as the normal C escape sequence (\b), or you can code it as \010 (octal) or \x008 (hexadecimal).

You can use only the digits 0 through 7 in an octal escape sequence. Octal escape sequences can never be longer than three digits and are terminated by the first character that is not an octal digit. Although you do not need to use all three digits, you must use at least one. For example, the octal representation is \10 for the ASCII backspace character and \101 for the letter A, as given in an ASCII chart.

Similarly, you must use at least one digit for a hexadecimal escape sequence, but you can omit the second and third digits. Therefore you could specify the hexadecimal escape sequence for the backspace character as either \x8, \x08, or \x008.

The value of the octal or hexadecimal escape sequence must be in the range of representable values for type unsigned char for a character constant and type

wchar_t for a wide-character constant. See “Multibyte and Wide Characters” for information on wide-character constants.

Unlike octal escape constants, there is no limit on the number of hexadecimal digits in an escape sequence. A hexadecimal escape sequence terminates at the first character that is not a hexadecimal digit. Because hexadecimal digits include the letters a through f, care must be exercised to make sure the escape sequence terminates at the intended digit. To avoid confusion, you can place octal or hexadecimal character definitions in a macro definition:

#define Bell '\x07'

For hexadecimal values, you can break the string to show the correct value clearly:

"\xabc" /* one character */

"\xab" "c" /* two characters */