C++ String Literals

A string literal consists of zero or more characters from the source character set surrounded by double quotation marks ("). A string literal represents a sequence of characters that, taken together, form a null-terminated string.

Syntax

string-literal :

"s-char-sequenceopt"
L"s-char-sequenceopt"

s-char-sequence :

s-char
s-char-sequence s-char

s-char :

any member of the source character set except the double quotation mark ("), backslash (\), or newline character
escape-sequence

C++ strings have these types:

The result of modifying a string constant is undefined. For example:

char *szStr = "1234";
szStr[2] = 'A';      // Results undefined

Microsoft Specific

In some cases, identical string literals can be “pooled” to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal. The /Gf compiler option enables string pooling.

END Microsoft Specific

When specifying string literals, adjacent strings are concatenated. Therefore, this declaration:

char szStr[] = "12" "34";

is identical to this declaration:

char szStr[] = "1234";

This concatenation of adjacent strings makes it easy to specify long strings across multiple lines:

cout << "Four score and seven years "
        "ago, our forefathers brought forth "
        "upon this continent a new nation.";

In the preceding example, the entire string Four score and seven years ago, our forefathers brought forth upon this continent a new nation. is spliced together. This string can also be specified using line splicing as follows:

cout << "Four score and seven years \
ago, our forefathers brought forth \
upon this continent a new nation.";

After all adjacent strings in the constant have been concatenated, the NULL character, '\0', is appended to provide an end-of-string marker for C string-handling functions.

When the first string contains an escape character, string concatenation can yield surprising results. Consider the following two declarations:

char szStr1[] = "\01" "23";
char szStr2[] = "\0123";

Although it is natural to assume that szStr1 and szStr2 contain the same values, the values they actually contain are shown in Figure 1.1.

Figure 1.1   Escapes and String Concatenation

Microsoft Specific

The maximum length of a string literal is approximately 2,048 bytes. This limit applies to strings of type char[] and wchar_t[]. If a string literal consists of parts enclosed in double quotation marks, the preprocessor concatenates the parts into a single string, and for each line concatenated, it adds an extra byte to the total number of bytes.

For example, suppose a string consists of 40 lines with 50 characters per line (2,000 characters), and one line with 7 characters, and each line is surrounded by double quotation marks. This adds up to 2,007 bytes plus one byte for the terminating null character, for a total of 2,008 bytes. On concatenation, an extra character is added to the total number of bytes for each of the first 40 lines. This makes a total of 2,048 bytes. (The extra characters are not actually written to the string.) Note, however, that if line continuations (\) are used instead of double quotation marks, the preprocessor does not add an extra character for each line.

END Microsoft Specific

Determine the size of string objects by counting the number of characters and adding 1 for the terminating '\0' or 2 for type wchar_t.

Because the double quotation mark (") encloses strings, use the escape sequence (\") to represent enclosed double quotation marks. The single quotation mark (') can be represented without an escape sequence. The backslash character (\) is a line-continuation character when placed at the end of a line. If you want a backslash character to appear within a string, you must type two backslashes (\\). (See Phases of Translation in the Preprocessor Reference for more information about line continuation.)

To specify a string of type wide-character (wchar_t[]), precede the opening double quotation mark with the character L. For example:

wchar_t wszStr[] = L"1a1g";

All normal escape codes listed in Character Constants are valid in string constants. For example:

cout << "First line\nSecond line";
cout << "Error! Take corrective action\a";

Because the escape code terminates at the first character that is not a hexadecimal digit, specification of string constants with embedded hexadecimal escape codes can cause unexpected results. The following example is intended to create a string literal containing ASCII 5, followed by the characters five:

\x05five"

The actual result is a hexadecimal 5F, which is the ASCII code for an underscore, followed by the characters ive. The following example produces the desired results:

"\005five"     // Use octal constant.
"\x05" "five"  // Use string splicing.