1.3 Identifiers

“Identifiers” or “symbols” are the names you supply for variables, types, functions, and labels in your program. Identifier names must differ in spelling and case from any keywords. You cannot use keywords (either C or Microsoft) as identifiers; they are reserved for special use. You create an identifier by specifying it in the declaration of a variable, type, or function. In this example, result is an identifier for an integer variable, and main and printf are identifier names for functions.

void main()

{

int result;

if ( result != 0 )

printf( "Bad file handle\n" );

}

Once declared, you can use the identifier in later program statements to refer to the associated value.

A special kind of identifier, called a statement label, can be used in goto statements. (Declarations are described in Chapter 3. Statement labels are described in “The goto and Labeled Statements”.)

Syntax

identifier :
nondigit
identifier nondigit
identifier digit

nondigit
: one of
_ a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


digit : one of
0 1 2 3 4 5 6 7 8 9

The first character of an identifier name must be a nondigit (that is, the first character must be an underscore or an uppercase or lowercase letter). ANSI allows six significant characters in an external identifier's name and 31 for names of internal (within a function) identifiers. External identifiers (ones declared at global scope or declared with storage class extern) may be subject to additional naming restrictions because these identifiers have to be processed by other software such as linkers.

Microsoft Specific

The Microsoft C compiler allows 32 characters in an external identifier's name and 247 for internal identifier names. Compiling with the /W4 command-line option or selecting warning level 4 from the Compiler Options dialog box (which is accessed from selecting Language Options on the Option menu in PWB) generates warnings on any names that exceed the ANSI-specified length. If you are not concerned with ANSI compatibility, you can modify this default to a smaller number using the /H (restrict length of external names) option.¨The C compiler considers uppercase and lowercase letters to be distinct characters. This feature, called “case sensitivity,” enables you to create distinct identifiers that have the same spelling but different cases for one or more of the letters.

Since uppercase and lowercase letters are considered distinct characters, each of the following identifiers is unique:

add

ADD

Add

aDD

Microsoft Specific

Do not select names for identifiers that begin with two underscores or with an underscore followed by an uppercase letter. The ANSI specification allows identifier names that begin with these character combinations to be reserved for compiler use. Identifiers with file-level scope should also not be named with an underscore and a lowercase letter as the first two letters. Identifier names that begin with these characters are also reserved. By convention, Microsoft uses an underscore and an uppercase letter to begin macro names and double underscores for Microsoft-specific keyword names. To avoid any naming conflicts, always select identifier names that do not begin with one or two underscores, or names that begin with an underscore followed by an uppercase letter. ¨The following are examples of valid identifiers that conform to either ANSI or Microsoft naming restrictions:

j

count

temp1

top_of_page

skip12

LastNum

Microsoft Specific

Although identifiers in source files are case sensitive by default, symbols in object files are not. The /Zc command-line option tells the compiler to ignore case for any identifier name declared with the __pascalkeyword. The /Gc command-line option specifies the FORTRAN/Pascal calling convention, causing all function names to be translated to uppercase. (The __pascaland __fortrankeywords perform the same operation on a function-by-function basis.) For information on the command-line options, see Chapter 13 in the Environment and Toolsmanual.Externally linked identifiers may or may not be case sensitive, depending on whether you use the /NOIGNORECASE option when you invoke the linker. The default for the Microsoft linker is to ignore case, making externally linked identifiers case insensitive. Note that the external name of an identifier may be different from its internal name. This is only an issue when you link C object files with non-C object files. For more information about linking, see Chapter 14 in the Environment and Tools manual.

The “source character set” is the set of legal characters that can appear in source files. For Microsoft C, the source set is the standard ASCII character set. The source character set and execution character set include the ANSI ASCII characters used as escape sequences. See “Character Constants” for information about the execution character set.¨

An identifier has “scope,” which is the region of the program in which it is known, and “linkage,” which determines whether the same name in another scope refers to the same identifier. These topics are explained in “Understanding Lifetime, Scope, Visibility, and Linkage”.

Multibyte and Wide Characters

A multibyte character is a character composed of sequences of one or more bytes. Each byte sequence represents a single character in the extended character set. Multibyte characters are used in character sets such as Kanji.

Wide characters are multilingual character codes that are always 16 bits wide. The type for character constants is char; for wide characters, the type is wchar_t. Since wide characters are always a fixed size, using wide characters simplifies programming with international character sets.

The wide-character-string literal L"hello" becomes an array of six integers of type wchar_t.

{L'h', L'e', L'l', L'l', L'o', 0}

The Unicode specification is the developing specification for wide characters. The run-time library routines for translating between multibyte and wide multibyte characters include mblen, mbstowcs, mbtowc, wcstombs, and wctomb.

Trigraphs

The source character set of C source programs is contained within the 7-bit ASCII character set but is a superset of the ISO 646-1983 Invariant Code Set. Trigraph sequences allow C programs to be written using only the ISO (International Standards Organization) Invariant Code Set. Trigraphs are sequences of three characters (introduced by two consecutive question marks) that the compiler replaces with their corresponding punctuation characters. You can use trigraphs in C source files with a character set that does not contain convenient graphic representations for some punctuation characters.

Table 1.1 shows the nine trigraph sequences. All occurrences in a source file of the punctuation characters in the first column are replaced with the corresponding character in the second column.

Table 1.1 Trigraph Sequences


Trigraph
Punctuation
Character

??= #
??( [
??/ \
??) ]
??' ^
??< {

Table 1.1 Trigraph Sequences (continued)


Trigraph
Punctuation
Character

??! |
??> }
??- ~

A trigraph is always treated as a single source character. The translation of trigraphs takes place in the first translation phase, before the recognition of escape characters in string literals and character constants. (See topic for information about translation phases.) Only the nine trigraphs shown in Table 1.1 are recognized. All other character sequences are left untranslated.

The character escape sequence, \?, prevents the misinterpretation of trigraph-like character sequences. (See topic for information about escape sequences.) For example, if you attempt to print the string What??! with this printf statement

printf( "What??!\n" );

the string printed is What| because ??! is a trigraph sequence that is replaced with the | character. You need to write the statement as follows to correctly print the string:

printf( "What?\?!\n" );

In this printf statement, a backslash escape character in front of the second question mark prevents the misinterpretation of ??! as a trigraph.