Character Set

The C language does not define the character set used in an implementation. This means that any programs that assume the character set to be ASCII are nonportable.

The only restrictions on the character set are these:

No character in the implementation's character set can be larger than the size of type char.

Each character in the set must be represented as a positive value by type char, whether it is treated as signed or unsigned. So, in the case of the ASCII character set and an eight-bit char, the maximum value is 127 (128 is a negative number when stored in a char variable).

Character Classification

The standard C run-time support contains a complete set of character classification macros and functions. These functions are defined in the CTYPE.H file and are guaranteed to be portable:

isalnumisalphaiscntrlisdigitisgraphislowerisprintispunctisspaceisupperisxdigit

The following code fragment is not portable to implementations that do not use the ASCII character set:

/* Nonportable */

if( c >= 'A' && c <= 'Z' )

/* uppercase alphabetic */

Instead, consider using this:

/* Portable */

if( isalpha(c) && isupper(c) )

/* uppercase alphabetic */

The first example above is nonportable, because it assumes that uppercase A is represented by a smaller value than uppercase Z, and that no lowercase characters fall between the values of A and Z. The second example is portable, because it uses the character classification functions to perform the tests.

In a portable program, you should not perform any comparison on variables of type char except strict equality (==). You cannot assume the character set follows an increasing sequence—that may not be true on a different machine.

Case Translation

Translation of characters from upper- to lowercase or from lower-to uppercase is called “case translation.” The following example shows a coding technique for case translation not portable to implementations using a non-ASCII character set.

#define make_upper(c) ((c)&0xcf)

#define make_lower(c) ((c)|0x20)

This code takes advantage of the fact that you can map uppercase to lowercase simply by changing the state of bit 6. It is extremely efficient but nonportable. To write portable code, use the case-translation macros toupper and tolower (defined in CTYPE.H).