Binary and Text Files

Normally, you wouldn't write a file in text mode and then read it in binary mode. As a general rule, you pick whichever mode is more appropriate (text mode for text or binary mode for data) and stick with it.

A somewhat baffling thing happened in the example above, however. The WRFILE.C program wrote "Example string" to a disk file and then added a newline character. That should be a total of 15 characters. But if you examine the directory, you'll see the file uses 16 bytes.

Where did the extra byte come from?

Testing Text Mode

If you ran the RDFILE.C program, you probably noticed two characters followed the line: a carriage return (ASCII 13) and a linefeed (ASCII 10). If you make the following change to the program, the output of RDFILE.C is different:

if( (fp = fopen( "c:\\testfile.asc","rt" )) != NULL )

The only modification is that the second string is "rt" instead of "rb". The t represents text mode; the b is binary mode. If you don't specify a mode, the fopen function normally defaults to text mode.

The list below shows the output of the two programs.

RDFILE.C
(binary mode)
RDFILE.C
(text mode)

E 69 E 69
x 120 x 120
a 97 a 97
m 109 m 109
p 112 p 112
l 108 l 108
e 101 e 101
32 32
s 115 s 115
t 116 t 116
r 114 r 114
i 105 i 105
n 110 n 110
g 103 g 103
13 10
10 End-of-file marker: -1
End-of-file marker: -1 ,  

In binary mode there seems to be two characters after the string. In text mode there's only one.

End-of-Line and End-of-File Characters

The two modes—binary and text—treat end-of-line (EOL) characters and
end-of-file (EOF) characters in different ways.

In DOS, a line of text ends with a carriage return (CR) and a linefeed (LF), which appear above as ASCII 13 plus ASCII 10. In the UNIX operating system, which has close ties to the C language, a single ASCII 10 (the newline character) marks the end of a line.

The once-popular CP/M operating system signals the end of files with a CTRL+Z character (ASCII 26, 0x1A)—a tradition that carried forward to DOS. This is not the case with UNIX (and C), which don't use a unique EOF character.

Text Mode Translations

It's important to understand the differences between text mode and binary mode when writing and reading disk files. No translations are made in binary mode. In text mode, however, the end-of-line and end-of-file characters are translated.

When you read a file in text mode and a CR–LF combination appears in the stream, the two characters are translated to one newline character. The opposite translation occurs when you write a file in text mode: each LF character is translated to a CR–LF pair. In other words, the new line is represented by two characters on disk and one character in memory. These translations do not occur when you read and write a file in binary mode.

When you read a file in text mode and a CTRL+Z (0x1A) character appears in the stream, the character is interpreted as the end-of-file character. However, when you're in text mode and you close a file to which you've been writing, a CTRL+Z is not placed in the file as the last character. In binary mode, the CTRL+Z character has no special meaning (it is not interpreted as the end-of-file character).

The difference between text mode and binary mode is relatively minor when you're handling strings, but it's important when you're writing numeric values to disk files.