Using Binary Format

When you're processing strings of ASCII characters and writing them to disk files, it matters little whether you use text mode or binary mode, as long as you're consistent. The advantage of text mode is that it translates newline characters to the carriage-return–line-feed combination, making it possible to use the DOS TYPE command to view the file.

When you're processing numeric values (integers and floating-point numbers), however, you may wish to save your variables in binary mode files, in binary format, for the following reasons:

Binary format almost always saves disk space. In text mode, the number 12345.678 would require eight bytes for the ASCII numerals, one byte for the decimal point, and one or more bytes for a separator between variables. In binary format, a floating-point number uses four bytes, regardless of its value. Short integers use only two bytes.

Binary format generally saves computer time. When you use fprintf to print a numeric value to disk, the computer must translate the internal binary representation to a series of characters. Likewise, when fscanf reads characters into memory, the ASCII values must be translated to the internal binary format. In binary format, none of these translations takes place.

Binary format preserves the precision of floating-point numbers. The translation from binary to decimal ASCII and back to binary affects the precision of the value.

A binary save of arrays or structures is fast. It's not necessary to read through an array of 100 items and print each one to the disk file. Instead, you call the fwrite function (discussed below) once, passing it the size of the array to be saved.

NOTE:

Binary mode is separate from binary format. The modes (binary and text) are parameters you pass to the fopen function. They affect the translation of newline characters and the placing of EOF markers. The formats (binary and text) are ways of representing numeric values. An integer in binary format always occupies two bytes on disk. An integer in text format uses a variable number of bytes: it might contain one character (5) or six (–10186).

Opening a Binary File

The SVBIN.C program below creates two binary mode files with the variables saved in binary format:

/* SVBIN.C: Save integer variables in binary format. */

#include <stdio.h>

#include <io.h>

#define ASIZE 10

main()

{

FILE *ap;

int zebra[ASIZE], acopy[ASIZE], bcopy[ASIZE];

int i;

for( i = 0; i < ASIZE; i++ )

zebra[i] = 7700 + i;

if( (ap = fopen( "binfile", "wb" )) != NULL )

{

fwrite( zebra, sizeof(zebra), 1, ap );

fclose( ap );

}

else

perror( "Write error" );

if( (ap = fopen( "morebin", "wb" )) != NULL )

{

fwrite( &zebra[0], sizeof(zebra[0]), ASIZE, ap );

fclose( ap );

}

else

perror( "Write error" );

if( (ap = fopen( "binfile", "rb" )) != NULL )

{

printf( "Hexadecimal values in binfile:\n" );

while( (i = fgetc( ap )) != EOF )

printf( "%02X ", i );

rewind( ap );

fread( acopy, sizeof(acopy), 1, ap );

rewind( ap );

fread( &bcopy[0], sizeof( bcopy[0] ), ASIZE, ap);

for( i=0; i<ASIZE; i++ )

printf( "\nItem %d = %d\t%d", i, acopy[i], bcopy[i] );

fclose( ap );

}

else

perror( "Read error" );

}

Focus your attention on the zebra array. It contains 10 integers, because the array size ASIZE was defined as 10. First, some values are stored in zebra (in a moment, we'll see why 7700–7709 are significant):

for( i = 0; i < ASIZE; i++ )

zebra[i] = 7700 + i;

Next, we open a file and use fwrite to write the entire array to disk:

if( (ap = fopen( "binfile", "wb" )) != NULL )

{

fwrite( zebra, sizeof(zebra), 1, ap );

fclose( ap );

}

Writing an Array in One Line

The fwrite function requires four pieces of information:

1.The address of the item (a variable, array, or structure)

2.The size of the item in bytes

3.The number of items to be written

4.The FILE pointer for a previously opened file

In this example, the first argument, zebra is an array and, as you may remember from Chapter 8, “Pointers,” the name of an array is the address of the array.

To provide the second argument for fwrite, SVBIN.C uses the sizeof operator, which returns the number of bytes a variable requires. Because zebra is an array of 10 integers and integers use 2 bytes each, the size of zebra should be 20. If you view a directory of your disk after running this program, you'll notice that the file BINFILE is exactly 20 bytes long.

The third argument tells fwrite how many items to write to the file. We have 1 array, so this parameter is 1.

The fourth argument is the FILE pointer returned by fopen.

There's another way to copy the 20 bytes of zebra to the file. After writing to BINFILE, the program uses the fopen function to create a second file called MOREBIN. The following fwrite line writes 10 integers instead of 1 array:

fwrite( &zebra[0], sizeof(zebra[0]), ASIZE, ap );

The second and third arguments have changed. Instead of passing the size of the array (20) and writing 1 copy of the array, we're accessing the size of 1 element (2 bytes) and writing 10 of them (using the symbolic constant ASIZE). The contents of this disk file should match, byte for byte, the contents of BINFILE.

Examining the Binary Contents

Finally, we look at what's inside the file BINFILE. It is opened for reading as a binary file:

if( (ap = fopen( "binfile", "rb" )) != NULL )

A short while loop reads the bytes from BINFILE and displays them in hexadecimal notation:

printf( "Hexadecimal values in binfile:\n" );

while( (i = fgetc( ap )) != EOF )

printf( "%02X ", i );

After running SVBIN.C, the screen displays these values:

14 1E 15 1E 16 1E 17 1E 18 1E

19 1E 1A 1E 1B 1E 1C 1E 1D 1E

The low byte precedes the high byte, so the first two bytes represent the number 0x1E14, which is 7700 in decimal. The next two bytes equal 7701, and so on.

A curious thing happens when you run SVBIN.C and then try to treat the 20-byte file as text. If you TYPE BINFILE from the DOS command line, the file appears as gibberish (of course), and you see only 12 of the 20 characters on the screen. Where did the other characters go? Recall the previous discussion of binary and text files. In DOS, a CTRL+Z (0x1A) marks the end of a text file. And in the midst of our binary file is one of those EOF characters. It's not acting as an EOF; it's part of the number 0x1E1A. But if you ever open this file in text mode, you'll be unable to read past the twelfth byte.

Retrieving the Values from Disk

Most of the time, you won't want to read a binary file one byte at a time. Instead, you call fread, which reads a disk file and stores the values in a variable, an array, or a structure. The fread function complements fwrite. It takes four parameters:

1.The address of the variable

2.The size of the variable in bytes

3.The number of values to read

4.The FILE pointer that references a file opened for reading

Here's one way to read values into an array:

rewind( ap );

fread( acopy, sizeof( acopy ), 1, ap );

The rewind command is necessary because we've already read through the file once. The acopy and bcopy arrays are the same size as our original zebra array. To fill an array with this technique, pass the address, the size of the entire array, a number 1, and the FILE pointer.

A second way to fill an array is to pass the size of a single element and the number of elements you want to read:

rewind( ap );

fread( &bcopy[0], sizeof( bcopy[0] ), ASIZE, ap );

In the first example of fread, we pass the information that the array acopy is 20 bytes long and we want to read it once. In the second example, we pass the size of an integer (2 bytes) and ask for 10 of them. In either case, 20 bytes are transferred.

Just to make sure both arrays are equal, we can print them out:

for( i = 0; i < ASIZE; i++ )

printf( "\nItem %d = %d\t%d", i, acopy[i], bcopy[i] );

fclose( ap );

The screen displays the values 7700 through 7709, which survived the trek from zebra to BINFILE and back again. These values were stored in the zebra array, written to a binary file, then read back into the acopy and bcopy arrays.