9.1 Data Compression

Data compression is an operation that reduces the size of a file by minimizing redundant data. In a file that contains text, redundant data could be frequently occurring characters, such as the space character, or common vowels, such as the letters e and a; it could also be frequently occurring character strings. Data compression operations create a compressed version of a file by minimizing this redundant data.

Each of the many types of data-compression operations minimizes redundant data in a unique manner. For example, the Huffman encoding algorithm assigns a code to characters in a file based on how frequently those characters occur. Another compression algorithm, called run-length encoding, generates a two-part value for repeated characters: The first part specifies the number of times the character is repeated, and the second part identifies the character. Another compression algorithm, known as the Lempel-Ziv algorithm, converts variable-length strings into fixed-length codes, which consume less space than the original strings.

To compress large applications or data files, you can run COMPRESS.EXE from the Microsoft MS-DOSÒ command line. COMPRESS.EXE uses the Lempel-Ziv compression algorithm.