The Workbook Compound File

An OLE 2 compound file is essentially "a file system within a file." The compound file contains a hierarchical system of storages and streams. A storage is analogous to a directory, and a stream is analogous to a file in a directory. Each Microsoft Excel workbook is stored in a compound file, an example of which is shown in the following illustration. This file is a workbook that contains three sheets: a worksheet with a PivotTable, a Visual Basic module, and a chart.

If a workbook contains embedded objects, then the file will also contain storages written by the applications that created the objects. The PivotTable data cache storage and VBA PROJECT storage are not documented. The CompObj stream contains OLE 2 component object data, and the Summary Info stream contains the standardized file summary information such as title, subject, author, and so on.

The Book stream begins with a BOF record, and then contains workbook global records up to the first EOF. The workbook global section contains one BOUNDSHEET record for each sheet in the workbook. You can use the dt field (document type), the lbPlyPos field (stream position of the BOF record for the sheet), and the cch/rgch fields (sheet name as a byte-counted string) to quickly read selected sheets in the workbook.

Each sheet in the workbook is stored after the workbook global section, beginning with BOF and ending with EOF. If you read the file in a continuous stream (instead of using the BOUNDSHEET records), you can test the dt field of each BOF record to determine the sheet type.