The Evolution of File Systems

Years ago, in the days before disk operating systems, each computer was built to run a single, proprietary application, which had complete and exclusive control of the entire machine. The application would write its persistent data directly to a disk, or drum, by sending commands directly to the disk controller. The application was responsible for managing the absolute locations of data on the disk, making sure that it was not overwriting data that was already there, but since the application was the only one running on the machine, this task was not too difficult.

The advent of computer systems that could run more than one application required some sort of mechanism to make sure that applications did not write over each other's data. Application developers addressed this problem by adopting a single standard for marking which disk sectors were in use and which were free. In time, these standards coalesced to become a disk operating system, which provided various services to the applications, including a file system for managing persistent storage. With the advent of a file system, applications no longer had to deal directly with the physical storage medium. Instead, they simply told the file system to write blocks of data to the disk and let the file system worry about how to do it. In addition, the file system allowed applications to create data hierarchies through the abstraction known as a directory. A directory could contain not only files but other directories. which in turn could contain their own files and directories, and so on.

The file system provided a single level of indirection between applications and the disk, and the result was that every application saw a file as a single contiguous stream of bytes on the disk even though the file system was actually storing the file in discontiguous sectors. The indirection freed the applications from having to care about the absolute position of data on a storage device.

Today, virtually all system APIs for file input and output provide applications with some way to write information into a flat file that applications see as a single stream of bytes that can grow as large as necessary until the disk is full. For a long time these APIs have been sufficient for applications to store their persistent information. Applications have made significant innovations in how they deal with a single stream of information to provide features like incremental "fast" saves.

In a world of component objects, however, storing data in a single flat file is no longer efficient. Just as file systems arose out of the need for multiple applications to share the same storage medium, so now, component objects require a system that allows them to share storage within the conceptual framework of a single file. Even though it is possible to store the separate objects using conventional flat file storage, should one of the objects increase in size, or should you simple add another object, it becomes necessary to load the entire file into memory, insert the new object, and then save the whole file. Such a process can be extremely time-consuming.

The solution provided by COM is to implement a second level of indirection: a file system within a file. Instead of requiring that a large contiguous sequence of bytes on the disk be manipulated through a single file handle with a single seek pointer, COM structured storage defines how to treat a single file system entity as a structured collection of two types of objects — storages and streams — that act like directories and files.