Platform SDK: Files and I/O

Defragmentation

A file is stored on a disk drive (and other media) in one or more clusters. Clusters are the atomic unit of data allocation, made up of one or more sectors. Sectors, in turn, are physical storage units.

As a file is written to the disk, the file may not be written in contiguous clusters. Noncontiguous clusters slow down the process of reading and writing the file. The farther apart on the disk the noncontiguous clusters are, the worse the problem because of the time it takes to move the hard drive's read/write head. A file with noncontiguous clusters is said to be fragmented. To optimize files for fast access, a volume may be defragmented.

Defragmentation is the process of moving portions of files around on the disk in order to defragment files; that is, the process of moving a file's clusters on the disk to make them contiguous.

In a simple single-tasking operating system, defragmentation is straightforward: the defragmentation software is the sole task, and there are no other processes to read from or write to the disk. However, in a multitasking operating system, some processes may be reading from and writing to the hard drive while another process is trying to defragment that hard drive. The trick is to avoid writes to the file being defragmented without stopping the writing process for very long. Solving this problem is not trivial, but it is possible.

Some file systems are publicly documented, such as the FAT16 and FAT32 file systems used in the Microsoft® MS-DOS® and Windows® 98 operating systems. This allows programmers to manipulate on-disk data structures (such as file allocation tables, or FATs) directly. However, NTFS, the file system that the Windows NT® operating system uses, is deliberately opaque. To allow defragmentation of NTFS without requiring detailed knowledge of the disk structure of NTFS, a set of three DeviceIoControl operations is provided. The three operations allow applications to locate empty clusters, determine the disk location of file clusters, and move clusters on the disk. The DeviceIoControl operations transparently handle the problem of inhibiting and allowing other processes to read from and write to files during moves.

These same DeviceIoControl operations also work with FAT volumes.

These operations can be performed without inhibiting other processes from running. However, the other processes will have slower response times while a disk drive is being defragmented.

Clusters may be referred to from two different perspectives: within the file and on the volume. Any cluster in a file has a virtual cluster number (VCN), which is its relative offset from the beginning of the file. For example, a seek to twice the size of a cluster, followed by a read, will return data beginning at the third VCN. A logical cluster number (LCN) describes the offset of a cluster from some arbitrary point within the volume. LCNs should be treated only as ordinal, or relative, numbers. There is no guaranteed mapping of logical clusters to physical hard drive sectors.

An extent is a run of contiguous clusters. For example, suppose a file consisting of thirty clusters is recorded in two extents. The first extent might consist of five contiguous clusters, the other of the remaining 25 clusters.

There is no guarantee of any relationship on the disk of any extent to any other extent. For example, the first extent may be at a higher LCN than a subsequent extent.

To defragment a file:

  1. Use the DeviceIoControl FSCTL_GET_VOLUME_BITMAP operation to find a place on the volume large enough to accept the entire file. If necessary, move other files to make a place that's large enough. Ideally, there will be enough unallocated clusters after the first extent of the file that you can simply move subsequent extents into the space after the first extent.
  2. Use the DeviceIoControl FSCTL_GET_RETRIEVAL_POINTERS operation to get a map of the current layout of the file on the disk.
  3. Walk the RETRIEVAL_POINTERS_BUFFER structure returned by FSCTL_GET_RETRIEVAL_POINTERS. Use the DeviceIoControl FSCTL_MOVE_FILE operation to move each cluster as you walk the structure. You may need to renew either the bitmap or the retrieval structure, or both, from time to time as other processes write to the disk.

Two of the operations used for defragmentation require handles to volumes. Only administrators can open volumes to handles, so only administrators can run defragmentation software. Your program should check the privileges of the user executing it, and gracefully refuse to run for nonadministrators.

The DeviceIoControl FSCTL_MOVE_FILE operation only operates on NTFS volumes with a cluster size less than 4K. NTFS format defaults to cluster sizes of less than 4K, so volumes with cluster sizes larger than 4K are rare.

Defragmentation DeviceIoControl Operations

FSCTL_GET_VOLUME_BITMAP

FSCTL_GET_RETRIEVAL_POINTERS

FSCTL_MOVE_FILE

The following is a table of defragmentation structures and the Associated DeviceIoControl operation.

Defragmentation structures Operation
MOVE_FILE_DATA FSCTL_MOVE_FILE
RETRIEVAL_POINTERS_BUFFER FSCTL_GET_RETRIEVAL_POINTERS
STARTING_LCN_INPUT_BUFFER FSCTL_GET_VOLUME_BITMAP
STARTING_VCN_INPUT_BUFFER FSCTL_GET_RETRIEVAL_POINTERS
VOLUME_BITMAP_BUFFER FSCTL_GET_VOLUME_BITMAP