Disk Striping with Parity

Disk striping with parity is a method where multiple partitions are combined as a single logical drive (like disk striping, described earlier). As illustrated in Figure 5.9, the partitions are arranged in a way that ensures multiple single points of failure in the array.

Figure 5.9 Disk Striping with Parity

There must be at least three disks and no more than 32 disks in a striped set with parity. A partition of approximately the same size must be selected from each disk. The disks can be on the same or different controllers. SCSI disks (that is, CD-ROMs) are best since advanced recovery features such as bad block remapping can be used during the recovery process. Data is written in stripes across all partitions in the set. In addition to the data, a parity stripe is written interleaved with the data stripes. The parity stripe is simply a byte parity of the data stripes at a given stripe level or row.

For example, suppose you have five disks in the striped set. At level 0, you have stripe block 0 on disk 0, 1 on 1, 2 on 2, and 3 on 3, and the parity (eXclusive OR, XOR) of the stripe blocks on disk 4. The size of the stripes (also called striping factor) is currently 64K. The size of the parity stripe is the size of the data stripes. On the next row, the parity stripe is on disk 0. Data is on the rest of the disks. Because the parity stripes are not all on the same disk, there is no single point of failure for the set, and the load is evenly distributed.

When using any of the fault-tolerant disk schemes, Windows NT uses a device driver called FTDISK.SYS to receive commands and respond appropriately based on the type of fault tolerance that is being used. Thus, when the file system generates a request to read a section of a file, the normal disk system receives the request from the file system and passes it to the FTDISK.SYS driver. This driver then determines the stripe the data is in. From this and the information on the number of disks in the set, the disk and location on the disk are located. The data is read into memory. Striping can actually increase read performance since each disk in the set can have an outstanding read at the same time.

Writing to a parity striped set is a little more difficult. First the original data from the stripe that is to be written must be read along with the parity information for that stripe level. The differences in the parity information are calculated. The differences are added to the parity stripe. Finally, both the parity and the new information are written to disks. The reads and the writes can be issued concurrently since they must be on different disks, by design.

Fault Tolerance with Parity Striping

There are two general cases of fault tolerance with parity striping.

The first case is when a data stripe is no longer readable. Though the data stripe is not readable, the system may still function. When the bad data stripe is to be read, all of the remaining good data stripes are read along with the parity stripe. Each data stripe is subtracted (with XOR) from the parity stripe; the order isn't important. The result is the missing data stripe. Writing is a little more complicated but works very much the same way. All the data stripes are read and backed out of the parity stripe, leaving the missing data stripe. The modifications needed to the parity stripe can now be calculated and made. Since the system knows the data stripe is bad, it is not written; only the parity stripe is written.

The other general case is when a parity stripe is lost. During data reads this does not present a problem. The parity stripe is not used during normal reads. Writes become much less complicated as well. Since there is no way to maintain the parity stripe, the writes behave as a data stripe write without parity. The parity stripe can be recalculated during regeneration.

Identifying When a Set Is Broken

The process of error detection and recovery is very similar for both mirrored sets and parity striped sets. The exact system response to the problem depends on when the problem occurred.

A broken set is defined as any time one or the other partition in a mirrored or duplexed set cannot be written, or any time a stripe can no longer be written.

When an I/O error is first detected, the system performs some routines in an attempt to keep the set from breaking. The system's first priority is to try reassigning the sector that failed. This is done by issuing a command to remap the sector to the disk.

Windows NT attempts remapping only if the disk is supported by a small computer standard interface (SCSI) controller. SCSI devices are designed to support the concept of remapping. This is why SCSI devices work well as fault-tolerant devices. (Note that some fixed hard disk devices also support the concept of remapping, but there is no standard for this support.)

If the disk does not support sector mapping, or if the other attempts to maintain the set fail, a high severity error is logged to the event log.

The partition that has failed is called an orphan. It is important to note that the process of orphaning a partition does not occur during a read, only during writes. This is because the read cannot possibly affect the data on the disks, so performing orphan processing would be superfluous.

During system initialization, if the system cannot locate each partition in a mirrored set, a severe error is recorded in the event log, and the remaining partition of the mirror is used. If the partition is part of a parity striped set, a severe error is recorded in the event log, and the partition is marked as an orphan. The system then continues to function using the fault-tolerant capabilities inherent in such sets.

If all of the partitions within a set cannot be located, the drive is not activated, but the partitions are not marked as orphans. This saves recovery time for simple problems like disconnecting the SCSI chain from the computer.

Recovering Orphans

When a partition is marked as an orphan, the system continues processing until a replacement disk or partition is available to recover from the problem and ensure fault tolerance again. A set with an orphan is not fault tolerant. Another failure in the set can, and most likely will, cause the loss of data.

Recovery procedures should be performed as soon as the problem is discovered.

To recover

Break the mirror-set relationship using the Break Mirror option in the Disk Administrator utility.
This converts the remaining active partition of the set into an "normal" partition. This partition receives the drive letter of the set. The orphan partition receives the next available drive letter.
You can then create a new set relationship with existing free space on another disk in the local computer, or replace the orphan drive and reestablish the relationship with space from this disk.
Once the relationship is established, restart the computer.
During the system initialization, the data from the original good partition is copied over to the new mirrored partition.

When a member of a parity striped set is orphaned, it can be regenerated from the remaining data. This uses the same logic discussed earlier for the dynamic regeneration of data from the parity and remaining stripes. Select a new free space area that is as large as the other members in the set. Then choose the Regenerate command from the Fault Tolerance menu. When the system is restarted, the missing stripes are recalculated and written to the new space provided.

For more information about using Windows NT fault-tolerance features, see the Windows NT Server Concepts and Planning Guide.