How to Tell When the Set Is Broken

Error detection and recovery are similar both for mirrored sets and for parity-striped sets. The exact system response to the problem depends on when the problem occurred. A broken set occurs any time a partition in a mirrored or duplexed set cannot be written or any time a stripe cannot be written. When an I/O error is first detected, the system attempts some things to keep the set from breaking. The primary function is to attempt to reassign the sector that failed. This is done by issuing a command to remap the sector to the disk. This is why SCSI works better for fault-tolerant devices. Some ESDI devices also support the concept of remapping, but there is no standard for the command. Microsoft Windows NT attempts remapping only if the disk is supported by a SCSI controller. If the disk does not support sector mapping, or if the other attempts to maintain the set fails, a high-severity error is logged to the event log. The failed partition is called an orphan. It is important to note that the process of orphaning a partition does not occur during a read, only during writes. This is because the read can't possibly affect the data on the disks, so performing orphan processing would be superfluous.

During system initialization, if the system can't locate each partition in a mirrored set, a severe error is logged in the event log and the remaining partition of the mirror is used. If the partition is part of a parity striped set, a severe error is logged in the event log and the partition is marked as orphan. The system continues to function using the fault-tolerant capabilities inherent in such sets. If all of the partitions within a set cannot be located, the drive is not activated, but the partitions are not marked as orphans. This saves the recovery time of simple problems like disconnecting the SCSI chain from the computer.

The system continues processing until a replacement disk or partition is available to recover from the problem and to ensure fault tolerance again. A set with an orphan is not fault tolerant. Another failure in the set can likely cause the loss of data. Recovery should be done as soon as the problem has been discovered.

Recovery of a orphan mirror is done in a number of steps. First, break the mirror set relationship using the Break Mirror option within the Disk Administrator utility. This converts the remaining active partition of the set into an "normal" partition. This partition receives the drive letter of the set. The orphan partition receives the next available drive letter. You can then create a new set relationship with existing free space on another disk in the local computer or replace the orphan drive and re-establish the relationship with space from this disk. Once the relationship has been established, the system can be restarted. During the system initialization, the data from the original good partition is copied over to the new mirrored partition.

When a member of a parity-striped set is orphaned, it can be regenerated from the remaining data. This uses the same logic (discussed earlier) for the dynamic regeneration of data from the parity and remaining stripes. Select a new, free space area that is as large as the other members in the set, and then choose the Regenerate command from the Fault Tolerance menu. When the system is restarted, the missing stripes are recalculated and written to the new space provided.