Recovering a Mirror Set or Stripe Set With Parity

The process of error detection and recovery for software fault-tolerant volumes is very similar for both mirror sets and stripe sets with parity. The Windows NT response to the problem depends on when the problem occurred. For recovery of a hardware fault-tolerant volume, see the documentation for the controller that you are using.

When a disk that is part of a mirror set or a stripe set with parity fails during normal operation, it becomes an orphan. When FtDisk (the fault-tolerant driver) determines that a disk has been orphaned, it directs all reads and writes to the other disk(s) in the set.

It is important to note that the process of orphaning a partition does not occur during a read, only during a write. The read cannot possibly affect the data on the disks, so performing orphan processing is not necessary.

The following error message is displayed:

The operating system should continue to work normally. Users accessing resources over the network should not be affected.

You should back up important data immediately, since the volume is no longer fault tolerant. Use a new tape for backup, not an existing tape. You should replace the failed disk and begin the recovery of the mirror set or stripe set with parity as soon as possible.

During system initialization, if the system cannot locate a partition in a mirror set or a stripe set with parity, it logs a severe error in the event log, marks the partition as an orphan, and uses the remaining partition(s) of the mirror set or stripe set with parity. The system continues to function by using the fault-tolerant capabilities inherent in such volumes.

In Disk Administrator, if you select a mirror set or stripe set with parity that has had a failure, you see a message in the status bar that says <Volume> #N [RECOVERABLE] where <Volume> is either mirror set or stripe set with parity.

Recovering a Mirror Set

There are different methods to use for recovering mirror sets, depending on which partition failed, and whether it contains the system or boot partition. When the disk or controller for the original system or boot partition fails, you will probably have to use the Windows NT startup floppy disk to start from the shadow partition, or reconfigure the shadow disk as the original disk. For information about when you need to use the Windows NT startup floppy disk, see "Creating a Windows NT startup floppy disk," presented earlier in this chapter.

If the failure does not cause any disruption in service, you can continue running in a non-fault-tolerant configuration and schedule a time to regconstruct the mirror set. This activity could occur during a normally scheduled maintenance period or during a less busy time.

You must first break the mirror-set relationship to expose the remaining secondary partition as a separate volume. This step prevents problems when restarting the system.

To break the mirror set

1. Open Disk Administrator and select the mirror set you want to break.

2. On the Fault Tolerance menu, click Break Mirror.

3. In the Confirm message, select Yes.

The remaining, working member of the mirror set receives the drive letter that was previously assigned to the complete mirror set. The orphaned partition receives the next available drive letter, or whatever letter you want to assign.

You can now shut down the system and replace the failed disk. The failed disk can be replaced with any disk that is the same size or larger. It is a good idea to use a disk as similar to the remaining disk as possible. If the failed disk contained the system partition, see the section titled "Configuring the System Partition on a Mirror Set," presented earlier in this chapter.

Note

When you move or replace a disk that was at the end of a SCSI bus, be sure that you terminate only the disk that is now at the end of the bus.

To recontruct the mirror set

1. Perform a low-level format of the new disk on the same controller that will be used with the new disk.

This step eliminates any possibility of translation problems.

If the failed disk was the shadow disk, use the same SCSI ID as the failed disk.

If the failed disk was the original disk, you might want to swap SCSI IDs, so that the remaining disk becomes SCSI ID 0.

2. Restart the system.

3. Once you have restarted the computer, follow the procedure in the earlier section titled "Creating a Mirror Set or Stripe Set With Parity" to reconstruct the mirror.

This step requires a second restart of the system to reconstruct the mirror.

4. After the mirror initialization is complete, update your system information, if necessary, as described in the section "Maintaining Configuration and Essential System Information," earlier in this chapter.

Recovering a Stripe Set With Parity

When a member of a stripe set with parity is orphaned, you can reconstruct the data for the orphaned member from the remaining members. Use the following procedure to initiate the recovery of the stripe set with parity. When you restart the computer, the FtDisk program reads the information from the strips on the other member disks, reconstructs the data of the missing member, and writes it to the new member.

If your computer running Windows NT Server has failed and you need to have the data available sooner than the expected repair time, you can move the disks containing the stripe set with parity to another computer and build the Registry key HKEY_LOCAL_MACHINE\SYSTEM\DISK by using the FtEdit program. The procedure is described in Chapter 7, "Disk, File System, and Backup Utilities."

To reconstruct a stripe set with parity
  1. Open Disk Administrator, and select the recoverable stripe set with parity.
  2. Select an area of unpartitioned space of the same size or larger on the replacement disk.

If the failure is due to a power failure or cabling failure on a single device, you can regenerate within the orphaned member of the original stripe set with parity once the hardware state is restored.

  1. On the Fault Tolerance menu, choose the Regenerate command.
  2. Quit Disk Administrator, and restart your computer.

The reconstruction process occurs in the background. If you open Disk Administrator, the message in the status bar is Stripe set with parity #n [INITIALIZING].

You might receive the following error message when attempting to reconstruct a stripe set with parity:


The drive cannot be locked for exclusive use...

This error occurs if Disk Administrator does not have exclusive access to the stripe set with parity, which happens if the page file, or some other system service, like Microsoft SQL Server or Microsoft Systems Management Server, is accessing the disk. You must temporarily shutdown these services and relocate the page file to regenerate the stripe set with parity.

Note

You should not put your page file on a stripe set with parity, because it degrades performance. If you want to have your page file on a fault-tolerant volume, use a mirror set instead.