Avoiding Single Points of Failure

Minimize the number of components whose failure will result in the failure of the computer. The "Contingency Planning" section, presented earlier in this chapter, discusses components and reducing the likelihood of failure.

Computers running Windows NT Server have fault-tolerance features built-in to the operating system. Fault tolerance is the ability of a system to continue functioning when a component on the computer fails. Normally, the expression fault tolerance is used to describe disk subsystems, but it can also apply to other parts of the system or the entire system. Fully fault-tolerant computers use redundant disk controllers and uninterruptable power supplies (UPS) as well as fault-tolerant disk subsystems.

Although the data are always available and current in a fault-tolerant disk configuration, you still need to make backups to protect the information on your disk subsystem from:

Disk fault tolerance is not an alternative to a backup strategy with offsite storage. For more information about disk fault tolerance, see "Planning a Fault-tolerant Disk Configuration," presented later in this chapter.

Consider having replacement disks and controllers available on site. For instance, SCSI controllers cost as little as a few hundred dollars, while the cost of 50 to 500 users who cannot work while waiting for a replacement could be many thousands of dollars.

Consider providing UPS protection for individual computers and the network itself, including hubs, bridges, and routers. Windows NT has UPS support on individual computers. These UPSs typically provide power for five to 20 minutes, long enough for Windows NT to do an orderly shutdown when power fails. If there is a history of frequent or prolonged power outages, you should investigate ways to provide power for your critical computers other than from your local power company. It is important to remember that individual UPS systems, even for every computer system on the network, will not necessarily prevent data loss or corruption due to power fluctuations. The network is itself an electrical system. Intermediary devices such as routers, bridges, and hubs, require the same UPS protection in order to prevent loss of network functionality.

There is more information about UPS in Chapter 7, "Protecting Data" in the Windows NT Server Concepts and Planning book, and in Chapter 5, "Preparing For and Performing Recovery," in this book.