Planning a Reliable Configuration |
Typically, server failure is the most costly for a business, whether it is a file server, a print server, or an applications server. You need to consider the effects of individual components when deciding how to configure company computers and when to run diagnostic tests.
Motherboards consist of electronics that can and do fail, although the motherboard and the CPU are generally reliable computer components. There is little that you can do to avoid a motherboard failure or CPU fault, except to run regular system checks to ensure that the components are functioning correctly. Some systems include built-in diagnostic tools that operate with Windows 2000.
The three major types of RAM that deal with error detection and correction are parity RAM, error-correction coding (ECC) RAM, and nonparity RAM.
Parity RAM Parity RAM contains an extra bit that indicates if each byte in the RAM is faulty. When parity RAM detects a parity difference, it signals the CPU through a nonmaskable interrupt (NMI). Depending on where and when detection happens, Windows 2000 determines if this is an I/O board parity error, memory bus error, or some other kind of parity error. Windows 2000 can also report I/O channel parity errors from cards in slots. This generates an error message in these cases, and sometimes the computer stops.
Error-correction coding RAM High-end systems often use ECC RAM, which can detect a two-bit failure and correct a single-bit failure in the system memory. Windows 2000 continues to run in spite of a single-bit failure. Depending on the hardware design, there might or might not be a report of this corrective action.
Nonparity RAM If you use nonparity RAM, Windows 2000 has no way to detect memory problems, and your computer might crash randomly. Nonparity RAM costs less than parity RAM, and parity RAM is not available for all computers. If you do not have parity RAM in your computers, ask your vendor if it can be installed or is supported by the computer.
Some vendors supply products that you can use to check the RAM in a computer.
Video cards drive the screen and render images for display. They rarely cause computer failures, but might cause the computer to behave erratically. More often, video cards cause screen painting problems, application page faults, and the like. Such problems are typically not critical enough to require you to shut down the computer. To minimize video problems, be sure that your computer is running the most recent release of a supported video driver.
You have many choices for your disk configuration, including fault tolerant configurations. EIDE and SCSI technologies each offer different benefits for fault tolerance and recovery. The MTBF gives you a measure of expected disk and controller reliability.
Be sure to run disk and controller diagnostics during every preventive maintenance check. Diagnostics are typically available from your hardware vendor. Windows 2000 automatically runs Chkdsk every time you start up the computer, and you can run a surface scan of the disks by specifying chkdsk /r. Chkdsk is also available from the Recovery Console.
Asynchronous Transfer Mode (ATM) and other network adapters can have dual channel connections. If one channel fails, the other is automatically used.
Ethernet and Token Ring network adapters do not have dual channel capability. If the manufacturer provides a diagnostic program, it is recommended that you run it on the network adapters during scheduled preventive maintenance or downtimes.
You can evaluate network segments with network packet trace programs, called sniffers. Network Monitor can check for the following problems: