Chapter 7: Maintenance

Most maintenance operations within a cluster may be performed with one or more nodes online, and usually without taking the entire cluster offline. This ability allows higher availability of cluster resources.

Installing Service Packs

Microsoft Windows NT service packs may normally be installed on one node at a time and tested before you move resources to the node. With this advantage of having a cluster, if something goes wrong during the update to one node, the other node is still untouched and continuing to make resources available. As there may be exceptions to the installation of a service pack and whether or not it can be applied to a single node at a time, consult the release notes for the service pack for special instructions when installing on a cluster.

Service Packs and Interoperability Issues

To avoid potential issues or compatibility problems with other applications, check the Microsoft Knowledge Base for articles that may apply. For example, the articles in table 5 discuss installation steps or interoperability issues with Windows NT Option Pack, Microsoft SQL Server, and Windows NT Service Pack 4:

Table 5. Knowledge Base Articles

Reference Number Article
Q218922 Installing NTOP on Cluster Server with SP4
Q223258 How to Install NTOP on MSCS 1.0 with SQL
Q223259 How to Install FTP from NTOP on Microsoft Cluster Server 1.0
Q191138 How to Install Windows NT Option Pack on Cluster Server

Replacing Adapters

Adapter replacement may usually be performed after moving resources and groups to the other node. If replacing a network adapter, ensure the new adapter configuration for TCP/IP exactly matches that of the old adapter. If replacing a SCSI adapter and using Y cables with external termination, it may be possible to disconnect the SCSI adapter without affecting the remaining cluster node. Check with your hardware vendor for proper replacement techniques if you want to attempt replacement without shutting down the entire cluster. This may be possible in some configurations.

Shared Disk Subsystem Replacement

With most clusters, shared disk subsystem replacement may result in the need to shut down the cluster. Check with your manufacturer and with Microsoft Product Support Services for proper procedures. Some replacements may not require much intervention, while others may require adjustments to configuration. Further information on this topic is available in the Microsoft Cluster Server Administrator's Guide and in the Microsoft Knowledge Base.

Emergency Repair Disk

The emergency repair disk (updated with Rdisk.exe) contains vital information about a particular system that you can use to help recover a system that will not start, allowing you to restore a backup, if necessary. It is recommended that the disk be updated when the system configuration experiences changes. It is important to note that the cluster configuration is not stored on the emergency repair disk. The service and driver information for the Cluster Service is stored in the system registry. However, cluster resource and group configuration is stored in a separate registry hive and may be restored from a recent system backup. NTBACKUP will backup this hive when backing up registry files (if selected). Other backup software may or may not include the cluster hive. The file associated with the cluster hive is CLUSDB and is stored with the other cluster files (usually in c:\winnt\cluster). Be sure to check system backups to ensure this hive is included.

System Backups and Recovery

The configuration for cluster resources and groups is stored in the cluster registry hive. This registry hive may be backed up and restored with NTBackup. Some third-party backup software may not include this registry hive when backing up system registry files. It is important, if you rely on a third-party backup solution, that you verify your ability to back up and restore this hive. The registry file for the cluster hive may be found in the directory where the cluster software was installed—not on the quorum disk.

As most backup software (at the time of this writing) is not cluster-aware, it may be important to establish a network path to shared data for use in system backups. For example, if you use a local path to the data (example: G:\), and if the node loses ownership of the drive, the backup operation may fail because it cannot reach the data using the local device path. However, if you create a cluster-available share to the disk structure, and map a drive letter to it, the connection may be re-established if ownership of the actual disk changes. Although the ultimate solution would be a fully cluster-aware backup utility, this technique may be a better alternative until such a utility is available.

What Not to Do on a Cluster Server

Below is a list of things not to do with a cluster. While there may be more items that may cause problems, these items are definite words of warning. Article numbers for related Microsoft Knowledge Base articles are noted where applicable.