Disaster Recovery Planning

Disaster recovery is the process by which information systems are recovered in the event of a catastrophe: a natural disaster such as a fire, or technical disaster such as a two-disk failure in a RAID-5 array. Disaster recovery planning is the work devoted to preparing all the actions that will occur in response to a catastrophic event. Disaster recovery assessment is the simulation of a catastrophic event and/or the evaluation of the disaster recovery plan's capability to deliver the specified recovery needs. Disaster recovery planning should be considered in light of your own environment, modified accordingly, specified, and verified.

For example, suppose a fire occurs and wipes out your 24-hour data center. Are you certain you can recover? How long will it take you to recover and have your system available? How much data loss can your users tolerate? These should be some of the key concerns of every system administrator and database administrator charged with maintaining invaluable system data.

Ideally, the disaster recovery plan should state how long recovery should take, and the final database state the users can expect. For example, "After the acquisition of specified hardware, recovery should be completed in 48 hours and data will be guaranteed only up to the end of the previous week." It is typically important that management be kept clearly informed of these specifications. Disaster recovery assessment should be able to substantiate the specification.

A disaster recovery plan can be structured many different ways and can contain many types of information (how to get hardware, who is to communicate what, who are the people to be contacted in the event of a disaster, how are they to be contacted, who owns the administration of the plan, and so on).

Preparing for a Disaster

To prepare for disaster, it is recommended that you perform these steps periodically:

  1. Back up all system and user databases.
  2. Maintain system logs in a secure fashion. Keep records of all service packs installed for both Microsoft® Windows NT® Server and Microsoft SQL Server™. Keep records of network libraries used, the security mode, and the sa password.
  3. Maintain a base functionality script for assessing minimal capability quickly.

    Note A base functionality script is a batch of code that can be used to demonstrate the successful functioning of the database from a specific application's perspective. Most commonly this is an .sql file with batched SQL statements run into the server from osql. For other applications, a .bat file is more appropriate because it can contain bcp and osql commands. This base functionality script is very application-specific, and can take many different forms. For example, on a decision support/reporting system, the script may merely be a copy of a couple of your key reporting queries; for an online transaction processing (OLTP) application it may be the execution of a batch of stored procedures to execute INSERT, UPDATE, and DELETE statements. The goal is to confirm that everything is working as intended. The base functionality script provides a dependable tool for the system administrator or database administrator to be able to see that the database is back in a viable state, without depending on end users for verification.


  4. To minimize the amount of data lost, perform regular database and transaction log backups. Performing a backup operation has minimal effect on running transactions, so backup operations can be run during normal operations.
  5. Assess the following steps in Recovering from a Disaster ahead of time on another server, and amend the steps as necessary to suit your environment.
Recovering from a Disaster

To recover from a disaster, perform these steps after acquiring suitable replacement hardware:

  1. Install Windows NT Server and load the appropriate service pack. Verify that appropriate domain functionality exists.

    For example, verify that file sharing works properly.

  2. Install SQL Server and load the appropriate service pack. Restore the master and msdb database backups. It will be necessary to shut down the server after restoring the master database. Reconfigure the server for the appropriate network libraries and security mode.
  3. Confirm that SQL Server is running properly by checking the SQL Server Service Manager and the Windows NT application log. If the Windows NT Server name was changed, use sp_dropserver and sp_addserver to match the Windows NT Server name with the SQL Server computer name.
  4. Verify the availability of the system. Run a functionality script to ensure correct operation.

    Optionally, run DBCC CHECKDB on each database.

  5. Allow users to resume normal usage.
See Also
Planning to Back Up and Restore Databases sqlservr Application
Managing Permissions  

  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.