INF: Disaster Recovery Planning for SQL Server

ID: Q169039

The information in this article applies to:

Microsoft SQL Server versions 4.2x, 6.0, 6.5

SUMMARY

This article provides two examples of simple disaster recovery plans that a site may consider while proactively planning for data recovery from a catastrophic disaster. The first example is targeted for sites having available system maintenance windows; the second example is designed for sites operating on a 24-hour basis.

The intention of this article is to provide a starting point for disaster recovery planning efforts. This article is not your Disaster Recovery Plan. It is for you to consider in light of your own environment, modify accordingly, specify, and verify.

MORE INFORMATION

Suppose a fire occurs and wipes out your 24-hour data center. Are you certain you can recover? How long will it take you to recover and have your system available? How much data loss can your users tolerate? These should be some of the key concerns of every system administrator (SA) and database administrator (DBA) charged with maintaining invaluable system data. Disaster recovery is the process by which information systems are recovered in the event of a catastrophe: a natural or manmade disaster such as a fire, or technical disaster such as a two-disk failure in a RAID-5 array. Disaster Recovery Planning is the work devoted to preparing all the actions that will occur in response to a catastrophic event. Disaster Recovery Assessment is the simulation of a catastrophic event and/or the evaluation of the Disaster Recovery Plan's capability to deliver the specified recovery needs.

Ideally, the Disaster Recovery Plan should state how long recovery should take, and the final database state the users can expect. For example, "After the acquisition of specified hardware, recovery should be completed in 48 hours and data will be guaranteed only up to the end of the previous week." It is typically important that management be kept clearly informed of these specifications. Disaster Recovery Assessment should be able to substantiate the specification.

A Disaster Recovery Plan can be structured many different ways, and it can contain many types of information (how to get hardware, who is to communicate what, who are the people to be contacted in the event of a disaster, how are they to be contacted, who owns the administration of the plan, and so on). This article is devoted only to proposing some initial avenues for the technical recovery of SQL Server.

The following is an example for sites that do not operate on a 24-hour basis (that is, sites that have available maintenance windows):

To prepare for disaster, do the following every day (or whenever the maintenance window is):

Shut down SQL Server.

Copy all database device files, preferably to another computer in another building (but beware of network load), and also to a tape device (with the server down, the device files can be copied just like any other files).

Maintain system logs in a secure fashion. Record the directory where all SQL Server files are located, especially the Master.dat file. Keep records of all service packs installed for both Windows NT Server and SQL Server. Keep records of Net-libraries used, the security mode, and the SA password.

Maintain a base functionality script for quickly assessing minimal capability (see the note at the end of this article).

To minimize the amount of data lost during the day, perform database and transaction log dumps while the system is live. See the SQL Server Books Online for more information on dump, load, and recovery procedures.

Assess the following disaster recovery steps ahead of time on another server, and amend the steps as necessary.

To recover after a disaster has occurred, do the following after acquiring suitable replacement hardware:

Install Windows NT Server and load the appropriate service pack. Verify that appropriate domain functionality exists. For example, verify that file sharing works properly.

Install SQL Server and load the appropriate service pack. Place the master database device in same directory as it was initially installed. Also select the same Net-library, security mode, and SA password as before.

Confirm that SQL Server is running properly. If the Windows NT Server name was changed, use sp_dropserver and sp_addserver to match the Windows NT Server name.

Stop SQL Server.

Move all of the database device files back to their original locations, including the Master.dat file.

Restart the SQL Server.

If any database or transaction logs are available after this time, load them.

Verify the availability of the system. Run a functionality script to ensure adequate operation. Ideally, before the users are released onto the system, time should be provided to run DBCC CHECKDB and NEWALLOC on each database, and DBCC TEXTALL and TEXTALLOC on those databases and tables containing TEXT fields. This is to ensure that the migration process did not alter the files in an undesirable fashion.

After running the DBCC statements shows the database to be consistent and the functionality test script succeeds, allow users to resume.

The following is an example for sites that have no online maintenance windows and that run seven days a week, 24 hours a day:

To prepare for a disaster, do the following:

Periodically dump all databases, preferably to a disk on another computer in another building (but beware of network load), and also to a tape device. Transaction logs can be handled similarly.

Maintain system logs in a secure fashion. Record the directory where all SQL Server files are located, especially the Master.dat file. Keep records of all service packs installed for both Windows NT Server and SQL Server. Keep records of Net-libraries used, the security mode, and the SA password. Keep records of the specified database options.

Record in scripts ALL size changes for ALL devices and databases. This is crucial to simplify recovery in this situation!

Maintain a base functionality script for quickly assessing minimal capability (see the note at the bottom of this article).

Assess the following disaster recovery steps ahead of time on another server, and amend the steps as necessary.

To recover after a disaster has occurred after acquiring suitable hardware:

Install Windows NT Server and load the appropriate service pack. Verify that appropriate domain functionality exists. For example, verify that file sharing works properly.

Install SQL Server and load the appropriate service pack. Make sure to place the master database device into same directory as before. Also select the same Net-library, security mode, and SA password as before.

Confirm that SQL Server is running properly. If the Windows NT Server name was changed, run sp_dropserver and sp_addserver to match the Windows NT Server name.

Create or alter all devices and databases from the scripts made in step 3 of the previous section above. Databases can be created for LOAD.

After all device files and databases are sized as they were at the time of the last dump, if either the user logon information or the remote server logon information is significant from the dumped master database, proceed with step 5a. Otherwise, if they are not crucial, proceed with step 6.
1. Stop the SQL Server.
2. Start the SQL Server in single user mode from the command line "SQLSERVR -c -m".
3. Load the master database from the last dump of it before the catastrophe occurred.
4. After success, stop and restart SQL Server normally. Continue with step 6.

Load each of the user databases from the dumped files (and the transaction log dumps too, if appropriate).

Stop and restart the SQL Server.

Verify the availability of the system. If the master database was not reloaded in step 5c, set the database options for each database. Run a functionality script to ensure the adequate operation of SQL Server. Ideally, before the users are released onto the system, time should be provided to run DBCC CHECKDB and NEWALLOC on each database, and DBCC TEXTALL and TEXTALLOC on those databases and tables containing TEXT fields. This is to ensure that the migration process did not alter the files in an undesirable fashion.

After running the DBCC statements shows the database to be consistent and the functionality test script succeeds, allow users to resume.

Disaster Recovery Assessment provides the verification of the plan, and is achieved by getting sufficient hardware, providing the documented disaster recovery guidelines, and having a backup SA or DBA (someone who is not involved with plan development) recover the system on this computer. Perform periodic Disaster Recovery Assessments to verify the vitality of the current Disaster Recovery Plan.

If your data is valuable, the importance of Disaster Recovery Assessment cannot be overstated. What is the business risk if you cannot get your data back? What is the cost for each hour's delay in getting your system back up and running? This is not a situation to assume that your data is quickly recoverable; verify it! Understand the steps very thoroughly ahead of time, and you will minimize the stress and uncertainty imposed by the circumstances of some future catastrophe.

This article was written as an expansion to the Database Recovery section on page 48 of the Microsoft SQL Server 6.5 Deployment Guide (found on the World Wide Web at http://www.microsoft.com/sql/deploy.htm). Additional information about DUMP LOAD SQLSERVR master database can be found in the SQL Server Books Online and in the Microsoft Knowledge Base.

NOTE: A "base functionality script" is a batch of code that can be used to quickly demonstrate the successful functioning of the database from a specific application's perspective. Most commonly this is a .sql file with batched SQL commands run into the server from ISQL. For other applications, a .bat file is more appropriate because it can contain BCP and ISQL commands. This base functionality script is very application-specific, and can take many different forms. For example, on a Decision Support/Reporting system, the script may merely be a copy of a couple of your key reporting queries; for an online transaction processing (OLTP) application it may be the execution of a batch of stored procedures to execute INSERT, UPDATE, and DELETE statements. The goal is to confirm, from a gross perspective, that everything is working as intended. The base functionality script provides a nice tool for the SA or DBA to be able to see that the database is back in a viable state, without depending on the end users for verification.

Keywords : kbenv kbusage SSrvAdmin SSrvGen
Version : 4.21 6.0 6.5
Platform : WINDOWS
Issue type : kbhowto