INF: Disaster Recovery Planning for SQL Server
ID: Q169039
|
The information in this article applies to:
-
Microsoft SQL Server versions 4.2x, 6.0, 6.5
SUMMARY
This article provides two examples of simple disaster recovery plans that a
site may consider while proactively planning for data recovery from a
catastrophic disaster. The first example is targeted for sites having
available system maintenance windows; the second example is designed for
sites operating on a 24-hour basis.
The intention of this article is to provide a starting point for disaster
recovery planning efforts. This article is not your Disaster Recovery Plan.
It is for you to consider in light of your own environment, modify
accordingly, specify, and verify.
MORE INFORMATION
Suppose a fire occurs and wipes out your 24-hour data center. Are you
certain you can recover? How long will it take you to recover and have your
system available? How much data loss can your users tolerate? These should
be some of the key concerns of every system administrator (SA) and database
administrator (DBA) charged with maintaining invaluable system data.
Disaster recovery is the process by which information systems are recovered
in the event of a catastrophe: a natural or manmade disaster such as a
fire, or technical disaster such as a two-disk failure in a RAID-5 array.
Disaster Recovery Planning is the work devoted to preparing all the actions
that will occur in response to a catastrophic event. Disaster Recovery
Assessment is the simulation of a catastrophic event and/or the evaluation
of the Disaster Recovery Plan's capability to deliver the specified
recovery needs.
Ideally, the Disaster Recovery Plan should state how long recovery should
take, and the final database state the users can expect. For example,
"After the acquisition of specified hardware, recovery should be completed
in 48 hours and data will be guaranteed only up to the end of the previous
week." It is typically important that management be kept clearly informed
of these specifications. Disaster Recovery Assessment should be able to
substantiate the specification.
A Disaster Recovery Plan can be structured many different ways, and it can
contain many types of information (how to get hardware, who is to
communicate what, who are the people to be contacted in the event of a
disaster, how are they to be contacted, who owns the administration of the
plan, and so on). This article is devoted only to proposing some initial
avenues for the technical recovery of SQL Server.
The following is an example for sites that do not operate on a 24-hour
basis (that is, sites that have available maintenance windows):
To prepare for disaster, do the following every day (or whenever the
maintenance window is):
- Shut down SQL Server.
- Copy all database device files, preferably to another computer in
another building (but beware of network load), and also to a tape
device (with the server down, the device files can be copied just like
any other files).
- Maintain system logs in a secure fashion. Record the directory where all
SQL Server files are located, especially the Master.dat file. Keep
records of all service packs installed for both Windows NT Server and
SQL Server. Keep records of Net-libraries used, the security mode, and
the SA password.
- Maintain a base functionality script for quickly assessing minimal
capability (see the note at the end of this article).
- To minimize the amount of data lost during the day, perform database and
transaction log dumps while the system is live. See the SQL Server Books
Online for more information on dump, load, and recovery procedures.
- Assess the following disaster recovery steps ahead of time on another
server, and amend the steps as necessary.
To recover after a disaster has occurred, do the following after acquiring
suitable replacement hardware:
- Install Windows NT Server and load the appropriate service pack. Verify
that appropriate domain functionality exists. For example, verify that
file sharing works properly.
- Install SQL Server and load the appropriate service pack. Place the
master database device in same directory as it was initially installed.
Also select the same Net-library, security mode, and SA password as
before.
- Confirm that SQL Server is running properly. If the Windows NT Server
name was changed, use sp_dropserver and sp_addserver to match the
Windows NT Server name.
- Stop SQL Server.
- Move all of the database device files back to their original locations,
including the Master.dat file.
- Restart the SQL Server.
- If any database or transaction logs are available after this time, load
them.
- Verify the availability of the system. Run a functionality script to
ensure adequate operation. Ideally, before the users are released onto
the system, time should be provided to run DBCC CHECKDB and NEWALLOC on
each database, and DBCC TEXTALL and TEXTALLOC on those databases and
tables containing TEXT fields. This is to ensure that the migration
process did not alter the files in an undesirable fashion.
- After running the DBCC statements shows the database to be consistent
and the functionality test script succeeds, allow users to resume.
The following is an example for sites that have no online maintenance
windows and that run seven days a week, 24 hours a day:
To prepare for a disaster, do the following:
- Periodically dump all databases, preferably to a disk on another
computer in another building (but beware of network load), and also to a
tape device. Transaction logs can be handled similarly.
- Maintain system logs in a secure fashion. Record the directory where all
SQL Server files are located, especially the Master.dat file. Keep
records of all service packs installed for both Windows NT Server and
SQL Server. Keep records of Net-libraries used, the security mode, and
the SA password. Keep records of the specified database options.
- Record in scripts ALL size changes for ALL devices and databases. This
is crucial to simplify recovery in this situation!
- Maintain a base functionality script for quickly assessing minimal
capability (see the note at the bottom of this article).
- Assess the following disaster recovery steps ahead of time on another
server, and amend the steps as necessary.
To recover after a disaster has occurred after acquiring suitable hardware:
- Install Windows NT Server and load the appropriate service pack. Verify
that appropriate domain functionality exists. For example, verify that
file sharing works properly.
- Install SQL Server and load the appropriate service pack. Make sure to
place the master database device into same directory as before. Also
select the same Net-library, security mode, and SA password as before.
- Confirm that SQL Server is running properly. If the Windows NT Server
name was changed, run sp_dropserver and sp_addserver to match the
Windows NT Server name.
- Create or alter all devices and databases from the scripts made in step
3 of the previous section above. Databases can be created for LOAD.
- After all device files and databases are sized as they were at the time
of the last dump, if either the user logon information or the remote
server logon information is significant from the dumped master database,
proceed with step 5a. Otherwise, if they are not crucial, proceed with
step 6.
- Stop the SQL Server.
- Start the SQL Server in single user mode from the command line
"SQLSERVR -c -m".
- Load the master database from the last dump of it before the
catastrophe occurred.
- After success, stop and restart SQL Server normally. Continue with
step 6.
- Load each of the user databases from the dumped files (and the
transaction log dumps too, if appropriate).
- Stop and restart the SQL Server.
- Verify the availability of the system. If the master database was not
reloaded in step 5c, set the database options for each database. Run
a functionality script to ensure the adequate operation of SQL Server.
Ideally, before the users are released onto the system, time should be
provided to run DBCC CHECKDB and NEWALLOC on each database, and DBCC
TEXTALL and TEXTALLOC on those databases and tables containing TEXT
fields. This is to ensure that the migration process did not alter the
files in an undesirable fashion.
- After running the DBCC statements shows the database to be consistent
and the functionality test script succeeds, allow users to resume.
Disaster Recovery Assessment provides the verification of the plan, and is
achieved by getting sufficient hardware, providing the documented disaster
recovery guidelines, and having a backup SA or DBA (someone who is not
involved with plan development) recover the system on this computer.
Perform periodic Disaster Recovery Assessments to verify the vitality of
the current Disaster Recovery Plan.
If your data is valuable, the importance of Disaster Recovery Assessment
cannot be overstated. What is the business risk if you cannot get your data
back? What is the cost for each hour's delay in getting your system back up
and running? This is not a situation to assume that your data is quickly
recoverable; verify it! Understand the steps very thoroughly ahead of time,
and you will minimize the stress and uncertainty imposed by the
circumstances of some future catastrophe.
This article was written as an expansion to the Database Recovery section
on page 48 of the Microsoft SQL Server 6.5 Deployment Guide (found on the
World Wide Web at http://www.microsoft.com/sql/deploy.htm). Additional
information about DUMP LOAD SQLSERVR master database can be found in the
SQL Server Books Online and in the Microsoft Knowledge Base.
NOTE: A "base functionality script" is a batch of code that can be used to
quickly demonstrate the successful functioning of the database from a
specific application's perspective. Most commonly this is a .sql file with
batched SQL commands run into the server from ISQL. For other applications,
a .bat file is more appropriate because it can contain BCP and ISQL
commands. This base functionality script is very application-specific, and
can take many different forms. For example, on a Decision Support/Reporting
system, the script may merely be a copy of a couple of your key reporting
queries; for an online transaction processing (OLTP) application it may be
the execution of a batch of stored procedures to execute INSERT, UPDATE,
and DELETE statements. The goal is to confirm, from a gross perspective,
that everything is working as intended. The base functionality script
provides a nice tool for the SA or DBA to be able to see that the database
is back in a viable state, without depending on the end users for
verification.
Keywords : kbenv kbusage SSrvAdmin SSrvGen
Version : 4.21 6.0 6.5
Platform : WINDOWS
Issue type : kbhowto
|