Watchdog: Distributing Tasks Across SQL Servers

Chris Westbrook

In this article, Chris demonstrates an efficient technique to coordinate T-SQL task execution across multiple servers running Microsoft SQL Server. Chris's approach can result in good performance with less network traffic than alternate techniques using C or VB.

Once upon a time, life was simple. One big box had all of the company's data. If you needed to run a feed, a report, or maintenance, you wrote a batch job and ran it overnight. Today, however, that same company is likely to have its data spread across half a dozen or more Microsoft NT servers, each with its own SQL Server databases. And, as you've undoubtedly discovered, there's no easy way to access all the data as a whole. Managing and using these discrete, but often similar, databases can be a challenge, and there are a handful of different ways to tackle the challenge.

My approach keeps the data processing on the servers and uses T-SQL. I've found that it offers much better performance (and lower network traffic) than client programs written in VB or C/C++.

I originally developed this technique, which I prosaically called the "watchdog," for a company that had its database spread over 12 servers in four centers around the country. Each server had the same database structure but different data. There was no replication or central data repository; the data was simply left on the individual servers. The company only had a small window each night within which to run its reports, feeds, and maintenance. Many of these processes were written in VB and ran on Pentium 100s. These programs sequentially queried each of the servers and did most of the recordset processing on the local PC, but eventually there was simply too much data for this approach to work. What watchdog did was move all of the queries and processing to the servers and, when needed, combined the results on one main server for exporting to an external location. Server-based T-SQL processing could accomplish in 45 minutes what client-side VB processing had taken eight hours to do. In this article, I'll give you the high-level outline of the parts that make up a watchdog, and I'll also show you step by step how to create one for your process.

Westbrook's watchdog
My watchdog is a stored procedure that starts other stored procedures, tracks their status, and takes appropriate action when those stored procedures are done. The tracking occurs through an audit table to which all of the stored procedures write status information. Figure 1 outlines the general flow of the watchdog technique as follows:

The watchdog stored procedure is started by a scheduled task.
The watchdog then starts tasks on the servers that contain the data to be processed.
Those tasks run stored procedures that do the processing.
Those stored procedures write their status (started) to the audit table.
Often, part of the processing involves inserting records into a table that's used to temporarily store results.
When done, the data-processing stored procedures write their status (finished) to the audit table.
At this time, the watchdog notes that they've finished and copies the information from their local results table into a main results table. The watchdog repeats this step for each data-processing stored procedure.
The main results table now has all of the results. The watchdog then runs the final stored procedure.
The final stored procedure processes the main results table and exports the data if required.

Initial setup
In order to use the watchdog technique, you need to set up a couple of things on each of the servers involved (this setup can be duplicated for as many processes as you need). First, pick a server to be the main server. This is where the watchdog stored procedure will run and where the feeds or reports will end up. The rest of the servers will be data servers. Note that these are only logical distinctions; the main server can be on the same physical server as one of the data servers.

Now you need to run the main.sql schema (see Listing 1) on your main server. Main.sql creates an WDAudit table, an sp_add_WDAudit stored procedure, an sp_upd_WDAudit stored procedure, and a WDServers table. WDAudit stores the status information of the data processing stored procedures. Sp_add_WDAudit and sp_upd_WDAudit are used to add and update information in the WDAudit table. WDServers stores the list of servers and databases on which the watchdog should run the data-processing stored procedures.

Listing 1. Schema for audit and server tables and related stored procedures.

CREATE TABLE dbo.WDAudit (
 audit_add_datetime datetime NOT NULL ,
 audit_server char (20) NOT NULL ,
 audit_code char (1) NOT NULL ,
 audit_process_name char (30) NOT NULL ,
 audit_message varchar (255) NULL 
)
GO
CREATE PROCEDURE sp_add_WDAudit
 @server char(20),
 @code char(1),
 @process char(30),
 @message varchar(255)
AS
 INSERT INTO WDAudit
 VALUES(
  getdate(),
  @server,
  @code,
  @process,
  @message
 )
GO
CREATE PROCEDURE sp_upd_WDAudit
 @server char(20),
 @code char(1),
 @process char(30),
 @date datetime
AS
 UPDATE WDAudit
 SET audit_code = @code
 where audit_server = @server
 and audit_process_name = @process
 and audit_add_datetime = @date
GO
CREATE TABLE dbo.WDServers (
 server_name char (6) NOT NULL ,
 server_database char (8) NOT NULL ,
)
GO

Next, run the data.sql schema (see Listing 2) on each of your data servers. Data.sql creates an sp_generic stored procedure and an sp_generic_runtask stored procedure. Sp_generic is a simple but powerful stored procedure that executes whatever is passed as a parameter. Watchdog uses sp_generic to move data between the servers. Watchdog uses sp_generic_runtask to start tasks on the data servers.

Finally, you or your DBA needs to set up some remote server information within SQL Server. Each data server needs to have the main server as a remote server, but the main server needs to have each data server as a remote server. See your DBA or the SQL documentation on how to accomplish this.

Listing 2 Schema for sp_generic and sp_generic_runtask (data.sql).

CREATE PROCEDURE sp_generic
 @SQL text
AS
 EXEC (@SQL)
GO
CREATE PROCEDURE sp_generic_runtask
 @taskname varchar(100),
 @command varchar(255)
AS
 --This stored procedure kicks off a task 10 seconds
 --from now. This variable holds the time 10 seconds
 --from now.
 DECLARE @datetime datetime
 --Set up the time to run the task.
 SELECT @datetime = dateadd(second, 10, getdate())
 --getting the time in the format needed for
 --sp_updatetask
 DECLARE @starttime1 char(8)
 DECLARE @starttime2 char(6)
 SELECT @starttime1 = convert(char, @datetime, 108)
 SELECT @starttime2 = substring(@starttime1, 1, 2)
  + substring(@starttime1, 4, 2)
  + substring(@starttime1,7, 2)
 --getting the date in the format needed for
 --sp_updatetask
 DECLARE @startdate char(8)
 SELECT @startdate = convert(char, @datetime, 112)
 --converting to int for sp_updatetask
 DECLARE @time int
 DECLARE @date int
 SELECT @time = convert(int, @starttime2)
 SELECT @date = convert(int, @startdate)
 --Run the stored procedure to setup task
 --with correct command and a starttime
 --of 10 seconds from now.
 EXEC msdb..sp_updatetask @taskname
  , @enabled = 1
  , @nextrundate = @date
  , @nextruntime = @time
  , @activestartdate = @date
  , @activestarttimeofday = @time
  , @command = @command
GO

Making a watchdog
Now that the initial setup is complete, you can make and run individual watchdogs. I recommend creating a separate watchdog for each process you need to run. To demonstrate creating a watchdog, let's assume I have a table called Actions that holds every action that was taken on an account (see Listing 3 for table schema). Let's also assume that I need to create a nightly feed of any actions of type "P" of the previous day and send them to an external group via ftp. I'll call this the "P-Feed". I need to set up the WDServers table to list all of the servers I want P-Feed to run on. I also need to set up a watchdog stored procedure, a watchdog task stored procedure, and a watchdog final stored procedure. I'll name these sp_wd_PFeed, sp_wdtask_PFeed, and sp_wdfinal_PFeed. The code can be found in Listings 4, 5, and 6 in the accompanying Download file. I also need to create the tables that will temporarily store my data, and I'll call these PFeedResults on the data servers and PFeedMainResults on the main server. Finally, I need to create scheduled tasks to run the sp_wd_PFeed and sp_wdtask_PFeed stored procedures.

Listing 3 Schema for example tables.


CREATE TABLE dbo.Accounts (
 ID int NOT NULL ,
 Name char(50) NOT NULL ,
 Address char(100) NOT NULL
)
GO
CREATE TABLE dbo.Actions (
 ID int NOT NULL ,
 Code char(1) NOT NULL ,
 AddDate datetime
)
GO
CREATE TABLE dbo.PFeedResults (
 ID int NOT NULL ,
 Name char(50) NOT NULL ,
 Address char(100) NOT NULL
)
GO
CREATE TABLE dbo.PFeedMainResults (
 ID int NOT NULL ,
 Name char(50) NOT NULL ,
 Address char(100) NOT NULL
)
GO

The watchdog stored procedure, sp_wd_PFeed, runs each of the data-processing stored procedures on each of the servers in WDServers. As each of those stored procedures finishes, it merges its results into the PFeedMainResults table on the main server. Finally, it runs the final stored procedure, which exports the results. This stored procedure will remain mostly unchanged between each process for which you create a watchdog. In most cases, you only need to change the names of the data-processing and final stored procedures within the watchdog.

On the other hand, you need to change the wdtask stored procedure dramatically for each new process, since it's based on the data processing required for the process. First, you need to develop the query that accomplishes your data processing. My example query is:

SELECT Account.ID, Account.Name, 
   Account.Address, Actions.Code
FROM Accounts, Actions
WHERE Actions.AccountID = Accounts.ID
AND Actions.Code = 'P'
AND (Actions.AddDate >= @start_date and 
   Actions.AddDate < @end_date)

Since my query is a select, I place an "INSERT INTO PFeedResults" in front of the query. If your query isn't a select, then you don't need to change it. The sp_wdtask_PFeed runs this query and writes status information to the WDAudit table. Next, I create the sp_wdfinal_PFeed stored procedure. This stored procedure exports the data to another server via bcp and ftp.

The final step is to create tasks on the main and data servers. On the main server, create a task that runs the sp_wd_PFeed stored procedure at the time when the PFeed process should start. On the data servers, create tasks called "WDTASK- PFeed". It doesn't matter what you set it up to do as long as you name it correctly, associate it with the correct database, and choose something other than "On Demand". The sp_generic_runtask stored procedure takes care of the rest.

That's it. The watchdog will start your processing on each server in WDServers, merge the results into a table on the main server, and ftp the results to its destination. You can use this technique to accomplish simple tasks, like the P-Feed, or more complicated tasks. The first process I wrote using this technique had an sp_wdtask stored procedure that was 20 printed pages long. As long as you can write your processing in T-SQL, and almost everything related to your data can be done in T-SQL, you can use the watchdog technique.

Chris Westbrook currently runs CM Domain Industries, Inc., a computer consulting company specializing in Internet integration and Web site development for businesses. Chrisw@cmdomain.com.