Capturing SQL Server Blocking Information

Stuart McKee

An unavoidable characteristic of Microsoft SQL's transaction-based relational database engine is contention. It's very easy to mistake contention for "poor performance." With a simple understanding and a few tools, you might be able to turn a problematic situation into a responsive system. This article outlines how to identify blocking situations, a simple method for monitoring blocking conditions, and a list of resources that can give you more information.

If you support production MS SQL databases and pack a pager, this article is for you. Too many times, I've responded to a frantic call that "the server is down" or "the server is slow" and found that processes were simply stumbling over each other. I won't discuss the details of MS SQL locking behavior -- instead, I'll steer you toward some additional resources -- but will endeavor to provide some pragmatic tips that will let you quickly resolve blocking problems. Another interesting discussion can be found in Paul Munkenbeck's article, "Trace, Trap, and Analyze Blocks and Deadlocks" in the May 1997 issue of SQL Server Professional.

Troubleshooting 101
From my experience supporting production systems and responding to my attached pager, I've boiled down "first-tier" MS SQL support to a very simple list:

1. Can you connect to NT server? (ping, net view, dir)
2. Can you connect to MS SQL Server? (isql, isqlw, odbcping)
3. Can you run a system stored procedure utilizing tempdb? (sp_who)
4. Is there any blocking?
5. Can you select from a database table?
6. Is the SQL errorlog clean?
7. Are the NT system and application event logs clean?

Although each of these steps has many possible outcomes, very often the issue is resolved in Step 4 by answering, "Is there any blocking?"

What's a blocker?
Simply, one process (transaction) holds a resource that another needs. Until the first process lets go, the second process will be "blocked." In a bit more complex situation, two processes might block each other (process 1 has table A and needs table B, process 2 has table B and needs table A). This is sometimes called a "deadly embrace" and will result in a deadlock.

Let's say user 1 (Rick) makes a simple update to table A. Rick is very careful with his ad hoc query and has wrapped it in a "begin tran" statement. His technique is on target -- begin a transaction, do the update, verify the results, then commit the transaction. However, after doing the update and before committing the transaction, Rick heads out for a cup of coffee. User 2 (Pete) has his boss on the phone and is trying to answer a simple question (the answer, of course, is in table A). Pete tries to run a select against table A, but he only gets a spinning globe. Pete is blocked by Rick and gets no results until Rick returns with his coffee and commits the transaction.

This simple example can get very complex when you drastically (and realistically) increase the number of users and objects. The complexity continues to compound with three types of locks (shared, update, and exclusive) that can exist on four types of items (page, extent, table, and intent).

Although this example sounds silly, I know some of you aren't laughing. Unfortunately, blocking situations aren't usually this simple to remedy (solution: make sure Rick has coffee before he does an update). There are many other items that can expose blocking problems -- long-running queries, slow networks, poor design, missing or fragmented indexes, maintenance processes, bugs (ugh), and so on.

Finding the culprit
Most DBAs will be familiar with the "blk" column of sp_who and/or the bright red highlighting of Enterprise Manager's current activity window. Both of these methods can indicate blocking within a database, but it might take you a significant amount of time to filter through to the underlying "culprit." The good news is that the information is primarily gathered from the master..sysprocesses system table. The blocked column gives the system process id (spid) of a blocking process, if any exist.

select count(*)
from sysprocesses
where blocked > 0

Executing the preceding statement will give you a very quick and simple list of how many processes are blocked. To find out how many processes are blocking requires a little deductive logic:

select count(*)
from sysprocesses
where blocked = 0
and spid in (select blocked
             from sysprocesses
             where blocked > 0 )

Finally, to show which processes are blocked and the corresponding blocker, do the following:

select spid 'Process Id Blocked',
       blocked 'Process Id Blocking'
from sysprocesses
where blocked > 0

These simple queries can be expanded to include additional system tables and stored procedures to pull together more useful information. I've created a stored procedure that's installed on all of our MS SQL servers. It has become a critical and useful tool. as SP_CULPRIT.SQL. In short, it does the following:

determines whether blocking is occurring;
determines how many spids are at the beginning of a blocking chain;
creates a cursor with blocking spids; and
returns sp_who, sp_lock, dbcc inputbuffer, dbcc outputbuffer, and dbcc pss for blocking spids.

The spinning globe -- simulated blocking
. BLOCKER.SQL and BLOCKED.SQL can be used to set up a blocking situation.

Open an isqlw window and run BLOCKER.SQL. Open a second query window and run BLOCKED.SQL. In the second window, you should see the spinning globe until you commit the transaction in the first window. You can run sp_culprit and perhaps open Enterprise Manager's current activity window to see what type of information is presented. Try experimenting with these scripts to create different types of blocking situations. For example, modify BLOCKED.SQL to include "begin tran" and a HOLDLOCK optimizer hint (with combinations of other optimizer hints like TABLOCKX):

begin tran select * from ##stu (HOLDLOCK TABLOCKX)

Blocking Monitor
The Blocking Monitor is composed of these processes:

monitor blocking levels (number of blocked processes)
capture blocking information in a history table
execute alerts based on blocking "thresholds"
capture verbose process information for "culprit" blockers

The process is very simple. A SQL Executive task is set up to run a stored procedure (sp_culprit_notify). If the number of blocked processes exceeds the input parameter passed to sp_culprit_notify, it will execute a raiserror statement -- and send e-mail if you'd like. Set up an alert to recognize the raiserror statement. The alert will then trigger another SQL Executive task that will capture the blocking information (sp_culprit) in a log file.

Monitor requirements
The processes require the following:

a batch file directory (c:\mssql\batch for the example)
a log directory (c:\mssql\batch\log for the example)
sqlmail configured and running
CAPTURE.BAT (located in c:\mssql\batch for the example)
CULPRIT.SQL (located in c:\mssql\batch for the example)
sp_culprit stored procedure
sp_culprit_notify stored procedure

Monitor setup
Setup consists of the following steps:

1. Install sp_culprit and sp_culprit_notify stored procedures. . You should modify the "Send Mail" section of sp_culprit_notify to include a valid xp_sendmail statement with your own e-mail addresses.

2. Create directories c:\mssql\batch and c:\mssql\batch\log.

3. Copy CAPTURE.BAT to c:\mssql\batch. Note that the default is to use trusted security, which requires that the sql executive service be set to start up with an account that's in the local administrators group. You can also modify this file to use standard security and input parameters if you'd like (sample command lines are provided). A system administrator login is required to get results from the dbcc statements.

4. Copy CULPRIT.SQL to c:\mssql\batch.

5. Create SQL Executive Task -- Blocking Monitor (see Figure 1). This is the main task that will keep "the pulse." In the command line, the <alert threshold #> is the blocking threshold (the default is one). Once the number of spids in the blocking chain exceeds this threshold, the alert task will be set into motion.

Type: TSQL Database: Master Command: sp_culprit_notify <alert threshold #> Schedule: Every Minute

6. Create SQL Executive Task -- Blocking Info (see Figure 2). This task will "capture" blocking information and will be triggered by an alert. You can modify CAPTURE.BAT to use standard security and input parameters for server and password.

Type: CmdExec Retry attempts: 5 (under options) Command: c:\mssql\batch\capture.bat <target server> <sa password> Schedule: On Demand

7. Create SQL Alert -- Blocking Alert (see Figure 3). This alert will detect the "Blocking Threshold" message in SQL Log generated by sp_culprit_notify.

Error Number: 50000 Error Message Contains String: Blocking Threshold Task To Execute: Blocking Info

You can test this by setting the Blocking Monitor threshold parameter to one (the default) and creating some blocking (see the previous simulated blocking).

Conclusion
Although this article and its code aren't a panacea to MS SQL blocking, I think you'll find the sp_culprit stored procedure and its logic helpful in identifying blocking processes and situations. You can customize the simple and effective logic of the monitor to address a variety of issues. Finally, don't forget the Microsoft Knowledge Base -- it's worth the price of admission.

Stuart McKee is the director of business and database systems at Infoseek Corporation. Infoseek and the Walt Disney Company operate the GO Network (www.go.com). The current MSSQL environment contains 100+ MSSQL Servers, 1,000+ databases, and 35 various replication implementations with more than 10 million replicated transactions per day. He joined Infoseek in a merger with Starwave Corporation last November. Prior to Starwave, he worked for Microsoft as a senior database administrator. stuart.mckee@infoseek.com.

Resources

Microsoft Knowledge Browser Support Online (http://support.microsoft.com/support/search/c.asp?) -- If you're not familiar with it yet, you should get to know the Microsoft Knowledge Browser. A fantastic article with details on MS SQL blocking that provides links to other articles is "Article ID: Q162361 -- Understanding and Resolving SQL Server Blocking Problems." [You can also request any "Q" article by sending a message to mshelp@microsoft.com with the article number (or numbers, separated by commas) in the Subject line -- for instance, Subject: Q162361. -- Ed.]
SQL Books Online -- To save you some time with BOL's query tool, look for the following:
  --  "Database Developers Companion -- Understanding Locking"
  --  "Database Developers Companion -- Customizing Locking with Select"
  --  "Transact-SQL Reference -- syslocks (master database only)"
  --  "Transact-SQL Reference -- sp_lock system stored procedure"