Capturing SQL Server Blocking Information
Stuart McKee
An unavoidable characteristic
of Microsoft SQL's transaction-based relational database engine is contention.
It's very easy to mistake contention for "poor performance." With
a simple understanding and a few tools, you might be able to turn a problematic
situation into a responsive system. This article outlines how to identify
blocking situations, a simple method for monitoring blocking conditions,
and a list of resources that can give you more information.
If you support production MS SQL databases and pack a pager, this article
is for you. Too many times, I've responded to a frantic call that "the
server is down" or "the server is slow" and found that processes
were simply stumbling over each other. I won't discuss the details of MS
SQL locking behavior -- instead, I'll steer you toward some additional resources
-- but will endeavor to provide some pragmatic tips that will let you quickly
resolve blocking problems. Another interesting discussion can be found in
Paul Munkenbeck's article, "Trace, Trap, and Analyze Blocks and Deadlocks"
in the May 1997 issue of SQL Server Professional.
Troubleshooting 101
From my experience supporting production
systems and responding to my attached pager, I've boiled down "first-tier"
MS SQL support to a very simple list:
1. Can you connect to NT server? (ping, net view, dir)
2. Can you connect to MS SQL Server? (isql, isqlw, odbcping)
3. Can you run a system stored procedure utilizing tempdb? (sp_who)
4. Is there any blocking?
5. Can you select from a database table?
6. Is the SQL errorlog clean?
7. Are the NT system and application event logs clean?
Although each of these steps has many possible outcomes, very often the
issue is resolved in Step 4 by answering, "Is there any blocking?"
What's a blocker?
Simply, one process (transaction) holds
a resource that another needs. Until the first process lets go, the second
process will be "blocked." In a bit more complex situation, two
processes might block each other (process 1 has table A and needs table
B, process 2 has table B and needs table A). This is sometimes called a
"deadly embrace" and will result in a deadlock.
Let's say user 1 (Rick) makes a simple update to table A. Rick is very careful
with his ad hoc query and has wrapped it in a "begin tran" statement.
His technique is on target -- begin a transaction, do the update, verify
the results, then commit the transaction. However, after doing the update
and before committing the transaction, Rick heads out for a cup of coffee.
User 2 (Pete) has his boss on the phone and is trying to answer a simple
question (the answer, of course, is in table A). Pete tries to run a select
against table A, but he only gets a spinning globe. Pete is blocked by Rick
and gets no results until Rick returns with his coffee and commits the transaction.
This simple example can get very complex when you drastically (and realistically)
increase the number of users and objects. The complexity continues to compound
with three types of locks (shared, update, and exclusive) that can exist
on four types of items (page, extent, table, and intent).
Although this example sounds silly, I know some of you aren't laughing.
Unfortunately, blocking situations aren't usually this simple to remedy
(solution: make sure Rick has coffee before he does an update). There are
many other items that can expose blocking problems -- long-running queries,
slow networks, poor design, missing or fragmented indexes, maintenance processes,
bugs (ugh), and so on.
Finding the culprit
Most DBAs will be familiar with the "blk"
column of sp_who and/or the bright red highlighting of Enterprise Manager's
current activity window. Both of these methods can indicate blocking within
a database, but it might take you a significant amount of time to filter
through to the underlying "culprit." The good news is that the
information is primarily gathered from the master..sysprocesses system
table. The blocked column gives the system process id (spid) of a blocking
process, if any exist.
select count(*)
from sysprocesses
where blocked > 0
|
Executing the preceding statement will give you a very quick and simple
list of how many processes are blocked. To find out how many processes are
blocking requires a little deductive logic:
select count(*)
from sysprocesses
where blocked = 0
and spid in (select blocked
from sysprocesses
where blocked > 0 )
|
Finally, to show which processes are blocked and the corresponding blocker,
do the following:
select spid 'Process Id Blocked',
blocked 'Process Id Blocking'
from sysprocesses
where blocked > 0
|
These simple queries can be expanded to include additional system tables
and stored procedures to pull together more useful information. I've created
a stored procedure that's installed on all of our MS SQL servers. It has
become a critical and useful tool. as SP_CULPRIT.SQL.
In short, it does the following:
- determines whether blocking is occurring;
- determines how many spids are at the beginning
of a blocking chain;
- creates a cursor with blocking spids; and
- returns sp_who, sp_lock, dbcc inputbuffer, dbcc
outputbuffer, and dbcc pss for blocking spids.
The spinning globe -- simulated blocking
. BLOCKER.SQL and BLOCKED.SQL can be used
to set up a blocking situation.
Open an isqlw window and run BLOCKER.SQL. Open a second query window and
run BLOCKED.SQL. In the second window, you should see the spinning globe
until you commit the transaction in the first window. You can run sp_culprit
and perhaps open Enterprise Manager's current activity window to see what
type of information is presented. Try experimenting with these scripts to
create different types of blocking situations. For example, modify BLOCKED.SQL
to include "begin tran" and a HOLDLOCK optimizer hint (with combinations
of other optimizer hints like TABLOCKX):
begin tran
select * from ##stu (HOLDLOCK TABLOCKX)
|
Blocking Monitor
The Blocking Monitor is composed of these
processes:
- monitor blocking levels (number of blocked processes)
- capture blocking information in a history table
- execute alerts based on blocking "thresholds"
- capture verbose process information for "culprit"
blockers
The process is very simple. A SQL Executive task
is set up to run a stored procedure (sp_culprit_notify). If the number of
blocked processes exceeds the input parameter passed to sp_culprit_notify,
it will execute a raiserror statement -- and send e-mail if you'd like.
Set up an alert to recognize the raiserror statement. The alert will then
trigger another SQL Executive task that will capture the blocking information
(sp_culprit) in a log file.
Monitor requirements
The processes require the following:
- a batch file directory (c:\mssql\batch for the
example)
- a log directory (c:\mssql\batch\log for the example)
- sqlmail configured and running
- CAPTURE.BAT (located in c:\mssql\batch for the
example)
- CULPRIT.SQL (located in c:\mssql\batch for the
example)
- sp_culprit stored procedure
- sp_culprit_notify stored procedure
Monitor setup
Setup consists of the following steps:
1. Install sp_culprit and sp_culprit_notify stored procedures. . You should modify the "Send Mail" section of
sp_culprit_notify to include a valid xp_sendmail statement with your own
e-mail addresses.
2. Create directories c:\mssql\batch and c:\mssql\batch\log.
3. Copy CAPTURE.BAT to c:\mssql\batch. Note that the default is to use trusted
security, which requires that the sql executive service be set to start
up with an account that's in the local administrators group. You can also
modify this file to use standard security and input parameters if you'd
like (sample command lines are provided). A system administrator login is
required to get results from the dbcc statements.
4. Copy CULPRIT.SQL to c:\mssql\batch.
5. Create SQL Executive Task -- Blocking Monitor (see Figure
1). This is the main task that will keep "the
pulse." In the command line, the <alert threshold #> is the blocking
threshold (the default is one). Once the number of spids in the blocking
chain exceeds this threshold, the alert task will be set into motion.
Type: TSQL
Database: Master
Command: sp_culprit_notify <alert threshold #>
Schedule: Every Minute
|
6. Create SQL Executive Task -- Blocking Info (see Figure
2). This task will "capture" blocking
information and will be triggered by an alert. You can modify CAPTURE.BAT
to use standard security and input parameters for server and password.
Type: CmdExec
Retry attempts: 5 (under options)
Command: c:\mssql\batch\capture.bat
<target server> <sa password>
Schedule: On Demand
|
7. Create SQL Alert -- Blocking Alert (see Figure
3). This alert will detect the "Blocking
Threshold" message in SQL Log generated by sp_culprit_notify.
Error Number: 50000
Error Message Contains String: Blocking Threshold
Task To Execute: Blocking Info
|
You can test this by setting the Blocking Monitor threshold parameter to
one (the default) and creating some blocking (see the previous simulated
blocking).
Conclusion
Although this article and its code aren't
a panacea to MS SQL blocking, I think you'll find the sp_culprit stored
procedure and its logic helpful in identifying blocking processes and situations.
You can customize the simple and effective logic of the monitor to address
a variety of issues. Finally, don't forget the Microsoft Knowledge Base
-- it's worth the price of admission.
Stuart McKee is the director
of business and database systems at Infoseek Corporation. Infoseek and the
Walt Disney Company operate the GO Network (www.go.com).
The current MSSQL environment contains 100+ MSSQL Servers, 1,000+ databases,
and 35 various replication implementations with more than 10 million replicated
transactions per day. He joined Infoseek in a merger with Starwave Corporation
last November. Prior to Starwave, he worked for Microsoft as a senior database
administrator. stuart.mckee@infoseek.com.
Resources
- Microsoft Knowledge Browser Support Online (http://support.microsoft.com/support/search/c.asp?) -- If you're not familiar with it yet, you should get
to know the Microsoft Knowledge Browser. A fantastic article with details
on MS SQL blocking that provides links to other articles is "Article
ID: Q162361 -- Understanding and Resolving SQL Server Blocking Problems."
[You can also request any "Q" article by sending a message
to mshelp@microsoft.com with the article number (or numbers, separated
by commas) in the Subject line -- for instance, Subject: Q162361. -- Ed.]
- SQL Books Online -- To save you some time
with BOL's query tool, look for the following:
-- "Database Developers Companion -- Understanding
Locking"
-- "Database Developers Companion -- Customizing
Locking with Select"
-- "Transact-SQL Reference -- syslocks (master
database only)"
-- "Transact-SQL Reference -- sp_lock system
stored procedure"