The information in this article applies to:
SUMMARY
Microsoft SQL Server is a transaction-oriented relational database
management system (RDBMS) that is often used in highly concurrent
environments with many simultaneous users. It maintains transactional
integrity and database consistency by using locks at the table and page
levels (and at the row level with SQL Server version 6.5) for INSERTs.
MORE INFORMATIONOVERVIEWTo achieve consistent results during concurrent update activity, any database must impose isolation between transactions. Without isolation, inconsistent results may occur under concurrent use. Most database products use locks to impose isolation. There are isolation techniques other than locks, such as versioning (also called time domain addressing), but each technique has its own cost and overhead. For details, see "Transaction Processing: Concepts and Techniques," by Jim Gray and Andreas Reuter, ISBN 1-55860-190-2. There is no "free lunch" in implementing isolation. Locking is well understood and highly refined in current products, so this article focuses on locking. All parts of the transaction must be protected with locks, or else a ROLLBACK would not be reliable. Some of the actions needing locking protection may not be obvious. These include locks on system catalog tables, indexes, and allocation structures such as extents.Locks are automatically acquired by the database server in response to certain types of queries. Locks are not typically acquired under manual programmatic control (although the current product allows this through optimizer hints). Locks are not generally intended for use under manual control to implement pessimistic concurrency. For example, one connection should not manually acquire a lock in order to indicate to other connections that the page or row is "in use." Rather, the application should use an "in use" column as a flag to other connections. Alternatively, the application could use a cursor under optimistic concurrency control, which would then signal the application if another connection changed the data during the interval the first connection was browsing the data. For more details on implementing cursors, see the SQL Server 6.5 documentation and the following article in the Microsoft Knowledge Base: Q156489 : Overview of SQL Server, ODBC, and DB-Library Cursors For best scalability, performance, and concurrency, application and query design should emphasize keeping the transaction path length short and holding locks as briefly as possible. The foundation of most concurrency problems is laid when the application and database are designed. For this reason, it is critical that these issues be well understood at design time. Otherwise, a hidden performance limitation may be unintentionally engineered into the application, and this may not appear until full-scale stress testing. Stress testing should always be done at the projected full user load, to ensure that concurrency at this load factor meets your requirements. Failure to do stress testing may result in concurrency, blocking, or performance problems appearing late in the design cycle (or worse, after application deployment). Problems discovered at this stage may be very costly to correct. Different RDBMSs may handle locking and concurrency differently. If the application is being ported from one RDBMS to another, you cannot expect the two databases to behave identically. The application may require adjustments, especially if the application takes advantage of a feature or characteristic unique to one RDBMS. This is so even if the application uses only standard ANSI SQL, because locking and concurrency control issues are implementation-specific. The RDBMS designers must balance a tradeoff between locking granularity and overhead. Fine-grain locks at the row or column level can allow more concurrency, but they also entail greater overhead. This relationship between locking granularity and overhead has been well understood for many years in the academic community. See "Effects of Locking Granularity in a Database Management System", by Daniel Ries and Michael Stonebraker from ACM Transactions on Database Systems, September 1977, and "Locking Granularity Revisited", same authors and publication, June 1979. Microsoft SQL Server locks at the table, page, or (with version 6.5) at the row level for INSERTs. Future versions of SQL Server will have more extensive row level locking, but this will not prevent blocking problems. No matter how fine-grained the locks taken, if the application does not exercise good discipline regarding transaction path length and query management, blocking problems may occur. This is similar to a slow moving automobile causing a backup by driving in the left lane of a freeway. The driver may think that sufficient lanes should be made available for him or her to drive any way he or she wants. However, no feasible number of highway lanes will allow drivers to disregard good lane discipline without causing a traffic slowdown. Likewise, no RDBMS isolation technique can allow applications to disregard the impact they have on concurrency and scalability. INVESTIGATING A BLOCKING PROBLEM
Q125770 : Locking Behavior of Updates and Deletes in SQL Server IDENTIFYING AND RESOLVING COMMON BLOCKING SCENARIOSI. Blocking Caused by a Normally Running Query with a Long Execution TimeA long-running query can block other queries. For example, a bulk DELETE or UPDATE can acquire many locks that (whether they escalate to a table lock or not) block other queries. For this reason, you generally do not want to intermix long-running decision support queries and OLTP queries on the same database. You can identify this situation by observing the blocking SPID. The INPUTBUFFER may point to a query known to have a long execution time. This will often cause a steady I/O consumption, visible by running the following query several times in succession:
The PHYSICAL_IO value does not always reflect all the work done by a SPID, because work done on its behalf by other processes (like the read ahead manager) is not charged back to the SPID. However, it is often an approximate indication of I/O activity. If the PHYSICAL_IO value continuously increases, inspect the query running, and decide whether it should be running at the time. Run it in isolation on a quiescent computer, and monitor the I/O activity with SET STATISTICS IO ON. If the query consumes a large amount of I/O resources, it may cause blocking when run on a busy system. The solution is to look for ways to optimize the query, by changing indexes, breaking a large, complex query into simpler queries, or running the query during off hours or on a separate computer. This class of blocking problem may just be a performance problem, and may require you to pursue it as such. For more information see the following article in the Microsoft Knowledge Base: Q110352 : Optimizing Microsoft SQL Server Performance One reason queries can be long-running and hence cause blocking is if they inappropriately use cursors. Cursors can be a convenient method for navigating through a result set, but using them may be slower than set- oriented queries. For more details, see "Microsoft SQL Server 6.5 Unleashed", by David Solomon, Ray Rankins, et al, ISBN 0-672-30956-4. II. Blocking Caused by a Sleeping SPID That Has Lost Track of the Transaction Nesting Level This type of blocking can often be identified by a SPID that is sleeping or awaiting a command, yet whose @@TRANCOUNT is greater than zero. This can happen if the application issues the DB-Library call dbcancel() or the ODBC call sqlcancel() without also issuing the required number of ROLLBACK and COMMIT statements. Issuing these calls cancels the query and the batch, but does not automatically rollback or commit the transaction. This can be seen by issuing a simple query from ISQL/w, such as BEGIN TRAN SELECT * FROM MASTER..SYSMESSAGES and clicking the red Cancel button. After the query is canceled, SELECT @@TRANCOUNT indicates that the transaction nesting level is one. Had this been a DELETE or an UPDATE query, or had HOLDLOCK been used on the SELECT, all the locks acquired would still be held. Applications must properly manage transaction nesting levels, or they may cause a blocking problem following the cancellation of the query. See the following article in the Microsoft Knowledge Base for more information: Q117143 : When and How to Use dbcancel() or sqlcancel() NOTE: The transaction nesting level of the SPID can be observed by using DBCC PSS. III. Blocking Caused by a SPID Whose Corresponding Client Application Did Not Fetch All Result Rows to Completion This problem is caused by poor application design. After sending a query to the server, all applications must immediately fetch all result rows to completion. If an application does not fetch all result rows, locks may can be left on the tables, blocking other users. If you are using an application that transparently submits SQL statements to the server, the application must fetch all result rows. If it does not (and if it cannot be configured to do so), you may be unable to resolve the blocking problem. To avoid the problem, you can restrict poorly-behaved applications to a reporting or a decision-support database. You can often identify this problem by the following behavior:
Unlike a conventional deadlock, a distributed deadlock is not detectable using the RDBMS lock manager. The blocking SPID will often appear sleeping, with a waittype of 0x800 (waiting on a network I/0). A SPID in this state cannot be KILLed, as it is waiting on a return from a Windows NT Server API call. A distributed client/server deadlock may occur if the application opens more than one connection to the RDBMS and submits a query asynchronously. The following are two examples of how this can happen, and possible ways the application can avoid it. Example A: Client/Server Distributed Deadlock with a Single Client ThreadIf the client has multiple open connections (dbprocs in DB-Library terms), and a single thread of execution, the following distributed deadlock may occur. For brevity, the term dbproc refers to the client connection structure. In ODBC API terms, the closest analogy is an hdbc.NOTE: Used a fixed font for the information below to display correctly.
In the case shown above, a single client application thread has two open connections. It asynchronously submits a SQL operation on dbproc1. This means it does not wait on the call to return before proceeding. The DB- Library asynchronous call is dbsqlsend(). ODBC applications select asynchronous mode with SQLSetStmtOption() and use the SQL_ASYNC_ENABLE parameter. The application then submits another SQL operation on dbproc2, and awaits the results to start processing the returned data. When data starts coming back, (whichever dbproc first responds), it processes to completion all the data returned on that dbproc. Assume this is dbproc1. It fetches results from dbproc1 until SPID1 gets blocked on a lock held by SPID2 (because the two queries are running asynchronously on the server). At this point, dbproc1 will wait indefinitely for more data. SPID2 is not blocked on a lock, but tries to send data to its client, dbproc2. However, dbproc2 is effectively blocked on dbproc1, awaiting the single thread of execution to run. Example B: Client/Server Distributed Deadlock with a Thread per ConnectionEven if a separate thread exists for each connection on the client, a variation of this distributed deadlock may still occur, as shown by the following:NOTE: Use a fixed font for the information below to display correctly.
This case is similar to Example A, except dbproc2 and SPID2 are running a SELECT with the intention of performing row-at-a-time processing and handing each row through a buffer to dbproc1 for an INSERT in the same table. Eventually, SPID1 becomes blocked on a lock held by SPID2. SPID2 then writes a result row to the client dbproc2. Dbproc2 then tries to pass the row in a buffer to dbproc1, but finds dbproc1 has not yet fetched the last row from the buffer (because it is blocked waiting on SPID1, which is blocked on SPID2). Both examples A and B are fundamental issues that application developers must be aware of. They must code applications to handle these cases appropriately. Two reliable solutions are to use either a query timeout or bound connections. Query Timeout: When a query timeout has been provided, if the distributed deadlock occurs, it will be broken when then timeout happens. See the DB- Library or ODBC documentation for more information on using a query timeout. Bound Connections: This is a feature new to SQL Server 6.5. It allows a client having multiple connections to bind them into a single transaction space, so the connections don't block each other. See the SQL Server 6.5 documentation under "bound connections" for more information. V. Blocking Caused by a SPID That Is in a "Golden," or Rollback, State A data modification query that is KILLed, or canceled outside of a user- defined transaction will be rolled back. This can also occur as a side effect of the client computer restarting and its network session disconnecting. Likewise, a query selected as the deadlock victim will be rolled back. A data modification query often cannot be rolled back any faster than the changes were initially applied. In the case of a DELETE, INSERT, or UPDATE that had been running for an hour, it could take at least an hour to roll back. If the server is shut down in the midst of this operation, the database will be in recovery mode upon restarting, and it will be inaccessible until all open transactions are processed. Startup recovery takes essentially the same amount of time per transaction as run- time recovery, and the database is inaccessible during this period. Thus, forcing the server down to fix a SPID in a rollback state will often be counterproductive. This is expected behavior, because the changes made must be completely rolled back, or transactional and physical integrity in the database would be compromised. Because this must happen, SQL Server marks the SPID in a "golden" or rollback state (which means it cannot be KILLed or selected as a deadlock victim). This can often be identified by observing the sp_who output, which may indicate the ROLLBACK command. On version 6.5 Service Pack 2 or later, a ROLLBACK status has been added to sysprocesses.status, which will also appear in sp_who output or the SQL Enterprise Manager "current activity" screen. However, the most reliable way to get this information is to inspect the DBCC PSS of the blocking SPID in question, and observing the pstat value. For example, it may be something like the following:
Meaning of PSTAT bits:
The pstat value above would be a typical situation if a long-running data modification was canceled (for example, by clicking the Cancel Query button on a GUI application), and then the SPID was found to block users and yet be unkillable. This situation is normal; the transaction must be backed out. It can be identified by the bits, as noted above. In addition to the pstat field of the PSS, the sysprocesses.waittype field can also give information about why the SPID may be waiting. The following are some common values:
VI. Blocking Caused by SQL Server 6.5 Atomic SELECT INTO Behavior By definition, SQL Server treats each statement as a separate transaction. Beginning with SQL Server version 6.5, SELECT INTO was made consistent with this standard by including the table creation and data insert phases in a single atomic operation. A side effect of this is that locks on system catalog tables are maintained for the duration of a SELECT INTO statement. This is more frequently seen in tempdb, because applications often do SELECT INTO temporary tables. Blocking caused by this action can be identified by examining the locks held by the blocking SPID. The atomic SELECT INTO behavior can be disabled with trace flag 5302. For more information, see the following article in the Microsoft Knowledge Base: Q153441 : SELECT INTO Locking Behavior VII. Blocking Caused by an Orphaned Connection If the client application traps or the client workstation is restarted, the network session to the server may not be immediately canceled under some conditions. From the server's perspective, the client still appears to be present, and any locks acquired may still be retained. For more information, see the following article in the Microsoft Knowledge Base: Q137983 : How to Troubleshoot Orphaned Connections in SQL Server APPLICATION INVOLVEMENT IN BLOCKING PROBLEMSSQL Server is essentially a puppet of the client application. The client application has almost total control over (and responsibility for) the locks acquired on the server. While the SQL Server lock manager automatically uses locks to protect transactions, this is directly instigated by the query type sent from the client application, and the way the results are processed. Therefore, resolution of most blocking problems necessitates inspecting the client application.Often, turnkey client applications are used against SQL Server, in addition to higher-level application development tools. These may encapsulate the DB- Library or ODBC API calls to the database in a higher abstraction level. However, from the perspective of SQL Server, there is essentially no difference between one of these higher level applications, a call-level DB- Library application, and a call-level ODBC application. SQL Server only perceives a stream of Transact-SQL queries and certain control tokens sent by each client API call. The same basic issues will cause blocking problems whether the client application is a call-level application written in C or a higher level application that encapsulates the database calls. Likewise, the solution to these problems is generally the same. This means that no matter what level of visibility the application exposes regarding the database calls being made, a blocking problem nonetheless frequently requires both the inspection of the exact SQL statements submitted by the application and the application's exact behavior regarding query cancellation, connection management, fetching all result rows, and so on. If the development tool does not allow explicit control over connection management, query cancellation, query timeout, result fetching, and so on, blocking problems may not be resolvable. This potential should be closely examined before selecting an application development tool for SQL Server, especially for business-critical OLTP environments. There may be a tendency to focus on server-side tuning and platform issues when facing a blocking problem. However, this does not usually lead to a resolution, and can absorb time and energy better directed at examining the client application and the queries it submits. Several tracing utilities can be used to examine exactly what the client application is sending to the server. Each client application may have its own tracing feature, so examine the documentation for the application. In addition, server-side tracing can be done using the -T4032 trace flag and the SQL Trace utility. You can usually resolve a blocking problem without using these utilities, but they are available if needed. APPLICATION DESIGN TECHNIQUES TO AVOID A BLOCKING PROBLEMDO NOT:
It is vital that great care be exercised during the design and construction phase of the database and application. In particular, the resource consumption, isolation level, and transaction path length should be evaluated for each query. Each query and transaction should be as lightweight as possible. Good connection management discipline must be exercised. If this is not done, it is possible that the application may appear to have acceptable performance at low numbers of users, but the performance may degrade significantly as the number of users scales upward. With proper application and query design, Microsoft SQL Server is capable of supporting many thousands of simultaneous users on a single server, with little blocking. The successful sites that reach this level typically use the techniques described in this article. Additional query words: sqlfaqtop
Keywords : kbprg kbusage SSrvGen |
Last Reviewed: April 8, 1999 © 2000 Microsoft Corporation. All rights reserved. Terms of Use. |