BUG: DTC Commit/Abort Operation may Leave Suspended Worker Thread with Waittype 0x0201

ID: Q247871


The information in this article applies to:
  • Microsoft SQL Server version 7.0

BUG #: 57127 (SQLBUG_70)

SYMPTOMS

In the process of completing a Microsoft Distributed Transaction Coordinator (DTC) transaction abort or commit operation the SQL Server may experience a race condition between 2 or more server process ids (SPIDs) enlisted in the same DTC transaction.

The race condition occurs after the proper abort or commit of the DTC transaction.

The database maintains the proper and consistent state but one or more of the SPIDs involved in the original DTC transaction can maintain the waittype of 0x0201 (XCB).


CAUSE

SQL Server is not properly protecting the waiting status flag in the shared transaction control block (XCB) between multiple worker threads running on separate processors. This leads to a race condition where the wait flag may not appear to be set by waiting SPIDs.

The race condition is such that the SPID completing the DTC abort or commit operation successfully handles transactional duties but does not signal the waiters.


WORKAROUND

Use a simple T-SQL script to check sysprocesses for SPIDs with a waittype of 0x0201 and excessive wait times, for example > 30 seconds.


select * from sysprocesses
	where waittime > 30000 and waittype = 0x201 
When located a simple KILL statement terminates the SPID and returns the worker thread for use by other SPIDs.

NOTE: KILL with the abort or commit option is not necessary when dealing with this issue.


STATUS

Microsoft has confirmed this to be a problem in SQL Server version 7.0.


MORE INFORMATION

The condition requires 2 or more SPIDs to be enlisted in the same DTC transaction. It is extremely timing specific, requiring the worker threads associated with the DTC SPIDs, to run on separate CPUs at the same time. Furthermore, due to timing constraints, it is highly difficult to encounter the problem if the number of SPIDs enlisted in the same DTC transaction exceed maximum CPUs in use by SQL Server.

The SPIDs maintain the waittype of 0x0201 and the wait time will continue to climb. The SPID is placed in a wait, 'sleeping', state by SQL Server and the worker thread is never woken.

When the SPID is in the wait state it should not hold locks or block other SPIDs on the SQL Server. It only monopolizes the assigned worker thread but takes no CPU cycles.

Additional query words:

Keywords : kbSQLServ700bug
Version : winnt:7.0
Platform : winnt
Issue type : kbbug


Last Reviewed: December 13, 1999
© 2000 Microsoft Corporation. All rights reserved. Terms of Use.