Locks

Locks are semaphores that are customized to participate in the commit and abort protocols. They are used to serialize access to critical regions, such as the objects in the model or their images in a database.

Phantom Locks

In a database, locking is usually performed on a range of items. Locking an individual item is a bit more difficult. Consider a typical query:

Select * from Customer where city = "Acton" and state = "MA";

One way to protect the data retrieved by this select statement is to lock a set of rows. What happens if we just lock the individual items that meet the search criteria? This is dangerous, because it can create a phantom. If another thread is updating the database, and in that thread an entry meets these same criteria (e.g. a new customer is added from Acton, MA), then that customer will be a phantom. He will be added to the database even though you intended to lock him out. Locks that don't keep out phantoms are called phantom locks.

Predicate, Precision and Granular Locks

To prevent the creation of phantoms, you must avoid locking individual rows. The alternative, however, locking the whole table, will have an unacceptable impact on performance.

Another alternative is predicate locking. Predicate locking specifies a predicate (a function that returns true or false) that is used as a filter on rows. A predicate lock will lock every record that satisfies the predicate; that is, for which the predicate returns true. Thus, in the above example, when you tried to add another customer from Acton MA, that customer would meet the predicate

City == Acton and State == MA

and thus would find the database locked; the addition of the record would be blocked until the lock is released. Predicate locks are expensive, however; and their performance is poor, so they do not tend to be used in real systems.

Yet another alternative is precision locking. Precision locking grants locks without computing predicates. Instead, they find conflicts when transactions attempt to read or write records. Precision locks are less expensive than predicate locks, but they tend to have more deadlocks than other locking schemes. They also are not generally used.

Up until now, we have been talking about locks that are set at a uniform granularity. The granularity, or size of a lock, is the size of the locked item: a database table, a row, a field of a row. While queries tend to access a whole table, updates often touch a single row or just one field of a row. There would be advantages in a locking scheme that allows transactions to lock at multiple granularities.

Granular locking restricts transactions to a small set of predefined predicates. These predicates form a tree of granularity (table, row, record). At the top of the tree is a predicate that returns the whole database. The next level of a distributed database might be an individual site. The level below that might be a table, below that would be records and finally an individual field.

This type of structure has some useful mathematical properties. For example, if a predicate subsumes another predicate (that is, if all records that satisfy the second predicate also satisfy the first predicate) then it is above the other predicate in the tree. A lock at a higher level of the tree locks all of the records that all of the locks at lower levels of the tree do. A problem would occur if one transaction locked the database at the record level and then another transaction locked the database at a higher level that included that locked record. This would lead to the second transaction being blocked, thus lowering performance. Because of this, when locking is allowed at multiple granularity, a more complicated set of lock modes is used. In addition to shared and exclusive locks, a new type of lock called an intention lock is defined. The idea is that you can set an intention lock at one level, to declare that you intend to set locks at a lower level. Thus, you can set an intention lock at the table level in order to declare that you intend to lock records.

The protocol is to declare an intention lock at every level higher than the one you will use. Thus, to obtain a

SHARE

lock at a record table, the transaction would first declare an intention lock at the database, site, and table levels. Only if all of these are granted is the SHARE
lock requested.

Locking Granularity, Lock Escalation and Hotspots

Locking often starts out at a high level, such as a table level lock. Some systems make page locking available instead. A popular combination is to make both page and record locking available. Coarse-grained locking, such as page locking, is easy to use, but may be plagued by specific items that many transactions try to access (known as hot spots).

An example of a hot spot is a directory listing in a repository. The directory listings are small and many fit on one page; this page could become a hot spot if the directory is frequently searched. When hot spots are a serious problem, a lower-level protocol, such as record-level locking, is required.

Dealing with Deadlocks

As we saw earlier, two threads may be unable to make progress because each is waiting for a lock that the other holds. This is known as deadlock or a deadly embrace. The only way out of a deadlock is for one of the threads to give up a lock. More generally, deadlocks are broken by aborting one or more transactions.

Deadlocks can either be prevented, or allowed to occur and subsequently dealt with. It turns out that the standard locking protocols lead to a small percentage of deadlocks. Also, it is hard to prevent deadlocks in transactional systems, without overly constraining the concurrency allowed. Therefore, deadlocks are usually allowed to occur, but a deadlock detection mechanism is required.

Deadlocks are especially problematic in distributed systems, as the locks are held at individual nodes. There is no easy way for one node on the network to know which locks are held by another node.

There are two techniques in use for deadlock detection. The first is simply to set a timer for the transaction. Any deadlock will cause the transaction to time out and be aborted. This may free up other transactions that were waiting on the locks the aborted transaction was holding. This method is easy to implement, but has the drawback that it may abort transactions that were merely slow, not deadlocked.

The other method is based on creating a data structure called a waits-for graph. The nodes of the graph represent transactions. An edge (straight line between nodes) from transaction 1 to transaction 2 means that transaction 1 is waiting on a lock held by transaction 2, or that transaction 1 is behind transaction 2 on the list of transactions waiting on a data item. Deadlocks are represented by cycles in this graph. Because deadlocks are sparse and most paths in the graph are short, an incremental algorithm is used to search for deadlocks. This algorithm also uses a data structure that lists all of the locks that each transaction is waiting on.

The list of locks is itself protected by semaphores. This kind of deadlock detector is usually run on a periodic basis, which is cheaper than running it each time a lock is added, removed or changed. The algorithm works by visiting each transaction and making a list of all the transactions that it waits on. This process will eventually find a deadlock when the list of transactions that transaction 1 is waiting on includes a transaction whose own such list includes transaction 1.