Synchronization
Windows NT requires the ability to atomically update memory cells.
One global hardware lock is "adequate" for this, since any number of software locks can be implemented with a single hardware lock.
However, it is desirable to allow multiple processors to make parallel locked references to different cells. This can be done by some fully associative mechanism (each lock is assigned to a processor as needed), in which case only 1 lock per processor is needed. If done by hashing, then the hash function should not include the low two bits of the address, since locks will always be aligned on a 4 byte boundary.
Any cell which is ever set atomically is always set atomically, but it may be read with an ordinary read.
No processor will ever assert more than one lock at a time.
Windows NT requires that user mode code be able to perform atomic operations.
Atomic operations are performed frequently, so locked memory references should not be hindered by long delays.
For systems that use general purpose processors on I/O devices, it would make sense to have the central and the I/O processors communicate via linked lists of requests in memory. In order for this to work, the I/O processor must participate in the locking protocol.