An NT driver writer who minimizes the time that the driver holds spin locks can significantly improve both the performance of the driver and of the system overall. For example, consider Figure 16.2, which shows how an interrupt spin lock protects device-specific data that must be shared between an ISR and the StartIo and DpcForIsr routines in an SMP machine.
Figure 16.2 Using an Interrupt Spin Lock
Until the ISR returns, thereby releasing the driver’s InterruptSpinLock, KeSynchronizeExecution spins on the second processor, preventing AccessDevice from touching SynchronizeContext. However, KeSynchronizeExecution also raises IRQL on the second processor to the SynchronizeIrql of the interrupt object(s), thereby preventing another device interrupt from occurring on that processor so AccessDevice can be run at DIRQL as soon as the ISR returns. However, higher DIRQL interrupts for other devices, clock interrupts, and power-fail interrupts can still occur on either processor, as shown in Figure 16.1.
Note that when KeSynchronizeExecution acquires the spin lock and runs AccessDevice on behalf of the StartIo routine, the driver-supplied synchronization routine AccessDevice is given exclusive access to SynchronizeContext. Because AccessDevice runs at the SynchronizeIrql, the driver’s ISR cannot acquire the spin lock and access the same area until the spin lock is released, even if the device interrupts on another processor while AccessDevice is executing.
As Figure 16.2 shows, while a routine running on one processor holds a spin lock, every other routine trying to acquire that spin lock gets no work done. Each routine trying to acquire an already held spin lock simply spins on its current processor until the holder releases the spin lock. When a spin lock is released, one and only one routine can acquire it so every other routine currently trying to acquire the same spin lock will continue to spin.
As explained in Section 16.2.3, the holder of any spin lock runs at raised IRQL: either at DISPATCH_LEVEL for an executive spin lock, or at a DIRQL for an interrupt spin lock. Callers of KeAcquireSpinLock run at DISPATCH_LEVEL until they call KeReleaseSpinLock. Callers of KeSynchronizeExecution automatically raise IRQL on the current processor to the SynchronizeIrql of the interrupt object(s) until the caller-supplied SynchCritSection routine exits and KeSynchronizeExecution returns control.
All code that runs at a lower IRQL cannot get any work done on the set of processors occupied by a spin-lock holder and by other routines trying to acquire the same spin lock.
Consequently, NT driver writers who minimize the time their drivers hold spin locks get significantly better performance from their drivers. They also contribute significantly to better overall system performance.
As Figure 16.2 shows, the Kernel interrupt handler executes routines running at the same IRQL in a multiprocessor machine on a first-come, first-served basis, and the Kernel also does the following:
Note that a given driver’s interrupt-driven I/O operations would tend to be serialized in a uniprocessor machine, but that the same operations can be truly asynchronous in an SMP machine. As Figure 16.2 shows, a driver’s ISR could run on CPU4 in an SMP machine before its DpcForIsr begins processing an IRP for which the ISR has already handled a device interrupt on CPU1.
In other words, NT device driver writers should not assume that an interrupt spin lock can protect operation-specific data, saved by the ISR when it runs on one processor, from being overwritten by the ISR when a device interrupt occurs on another processor prior to the execution of the DpcForIsr or CustomDpc routine.
Although a device driver writer could try to serialize all interrupt-driven I/O operations in order to preserve data collected by the ISR, such a driver would not run much faster in an SMP machine than it did in a uniprocessor machine. To get the best possible driver performance while remaining portable across Windows NT uniprocessor and multiprocessor platforms, NT drivers should use some other technique to save operation-specific data obtained by the ISR for subsequent processing by the DpcForIsr.
For example, an ISR can save operation-specific data in the IRP it passes to the DpcForIsr. A refinement of this technique is to implement a DpcForIsr that consults an ISR-augmented count, processes the count’s number of IRPs using ISR-supplied data, and resets the count to zero before returning. Of course, such a count must be protected by the driver’s interrupt spin lock because its ISR and a SynchCritSection would maintain its value dynamically.