Microsoft Corporation
June 29, 1995
Among the many types of common applications for which computers are used, the most familiar are personal productivity applications or various business applications. There are also high- performance, embedded applications that are often called upon to manage time-critical responses. These are among the most demanding of applications. These applications not only need to respond correctly; they also need to respond within certain specified time parameters, or in "real time." Examples of these applications include:
There are many ways to define a real-time system. Perhaps one of the most comprehensive definitions is: "A real-time system is one in which the correctness of the computations not only depends upon the logical correctness of the computation, but also upon the time at which the result is produced. If the timing constraints of the system are not met, system failure is said to have occurred." (The source for this definition is the real time FAQ in the Internet's comp.realtime news group.)
As an example, consider the bottling equipment in a beverage plant. It's not sufficient if the equipment attempts to cap bottles at regular time intervals. If it's off by just a fraction of a second, then the bottle may not be capped properly and the beer will be spoiled. That is, the time at which the machine caps the bottle is a critical part of the process.
Practically speaking, this means that real-time applications fall into two primary groups—those that can respond in "hard" real-time, and other "soft" real-time applications with less severe requirements. Simply stated, a hard real-time operating system must, without fail, provide a response to some kind of event within a specified time window. This response must be predictable and independent of other activities undertaken by the operating system. Providing this response implies that system calls will have a specified, measured latency period. Hard real-time systems often employ specific hardware devices with special device drivers.
Under this definition, Microsoft® Windows NT™ Workstation is not a hard real-time operating system. Rather, it is a general-purpose operating system that has the capability to provide very fast response times, but is not as deterministic as a hard real-time system.
In contrast, a soft real-time operating system is one that has reduced constraints on "lateness," but still must operate quickly within fairly consistent time constraints. That is, it must be good enough to service events so that the response should be satisfied, on average. Windows NT meets this definition.
In summary, there are very few applications that require a true real-time operating system. Most applications only need an operating system that is "good enough" to meet certain requirements. In most situations, given all the features these applications require, Windows NT is probably the best system available for most of these tasks.
The common usage, loose-terminology definition of "real time" is used to refer to any system that modifies an industrial process based on measurements from that process. Many industrial processes such as job shop flow (controlling the flow of work through the factory) and inventory control are often lumped together into the "real time" category.
Feature | Benefit |
Real-time priority class | Application priority can be defined for high levels of responsiveness. |
Virtual page locking into RAM | Avoids paging while doing real-time work to improve predictability of response time. |
Deferred procedure call (DPC) | Limits work done at the interrupt level and results in improved interrupt latency. The interrupt schedules a DPC, but does not process it immediately. This permits other interrupts to proceed. |
Interrupt masks | Disables all interrupts at equal or lower priorities to ensure that the latency of interrupts at higher priorities are not interfered with. |
Sometimes, the term "real time" refers to a 4kb operating system that creates and schedules primitive tasks by absolute priority, responds immediately to hardware interrupts, and does very little else. This is a useful definition for simple applications that gather instrumentation signals or control industrial processes.
This description is still often appropriate. However, a personal computer can be equipped with sophisticated adapter boards with CPU and memory, instead of just data registers. Incoming data can be buffered on the physical adapter, which enables general-purpose operating systems to do what the special purpose real-time systems did previously. Several features make Windows NT an extremely good general-purpose operating system for real-time applications.
Within the Microsoft Windows® family, two initiatives focus on real-time activities. Within the financial services market, where financial analysts and traders often need to know about pricing and market movements almost instantaneously, Microsoft has put together an initiative referred to as OLE for Real-Time Market Data. This initiative is designed to provide a standard way for financial professionals to get access to real-time financial data.
For scientific and manufacturing users, Microsoft has been working with a number of applications and device manufacturers to define a set of Windows-based interfaces that data acquisition equipment can tie into easily. This effort is referred to as Windows for Science, Engineering and Manufacturing, or WINSEM.
Windows NT has been designed from the ground up to be a highly responsive, general-purpose operating system. To the real-time developer, this implies that there are some areas where Windows NT will not be suitable for real-time applications as a result of basic design choices made in its architecture. Topics of interest in real-time systems include:
Paradoxically, many design choices made within Windows NT actually result in a high level of responsiveness. This article will discuss these important real-time topics in terms of the capabilities offered by Windows NT. For reference, a section later in this article will provide a high-level discussion of the Windows NT architecture.
Real-time applications are designed to respond to external events within a specified time interval. Windows NT offers strong capabilities in the areas of both interrupt management and input/output (I/O) management.
Real-time applications use interrupts to ensure that external events are noticed by the operating system. It is critical that interrupts be handled promptly, according to their relative priority.
Within Windows NT, the kernel and the hardware abstraction layer (HAL) are tuned to optimize interrupt delivery and event dispatching. The kernel provides interrupt dispatching to the rest of the system. The kernel can operate at one of 32 possible interrupt levels, as shown in the table below. These levels help prioritize the tasks that must be accomplished before other, less time-critical work. The kernel reserves eight interrupt levels for its own use. The remaining 24 interrupt levels are mapped onto hardware interrupts with the HAL.
Interrupt Level(s) | Definition |
31 | Hardware error interrupt |
30 | Powerfail interrupt |
29 | Interprocessor interrupt |
28 | Clock interrupt |
12–27 | These levels map to the traditional interrupt levels 0–15 used in PCs |
4–11 | These levels are not generally used |
3 | Software debugger interrupt |
0-2 | Reserved for software-only interrupts to prioritize work within device drivers and executive components |
Windows NT handles interrupts on a preemptive basis. When an interrupt occurs, all execution at lower interrupt levels is suspended and execution begins immediately on the highest-level request. Processing continues until the highest-level process has been completed. This places a responsibility on device drivers in that system responsiveness is directly related to how quickly a device driver exits its interrupt routine.
In other words, Windows NT offers applications a multilevel interrupt mask. Higher priority interrupts can occur when the interrupt mask allows them to occur. Changing the interrupt mask raises the level so that lower-level interrupts cannot use system resources until the handling routing for the higher-level interrupt has been completed.
Windows NT is designed for multiprocessor systems. When an interrupt is dispatched, the kernel dispatches the interrupt to just one of the processors in the system. All other processors continue executing uninterrupted. Interrupts can be handled on any of the processors in a computer; this allows interrupts to be handled by idle processors, rather than concentrating the load on a single processor. For these reasons, multiprocessor systems can offer significant benefits for real-time applications.
Asynchronous I/O is a very powerful mechanism for user-level, real-time applications; the application can queue I/O and continue processing without having to either wait or respond immediately to some end-of-I/O event. Additionally, there are completion mechanisms in the I/O system (completion port I/O) that efficiently use the kernel synchronization and executive scheduling capabilities to distribute I/O completion processing to the most recently busy thread. This assures that cache is not invalidated and that the system makes efficient use of the processing power available. Asynchronous I/O not only results in enormous dividends for multiprocessor systems—it does not require any appreciable overhead on single-processor systems.
In many cases (such as in a Win32® application), asynchronous I/O may not be important and the application will wait for the I/O to complete before returning. However, when the user (or kernel component) wants to work while the asynchronous I/O is completing, that user can specify that he or she doesn't want to wait for the request to complete, and can continue working in the rest of the application. When the asynchronous I/O eventually completes, an event or some other notification mechanism will fire. The application can check for this completion event at a convenient time in the future.
Device drivers are very important to real-time users of Windows NT. In particular, processing in a device driver will proceed to completion without any interruptions, which is a feature that many real-time applications want. In order to get this kind of performance, however, the device driver code must be extremely solid. Windows NT device drivers run entirely within the system process and have access to all hardware through the HAL. A typical device driver will have several components, as described in the following table.
Component | Description |
Initialization routine | This routine initializes hardware and sets up data structures used by the driver at startup time. |
Interrupt service routine (ISR) | This routine handles an interrupt on the device that the device driver controls. |
Deferred processing call (DPC) | One or more DPCs handle non–time-critical processing for the driver. |
System thread | Some, but not all, drivers will have a system thread for very low- priority work. |
When a device driver starts, the initialization routine will typically make the driver known to the system, register some entry points, and register an ISR. The device driver will wait, consuming only memory resources, until an interrupt occurs that meets the criteria of the driver's ISR; the driver's ISR is then entered. The driver will not be interrupted until the end of its interrupt service routine, unless a higher level-interrupt occurs. Unlike other operating systems, an ISR on Windows NT can be interrupted by another ISR with higher priority; this is one reason why interrupt latency is hard to define for Windows NT.
When a driver is in its interrupt service routine, it should perform the minimum processing necessary to handle the interrupt, save the state necessary for processing the interrupt, queue a DPC routine for later processing that is not time-critical, and return. The DPC will occur at some later time—although it may occur immediately after leaving the interrupt service routine, if the system is not very busy. A DPC will run to the exclusion of all other processing (other than ISRs) until it exits. Most device driver processing is accomplished in this deferred processing routine or in even lower-priority routines queued by the DPC. Several important rules apply to DPCs. The most important rule is that a DPC cannot wait or lock up the system. Another important rule is that the DPC must have all the memory it accesses locked down in physical memory, so that it cannot incur page faults.
It should be possible, using the support routines and driver model provided by Windows NT, to write device drivers that handle even the most complex and high-speed data acquisition hardware.
Real-time applications, by definition, have a time component associated with their behavior. In this context, it is important to understand how Windows NT assigns priorities to applications and schedules their execution. This section will also discuss several other elements of the operating system and how they can affect real-time applications.
Within Windows NT, user applications are defined as processes. Windows NT is a preemptive, operating system that allows multiple processes (applications) to run at the same time. A process has a number of properties associated with it. For real-time applications, one of the most important properties is the priority class (such as real_time) that defines the basic priority at which the application will run. The priority model within Windows NT includes 32 priority levels, of which 16 are reserved for the operating system and real-time processes. Note that priority levels are different from the dispatch interrupt levels discussed in the kernel section. User applications almost always run at interrupt level 0, regardless of the priority level they are set to.
Each process maintains a private address space to ensure that it will not interfere with other processes, and each process has a base priority class. As shown in Figure 1 below, real-time applications can run with a base priority class of 31 (highest priority), 24, and 16. Typically, real-time processes will run at priority 24. Other applications (dynamic classes) have a base priority class of 15, 13, 9 (normal foreground process), 7, 4, 1, and 0.
Figure 1. Windows NT priority classes
Each process also has one or more threads associated with it, within the same address space, where each thread represents an independent portion of that process. The number of threads is limited only by the available memory and resources. The properties associated with each process, including the priority level, are inherited by these threads.
Each thread has a current priority derived from the process' priority class. By using an application programming interface (API) call that varies up or down from the process' base priority, you can vary the current priority up or down within defined limits. For example, a process running at real_time class 24 can have threads that run anywhere between classes 26-22, depending on their own independent priority. These threads will always stay within the real_time priority class.
Threads are independently scheduled by the executive. Associated with the process is a quantum (the maximum amount of time a thread can execute before the system checks to see if other threads with the same priority want to execute). In general, real-time processes will have priority over almost all other activities or system events. However, for processes in the spectrum of dynamic classes that are running at lower priority levels, a number of events within the system, such as I/O completion, could cause a temporary priority boost for a thread, giving it priority within a process.
Finally, there is a single system process, within which there can be multiple system threads running. This system process runs all device drivers, the kernel, the executive, and device drivers. All these components share a single address space, known as the system space. A device driver, executive component, or the kernel can create a new system thread at any time—these threads can be used to do work in the context of the system process. The technique of running a thread within the context of the system, where it has direct access through the HAL to device hardware, might be of interest to real-time engineers.
Memory management is another area in which many real-time engineers are interested. Windows NT is built around a virtual memory system. For real-time applications, Windows NT solves many of the problems that face real-time developers using more traditional virtual memory systems. First, paging I/O occurs at a lower priority level than the real-time priority process levels. Paging within the real-time process is still free to occur, but this really ensures that background virtual memory management won't interfere with processing at real-time priorities.
Second, Windows NT permits an application to lock itself into memory so that it is not affected by paging within its own process. This allows even very large processes (such as raster image processing, where some processes are over 100MB) to lock all their memory down into physical memory and avoid the overhead of paging, while allowing the rest of the system to function normally.
Finally, Windows NT memory management allows for memory mapping, which permits multiple processes—even device drivers and user applications—to share the same physical memory. This results in very fast data transfers between cooperating processes or between a driver and an application. Memory mapping can be used to dramatically enhance real-time performance.
Cache management is one of the drawbacks of using a general-purpose operating system such as Windows NT for real-time applications. Memory caching is a technique that uses a small amount of high-speed memory to hold the most recently used code or data. If the next instruction or piece of data is not in the cache, the CPU retrieves it from the slower main memory. Using a cache results in the best average system performance for an operating system, but it does introduce an element of unpredictable timing in real-time environments.
One of the most difficult tasks of real-time systems is ensuring that different threads and processes stay synchronized. That is, within a real-time application, the timing at which different activities occur is important. For example, if one part of the application completes before a second part gets the most current data, then the process that the application is monitoring may become unstable. Synchronization results from ensuring that application components are prioritized properly.
Most of the work in the kernel is performed at the highest software interrupt level (known as dispatch_level) or above. The kernel's job primarily consists of synchronizing execution on multiple processors, dispatching, and system database maintenance; it does very little work that is not a direct consequence of a request by a user or subsystem.
The kernel also has a rich set of dispatch objects. These objects synchronize execution within device drivers and Windows NT executive components. Included in this set of dispatch objects are various timers, events, mutexes, and semaphores. These objects can be used in a number of ways to synchronize execution as necessary within the Windows NT executive and kernel. They are also used by subsystems to implement the synchronization primitives exported to user applications.
With general-purpose operating systems that use virtual memory and caching algorithms, it is often difficult to ensure that events will take place within specified periods of time. Windows NT offers several timers that can be used to obtain more deterministic time intervals for managing events in real-time environments. These timers generate software interrupts from the kernel.
Windows NT Workstation version 3.5 makes it possible for applications to use the basic system timer with the GetTickCount( ) API. The resolution of this timer is 10 milliseconds. Several CPUs support a high-resolution counter that can be used to get very granular resolution. The Win32 QueryPerformanceCounter( ) API returns the resolution of a high-resolution performance counter. For Intel®-based CPUs, the resolution is about 0.8 microseconds. For MIPS®-based CPUs, the resolution is about twice the clock speed of the processor. You need to call QueryPerformanceFrequency( ) to get the frequency of the high-resolution performance counter.
Another method that ensures proper synchronization is a spinlock. A spinlock is a locking mechanism associated with a global data structure; this mechanism ensures that only one thread can get access to the data structure at any given time. When the first thread is done, it releases the spinlock so that other threads can get access to the data structure. Within Windows NT, spinlocks are often used by device drivers to ensure that device registers or other data structures can be accessed by only one device driver at a time. Real-time applications can use spinlocks to synchronize timing events during an interrupt response or another similar activity.
With real-time systems, it is important to understand how quickly the operating system can respond to external events. The more deterministic the operating system, the more suitable that system will be for real-time applications.
To process an interrupt, three steps are generally taken:
Frequently, in real-time environments, latency refers to the total time that it takes for these steps to occur—that is, the amount of time that it takes for the CPU to acknowledge and handle an interrupt.
In Design of a TCP/IP Router Using Windows NT, a paper delivered at the 1995 Digital Communications Design Conference, the ability for Windows NT to handle real-time activities was measured. (The paper's author, Brian Catlin, is a principal at Catlin & Associates, a systems analysis and programming firm in Redondo Beach, CA.)
The measurements described in Catlin's paper were taken to analyze the appropriateness of using Windows NT as a platform for a transmission control protocol/Internet protocol (TCP/IP) router. The system being measured was a Hewlett-Packard® XU 5/90 personal computer with one 90 MHz Pentium CPU, 256kb synchronous cache, 16 MB memory and 540 MB of disk space. The measurement test equipment included various Hewlett-Packard systems.
The paper concluded that Windows NT was appropriate for use as a real-time system. The basic measurementsreported in the paper are listed in the table below. The primary discrepancy in the overall duration of the event was attributed to the effects of virtual memory and, in particular, the cache manager.
Measurement | Duration |
Hardware interrupt latency | 1.8 – 2.9 microseconds |
Interrupt dispatching | 4.6 – 10.5 microseconds |
Interrupt service routine length | 10.3 – 16.7 microseconds |
Total elapsed time | 16.7 – 30.1 microseconds |
The Windows NT operating system is based on a design principle often referred to as a microkernel architecture. To ensure higher levels of performance, the design time for Windows NT modified the microkernel design by eliminating the clearly defined break between the kernel code and other system components. In most cases, this distinction is not significant and users get the benefit of a microkernel design with better performance.
The basic architecture of Windows NT uses a layered design composed of a number of components that are working in concert to provide the basic services that a user or application needs to accomplish its tasks. Working from the inside out, these components include the kernel, hardware abstraction layer, executive, device drivers, and environment subsystems. Figure 2 (below) displays an architectural diagram of Windows NT.
Figure 2. Windows NT system architecture
The kernel is the core of the Windows NT operating system. It is a dispatcher, not a scheduler, and is responsible for handling events within the system. Note that this process is very complex and it is difficult to characterize specific performance numbers for interrupt delivery time or dispatch timing. The Windows NT kernel is fairly lightweight—it performs a limited set of functions efficiently, thereby freeing CPU cycles for other processing. These functions would be of little interest to a traditional application program; for instance, the kernel neither knows nor cares about file I/O. Still, the primitives offered by the kernel are the foundation of the operating system services used by applications.
Windows NT is designed to run with a variety of CPUs and hardware platforms. This places a heavy burden on the kernel to efficiently reconcile kernel processing with the need to handle interrupts, cache management, memory handling, register assignments, and other hardware-specific implementations. To solve this portability issue, Windows NT isolates the kernel code from the hardware by using an isolation layer referred to as the hardware abstraction layer, or HAL. The HAL presents a "virtual machine" to the kernel, executive, and device drivers. It maps this virtual machine onto the underlying hardware in an efficient manner and eliminates the need for Windows NT to provide hardware specific kernels, which improves overall efficiency. The HAL presents a single mechanism for device driver and kernel components to use on any hardware platform.
As an example, in Intel®-based systems, device drivers use the INP and OUTP instructions to access the I/O port registers directly. Other CPU architectures use different mechanisms for accessing device I/O registers. Windows NT resolves this difference with macros (such as read_port_uchar and write_port_uchar) that take the appropriate action for each supported CPU.
The Windows NT executive provides services that allow the operating system to perform operations that may span various systems or managers and kernel operations. It provides services to user applications, subsystems, and some device drivers. The executive includes many systems and managers, each of which fulfills a particular function. In traditional mini-computer operating systems, these services are known as system services. In Windows NT, major systems include the cache manager, security monitor, object manager, network access, and other functions. The I/O management and process management routines are probably of most interest to real-time engineers.
The I/O system has a rich set of I/O operations that are exported to other components. These operations range all the way from traditional, synchronous buffered read-and-write operations up to high-speed, buffer-mapped asynchronous I/O operations. Except for creation of the I/O channel itself, all internal I/O operations in Windows NT are asynchronous—that is, they are not expected to be complete upon return from the I/O operation request. This is a major break from mainstream operating systems, and is a major reason for the responsiveness of Windows NT to user inputs and interrupts.
To summarize several important concepts mentioned earlier in this article (in the "Process priority" section), Windows NT lets multiple processes (user applications) run within the system at the same time. Each process is associated with one or more threads, where each thread represents an independent portion of that process. Threads are independently scheduled by the executive based on their priority. A single system process runs all device drivers, the kernel, the executive, and device drivers—components that share a single address, or "system" space. A thread can be run within the context of the system process, where it has direct access through the HAL to device hardware.
Device drivers in Windows NT are kernel-level code that extend the capabilities of the kernel and executive to control new devices or, in some cases such as file systems, to provide new services to user-level components. Generally, these services are very constrained and consist of nothing more than the ability to use a new device of an already supported kind. However, device drivers for totally new types of devices can be written and integrated into the system. Windows NT also has a rich set of direct memory access (DMA) handling support routines. These routines support conventional DMA, as well as scatter-gather DMA.
Subsystems in Window NT are used to provide user environments for applications. Examples of Windows NT subsystems are the Windows 32 subsystem and the POSIX subsystem. Each of these subsystems provides a set of primitives specific to its user environment. The primitives are built upon the primitives provided by the Windows NT executive and kernel.
Q. What kind of interrupt latency does Windows NT provide?
A. Interrupt latency is a function of the drivers that are running and what they do at interrupt time. The Windows NT design philosophy encourages drivers to do as little work as possible. The kernel architecture generally lets a driver do a very small amount of work at interrupt time and use a deferred procedure call (DPC) for driver-related processing at a lower interrupt request level (IRQL). This design allows for very short interrupt service routines and results in reduced interrupt latency.
Q. Can the kernel respond to an interrupt while actually executing kernel code?
A. Yes, although there are some spots where the IRQL level is raised or where interrupts are disabled. In general, the kernel is not preemptible, but interrupts can and do happen during execution of the kernel. The rules for interrupts are very similar to other systems. Windows NT has a multilevel interrupt mask. Higher-priority interrupts can occur when the mask allows them to occur. Changing the mask raises the level where lower-level interrupts cannot occur.
Q. Does Windows NT have a fixed priority scheduler?
A. Windows NT supports 32 priority levels that can be assigned to particular threads. Sixteen of these priority levels are variable; that is, the scheduler adjusts your priority up and down as events occur. Sixteen levels are fixed priority, or "real-time," which means that the operating system cannot adjust the priority level. The Win32 API allows access to both the variable and fixed priority classes and to most of the 16 priorities in each level.
Q. Does Windows NT support asynchronous I/O?
A. Yes. Windows NT provides three different completion notification mechanisms, which allows for a very flexible approach to using asynchronous I/O.
Q. Does Windows NT wait for a clock tick before it responds to an interrupt?
A. Assuming that interrupts are not disabled or masked at that level, the system handles the interrupt immediately.
Q. Does Windows NT support a real-time or contiguous file system?
A. With file systems, the bulk of the performance cost occurs when a file needs to grow. With NTFS, Windows NT can preallocate space to files, although it is not strictly contiguous. The RAW file system provides unstructured storage on an unformatted disk volume, so that the application itself can control the contents of the file system.
Q. Does Windows NT support memory locking?
A. Windows NT allows an application to control the size of its working set. An application can lock memory into its working set.
Q. Does Windows NT support direct I/O and memory access?
A. In general, Windows NT does not allow an application direct access to I/O ports. If this is needed, a driver needs to be written.
Catlin, Brian. Design of a TCP/IP Router Using Windows NT. Presented at the 1995 Digital Communications Design Conference. Redondo Beach, CA: Catlin & Associates.
Custer, Helen. Inside Windows NT. Redmond, WA: Microsoft Press, 1993.