An Introduction to Windows NT[TM] Memory Management Fundamentals

Paul Yao

Paul Yao is president of International Systems Design, a firm specializing in product development and programmer training for Windows. He is the creator of ISD's Windows Power Programming Workshop.

The MicrosoftÒ Windows NTÔ operating system looks just like its 16-bit cousin, the WindowsÔ operating system version 3.1. Almost every part of the 16-bit API has a counterpart in the 32-bit API of the Win32Ô subsystem. Microsoft has worked hard to eliminate differences between the (skinny) 16-bit API and the (hefty) 32-bit API. But this hasn’t prevented Microsoft from building capabilities into Windows NT that far surpass those in Windows1 3.x. Most of these capabilities are served up to the developer via a large number of new APIs only available in Win32, especially in the area of memory management. These include functions for memory-mapped files, better heap support, virtual address management, and thread local storage (see Figure 1).

Figure 1 Windows NT Memory Management Components

Improved performance is one effect of the new memory features. For example, everything read from disk into RAM is automatically cached by the Virtual Memory Manager (VMM). The clearest evidence of this occurs when you run a program twice. When first run, a program’s executable image must be loaded from disk. The second time, the VMM finds the image already present and uses it without hitting the disk. Of course, this assumes the executable image hasn’t been squeezed out of memory by a memory-hungry application. This built-in caching automatically benefits every disk access, without the need for a dedicated disk caching mechanism. Since this caching is integrated into the Windows NT memory manager, it automatically relinquishes memory when the available page pool is below tolerable limits.

Because memory support relies on the capabilities of the processor, I’ll start by discussing two of the processors that Windows NT currently runs on: the 32-bit IntelÒ x86 processor family and the MIPS R4000 processor. Although both processors have built-in support for paged virtual memory, the support in each is quite different. I’ll briefly review the way the two chips work before introducing the Windows NT Executive’s VMM.

Intel x86 Virtual Memory Support

If you’re an experienced PC programmer, you’re familiar with the segmented addressing of the Intel x86 family of processors. When paging support is turned on, segmented addressing is still used, though in a somewhat stunted form. The chief benefit of mixing a segmented and paged addressing scheme is compatibility. Specifically, this feature simplifies supporting MS-DOSÒ and Win16 applications in Windows NT.

As you probably know, segmented addressing means that memory locations are referenced with a 16-bit segment identifier and a 16-bit offset. The real mode address translation mechanism shifts the segment identifier left by four bits to produce a 20-bit base address. A location is referenced by adding the 16-bit offset value to the 20-bit base address, which yields a 20-bit memory address.

This approach has a serious limitation: since applications know about physical addresses, operating systems have difficulty managing system memory. Applications expect that object addresses won’t spuriously change. But an operating system must move objects—and therefore change their addresses—to manage system memory. In single-tasking MS-DOS, applications manage system memory. In the multitasking world of Windows, applications can’t be counted on to do this.

To work around the otherwise unmanageable situation created by real mode, Microsoft built a handle-based scheme into Windows for managing memory. Handles are issued when memory is allocated. To access memory, a lock is applied to produce a pointer. When applications aren’t using memory, they remove all locks to let Windows move memory. With Windows 3.1, real mode disappears, and so does the need to keep memory unlocked when it isn’t being used. Win16 and Win32 allocation routines still reflect a handle-based memory scheme.

The protected mode addressing mechanism of Intel 80286 and higher processors hides the physical address of segments using descriptor tables. Instead of identifying a segment’s physical location, a protected mode segment identifier (also known as a segment selector) contains an index into one of two memory lookup tables: the local descriptor table (LDT) or the global descriptor table (GDT). These tables give operating system builders flexibility in the implementation of memory systems. For example, an operating system might give each process a private LDT and use the GDT for shared memory. Or a single LDT could be used to create a shared global heap, which is how Windows 3.x manages its memory.

Each descriptor table contains several segment attributes, including the segment’s physical address, its size, whether it’s present in the system, and a segment protection value. Each segment is assigned a privilege level (0 to 3) so that an operating system can protect itself from applications. Another part of the protection mechanism prevents access to undefined segments and access beyond the end of segments.

In protected mode, 386 and higher CPUs can support paged virtual addressing. What’s interesting about this feature is that both a segmented and a paged addressing scheme can be operating simultaneously. How is this possible? The segmented address scheme describes the application’s view of memory. Applications access memory using a segment selector (16-bit) and an offset (16-bit or 32-bit). The CPU’s address translation hardware retrieves the descriptor table data and creates a 32-bit linear address. The resulting 32-bit address does not reference physical memory, but instead is used by the paging hardware as a 32-bit virtual memory address.

Although segmentation cannot be disabled for a Windows NT process, an application written to the Win32 API can avoid referencing the segment registers. Such an application ignores the segment registers in the same way that a program compiled using the small memory model accesses memory with 16-bit offsets. It can address memory using 32-bit linear addresses. Windows NT defines two segment selectors to reference the entire address space. It’s possible to define segment selectors that are capable of addressing up to 4GB (see Figure 2). At least two such selectors must be defined, since the Intel processors allow selectors to reference either code or data but not both. When programming to the Win32 API, you work with a flat address space, free from the worries of near and far pointers.

Figure 2 386/486 Flat-model Addressing

Figure 3 depicts the virtual-to-physical address translation of 386 and higher CPUs. These processors use a two-level paging structure to support virtual addressing. Any 32-bit virtual address is viewed by the CPU as being comprised of three parts: a ten-bit offset into a page directory, a ten-bit offset into a page table, with the final twelve bits an offset to the desired code or data. In Windows NT, each process has its own page directory and page table(s).

Figure 3 386/486 Linear to Physical Address Translation

A special register in the CPU points to the current process’s page directory page. When addressing memory, the top ten bits of the 32-bit address serve as an offset into the page directory. Using this value, the addressing hardware retrieves a 32-bit value from the page directory. This value identifies the location of a page table. The next ten bits (the middle of the 32-bit address) serve as an offset into this page table. This location contains the 32-bit address of a physical page of memory. The virtual address’ low twelve bits describe an offset within this page to the specific byte(s) of interest. If this sounds overly complex, rest assured that the processor caches recently used page directory and page table entries. This lowers the overhead required to retrieve intermediate addressing information.

The paging architecture of the Intel x86 processors has some nice features. Page directories and page tables each fit nicely into a 4KB page, with room for 1024 entries in each case. This means that a single page table can reference 1024 X 4096 or 4MB worth of data. A single page directory, in turn, can reference 1024 X 4MB or 4GB of data. Both page tables and page directories can themselves be paged to disk to free up physical memory when it’s needed to allow other processes to run.

The design of the paging support in Windows NT is modeled after the built-in paging of the x86 processors for a number of reasons. It simplified the implementation of Windows NT on these processors and it’s similar to that used by quite a few other 32-bit microprocessors, including the SunÒ SPARCÔ, MotorolaÒ 68K, DECÒ VAXÒ, and Intel i860. Since Microsoft is porting Windows NT to other processors, the Windows NT design team tried to avoid major incompatibilities with the more common processors. Also, this paging scheme is quite efficient in terms of lookup time, memory use, and the amount of code required to handle the tables.

The MIPS processor takes a different approach to providing virtual memory support, which is worth learning about to gain a broader perspective on some of the messiness that a portable API—like Win32—hides from you.

MIPS R4000 Virtual Memory

The R4000 is a RISC chip created by MIPS Computer Systems. The R4000 offers high performance and considerable flexibility to operating system developers.

Some examples of the R4000’s flexibility include its memory access modes and its configurable byte-ordering logic. The R4000 understands both byte-ordering schemes commonly used by today’s hardware, nicknamed “Little-Endian” versus “Big-Endian” ordering. This refers to the location of the most-significant/least significant byte in a multibyte value. As you may know, Intel processors and the DEC VAX put the least significant byte first (Little-Endian). Motorola processors and the IBMÒ 370 put the most significant byte first (Big-Endian). The R4000 can do either, and is used by Windows NT in Little-Endian mode.

Although this chip’s native mode is as a 64-bit processor, it can be set to run as a 32-bit processor. Windows NT uses the 32-bit mode, but still benefits from improved performance due to the processor’s internal 64-bit data paths. Both modes access physical RAM with a 36-bit address. This means the R4000 can access up to 64GB of physical RAM.

In both operating modes, the R4000 lets an operating system choose the size of a memory page. This contrasts sharply with the Intel x86 processors, which use a hardwired 4KB page. Possible page sizes in the R4000 include: 4KB, 16KB, 64KB, 256KB, 1MB, 4MB, and 16MB. Figure 4 shows how the R4000 processor divides a 32-bit virtual address for some of these different page sizes.

Figure 4 The R4000 divides 32-bit virtual addresses for different page sizes.

When running on an R4000, Windows NT sets the page size to 4KB. While there is no hardware support for the page-directory and page-table structures of the Intel x86 processors, the Windows NT VMM takes the 20-bit virtual page number value and parses it into two 10-bit values. It then addresses physical pages using the same hierarchical page table structure as in the Intel implementation.

Fortunately, you won’t have to spend too much time worrying about these low-level details. They’re all handled for you by the operating system software. Your 32-bit Windows-based applications should work equally well on both.

Setting aside differences in implementation, the role of paging hardware can be simply stated: it gives software access to memory. It provides the low-level connection between a virtual address and a physical address. Giving a virtual address physical significance is strictly the role of the operating system.

The VMM

The software responsible for managing the paging process resides in the Windows NT Executive—the VMM. The primary goal of the VMM is to make physical memory available to processes. When the demand for physical memory exceeds the supply, the VMM writes one or more pages to disk. When these pages are needed again, the memory management hardware notices that a desired page is missing and issues a page fault.

The VMM can respond to a page fault in several ways. Pages get read in from a swap file on disk or from a memory-mapped file. A page fault may also result in the allocation of zero-filled pages, or in RAM-resident standby or modified pages getting returned to their original owner. As you can see, the response to a page fault can be more involved than simply fetching pages from a swap file. In a moment, I’ll describe some of these other responses. First I need to discuss a key memory management entity: the process.

In Windows NT, a process is the unit of ownership. The most important thing that a process owns is a private address space. From the Executive’s point of view, a process also owns objects like file objects, thread objects, security objects, timer objects, and so on. To the Win32 subsystem, a process owns GUI objects like windows, menus, and bitmaps. When a process terminates, the Executive and the Win32 subsystem purge the process’s leftover—that is, undeleted—objects.

Every Windows NT process gets a private address space. In hardware memory management terms, that means that each process has its own page directory, its own page table(s), and its own set of pages filled with code or data. 32-bit addresses allow each Windows NT process access to a 4GB address space. Figure 5 shows how the lower 2GB is reserved for applications, while the upper 2GB portion is reserved for the Executive. In addition, two 64KB regions are hardwired as “no man’s land” at the top and bottom of the application address space to help catch stray null and out-of-bounds pointers.

Figure 5 Process's Virtual Address Space

As you may know, Windows NT will be available in multiprocessor configurations. For example, during his Spring 1992 Windows World keynote speech, Bill Gates demonstrated a multiprocessor system from NCR in which eight 50MHz 486 processors were simultaneously used by Windows NT. On such a system, a process can have a different thread running on each processor to allow work to be performed concurrently. From a memory management point of view, such systems require that all memory be visible to each individual processor. Also, there has to be a mechanism for synchronizing the page caches in different processors.

An operating system’s design goals are sometimes referred to as its policy. The Windows NT memory manager’s policy is to spread out CPU utilization over time. One way this is achieved is by allocating physical pages only when they are accessed, not when they are requested. The result is that the performance penalty of allocating 5 bytes of memory isn’t very different from that of allocating 5MB. Asynchronous threads work during system idle time to amortize the performance cost of freeing, swapping, and paging memory. Each of these is part of the Executive’s paging policy.

Paging Policy

There are two aspects to the Windows NT paging policy, which I think of as “creative procrastination” and “saving for a rainy day.” The procrastination aspect has to do with allocation. Windows NT uses demand-driven paging to put off work until just before needed. Saving for a rainy day sort of describes how background threads manage the systemwide page pool, attempting to achieve a reasonable balance between free and used memory pages.

The allocation process can be described as creative procrastination since the VMM postpones much of the work until a page of memory is actually accessed. This contrasts sharply with the way MS-DOS and Windows allocate memory. In these environments, you are either provided with physical memory or you aren’t. Even Windows enhanced mode, with its virtual memory support, doesn’t play any games but rather gives you the memory you ask for at the moment you ask for it.

Windows NT, on the other hand, doesn’t respond to an allocation request by allocating pages. Instead, it keeps track of allocation requests in a set of data structures called virtual address descriptors (VAD). The system allocates a fairly small data structure (8 dwords or 32 bytes) in the VAD for each allocation request, which can be as small as 4KB (1 page) or as large as 4GB (1024K pages). When memory is actually accessed, physical pages get assigned and page table entries are created. The VMM only allocates physical pages for the actual address that caused the page fault, plus (since most memory accesses are grouped quite closely together) a few pages for neighboring virtual page addresses. This allocation scheme results in fast allocation, even when an application allocates large, sparsely accessed data objects.

The VMM saves for a rainy day by stealing pages away from processes. It does this slowly, in a way that minimizes the impact of the stolen pages on currently executing processes. Stolen pages provide a reserve in case a new process starts up. It also ensures that a sudden demand for pages—a run on the bank—doesn’t cause a systemwide performance hit. Also, the memory requirements for most processes are greatest during startup. The page-stealing mechanism lets the system take this behavior into account. The portion of the VMM that steals pages is called the working set manager.

Another job of the working set manager is to ensure that physical pages are distributed fairly in the system. It does this by enforcing paging on a per-process basis. In other words, if a process forces pages to be swapped, the VMM will swap pages from that process itself rather than from another process. Other operating systems enforce paging against the entire system, but such an approach risks giving pages to greedy processes and starving other processes. This helps ensure, for example, that when a user switches between programs A and B, perhaps by clicking the mouse on window B, that the pages required for program B will already be present in physical memory so no access to the swap file is necessary.

There is a risk, of course, that paging on a per-process basis will allow dormant processes to occupy physical pages that would otherwise be available to more industrious processes. There are two solutions: smart users and a smart memory manager. When performance is an issue, smart users close down unneeded applications. But not all users are smart (and not all processes can be shut down by the user), so the working set manager tries to be smart about balancing the working set in each process.

The working set of each process is slowly adjusted based on the processor time used by the process. Hard-working processes get rewarded with more access to physical memory. Less active processes are given vacation time in the swap file. Adjustments to the working set don’t occur quickly, however, but appear over a matter of hours.

Running on a background thread, the working set manager maintains two types of lists: a working set list for each process and lists describing the available memory in the system. These lists help the memory manager balance the requirements of individual processes against the usage patterns of particular users.

Each process has a working set list, which is constantly updated to identify the least recently used pages. These pages get swapped out first. Physical pages are allocated to each process on a quota system. When a process asks for more pages, a check is made against the quota. If the process exceeds the quota, the new pages are swapped for old pages. As I mentioned earlier, however, industrious processes get larger quotas over time while less active processes get less physical memory.

The working set manager also maintains four physical page frame lists to spread out the cost of allocating and freeing physical page frames (see Figure 6). The lists contain pages that are modified, on standby, are free, and zeroed.

Figure 6 Page Frame Lists

List Name

Description

Modified	"Stolen" pages that are still in RAM and have not been written to the swap file. The corresponding entry in the page table is set to invalid.
Standby	"Stolen" pages that have been written to the swap file, but are still in RAM.
Free	Unowned, uninitialized pages.
Zeroed	Unowned, initialized pages.

Over time, the working set manager steals the least recently used physical pages from processes and adds them to the modified list. The corresponding entry in the page table entry is marked as invalid. If a process accesses such a page, a page fault occurs. The memory manager removes the page from the modified list and restores it to the process’s page table. Since the page is already present in memory, no swap file access is required. These page faults are handled quite quickly.

The working set manager has a helper, the modified page writer. As the modified list grows, the modified page writer starts copying pages to the swap file. It transfers these pages from the modified list to the standby list. Like pages on the modified list, pages on the standby list are returned to the owning process if a thread in the process accesses the page.

When a process frees memory, or when the system reclaims the physical pages it is using, the associated pages are added to the free list. This represents memory that is available to be reused, but whose contents haven’t been erased. To prevent illegal access to data, memory is cleared before it can be released for use by another process. Adding pages to the free list is fast. By postponing the initialization process, the system avoids a performance hit when a process frees memory. Pages on the free list are eventually cleared with zeros and placed on the zeroed list.

When a process needs physical memory, the VMM first takes pages off the zeroed list. Once this supply is exhausted, it takes pages from the free list, after initializing the memory to zeros. When both the zeroed list and free list are empty, the memory manager takes pages from the standby list. Only as a last resort does the VMM take pages from the modified list. Since the modified list contains unwritten process data, the contents must first be written to the swap file, then the memory zero-initialized. Only then can it be passed on to the requester.

In a low memory situation, the working set manager pays less attention to processor use and more attention to maintaining a minimum set of available pages. For example, when the zeroed, free, and standby lists together have fewer than 20 pages, the working set manager steps in to grab least recently used pages from processes to refill these lists. Or if the modified list has more than 30 or so pages, the modified page writer is called to save pages to the swap file and fill up the standby list.

You can see that the VMM in the Windows NT Executive balances application demands with the need to maintain a reserve of system resources. It procrastinates in the allocation process to avoid unnecessary work, but then borrows back physical pages in a slow, reversible manner to minimize the performance hit that might otherwise occur because of its procrastination.

Although the Executive supports the operating system, application programs don’t call the Executive directly. Instead, they call subsystem processes. The division of the system into a privileged Executive and nonprivileged subsystems allows a single operating system to support multiple programming interfaces. It also makes the overall system more stable, because the API subsystems are not given privileged access but rather run as user-modeprocesses.

Memory and the Win32 Subsystem

The Win32 subsystem is the portion of Windows NT that directly supports the 32-bit Windows API and indirectly supports the 16-bit Windows API of Windows 3.x. Architecturally, Windows NT subsystems run as separate processes, each with its own address space. In fact, part of the Win32 subsystem support is provided by DLLs that reside in the application’s address space. To provide a complete picture of the memory management that goes on in Windows NT, I’m going to delve a bit into the way that a subsystem interacts with a client process.

The 32-bit Windows API is full of old friends. Since it’s a superset of the Win16 API, programmers who’ve worked with Windows 3.x will find familiar function names, function parameters, data structures, and symbolic constants. Although 16-bit parameters get widened to 32 bits, this difference is hidden by include file definitions.

The most significant difference between the two APIs is the new set of functions that provide functionality never included in Windows 3.x. Quite a few of these new capabilities, such as support for multithreaded programming, memory-mapped file I/O, semaphores, and security objects, are supplied by the Executive. Other new APIs reflect enhancements to the 32-bit versions of USER and GDI.

Figure 7 shows the relationship between the address space of a Win32-based application and the Win32 subsystem. The application program and the subsystem run as separate processes. This means that each has its own address space. The subsystem is protected from a misbehaved application in the same way that the Executive is protected from a misbehaved subsystem.

Figure 7 Client/server Relationship of Win32 DLLs

When a Windows NT process is created, an executable file is mapped into its address space. The memory-mapped file mechanism in the Executive ties the address space of the executable image into the VMM. When a memory-mapped file is opened, a virtual address descriptor is created but no bytes are actually read from disk into RAM. As with dynamically allocated memory, physical pages are allocated only when an attempt is made to access a particular location. This causes the hardware to issue a page fault. The VMM responds by reading in a few pages of the memory-mapped file—the page containing the offending bytes and a few neighboring pages.

In Figure 7, the Win32-based application’s executable file is named APP.EXE. The Win32 subsystem (at least at the time of writing) lives in the file CSRSS.EXE. This program’s primary role is to start the Win32-based process, then to pass control to the three DLLs that control the subsystem.

In a Windows NT process, DLLs are mapped into the upper half of the process’s address space. On the application or client side, there are three key DLLs: BASE.DLL, GDI.DLL, and USER.DLL. These represent core operating system functionality: graphics output, and the user interface. On the subsystem or server side, three DLLs receive and process requests sent from the client: BASESRV.DLL, GDISRV.DLL, and USERSRV.DLL.

These three DLLs correspond quite exactly to the three core DLLs of Windows 3.x. An important difference is that the Executive, running with supervisory privileges, provides the services associated with the Windows 3.x kernel. The two DLLs in the client and server processes, BASE.DLL and BASESRV.DLL (which correspond with KERNEL.DLL), in many cases provide only a thin layer between the client application and the Executive.

The client/server model depicted in Figure 7 suggests a very clean division of labor: the Win32-based application asks for services, which the Win32-based subsystem supplies, perhaps with a little help from the Executive. In fact, the implementation details are quite a bit more complex. The reason? Performance.

As I have mentioned, one benefit of placing a subsystem in a separate process is security. But it takes time to perform a context switch between two processes. Taking this time for each system service call would make the system too slow. Instead, the interface has been fine-tuned to eliminate context switches.

In some cases, API calls don’t pass control to the Win32 subsystem but instead run in the application’s context. This is the case when the Executive provides a service such as opening a file, starting a thread, or setting a semaphore.

In other cases, calls get queued by the client DLLs and executed in a batch. These calls are stored in a 64KB memory area called a client-server stack. One client-server stack is allocated for each client thread that makes USER or GDI calls. On the server side, a dedicated thread is allocated for each stack to manage the server side of the queue. One type of call that gets queued are ones that change the state of a display context. Since such changes don’t appear until a drawing call is made, changes to DC attributes are always queued. At the time of a GDI drawing call, only the specific drawing attributes required by the call are sent to the server.

A key issue for programmers is the allocation and cleanup of GDI and USER objects. As you may know, in Windows 3.x, quite a few GDI and USER objects are allocated from the GDI or USER local heaps. In Windows 3.x, the limited size of these heaps is often the factor that prevents additional programs from running. So how is this problem addressed in Windows NT?

Windows NT does a much better job than Windows 3.x of using available memory. To a great extent, this simply reflects the fact that local heaps in Windows NT can be larger than 64KB. After all, GDI and USER allocate heap objects with the same routine that normal applications do: LocalAlloc.

In Windows NT, GDI and USER are also better at cleaning up objects that exiting applications leave behind. Whenever USER objects like windows are created, the associated data is tagged with the process ID. The same is true for GDI objects like bitmaps and pens, although GDI objects themselves are kept in a separate list from USER objects.

Even though a subsystem is just another process, it does have a few privileges. One involves the right to be told when an associated process gets terminated. At such times, the USER and GDI portions of the Win32 subsystem walk their object lists and free every single object owned by recently deceased processes.

One big difference in the Win32 API is that the GMEM_DDESHARE flag has been removed from the memory allocation routines. A problem with this flag is that a given object is supposed to be mapped to the same address for every process. This is easily done when all applications share a common address space, as in Windows 3.x. But when each application has a private address space, there is no way to guarantee that you’re going to be able to map GMEM_DDESHARE memory on every different process in the system. This limitation caused the Windows NT development team to ditch this flag. In its place, applications have other mechanisms that can be used to share a single block of memory between two processes, including named and unnamed shared memory and memory-mapped files.

Win32 Application Memory

Let’s consider the memory management issues a Win32-based application must face. Figure 8 compares the memory available in the Win16 and the Win32 APIs. In a Win32-based program, the stack and static data correspond to local and global variables. There isn’t anything special in either Win32 or Win16 to handle these variables, since the compiler and linker work together to make them available. Any discussion of an application’s available memory would be incomplete without giving at least a passing reference to these two types of memory.

Figure 8 Types of Memory in Win16 and Win32

Type of Memory

Win16 API

Win32 API

Stack	Automatic variables	Automatic variables
Static	Static variables	Static variables
Local heap	LocalAlloc	LocalAlloc
Global heap	GlobalAlloc	GlobalAlloc
Private heaps	—	HeapCreate / HeapAlloc
Virtual memory	—	VirtualAlloc
Window extra bytes	GetWindowWord	GetWindowWord
Class extra bytes	GetClassWord	GetClassWord
Memory-mapped files	—	CreateFileMapping / MapViewOfFile

Both Win16 and Win32 supply local and global heap allocation routines: LocalAlloc plus related functions and GlobalAlloc plus related functions. To write code that can compile for either environment, you should stick tothese routines.

Win32 introduces the ability to create multiple private heaps. In Win16, without some fancy footwork, you’re limited to one heap per application or DLL. With Win32, you create as many heaps as you need.

Another new feature in the Win32 API is direct control of virtual memory allocation. By calling VirtualAlloc and its related functions, you get control over each step of the allocation process. For example, this function makes a distinction between reserving a range of virtual addresses (with the MEM_RESERVE flag) and committing space in the swap file (with the MEM_COMMIT flag).

When you reserve a range of virtual addresses by calling VirtualAlloc with the MEM_RESERVE flag, a VAD (virtual address descriptor) is created. As you may recall, this data structure describes an allocation request to the VMM. No corresponding pages, page table entries, or swap file space is allocated. Instead, it reserves a contiguous set of addresses. If you attempt to access a location in a range of memory that has been reserved but not committed, an exception error results.

After a range of addresses has been reserved, you can commit as many or as few pages as you want by calling VirtualAlloc again. This time, however, the MEM_COMMIT flag tells the function that you wish to make memory accessible. This call forces the VMM to update the state of the virtual address descriptors to allow memory in the specified range to be accessible. It also makes sure that there is enough room in the swap file to handle the pages you’ve requested. In keeping with the procrastinating style of the VMM, no physical pages are actually allocated until the specified memory is accessed.

The Win32 API supports the same types of window and class extra bytes as in Win16. One word of caution: to write portable code, avoid hardcoding the sizes of data types when you allocate window and class extra bytes. Instead, use your C compiler’s sizeof statement to adjust for the changes between the 16- and 32-bit APIs.

The last type of memory described in Figure 8 is memory-mapped files. This type of memory combines a file access method and virtual memory management. Memory-mapped files allow an application to associate a range of addresses with a file. No physical pages of memory are allocated until access is attempted in the specified address range. Then the VMM rushes to the file system to bring into memory the specific page and some neighboring pages needed to complete the access. As described earlier, memory-mapped files map executable file images into RAM.

Conclusion

There is much in Windows NT that is familiar but much that is quite new. The Windows NT Executive is entirely new. It’s a multitasking system that may incorporate multiple processors and multiple programming interfaces. The Executive’s VMM goes to great lengths to avoid doing work that doesn’t need to be done, while at the same time keeping itself ready to respond to a need for memory with a minimum of overhead.

The Win32 subsystem offers a new, more secure way to access an API that is wider but mostly familiar. New functions tap the more sophisticated features of the NT Executive and broaden the user-interface and graphic abilities of Win32-based applications.

References and further reading:

Kane, Gerry and Heinrich, Joe. MIPS RISC Architecture, Prentice Hall, New Jersey, 1992.

Intel Corporation. 386 DX Programmer's Reference Manual, 1989.

1For ease of reading, “Windows” refers to the Microsoft Windows operating systems. “Windows” is a trademark that refers only to this Microsoft product.