Bob Chiverton
Bob Chiverton is president of Chiverton Graphics, Inc., which provides Windows and PM consulting and contract programming services. He can be reached via his CompuServe address: 72510,1655.
{ewc navigate.dll, ewbutton, /Bcodeview /T"Click to open or copy the code samples from this article." /C"samples_1}
It’s a dangerous cave--dark, damp, and unmarked, and I’m in trouble. Propelled by curiosity, I’ve trapped myself in a long confining crawlway nicknamed the Gunbarrel. And to make matters worse, the carbonite in my miner’s helmet lamp is rapidly running out of methane. In minutes, I’ll be without light. I am already without my partner, who was unable to squeeze into this narrow tunnel. So here I am, alone, without guidance.
That feeling sometimes surfaces when I explore the uncharted caves in WindowsÔ--the undocumented data structures and functions used by the MicrosoftÒ Windows1 graphical environment, but unavailable to application programmers.
There’s been a long-standing debate among Windows programmers over these internals. Programmers want to know everything, even if they can’t use everything.
Personally, I’d like to see Microsoft document all internal data structures and function calls, and let us worry about the dangers incurred from using them. Why keep us in the dark? It’s not as if we’re asking for the Windows source code (maybe next year). But Microsoft has been reluctant to open up Windows that far. Doing so might lock them into supporting features they never intended to be permanent. At least, that’s how I understand it.
For as long as I’ve been a Windows programmer, I’ve wanted a tool to monitor my application’s local heap and see how much stack was really used. The HeapWalker utility supplied with the Microsoft Windows Software Development Kit (SDK) wasn’t much help. Sure, it can display any data segment in a pop-up window, but only in hex format. How easy is it to glean stack and heap information from that? Not very, especially since the data structures were undocumented.
So I wrote HeapPeep, a Windows utility that graphically displays the default data segment of any Windows application (see Figure 1). Twice a second, HeapPeep walks the default data segment of the active window (the window whose caption bar or dialog frame is highlighted) and displays a graphic depicting the data segment’s different parts. This makes it easy to monitor an application’s local heap and stack, and see how much space is used for static data.
Figure 1 HeapPeep
It’s important to make your stack as small as possible--but no smaller. Too small and it overflows, crashing your program. Too large and it wastes precious space in your default data segment.
Before HeapPeep existed, finding the optimum stack size meant attending the "crash and burn" school of design. That’s where you keep lowering the STACKSIZE value in the module definition (DEF) file until the stack overflows, crashing your app. Silly way to design, I know, but what alternative was there?
Well, now you have one. HeapPeep shows you how large your stack is and how much has actually been used. Monitoring your local heap is useful too. Once I wrote a program that consistently ran for 30 minutes and then crashed. HeapPeep revealed the problem immediately. Turns out I wasn’t matching LocalFrees with LocalAllocs, so after half an hour my heap’s free memory pool was exhausted.
Over the years, HeapPeep has proved extremely useful in both design and debugging. But there was a problem. Although HeapPeep was my baby, it was an illegitimate child. It relied on undocumented Windows internals, so I couldn’t guarantee its accuracy. Another problem was keeping up with new Windows releases. Microsoft has always cautioned programmers about using undocumented features, warning that what works now may not work in future releases. So I worried that a new Windows release would break HeapPeep beyond repair.
That’s why I was happy to discover the ToolHelp API. Part of the Windows 3.1 beta III release, ToolHelp is a collection of functions and data structures that make it easier to get inside Windows (see Figure 2). It’s implemented as TOOLHELP.DLL. This DLL will be included in the retail release of Windows 3.1 in 1992. Developer support for using ToolHelp (TOOLHELP.H, and so on) will be available in 1992 as part of the Windows 3.1 SDK.
Using ToolHelp functions, HeapPeep went from renegade to respectable. Today, there are two versions of HeapPeep--one that uses ToolHelp functions and one that doesn’t. If you have the Windows 3.1 beta III release, use the version of HeapPeep that utilizes TOOLHELP.DLL. Otherwise use the non-ToolHelp version with Windows 3.0. I’ll discuss ToolHelp briefly below. (Look for more information on ToolHelp in future issues of MSJ--Ed.) But first, a review of default data segments is in order, since that’s what HeapPeep peeps at.
Figure 2 ToolHelp Structures and Functions
Functions |
ClassFirst | Gets information about the first class in the Windows class list. |
ClassNext | Used to continue a walk through the Windows class list started by ClassFirst. |
GlobalEntryHandle | Used to begin a global heap walk at a specific memory block. An application can continue to walk through the global heap with the GlobalNext function. |
GlobalEntryModule | Used to begin a global heap walk at a specific module. An application can continue to walk through the global heap with the GlobalNext function. |
GlobalFirst | Used to begin a global heap walk. An application can examine subsequent blocks in the global heap by using the GlobalNext function. |
GlobalInfo | Used to determine how much memory to allocate for a global heap walk. |
GlobalNext | Used to continue a global heap walk started by GlobalFirst, GlobalEntryHandle, or GlobalEntryModule. |
InterruptRegister | Installs a callback function to handle system interrupts. |
InterruptUnRegister | Uninstalls a callback function for system interrupts. |
LocalFirst | Used to begin a local heap walk. An application can examine subsequent blocks in the local heap by using the LocalNext function. |
LocalInfo | Used to determine how much memory to allocate for a local heap walk. |
LocalNext | Used to continue a local heap walk started by the LocalFirst function. |
LockInput | Locks and unlocks input to tasks. |
MemManInfo | Gets status and performance information about the memory manager. |
MemoryRead and | These functions were designed to enable an application to set breakpoints, |
MemoryWrite | disassemble code, or otherwise modify memory in a task. |
ModuleFindHandle | Used to begin a walk through the list of all currently loaded modules. An application can examine subsequent items in the module list by using the ModuleNext function. |
ModuleFindName | Used to begin a walk through the list of all currently loaded modules. An application can examine subsequent items in the module list by using the ModuleNext function. |
ModuleFirst | Used to begin a walk through the list of all currently loaded modules. An application can examine subsequent items in the module list by using the ModuleNext function. |
ModuleNext | Used to continue a walk through the list of all currently loaded modules. The walk must have been started by ModuleFirst, ModuleFindName, or ModuleFindHandle. |
NotifyRegister | Installs a notification callback function for a task. |
NotifyUnRegister | Uninstalls the notification callback function. |
QuerySendMessage | Determines if a message sent by SendMessage originated from within the current task. |
StackTraceCSIPFirst | Used to begin a stack trace of the current task. An application must use StackTraceFirst to begin a stack trace of any other task. The application can continue to trace through the stack with the StackTraceNext function. |
StackTraceFirst | Used to begin a stack trace of any task except the current task. The StackTraceCSIPFirst function must be used to begin a stack trace of the current task. An application can continue to trace through the stack with the StackTraceNext function. |
StackTraceNext | Used to continue a stack trace started by StackTraceFirst or StackTraceCSIPFirst. |
SystemHeapInfo | Gets information that describes the system heaps. |
TaskFindHandle | Used to begin a walk through the task queue. An application can examine subsequent entries in the task queue by using the TaskNext function. |
TaskFirst | Used to begin a walk through the task queue. An application can examine subsequent entries in the task queue by using the TaskNext function. |
TaskGetCSIP | Returns the next CS:IP value of a task. This function is useful for applications that need to know where a sleeping task will begin execution upon awakening. |
TaskNext | Used to continue a walk through the task queue. The walk must have been started by TaskFirst or TaskFindHandle. |
TaskSetCSIP | Sets the CS:IP of the sleeping task. When the task is yielded to, it will begin execution at the address specified. |
TaskSwitch | Switches to the task specified. The task begins execution at the specified address. |
TerminateApp | Cleans up the interrupt and notification callback functions and then terminates the application as if hTask had produced a UAE. Only debugger applications should use this function. |
TimerCount | Gets information about the execution times of the current task and VM (Virtual Machine). |
Structures |
CLASSENTRY | Contains the name of a Windows class and a near pointer to the next class in the Windows class list. |
GLOBALENTRY | Contains information about a block of memory on the global heap. |
GLOBALINFO | Contains information about the global heap. |
LOCALENTRY | Contains information about a block of memory on the local heap. |
LOCALINFO | Contains information about the local heap. |
MEMMANINFO | Contains information about the status and performance of the virtual memory manager. |
MODULEENTRY | Contains information about one module in the module list. |
NFYSTARTDLL | Contains information about the dynamic link library being loaded when the kernel sends a load DLL notification. |
NFYLOADSEG | Contains information about the segment being loaded when the kernel sends a load segment notification. |
NFYRIP | Contains information about the system when a RIP occurs. |
STACKTRACEENTRY | Contains information about one stack frame. This enables an application to trace back through a task’s stack. |
SYSHEAPINFO | Contains information about the system heaps. |
TASKENTRY | Contains information about one task. |
TIMERINFO | Contains the elapsed time since the current task awakened and since the VM (Virtual Machine) started execution. |
Every Windows program, regardless of memory model, gets a default data segment (also called an automatic data segment). The four major parts are the task header, static data section, stack, and local heap (see Figure 3).
The first 16 bytes of the default data segment are called the task header (also called the NULL segment). Windows keeps near pointers to the stack, local atom table, and local heap here.
The three stack pointers, pStackBottom, pStackMin, and pStackTop, are initialized by the Windows startup routine _ _astart. This is the true entry point to a Windows application, and is called before WinMain. (It’s informative to step through _ _astart in the Microsoft CodeViewÒ for Windows debugger and see the task header get initialized. To do this, open a memory window and a source window. Set the source window display mode to Mixed Source & Assembly. Then single-step through your app and watch the first 16 bytes of the default data segment change in the memory window.)
The pStackTop and pStackBottom values determine the stack size and its location within the default data segment. They don’t change once initialized since the stack size is fixed and there’s no way to resize it at run time. The other stack pointer, pStackMin, is the stack’s high-water mark. Unlike the other two stack pointers, it changes during run time, indicating how much of the stack is actually used.
The local heap pointer, pLocalHeap, also gets initialized at this time, but the atom table pointer, pAtomTable, does not. That occurs when your application explicitly or implicitly calls the InitAtomTable function.
The startup routine has other responsibilities, too. It calls the Windows C run-time library startup code, which in turn initializes its own public and private variables, which are stored--surprise--in the static data area of the default data segment.
In one sense, the static data area is like a private club with two types of members, your variables and the Windows C run-time library’s variables. Now let’s take the club tour.
All the variables in your Windows program declared static or extern, explicitly or implicitly, reside in the static data area of the default data segment. For example, in the following code fragment,
static int i;
int j;
void foo ( )
{
static int k;
int z:
o
o
o
}
variables i, j (extern implied), and k each occupy 2 bytes of space in the static data area. In contrast, z is a stack variable (auto implied) and occupies 2 bytes on the stack only when function foo is executing.
The name "static data area" is a bit misleading because it implies only static data goes there. Since extern data and constants are stored there too, the name "static, extern, and constant data area" would be more accurate.
Actually, not all static or extern variables reside there. You can put them in another data segment with a statement like this:
int _far m;
The _far keyword directs the Microsoft C compiler version 6.0 to put m in a separate data segment. You probably don’t want to do that because it forces your application into the large memory model, undesirable for a Windows application. The _huge and _based keywords can also do this, so be careful. Of course, far pointers are OK.
int _far * m;
Now m is a far pointer and resides in the default data segment, not an int residing in a far data segment.
String constants (not to be confused with string resources) are stored in the default data segment, too. In this fragment,
char _far * lpstr = "I am stored in the default
data segment";
char * npstr = "I am stored there, too";
void foo2 ( )
{
char * npstr2 = "I am stored there, too";
o
o
o
}
all three "I am. . ." strings are stored in the default data segment. Some C compilers, like BorlandÒ C++ version 3.0, give you the option of conserving space by eliminating duplicate string constants. This means npstr and npstr2 point to the same address in the data segment, to a single copy of "I am stored there, too." Unfortunately, this feature is lacking in C 6.0, so twice as much space is taken up and npstr and npstr2 point to separate copies.
By the way, did red flags start waving when you saw how lpstr was initialized? For years, we Windows programmers have had it pounded into us that initializing far pointers to near data (that is, data in the default data segment) is a sin. That’s because the data segment could move any time, invalidating the segment portion of the segment:offset far address. But if your application runs only in protected mode, you can stop worrying about that. Far addresses in protected mode use a selector:offset convention; selectors don’t change when segments move.
Once, I tried to eliminate the static data area by eliminating all my global and static variables and string constants. But that didn’t work because I forgot to consider the other members of the club--the C run-time variables.
Almost all Windows programs written in C are linked with a C run-time library. You can see some of the variables the C run-time library places in your default data segment by looking in the MAP file produced by the linker. Not all the C run-time variables are listed, just the public ones, like these three:
AddressPublics by Value
xxxx:yyyy_ _argc
xxxx:xxxx_ _argv
o
o
o
xxxx:yyyy_environ
Recognize them? They’re three of the better-known C run-time variables. The first two, _ _argc and _ _argv, are used to access your program’s command-line arguments. They get initialized by the _setargv routine, which also copies the command-line arguments (if any) to a reserved location in the static data area, just below the stack. The _setargv function is supplied by the C run-time library and called by _ _astart.
Now, if your application doesn’t use command-line arguments, you can save some code space in your executable by writing your own do-nothing version of _setargv. It must reside in the same file as WinMain and look something like this:
void _setargv {};
Substituting this dummy routine cuts down on your application’s code size but doesn’t remove the _ _argc or _ _argv variables from the static data area. And although it prevents command-line arguments from being copied to the data segment, it won’t reduce the size of the data segment. Space for these arguments was already allocated by the C run-time library at (the library’s) compile time.
The third variable, _environ, is a pointer to the environment table. The environment table resides in your default data segment and is a copy of the MS-DOS operating system environment strings (like the PATH string). This table is accessed via _environ, which is initialized by the _setenvp routine. And just like _setargv, _setenvp is supplied by the C run-time library and called by _ _astart.
If your program doesn’t access environment strings, you can reduce your code size a bit more by substituting a do-nothing routine for _setenvp:
void _setenvp {};
Again, this will not remove the _environ variable from the static data area. However, it will reduce the size of your default data segment, because the environment table is not stored in the static data area like the command-line arguments. It is stored in the local heap at run time. By using a dummy _setenvp, you prevent the real _setenvp routine from setting up an environment table in your local heap.
You don’t use any C run-time library functions? Then evict all the C run-time library variables. This reduces the size of your static data area, as long as a few conditions are true. First, your application doesn’t explicitly call any C run-time library routines. Second, your application doesn’t use the _argc or _argv command-line arguments or the environ variable. Finally, your application doesn’t implicitly call any C run-time library routines. This means you can’t use stack probes or perform long division.
You eliminate the code and data overhead of the C run- time library from your application by replacing it with the xnocrt.lib (x denotes memory model) supplied with the Windows 3.x SDK. Your link line would then look something like this:
link test.obj, test.exe /align:16,,libw snocrt.lib, test.def
Finally, before leaving the static data area, consider a little-known Windows feature--the private handle table.
The private handle table is a mystery to many Windows programmers. Although it was available in Windows 2.x, it wasn’t documented in the SDK until the 3.0 release.
Chances are, if you haven’t used one by now, you never will, because it will probably be dropped in Windows 3.1. Well, not exactly. The DefineHandleTable function you use to register a private handle table will still be around, it just won’t do anything. Private handle tables were useful only in real mode, which will not be supported in Windows 3.1.
Essentially, the private handle table is a static array of words, stored in the static data area of your default data segment. The array holds global memory segment addresses that (real mode) Windows will update whenever a segment moves. A private handle table improves performance in real mode by avoiding the time spent locking and unlocking global memory handles. Instead of having to lock a handle to get a far address, you retrieve the address directly from the table.
Anyway, something downright perverse occurs if you’ve set the second element in the handle table incorrectly--Windows trashes your default data segment. Here’s why.
Periodically (about 4.5 times per second), Windows zeros-out part of the table. The second element informs Windows how much to zero-out. If it’s mistakenly assigned an amount larger than your table, Windows obediently zeros-out words beyond the end of the table.
This sort of resembles a stack overflow in reverse--overflowing the handle table towards the end of the default data segment. Stack overflows go the other way, towards the beginning of the segment. Either one is catastrophic, but as you’ll see, there’s a protection mechanism for stack overflows.
Earlier you saw three stack pointers Windows uses to define the stack top, stack bottom, and stack high-water mark. They delimit the stack’s maximum and dynamic sizes. The top and bottom pointers, represented by pStackTop and pStackBottom, are near pointers, which means they are offsets relative to the beginning of the default data segment. Once initialized, their values don’t change. Since the stack grows "down" (higher to lower address), you compute the maximum stack size like this:
maximum stack size (in bytes) = pStackBottom - pStackTop
The maximum stack size corresponds to (but doesn’t necessarily equal) the value you specified in the DEF file with the STACKSIZE statement. Windows enforces a 5KB minimum. If you omit the STACKSIZE statement, the resource compiler warns you but gives you a 5KB stack anyway.
The other pointer, pStackMin, is the stack high-water mark. It equals the lowest address used in the stack, and corresponds to the largest dynamic stack size attained by your program:
largest dynamic stack size (in bytes) = pStackBottom pStackMin
Windows updates pStackMin when the application yields (typically when GetMessage is called). But that’s like photographing a cliff diver in mid-dive. It’s unlikely the photo will show the diver hitting water, just as it’s unlikely pStackMin will be at its lowest point ever when a task yields. Quite frankly, I don’t know why Windows bothers to update pStackMin at this time. The proper way is with stack probes.
Your stack overflows when its dynamic size exceeds its maximum size. A stack probe is a C run-time function that checks for stack overflows and accurately updates pStackMin.
Many Windows programmers are familiar with stack probes but have never seen one explicitly called. That’s because the C compiler hides them in your function prolog code. To see how the C 6.0 compiler embeds them, I compiled the following function with the /Fc option.
long FAR PASCAL WndProc (HWND hwnd, WORD wMsg,
WORD wParam, LONG lParam)
{
return DefWindowProc (hwnd, wMsg, wParam, lParam);
}
This option generates a mixed C and assembly-language code listing. I compiled WndProc twice, omitting and including stack probes (see Figures 4 and 5).
Figure 4 WndProc Compiled without Stack Probes
PUBLIC WNDPROC
WNDPROC PROC FAR
push ds
pop ax
nop
inc bp
push bp
mov bp,sp
push ds
mov ds,ax
sub sp,0
push di
push si
o
o
o
Figure 5 WndProc Compiled with Stack Probes
PUBLIC WNDPROC
WNDPROC PROC FAR
push ds
pop ax
nop
inc bp
push bp
mov bp,sp
push ds
mov ds,ax
mov ax,0
call __aNchkstk
push di
push si
o
o
o
The compiler embedded a stack probe named _ _aNchkstk in the prolog code. This is a good place to check for a stack overflow, because _ _aNchkstk "knows" how many bytes are needed for the current stack frame and how much space is available. If there’s not enough room, _ _aNchkstk writes a stack overflow error message to the AUX device and notifies Windows, which displays a sysmodal message box allowing you to close the offending application gracefully.
Removing stack probes won’t make your default data segment any smaller. But it will make your code smaller and your program faster. You must weigh those benefits against what could happen when stack probes aren’t present.
Stack overflows are disastrous because they overwrite your static data area, and possibly your task header. The insidious thing is that your program doesn’t have to fail at the point the overflow occurs. The crash occurs when an overwritten variable is used. This can make stack overflows hard to pin down. Stack probes solve this by catching the overflow at the point it occurs, so I use them in all phases of program development. I usually remove them in final production code, though.
The local heap is a linked list data structure maintained at the tail end of your application’s default data segment. Unlike the global heap, the local heap is not shared with other tasks. However, that doesn’t mean your application has exclusive use of it. Windows uses it too for your application’s edit controls, local atom table, and the environment table.
Actually, an application isn’t required to have a local heap. Just be sure you don’t use edit controls, local atoms, or the environment table, else your application will crash when Windows can’t find a local heap.
Two ways to prevent the environment table from being loaded were outlined above, using a do-nothing _setenvp function and omitting the C run-time library entirely. But neither approach helps with edit controls or the atom table.
By default, edit controls that appear in dialog boxes use global heap memory for storage. You can override that behavior by specifying the DS_LOCALEDIT edit control style. This tells the Dialog Manager to create an edit control that uses the local heap instead of the global heap. If you don’t want your application to have a local heap, don’t use this style. Nor should you use edit controls created explicitly with CreateWindowXX either, as they use the local heap too.
The local atom table is similar to the global atom table, except it’s exclusively for your application’s use. You explicitly initialize the table by calling InitAtomTable. This establishes an empty table in the local heap. If you don’t call InitAtomTable, it will be called implicitly with your first AddAtom call. Either way, your application will fail if a local heap isn’t present.
To omit the local heap, either omit the HEAPSIZE statement in the DEF file or use a HEAPSIZE statement with a value of zero. If you want a local heap, specify a nonzero value.
What’s a good size to specify? One strategy is to specify the smallest amount possible and let Windows enlarge it for you. When a LocalAlloc can’t be satisfied because the heap isn’t big enough, Windows has a variety of ways to satisfy the request, including enlarging the default data segment with a GlobalRealloc. The segment can grow to 64KB, the maximum size of a default data segment under Windows 3.1. Of course, the LocalAlloc will fail if the GlobalRealloc fails.
Another strategy is to make the heap as big or bigger than it needs to be (assuming you know how big that is). Then, if Windows successfully loads the application, you’re guaranteed that LocalAllocs won’t fail (not immediately, anyway) because the local heap was too small.
Microsoft recommends a minimum HEAPSIZE of 250 bytes, although I’ve successfully used less. The maximum value depends on how much space you’ve got left in the default data segment after subtracting the space needed by the task header, static data area, and stack.
Max heap size = 64KB task header size
static data area size stack size
Another way to look at this is as the amount of space that your application can have for static variables. Sometimes you need more than you can fit in the default data segment. If so, you have several choices. You can use a custom resource, or read in the data at run time, storing it in a global data segment via GlobalAlloc. But I don’t like those choices. I prefer using a little-known trick you can do with based variables.
I find the 64KB limit on the default data segment annoying when it means there’s not enough room for all my application’s static data. Then I’m forced to move some data out of there. Fortunately, the C 6.0 compiler offers an outstanding feature to help lighten the default data segment’s load--based variables. Consider this code (in a file called COOLTRIK.C):
int _based(_segname("_CODE")) iBigArray[] =
{1,2,...,25000}; // read only
int do_something_big (int i)
{
return iBigArray [i];
}
The _based(_segname("_CODE")) construct puts iBigArray into a code segment (COOLTRIK_TEXT). All references to iBigArray use the segment/selector in the CS register, rather than the DS register. This means do_something_big will always be able to access the array, since it resides in the same code segment, and the CS register will have the proper segment/selector value for the array.
Putting iBigArray into a code segment instead of a second data segment avoids having Windows treat the application as large-model. But this technique only works for read-only data. You cannot update iBigArray because you aren’t allowed to write into a code segment. Furthermore, in real mode, if you pass the address of iBigArray to another function in another code segment, you must lock down COOLTRIK’s segment first to prevent Windows from moving it and invalidating the segment.
The first version of HeapPeep doesn’t use ToolHelp functions. Instead, it relies on undocumented Windows features and runs under Windows 3.0, in both real and protected mode (the ToolHelp-based version runs only inWindows 3.1, as mentioned).
The source code is in two files, HP1.C and HP2.C, providing an important logical separation as well as a physical one. Everything HeapPeep knows about default data segments is encapsulated in HP2.C in four functions, dds_create, dds_destroy, dds_walk, and dds_paint, and one DEFAULTDATASEGMENT structure, DDS.
The way HeapPeep works is simple. Twice a second, HeapPeep’s client window procedure receives a WM_TIMER message and calls dds_walk. This function analyzes the default data segment of the active window, stores the results in DDS, then calls dds_paint to redraw HeapPeep’s client window.
The heart and soul of HeapPeep is the dds_walk function, which gets the size (in bytes) of various parts of a default data segment and stores the results in DDS.
typedef struct tagDEFAULTDATASEGMENT
{
HANDLE hinstActive; // instance handle of active app
HWND hwndActive, // window handle of active app
hwndClient; // window we draw bar graph in.
WORD wSize, // size (bytes) of Data Segment.
wStaticData, // size (bytes) of static data area
wStackMax, // size (bytes) of stack size
// defined in .DEF
wStackUsed, // size (bytes) of stack actually
// used.
wHeapMoveable, // size (bytes) of heap
// allocation (moveable).
wHeapFixed, // size (bytes) of heap
// allocation (fixed).
wHeapFree, // size (bytes) of free heap space
wOther, // size (bytes) of remaining
// allocated space in DS.
wUnused; // size (bytes) of heap unused.
} DEFAULTDATASEGMENT;
static DEFAULTDATASEGMENT DDS;
The nine WORD members represent sizes of different parts of the data segment. The first thing dds_walk does is initialize them to zero. Then dds_walk gets the window handle of the active window, hwndActive, and uses it in GetWindowWord to extract the instance handle hinstActive.
DDS.hwndActive = GetActiveWindow();
if (DDS.hwndActive != OldDDS.hwndActive)
{
DDS.hinstActive = (HANDLE)
GetWindowWord(DDS.hwndActive, GWW_HINSTANCE);
if (!DDS.hinstActive) return;
}
The instance handle is the handle to the default data segment. Windows developers often wonder if there’s a difference between task handles and instance handles, probably because a Windows task is synonymous with a Windows instance. But instance handles and task handles are two different animals, and Windows creates both for each instance of an application.
The task handle is a handle to a global data segment called the task database (TDB). The TDB contains the instance’s message queue, among other things. The instance handle is a handle to the instance’s default data segment.
Usually every window belonging to a task is associated with the same instance handle. For example, a task that has one top-level window and three child windows gets the same hInstance,
hInstance = GetWindowWord (hwnd, GWW_HINSTANCE);
regardless of which of the four window handles is used as the hwnd parameter. That’s because the same hInstance was used to create each window:
CreateWindow (..., hInstance, ...) ;
But message boxes are different. They’re associated with the USER DLL’s default data segment. MessageBox is a USER function, and it calls CreateWindow, passing in USER’s hInstance, not the task’s hInstance.
So when the active window is a message box, this version of HeapPeep displays USER’s default data segment, not the default data segment of the task that put up the message box. You can confirm this by running HeapWalker and comparing the handle of USER’s data segment with hInstance displayed by HeapPeep.
The ToolHelp-based version of HeapPeep behaves differently. It only uses hInstances of tasks, because it gets the hInstance associated with a task (via a task handle). Since DLLs aren’t tasks, they don’t have task handles, only instance handles. So the ToolHelp-based version will not display default data segments of DLLs. But I’m getting ahead of myself.
HeapPeep gets a far pointer, lpInstance, to the beginning of the default data segment using GlobalLock:
if ( !(lpInstance = GlobalLock(DDS.hinstActive)))
return;
At this point, lpInstance equals DS:0000, where DS is the value of the DS register when the active window’s task is running. The stack pointers in the task header are defined relative to lpInstance:
#define PSTACKBOTTOM (*(WORD FAR*)(lpInstance+14))
#define PSTACKMIN (*(WORD FAR*)(lpInstance+12))
#define PSTACKTOP (*(WORD FAR*)(lpInstance+10))
(The (WORD FAR*) cast is needed because the stack pointers are words, while lpInstance is a far pointer to a char.) This makes it easy to compute the stack sizes:
DDS.wStackMax = PSTACKBOTTOM - PSTACKTOP ;
DDS.wStackUsed = PSTACKBOTTOM - PSTACKMIN ;
The static data area occupies the region from the end of the task header to the beginning of the stack. However, HeapPeep includes the 16-byte task header when computing its size:
DDS.wStaticData = PSTACKTOP ;
Getting the heap information takes a bit more effort. Better grab some extra carbonite for your helmet lamp . . . here’s where it gets really dark.
Walking the local heap lets you quantify its size and its composition. The local heap is implemented as a doubly linked list (see Figure 6). Each entry or record in this list is generated by a LocalAlloc function call. The first two words in each heap record are pointers, a backward pointer, and a forward pointer. This means the overhead per LocalAlloc is at least 4 bytes (2 bytes per pointer). Windows encodes the record type in the two lower bits of the backward pointer (see Figure 7).
Figure 6 The Local Heap Is Implemented as a Doubly Linked List
Figure 7 Encoding of Local Heap Entry Types
Lower 2 bits of backward pointer | Local heap entry type |
00 | Free (belongs to free memory pool) |
01 | Fixed (corresponds to LMEM_FIXED flag) |
1x | Moveable (corresponds to LMEM_MOVEABLE flag) |
Walking the local heap is simply a matter of traversing this doubly linked list. My dds_walk function (see Figure 8) advances through the list using forward pointers and uses backward pointers to distinguish the record types. To make the code easier to read, I defined two macros for the backward and forward pointers. The pointers are relative to lpHeapRecord, which is a far pointer to the first byte in a record.
#define PREV_POINTER (*(WORD FAR*) lpHeapRecord)
// Backward "pointer"
#define NEXT_POINTER (*(WORD FAR*)(lpHeapRecord+2))
// Forward "pointer"
Another macro is used for pLocalHeap, the local heap pointer residing in the task header.
#define PLOCALHEAP (*(WORD FAR*)(lpInstance + 6))
PLOCALHEAP in dds_walk defines the offset (in bytes) from DS:0000 (of the active window) to the start of the local heap. Actually, PLOCALHEAP is pointing 4 bytes into the first heap record. So dds_walk initializes the far pointer, lpHeapRecord, to the start of the list by subtracting 4 bytes.
lpHeapRecord = lpInstance + PLOCALHEAP - 4 ;
Then it traverses the list.
DDS.wSize = GlobalSize (DDS.hinstActive);
while ((WORD)lpHeapRecord < DDS.wSize)
{
lpNextHeapRecord = (lpInstance + NEXT_POINTER);
if (lpNextHeapRecord = = lpHeapRecord) break;
wRecordSize = lpNextHeapRecord - lpHeapRecord;
// includes ptr overhead
wStatus = (PREV_POINTER & 0x0003);
switch (wStatus)
{
case 0: DDS.wHeapFree += wRecordSize; break;
case 1: DDS.wHeapFixed += wRecordSize; break;
case 2:
case 3: DDS.wHeapMoveable += wRecordSize; break;
}
lpHeapRecord = lpNextHeapRecord;
}
The traversal ends when the forward pointer points to itself. To make sure it doesn’t go beyond the end of the data segment, the heap record offset is compared to the data segment size. The FP_OFF macro strips the segment (selector) portion off the lpHeapRecord, leaving just the offset.
Finally, note that the 4 byte pointer overhead is included in the record size. This explains a discrepancy between the two versions of HeapPeep.
At this point, dds_walk has determined the size of everything known in the default data segment. But it’s possible there’s unknown stuff in there (it is undocumented, after all). So a catch-all variable, DDS.wOther, is used.
DDS.wOther = DDS.wSize - DDS.wStaticData
- DDS.wStackMax
- DDS.wHeapFixed
- DDS.wHeapFree
- DDS.wHeapMoveable ;
Finally, dds_walk updates the client window if anything has changed since last time.
if ( DDS.hwndActive != OldDDS.hwndActive ||
DDS.wHeapFree != OldDDS.wHeapFree ||
DDS.wHeapFixed != OldDDS.wHeapFixed ||
DDS.wHeapMoveable != OldDDS.wHeapMoveable ||
DDS.wOther != OldDDS.wOther ||
DDS.wSize != OldDDS.wSize ||
DDS.wStackUsed != OldDDS.wStackUsed)
{
InvalidateRect (DDS.hwndClient, NULL, TRUE);
UpdateWindow (DDS.hwndClient);
OldDDS = DDS;
}
Figure 8 The dds_walk Function, Written without Using ToolHelp
void dds_walk ()
{
static DEFAULTDATASEGMENT OldDDS;
WORD wRecordSize, // size in bytes of heap record.
wStatus; // type of heap record.
LPSTR lpInstance, // far pointer to Default Data Segment.
lpHeapRecord, // far pointer to heap record.
lpNextHeapRecord; // far pointer to next heap record.
#define PREV_POINTER (*(WORD FAR*) lpHeapRecord) // Backward "pointer"
#define NEXT_POINTER (*(WORD FAR*)(lpHeapRecord+2)) // Forward "pointer"
#define PSTACKBOTTOM (*(WORD FAR*)(lpInstance+14))
#define PSTACKMIN (*(WORD FAR*)(lpInstance+12))
#define PSTACKTOP (*(WORD FAR*)(lpInstance+10))
#define PLOCALHEAP (*(WORD FAR*)(lpInstance+ 6))
// First, initialize the data segment values.
//
//
DDS.wSize = 0;
DDS.wStaticData = 0;
DDS.wStackMax = 0;
DDS.wStackUsed = 0;
DDS.wHeapMoveable = 0;
DDS.wHeapFixed = 0;
DDS.wHeapFree = 0;
DDS.wOther = 0;
DDS.wUnused = 0;
// Now, get the window that has the focus.
//
//
DDS.hwndActive = GetActiveWindow ();
// Is it a valid window?
//
//
if ( !IsWindow (DDS.hwndActive) ) return;
// If this is a different window than before,
// get a new instance handle.
//
//
if (DDS.hwndActive != OldDDS.hwndActive)
{
DDS.hinstActive = (HANDLE) GetWindowWord (DDS.hwndActive,
GWW_HINSTANCE);
if (!DDS.hinstActive) return;
}
// Lock down the Data Segment
//
//
if ( !(lpInstance = GlobalLock (DDS.hinstActive))) return;
/*
* The Data Segment is a global memory object - created by WINDOWS
* with a GlobalAlloc. It's comprised of 4 components: header,
* Static, stack, and local heap. All 4 components are offset
* into the segment, with the header at DS:0000.
*
*
* The header occupies the first 16 bytes of a Default Data Segment.
* Within the Header area are 3 pointers to the stack:
*
* pStackBottom - (highest physical address) beginning of stack.
* pStackMin - High-Water mark of actual stack use.
* pStackTop - (lowest physical address) end of stack.
*
* Remember, the stack grows "down" (higher to lower address), so
* to compute the stack sizes, we use these equations:
*
* wStackMax = pStackBottom - pStackTop ;
* wStackUsed = pStackBottom - pStackMin ;
*
*
*/
DDS.wStackMax = PSTACKBOTTOM - PSTACKTOP ;
DDS.wStackUsed = PSTACKBOTTOM - PSTACKMIN ;
DDS.wStaticData = PSTACKTOP ;
/*
* First test for a heap. (It's possible there isn't one.)
*
*/
if (PLOCALHEAP = = 0)
{
GlobalUnlock (DDS.hinstActive);
return;
}
/*
* The heap begins where the
* stack ends. The offset that represents the
* beginning of the heap is stored in the header area, 6 bytes from
* DS:0000. Actually, the heap begins 4 bytes before this offset.
*
* Now we'll get a far pointer (lpHeapRecord) to the 1st record in the heap.
*
*/
lpHeapRecord = lpInstance + PLOCALHEAP - 4;
/*
* Traverse the local heap. The heap is implemented as a doubly-linked
* list. The 1st WORD is a backward "pointer" (ie, offset) to the
* previous record. The 2nd WORD is the forward pointer to the next record.
* When the forward pointer points to itself we are done.
*
*/
DDS.wSize = GlobalSize (DDS.hinstActive);
while ((WORD)lpHeapRecord < DDS.wSize)
{
lpNextHeapRecord = (lpInstance + NEXT_POINTER);
if (lpNextHeapRecord = = lpHeapRecord) break;
wRecordSize = lpNextHeapRecord - lpHeapRecord; // includes ptr
// overhead
wStatus = (PREV_POINTER & 0x0003);
switch (wStatus)
{
case 0: DDS.wHeapFree += wRecordSize; break;
case 1: DDS.wHeapFixed += wRecordSize; break;
case 3: DDS.wHeapMoveable += wRecordSize; break;
}
lpHeapRecord = lpNextHeapRecord;
}
/*
* At this point, heap traversal is done.
* However, the heap can grow until the size of DS is 64K (0xFFFF).
* Determine how many additional bytes the heap can grow.
*
*/
DDS.wUnused = 0xFFFF - DDS.wSize;
/*
* Anything else we didn't account for?
*
*/
DDS.wOther = DDS.wSize - DDS.wStaticData
- DDS.wStackMax
- DDS.wHeapFixed
- DDS.wHeapFree
- DDS.wHeapMoveable ;
GlobalUnlock (DDS.hinstActive);
// If anything has changed since last walk, update client window.
//
if (DDS.hwndActive != OldDDS.hwndActive ||
DDS.wHeapFree != OldDDS.wHeapFree ||
DDS.wHeapFixed != OldDDS.wHeapFixed ||
DDS.wHeapMoveable != OldDDS.wHeapMoveable ||
DDS.wOther != OldDDS.wOther ||
DDS.wSize != OldDDS.wSize ||
DDS.wStackUsed != OldDDS.wStackUsed)
{
InvalidateRect (DDS.hwndClient, NULL, TRUE);
UpdateWindow (DDS.hwndClient);
OldDDS = DDS;
}
}
The dds_paint function handles the drawing of a graphic image of the default data segment in HeapPeep’s client window. The technique dds_paint uses is simple: it draws the 64KB default data segment as a horizontal bar 1000 logical units in length. The bar is divided into several parts: the static area, stack, and heap, with logical lengths proportional to their respective sizes in the DDS structure.
The ANISOTROPIC mapping mode is ideal for this application. Although it’s important to display the bar and all the labels, the horizontal-to-vertical aspect ratio is unimportant. Only the bar’s horizontal length matters because the different sections must be drawn in correct proportion to the bar’s total length. The ANISOTROPIC mapping mode enforces this, while scaling the horizontal and vertical dimensions independently to fit the image entirely within the client area.
Notice the order of the bar, from the beginning of the data segment to the end: static data area (includes the 16-byte task header), stack, local heap. Although the left-to-right order of the bar echoes an actual default data segment, HeapPeep’s representation of the local heap is idealized. Here, the order is always FIXED, MOVEABLE, FREE, although actual heaps become fragmented.
If the display supports color, dds_paint uses color brushes to draw the bar. Otherwise, monochrome patterned brushes are used. The brushes are created by dds_create when HeapPeep is initialized and destroyed by dds_destroy when HeapPeep terminates.
The HP1.C file contains just three functions: WinMain, WndProc, and AboutDlgProc. WinMain has the usual job of registering the application’s window class and creating a client window. But it also prevents multiple instances.
There’d be no benefit to having multiple instances of HeapPeep, since they’d all display the same thing. That’s because HeapPeep displays the default data segment of the active window, and only one is active at any time. So WinMain enforces a single instance, using FindWindow to locate a top-level window with a HeapPeep window class:
if (hwnd = FindWindow ("HeapPeep", NULL))
{
BringWindowToTop (hwnd);
return (FALSE);
}
If the class is found, BringWindowToTop activates it and repositions it above the other windows on the display. Then WinMain exits.
Microsoft recommends the following technique instead of checking hPrevInstance:
if (hPrevInstance) return 0;
To quote from a Microsoft OnLine tech note: "The first method is independent of any memory architecture. The second method depends on a memory architecture of the operating system that allows different tasks to access other application’s data segments. In future versions of Windows, this may not be a valid assumption if applications do not share the same local descriptor table (LDT)."
Of course, the dds_walk function will break too when applications stop sharing an LDT, because dds_walk will not be able to get a far pointer to another application’s default data segment by GlobalLocking an hInstance handle.
Next, WinMain checks for the Windows real-mode large frame EMS configuration.
if (GetWinFlags() & WF_LARGEFRAME)
{
MessageBox (GetFocus(),
"HeapPeep cannot run with Large Frame \
EMS memory. (Try c:> win /r/n).",
HeapPeep", MB_OK | MB_SYSTEMMODAL);
return (FALSE);
}
HeapPeep crashes in this configuration because Windows moves the default data segments of all tasks except HeapPeep (when HeapPeep is running) above the EMS bank line. This presents a problem when HeapPeep tries to GlobalLock another application’s default data segment. The GlobalLock returns a far pointer to the data segment, but it’s invalid because the data segment is banked out! So if large frame EMS is detected, WinMain notifies the user with a message box and terminates.
After registering the window class, creating the client window, and calling ShowWindow, WinMain creates a timer. This is HeapPeep’s heartbeat. It’s needed to force HeapPeep to walk the active window’s default data segment about every half second. Then WinMain drops into the message loop.
WndProc, HeapPeep’s client window procedure, is almost trivial. Most of the work is done by the dds_xxx functions:
WndProc (HWND hwnd, unsigned message, WORD wParam,
LONG lParam)
{
switch (message)
{
case WM_CREATE:
{
o
o
o
dds_create (hwnd);
o
o
o
}
return 0L;
case WM_TIMER:
dds_walk ();
return 0L;
o
o
o
case WM_PAINT:
dds_paint ();
return 0L;
case WM_DESTROY:
dds_destroy ();
o
o
o
return 0L;
}
}
The HeapPeep version written without using ToolHelp has several problems, all related to how it collects default data segment information. It makes assumptions about the stack and local heap that might be incorrect in later versions of Windows. And it accesses default data segments via a far pointer obtained by GlobalLocking an hInstance handle--a technique sure to break in future versions of Windows, when each task gets its own LDT.
Using ToolHelp solves these problems, providing functions and data structures for getting stack and local heap information without having to GlobalLock instance handles. Updating HeapPeep was easy (see Figure 9). I simply included TOOLHELP.H, which contains all the external function prototypes and data structures that support the ToolHelp API, and changed about a dozen lines of code in the dds_walk function. Here’s what changed.
First, dds_walk changed the way it finds the instance handle DDS.hinstActive. Instead of using GetWindowWord, it walks the Windows task list to find the instance handle. This guarantees that HeapPeep displays only the default data segment of the task associated with the active window. For message boxes, HeapPeep now shows the task’s data segment instead of USER’s.
There’s another benefit to walking the task list. It’s a way to get stack information for a given task. To walk the task list, dds_walk uses two ToolHelp functions and one data structure: TaskFirst, TaskNext, and TASKENTRY, respectively. Since walking the task list is relatively time-consuming, it’s only done when the active window has changed.
DDS.hwndActive = GetActiveWindow ();
o
o
o
if (DDS.hwndActive != OldDDS.hwndActive)
{
// Loop through the task list
DDS.htaskActive = GetWindowTask (DDS.hwndActive);
taskentry.dwSize = sizeof (TASKENTRY);
if ( TaskFirst (&taskentry) )
do
{
if (DDS.htaskActive = = taskentry.hTask)
{
DDS.hinstActive = taskentry.hInst ;
break;
}
}
while (TaskNext(&taskentry));
}
The traversal terminates when taskentry corresponds to the task of the active window.
The TASKENTRY structure contains a wealth of information for one task.
typedef struct tagTASKENTRY
{
DWORD dwSize;
HANDLE hTask;
HANDLE hTaskParent;
HANDLE hInst;
HANDLE hModule;
WORD wSS;
WORD wSP;
WORD wStackTop;
WORD wStackMinimum;
WORD wStackBottom;
WORD wcEvents;
HANDLE hQueue;
char szModule[MAX_MODULE_NAME + 1];
WORD wPSPOffset;
HANDLE hNext;
} TASKENTRY;
Three members--wStackTop, wStackMinimum, and wStackBottom--compute stack and static data sizes, replacing the PSTACKXXX macros.
TASKENTRY taskentry;
o
o
o
taskentry.dwSize = sizeof (TASKENTRY);
TaskFindHandle (&taskentry, DDS.htaskActive);
DDS.wStackMax = taskentry wStackBottom -
taskentry.wStackTop ;
DDS.wStackUsed = taskentry wStackBottom -
taskentry.wStackMinimum ;
DDS.wStaticData = taskentry.wStackTop ;
It’s strange how taskentry.dwSize must be set to the size of a TASKENTRY structure before it’s used by TaskFindHandle (or TaskFirst)--you’d think TaskFindHandle (or TaskFirst) already knows how big a TASKENTRY structure is. It turns out you have to set sizes on all ToolHelp structures so that future versions of ToolHelp will be able to add or delete structure members and still work with old code.
The second parameter in TaskFindHandle is DDS.htaskActive--the task handle associated with the active window. The DEFAULTDATASEGMENT definition was expanded to accommodate this new member.
The instance handle is still needed to walk the local heap, but now dds_walk replaces the pointer technique with three functions, LocalInfo, LocalFirst, and LocalNext, and two data structures, LOCALINFO and LOCALENTRY.
localinfo.dwSize = sizeof (LOCALINFO);
LocalInfo (&localinfo, DDS.hinstActive);
localentry.dwSize = sizeof (LOCALENTRY);
if (LocalFirst (&localentry, DDS.hinstActive))
{
do
{
if (localentry.wFlags & LF_FREE)
DDS.wHeapFree += localentry.wSize;
else if (localentry.wFlags & LF_FIXED)
DDS.wHeapFixed += localentry.wSize;
else if (localentry.wFlags & LF_MOVEABLE)
DDS.wHeapMoveable += localentry.wSize;
}
while (LocalNext (&localentry));
}
You prepare the walk by first identifying the default data segment containing the local heap. This is accomplished by calling LocalInfo, passing in an address to an initialized LOCALINFO structure and the DDS.hinstActive instance handle.
Then the first local heap record is retrieved by calling LocalFirst, followed by subsequent calls to LocalNext. The traversal is finished when LocalNext returns 0.
Finally, since the ToolHelp DLL is a protected-mode-only DLL, I used the -t flag on the resource compiler, which prevents HeapPeep from running in real mode. This allowed me to simplify WinMain by removing the check for EMS memory.
I had fun with the About dialog. For a little pizazz, it displays my company logo in a vivid color. The logo was scanned in as a monochrome bitmap and saved as a BMP file, then referenced in the resource file, HP.RC, like this:
logo BITMAP logo.bmp
The dialog box function AboutDlgProc loads this resource when it processes the WM_INITDIALOG message:
case WM_INITDIALOG:
{
o
o
o
hbm = LoadBitmap (GetWindowWord (hDlg,
GWW_HINSTANCE), "logo");
The tricky part was drawing the bitmap. Normally, a DialogBox function doesn’t process the WM_PAINT message because each child control receives its own WM_PAINT message and paints itself. But since the bitmap is not a child control I had to draw it myself.
The bitmap is selected into a memory device context, colored (who says monochrome means black and white?), then rendered. Color is added with SetTextColor:
case WM_PAINT:
{
o
o
o
// Create a memory DC
// and select the
// bitmap into it
SetTextColor (ps.hdc, RGB (255, 0, 255));
Then the bitmap’s upper-left corner is aligned with an invisible control (IDD_LOGO) used exclusively for positioning the bitmap. The x,y coordinates are relative to the client window origin of the dialog box:
hwndLogo = GetDlgItem (hDlg, IDD_LOGO);
GetWindowRect (hwndLogo, &rcLogo);
GetWindowRect (hDlg, &rcDialog );
x = rcLogo.left - rcDialog.left;
y = rcLogo.top - rcDialog.top ;
The bitmap is rendered with BitBlt, using 200 pixels for the bitmap’s width and height:
BitBlt (ps.hdc, x, y, 200, 200, hdcMem,
0, 0, SRCCOPY);
Finally, like any GDI resource that you create, the bitmap must be destroyed.
case WM_DESTROY:
DeleteObject (hbm);
return FALSE;
I was eager to compare both versions of HeapPeep side by side. The comparison would reveal how accurate the non-ToolHelp version is, and validate or invalidate the assumptions it makes about default data structures.
I am pleased to report there are just a few differences between them. They both report identical static data area and stack sizes. The differences are in the heap sizes. It seems the ToolHelp version always reports slightly smaller heap sizes. Further testing revealed why--it’s because the localentry.wSize value doesn’t include the 4 bytes used for each heap entry’s forward and backward pointers.
This helps explain why the ToolHelp version’s "other" size is typically larger than the non-ToolHelp version. The 4 bytes per heap entry unaccounted for in the heap sizes winds up in this category.
Both versions have trouble with Windows applications that omit a local heap but don’t zero the pLocalHeap pointer in the task header. HeapPeep is fooled into thinking there’s a local heap when in fact there’s none. This yields crazy results--often causing the bar to extend until clipped by the client area. It’s interesting that the ToolHelp version is fooled, too.
In a future version of HeapPeep, I’ll add TrueTypeÔ support, and anything else that makes HeapPeep useful and fun. Never underestimate the element of fun.
Which reminds me. Let me tell you what happened in the cave. Sure enough, my helmet lamp went out and I was plunged into utter darkness.
As I lay there, unable to move, I heard voices nearby. Turns out there was a spelunking party of five returning from the cave’s other end. Imagine their surprise, finding me stuck in that tunnel. When they pulled me free, they were amazed that I had brought so few tools. I learned one thing that day. When you go into dark caves, bring plenty of tools. And if that dark cave is in Windows, bring HeapPeep.
1For ease of reading, "Windows" refers to the Microsoft Windows graphical environment. Windows is a trademark that refers only to this Microsoft product and is not intended to refer to such products generally.