How Working Set Tuning Can Help You

The counter Process: Working Set shows the number of pages in memory for the process, as we discussed at some length in Chapter 5. The working set includes both shared and private data. The shared data includes pages containing any instructions your application executes, including those in your own DLLs and those of the system. It is efficient that these pages are shared between processes, so that their working sets overlap to whatever extent sharing is possible. Still, it can amount to a whole slew of pages.

The Windows NT Working Set Tuner reduces the number of code pages that have to be in RAM for your program to execute. It reduces them by helping the linker put your executable together in a way that minimizes the number of pages you use.

Normally your executable image is put together in the order in which address references are resolved. This has nothing at all to do with the need for particular routines to reside together in memory, because lots of functions get called only under error conditions or other unusual situations. The references to these routines are, of course, right next to references to those routines which are used all the time. Consider the following example.

status = DoSomethingFirst(...);
while (status == WONDERFUL) {
    status = ProcessNormally(...);
} else {
    PressThePanicButton(...);
}

Assuming this is the first time the linker has seen these symbols, it would put DoSomethingFirst in the .EXE, followed by ProcessNormally, and then it would put in PressThePanicButton. But DoSomethingFirst is only used during initialization, and PressThePanicButton is only called when the sky is falling. It would be better if ProcessNormally were placed in the .EXE with other routines which are used frequently, DoSomethingFirst were placed with routines which were used to initialize the applications, and PressThePanicButton were placed somewhere else (and we really don't care where). If the linker sets them up that way, a page brought in when ProcessNormally is first executed would likely contain only routines which are used frequently. And the page containing DoSomethingFirst could be discarded after initialization, because it would likely be packed with initialization routines. And best of all, the page containing PressThePanicButton would come into memory only if the error condition arose.

The Working Set Tuner accomplishes precisely these objectives. It provides a packing list to the linker so the linker can place functions into the executable image in the order that most reduces paging. It does this by determining which functions are used together in time. The functions which are used most often are placed together in the .EXE image. This continues in order of usage until the never-referenced functions are reached. It places these at the end of the .EXE in "don't care" order.

In order to determine which functions are used together in time, the Working Set Tuner starts with a measurement of your application. For this utility to do a good job, you must define your scenario to include all the commonly used functions in your application. Your scenario should spend the most time on the most commonly used function, fading to those less frequently used. For example, when we performed working set tuning on Performance Monitor, our scenario included the following:

Logging all objects at 3-second intervals.
Charting Processor, Memory, and System counters at 5-second intervals.
Reporting on the same objects at 10-second intervals.
Alerting on the same objects at 15-second intervals.

The tuned working set that resulted from this set of tasks was smallest for logging, which we wanted because it is the most serious use of the tool, when we want Heisenberg in the trunk. For charting, which is quite common, we let Heisenberg sit in the back seat. And so on.

How good a job does the Working Set Tuner do? The results for Performance Monitor executing the scenario described above are in the next table.

Table 11.1 Code Working Set Tuning of Performance Monitor


Executable image pages	RAM pages before tuning	RAM pages after tuning

41	30	11

That's a pretty dramatic saving. Typically we see between 25% and 50% savings on code space used. You can normally expect a 30% reduction in your code space for the scenario that you measure. But the operational results of your efforts depend almost entirely on how good a job you did at devising your scenario. A quick "let's just run something" without preparation won't help. So think carefully about that test scenario.