Under the Hood

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

March 1996

Matt Pietrek is the author of Windows 95 System Programming Secrets (IDG Books, 1995). He works at NuMega Technologies Inc., and can be reached at 71774.362@compuserve.com.

Click to open or copy the SHOWTITL project files.

In my article "An Exclusive tour of the New TOOLHELP32 Functions for Windows® 95," (MSJ, September 1995), I described how Windows 95-based programs can easily access system-level information. The functionality that TOOLHELP32 provides includes process lists, thread lists, module lists, heap lists, and heap walking. Unfortunately, the TOOLHELP32 API isn't part of Win32®, so if you're coding for Microsoft® Windows NT™, these functions simply aren't available.

In place of TOOLHELP32 functions, Windows NT has what's known as performance information. "Performance" is somewhat a misnomer, since much of the information in the Windows NT performance data has nothing to do with how fast something executes. Through the performance information, you can access the running process list, the amount of free memory on a given disk, and many other things unrelated to performance.

In this and a future column, I'm going to describe how to access the Windows NT performance information. This month, I'll describe the basic components that need to be set up prior to using Windows NT performance data. Next month I'll describe the performance data structures and how you can access them. Because the performance data structures are of variable length, I've chosen to encapsulate some of the complexity using C++® classes.

The Windows NT performance information is the basis for at least two utilities that should be familiar to you. PERFMON, which comes with Windows NT, is really just a front-end viewer for Windows NT performance information. The code for PERFMON can be found in the Win32 SDK samples. Alas, the PERFMON program has lots of code unrelated to accessing the performance information, so it's not the best example if you want to quickly learn how to access the performance data. The other program that uses the performance information is PVIEW (AKA PVIEWER). PVIEW focuses on the process and thread lists found in the performance information. The code for PVIEW also comes with the Win32 SDK, and is simpler than PERFMON's. Still, it can be confusing if you just want to learn about performance data.

All Windows NT performance information is accessed through the registry. There's even a special predefined registry key called HKEY_PERFORMANCE_DATA that's used to read the information. In my September 1995 Under the Hood column, I described how Windows 95 keeps performance information in its registry. Unfortunately, the registry keys and the data structures are completely different in Windows NT and Windows 95. In fact, this difference issogreat that Windows 95 calls its performance key HKEY_DYN_DATA, rather than HKEY_PERFORMANCE_DATA. While the Windows 95 performance information is relatively easy to access, the level of information that Windows 95 provides via this mechanism is just a drop in the bucket compared to the amount of information found in the Windows NT registry.

Unlike regular registry keys such as HKEY_LOCAL_MACHINE,thedataretrievedfromHKEY_PERFORMANCE_DATA isn't file-based, nor is it persistent. Instead, when you query the registry for performance information, the registry code reads the information from system internal data structures. What this means is that the performance data will be different each time you query the registry. By the time the calling program gets around to using the information, the system may already have changed its state. More on this later.

The Title Database

Before diving into the specifics of the performance data, it's important to first understand how the performance information gives titles (or names) to the various components of the performance data. The titles of the performance data components are kept separate from the data itself. Since the naming mechanism can be described independently of the actual performance data, I'm going to first describe how performance titles are managed. Later, I'll connect this topic to the actual performance data.

One of the apparent design goals of the Windows NT performance information is that it should be self-describing. By that, I mean that the information accessible via the performance data can be arbitrarily extended. For example, Windows NT device drivers can make their own performance data accessible. The key point is that any code that uses performance data shouldn't need to change if new information is made available in the performance data. Thus, there aren't any magic #defines in a header file somewhere that say "this represents a thread ID," or "that is the current page file size." Instead, the names of objects and counters are maintained in a block of data that's also read in via the registry.

Next month, I'll give a more precise definition of performance objects and counters. For now, it's sufficient to know that an object is something like a process list, memory details, or the paging file. A counter is a specific piece of information about an object instance. For example, the process ID for a particular process is represented as a counter. The important thing to remember is that these titles are dynamic. They could theoretically change from boot to boot or machine to machine.

All of the object and counter name strings are collected together in one place for easy access. They're kept in what I call a title database for lack of a better term. You read the title database in one big chunk from the registry. A title database consists of a series of null-terminated strings, one after the other in memory. To make things more interesting, only half the strings in the title database are actually an object or counter title. The remaining strings are string representations of decimal numbers; for instance, the string "126." These number strings are each paired up with an actual name string. For instance, in my machine's title database, I have the following sequence of strings:

 "86", "Cache"
"88", "Data Maps/sec"
"90", "Sync Data Maps/sec"

As you navigate through the performance data, you'll come across data structures that describe a type of object (such as the cache) or a counter (such as "Data Maps/sec"). In these data structures, the title of an object or counter is stored as a binary value corresponding to a string number in the title database. For instance, using the above sequence of strings, the description for a cache object would include the binary value 86. Likewise, the counter data structure that describes "Data Maps/sec" would contain 88 for the title.

Whileyou could look up a given string in the title database using a brute force approach of comparing strings, there's obviously a better way. If you know how many strings are in the array, you can create an array of pointers to the name strings within the raw title data. Looking up a name becomes very easy at that point. For a given title index, simply use the corresponding array entry from your array of string pointers. Beware, though. Not every index value has a string associated with it. In fact, in the latter part of the title database, you'll come across large gaps in the indexes of consecutive title strings.

Tomakethe title database easy to use, I wrote the CPerfTitleDatabase class (Figure 1). The two most important member functions of this class are GetTitleStringFromIndex and GetIndexFromTitleString. The GetTitleStringFromIndex method simply returns whatever value is in the specified slot in the string array, while GetIndexFromTitleString performs a linear search of the string array, comparing the input string to each nonzero string pointer in the array.

The most interesting method in the CPerfTitleDatabase class is its constructor. There you'll find the code that reads in the raw title strings from the registry and builds the array of string pointers. The code begins by determining how many elements the array of title string pointers needs. The code does this by reading the "Last Counter" value from theHKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\PerfLib key in the registry. The value returned is not the actual number of title strings, however. Rather, it's the maximum index value that a given titlestringcanhave.Using this value, the constructor allocates enough memory for the array of string pointers.

After allocating the string pointer array, the constructor reads in the raw title strings as one giant blob. The function reads the title strings from the "Counter 009" value of the HKEY_PERFORMANCE_DATA key. According to the documentation, the "009" portion of the string indicates the Englishlanguage.It'snotclearfromthedocumentationwhat you'd use in place of "009" for non-English versions of Windows NT.

After reading in the title string data, the code processes the strings in pairs, first the array index string, then the title string. For each title string that's encountered, the appropriate slot in the string pointer array is set to point at the title string in the raw data.

To convert the array index string into its binary representation, I use the _ttoi function. What's _ttoi? It's essentially the atoi (ASCII to Integer) function, but it can be used when compiling for either ANSI or Unicode. I implemented this class using code that's ANSI/Unicode compatible. That's why you'll see a lot of TEXT macros and potentially unfamiliar RTL functions sprinkled throughout the class code.

In addition to the strings that describe the objects and counters, there's an entirely separate registry key that also contains string pairs in the same format as the object and counter titles. This second set of strings is descriptive help text for the object and counter titles. For each title string in the counter database, there's a corresponding description string. In some (but not all) cases, the index of the description string for a counter title is the counter string's index plus one. For example, index 2 in the counter strings is "System," while index 3 in the help string database is: "The System object type includes those counters that apply to all processors on the computer collectively. These counters represent the activity of all processors on the computer."

The maximum number of possible entries in the description title database is found by reading the "Last Help" value from the same key where "Last Counter" is located. The raw help description strings are obtained by reading the "Explain 009" value of the HKEY_PERFORMANCE_DATA key. Since the counter and help title databases are so similar, I made the CPerfTitleDatabase class work with both. The CPerfTitleDatabase constructor takes an enum parameter which indicates whether the counter or help titles should be used. The rest of my code doesn't use the help strings, but you can easily extend the classes to do so. Since I won't have space this month to describe the actual formats of the objects and counters in the performance data, I wrote a trivial program (SHOWTITL) that demonstrates the CPerfTitleDatabase class in action. This way, you'll at least be able to see what sort of goodies can be found in the performance database. The SHOWTITL code is in Figure 2. The code declares two CPerfTitleDatabase class instances (one with counter strings, the other with description strings), and then enumerates through each string in both databases, printing the strings out as it encounters them. On my system, the first part of the SHOWTITL output is:

1	1847
2	System
4	Memory
6	% Processor Time
10	File Read Operations/sec
12	File Write Operations/sec

That's it for my quick tour of the title database. Let's now start acquiring actual performance data.

Performance Snapshots

When using the Windows 95 TOOLHELP32 functions, you create what's called a snapshot by calling CreateToolhelp32Snapshot. A TOOLHELP32 snapshot records all of the desired information into a single data block that you then pass to other functions to retrieve the various bits and pieces of information. Using the Windows NT performance information is somewhat similar, except that there isn't a set of API functions to parse the data block. You have to parse the data structures yourself. This parsing isn't trivial, and is the main reason I wrote C++ classes to encapsulate the dirty work. Another difference is that the Windows NT performance data isn't referred to as a snapshot by the Microsoft documentation. In spite of this, I've decided to call the raw performance data a snapshot because the documentation doesn't offer any better alternative name. Calling it a snapshot also highlights the fact that the performance data is similar in some ways to Windows 95 TOOLHELP32 snapshots.

To take a performance snapshot, you read yet another value from the HKEY_PERFORMANCE_DATA key in the registry. What's unique about taking a snapshot is that the value parameter passed to RegQueryValueEx isn't a predefined string. Instead, you pass RegQueryValueEx a value name that's composed of zero or more tokens. Based on the value name that you pass, RegQueryValueEx creates an appropriate snapshot containing the requested data. Taking a performance snapshot is somewhat like ordering from a menu. You can say "I'd like a thread list and a list of logical disks. That's all, thanks." Interestingly, when you order certain dishes, you may automatically get side orders. For example, when you request thread information, the snapshot data also contains a process list. That's because a thread cannot be completely described without mentioning the process that owns it. Likewise, a complete logical disk description requires information about the physical disk that the logical disk resides on. It's up to the code that parses the snapshot data to realize that the snapshot may contain more data than was actually requested.

Assuming you only want selected parts of the total available performance information, you first create a string with one or more tokens representing which information you'd like. The tokens are separated by spaces, and each token is a string representation of a counter index. For example, on my system, the process list object corresponds to counter index 230, and the logical disk list has counter index 236. To collect information on just those two items, I'd read the HKEY_PERFORMANCE_DATA key, and pass "230 236" as the value name parameter.

If you'd like to sample nearly everything on the performance menu, you can pass the string "Global" to RegQueryValueEx. The returned snapshot will contain all the performance data that isn't expensive (timewise) to collect. On my system, passing "Global" returns information about these items:

system
processor (list)
memory
cache
physicalDisk (list)
logicalDisk (list)
process (list)
thread (list)
objects
redirector
server
paging file
browser

Another predefined string value that can be passed to RegQueryValueEx is "Costly." The Costly performance data contains information that takes the system a relatively long time to acquire. On my system, reading the Costly data takes 4 or 5 seconds, even with just a few programs running. The Costly information on my system is the following:

process address space (list)
image (list)
thread details (list)

Yet another predefined string value that can be used when reading performance data is "Foreign <computer name>." This causes the registry to take a performance snapshot of a remote machine over a network. Since I don't have a Windows NT-based network, I wasn't able to try this feature out and see how long it takes to acquire the data.

One messy part of taking a performance snapshot is that you don't know ahead of time how big a buffer you'll need for the data. Normally if you don't know how big a buffer you'll need when reading data from the registry, you can call RegQueryValueEx, and tell it that the buffer size is zero bytes. The registry can then indicate to you how big the buffer needs to be-you can allocate a buffer of the correct size and try again. The problem is that when reading performance data, the required buffer size for a snapshot can vary from invocation to invocation of RegQueryValueEx. The result is that you might ask the registry how big a buffer you need. After allocating a buffer of that size, you could query the registry again and still have the call fail because the buffer was too small. One way to handle this problem is pass in a very large buffer and hope for the best. A better approach is to ask for the performance data in a loop. If RegQueryValueEx returns ERROR_MORE_DATA, allocate a bigger buffer and go through the loop again. This is what the Microsoft sample code does, and this is what my C++ class code does, as you'll see shortly.

After you successfully take a performance snapshot, what exactly do you have? Take a look in the WINPERF.H header file that comes with your compiler, and you'll find several structure definitions. I'll go over most of them next month. What's of interest right now is the PERF_DATA_BLOCK structure. All snapshots begin with this structure. Immediately following PERF_DATA_BLOCK is the data for the object types that you requested.

While there are over a dozen fields in the PERF_DATA_BLOCK structure, the most important fields tell you how many object types follow the PERF_DATA_BLOCK (for example, "System"), and where the first object description can be found in the snapshot data. Additional information in the PERF_DATA_BLOCK includes the name of the system that the snapshot was taken for (as a Unicode string), and various timing related fields. The WINPERF.H file describes each field, so I won't waste space with a description here.

To encapsulate the complexities of taking and managing snapshot data, I wrote the CPerfSnapshot class in Figure 3. The constructor for CPerfSnapshot takes a CTitleDatabase pointer as an argument and stores it in a private data member for later use. A snapshot isn't actually taken in the constructor code. The destructor for the CPerfSnapshot class just deletes any memory allocated by previously taken snapshot.

The most important member function for the CPerfSnapshot class is TakeSnapshot. This function takes one parameter, a string indicating what sort of data you'd like in the snapshot. To make the class easier to use, the function does any necessary conversions on the string prior to passing it as the value parameter to RegQueryValueEx. For instance, instead of passing "230," you can pass "Process." The private member function ConvertSnapshotItemName handles the messy work of looking up the counter titles and creating the final string.

After TakeSnapshot has a string that's suitable to pass to RegQueryValueEx, the code enters into a while loop. The first time through the loop, the input buffer size is 0, so RegQueryValueEx fails with a code of ERROR_MORE_DATA. However, in doing so, the function fills in the cbPerfInfo variable with the required buffer size for the snapshot. My code then adds 4KB to this size, allocates a memory block of that size, and then loops back to the RegQueryValueEx call.

The second time through the loop, the buffer should be big enough, and RegQueryValueEx should return ERROR_SUCCESS. If this happens, the TakeSnapshot code checks for the "PERF" signature that's at the start of a valid PERF_DATA_BLOCK structure, and returns TRUE if everything's OK. If the second call to RegQueryValueEx fails, the code bumps up the buffer size by at least another 4KB and loops again. The loop continues in the same way until RegQueryValueEx returns either a success code or some other error code besides ERROR_MORE_DATA.

The GetNumObjectTypes and GetSystemName member functions of the CPerfSnapshot class are relatively self-explanatory. They both retrieve the desired data out of the PERF_DATA_BLOCK structure. One slight twist to the GetSystemName method is that it has code to convert the Unicode system name string to ANSI if you're compiling in ANSI mode. The last member function of the CPerfSnapshot class to describe is GetPostHeaderPointer. This function returns a pointer to the first byte of the snapshot buffer following the PERF_DATA_BLOCK structure. You'll see this function in action next month, when I get to the classes that demonstrate how to enumerate through the performance objects and their associated counters.

Have a question about programming in Windows? You can mail it directly to Under The Hood, Microsoft Systems Journal, 825 Eighth Avenue, 18th Floor, New York, New York 10019, or send it to MSJ
(re: Under The Hood) via:

Internet:

Matt Pietrek
71774.362@compuserve.com

Eric Maffei
ericm@microsoft.com

From the March 1996 issue of Microsoft Systems Journal.