August 1998
Custom Performance Monitoring for Your Windows NT Applications
Download Aug98Perf.exe (62KB)
Jeffrey Richter wrote Advanced Windows, Third Edition (Microsoft Press, 1997) and Windows 95: A Developers Guide (M+T Books, 1995). Jeff is a consultant and teaches Win32 programming courses (www.solsem.com. He can be reached atwww.JeffreyRichter.com.
|
Monitoring the health
of a computer system is incredibly important. That's why Microsoft built performance monitoring into the very first version of Windows NT®. Unfortunately, Microsoft has not emphasized the importance of performance monitoring to the Outside world, so very few applications take advantage of it.
Microsoft also didn't make it easy for an application to expose performance data information. I looked into adding performance information to my own applications about two years ago. I remember being flabbergasted to see how complex it was and deciding to postpone adding this feature. Around last September, I vowed to bite the bullet and go for it. I decided to create a C++ class that encapsulates the process of exposing performance data to the Operating system. This would allow me to easily add performance data to any application. Before I present my C++ class, I want to explain the basic performance-monitoring facilities that Windows NT offers. You can examine performance monitoring from several different perspectives, and I'd like to touch on all of the M. I'll begin by examining performance monitoring from a user's perspective. I'll explain how the system organizes performance information and how administrators, users, and developers can use the PerfMon tool to see how healthy a computer system is. I will then discuss some of the more common reasons for using performance monitoring when designing an application or Windows NT service. I will then get more technical and discuss the Windows NT performance monitoring architecture from a systems and programmer's perspective.
Performance Monitoring: A User's Perspective
|
Figure 1 The Windows NT Performance Monitor |
|
Figure 2 Add to Chart |
|
Figure 3 Objects, Instances, and Counters |
Performance Monitoring From a Designer's Perspective
The Architecture of Performance Objects
and Counters
|
Figure 4 Perflib Registry Subkeys |
To add your own performance objects and counters to the system, you must append your objects, counters, and help string pairs to the Se two registry values. Referring back to Figure 4, you see two values under the Perflib subkey: Last Counter and Last Help. the Se two values tell you the highest number that has been used in the Counter and Help values. In my registry, the last counter has a value of 1860, so any new objects or counters that I would add to the system should start at 1862. Similarly, my new help text should start at 1863. Microsoft reserves a certain set of numbers for Windows NT itself. The Base Index value (1847) indicates the highest number that Microsoft has reserved for the system's own objects and counters. Objects and counters that you add must be above this number.
You'll also notice in Figure 4 that there are three subkey values that may not exist on your system. You can find documentation for the Se values by looking in Chapter 10 of the Windows NT Workstation 4.0 Resource Kit. Now, let's turn to the Other part of the registry. To expose your own performance objects, instances, and counters, you must create a DLL responsible for returning your performance information. Once the DLL is created, you must tell the system about it by making some more modifications to the registry. the Se modifications are made in a different part of the registry, however: HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\Services\JeffreyObject\ Performance (see Figure 7). The Jeffrey Object portion of this subkey is the part that uniquely identifies my performance data. You will, of course, replace this portion of the subkey with the name that uniquely identifies your performance data. |
Figure 7 Adding Objects to the Registry |
Within this subkey, there are several data values. The most important value, Library, specifies the path name of the DLL that knows how to return your performance information. This DLL must export three functions: an Open function, a Collect function, and a Close function. You can choose any names for the Se functions that you want, but the names must be specified using the Open, Collect, and Close values in the registry.
When performance information is being requested, the system will load your DLL and immediately call its Open function. Your Open function must look like this: |
|
This gives your DLL the Opportunity to initialize itself. Once initialized, the system will periodically make calls to your Collect function. This function is responsible for initializing a memory buffer that contains all of the performance information you wish to return. I will discuss the prototype of the Collect function and go into much more detail of its implementation shortly.
When the system decides to unload your DLL, it calls your Close function, giving you the Opportunity to perform any necessary cleanup. Your Close function must look like this: |
|
The four registry values that I have discussed so far are the Only ones required by the system. However, it is frequently useful to have some additional registry values as shown in Figure 7. As mentioned earlier, when you add new performance objects and counters to the system, you must append your performance object and counter strings to the registry. the Se new strings must begin with unique numbers. It is absolutely essential that your performance DLL remember what object and counter numbers it has been assigned. The easiest place to remember this information is in the Performance registry subkey. Figure 7 shows that I have added the First Counter, Last Counter, First Help, and Last Help values to the registry for just this purpose. When my DLL loads, I open the registry, extract the Se values, and save the M for reference by my Collect function. It is necessary to know your object and counter numbers because the system passes the M to your Collect function to identify which performance information you should return.
You're probably wondering which process your DLL gets loaded into. There are two possible scenarios. The first scenario is when your performance information is being queried locally. For example, when PerfMon is used to query performance information from the local machine, the system loads your DLL into PerfMon's address space. Because you're running in another process's address space, it is critically important that you test your code thoroughly and make sure that it is very robust. If your code contains any infinite loops, calls ExitProcess, or deadlocks waiting for some thread synchronization object, you'll be adversely affecting PerfMon. Over the years, Microsoft has made PerfMon more robust to withstand this kind of passive-aggressive behavior. For example, PerfMon now spawns a separate thread when calling into a DLL's Collect function. If that thread doesn't return in a fixed amount of time, PerfMon can kill that thread and continue running. PerfMon does not directly call the Open, Collect, or Close functions. Instead it calls RegQueryValueEx with HKEY_PERFORMANCE_DATA. This call into ADVAPI32.DLL gets routed to a statically linked library, PERFLIB, that will call the Open, Collect, or Close functions. By the way, PERFLIB is the source name PerfMon uses when posting errors to the Eventlog. A slightly different scenario exists when your performance information is being queried by a remote computer. Here, the remote computer is actually talking to an RPC server contained inside WinLogon.exe. When WinLogon detects a request for performance information, it will load your DLL into its address space. The information that you return from the Collect function is then remoted to the machine making the request. All of this happens transparently to you. Be aware that being loaded into WinLogon's address space is much more severe than being loaded into PerfMon's address space. If your DLL calls ExitProcess, you will terminate WinLogon, which will, in turn, crash the entire operating system. Again, I cannot stress enough how important it is to test the code residing in your performance DLL.
Collecting Performance Data |
|
To repeatedly request performance information, just keep calling RegQueryValueEx. The "Global" in the call to RegQueryValueEx tells the system that you want performance information returned for all components in the system. This can be a lot of information; if you want a small subset of the available information, simply replace "Global" with a set of object numbers. For example, if you want the performance information for the System and Memory objects, you would pass a string of "2 4" to RegQueryValueEx. You always pass object numbers to RegQueryValueEx, and the corresponding DLL always returns all the counter and instance information associated with the specified object.
If an application wants to request performance information from a remote computer, all it has to do is call RegConnect- Registry before entering the while loop. |
|
If you're not going to make any more requests of performance information, you should close the registry with the following call: |
|
Make this call even though you never call RegOpenKeyEx to explicitly open this key. And don't close the key after every call to RegQueryValueEx because it'll degrade performance severely. It is best to just close it once, usually just before your application terminates.
When PerfMon (or any application) calls RegQueryValueEx to request performance information, the system looks in the target machine's HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\ Services subkey. For each entry that contains a Performance subkey, the system loads the specified DLL. If this is the first time the DLL is loaded, the DLL's Open function is called. Once loaded and initialized, the system calls each DLL's Collect function. This gives the DLL the chance to load the memory buffer with its performance information. The Collect function must look like this: |
|
The first parameter, lpValueName, is a Unicode string that is the same value that was passed to RegQueryValueEx. If this string is "Global," then the Collect function must return all of the performance information that it is responsible for. If the string contains a set of space-delimited numbers, then the DLL must determine if the Objects that it offers information about was requested and, if so, return only this information.
The lppData parameter is a pointer to a memory address. On input to this function, *lppData points to the memory buffer where the DLL should write its information. Before returning from this function, the *lppData should be updated so that it points to the memory address immediately following the new data that was placed into the buffer. The next performance DLL will append its data starting at this new address. The lpcbTotalBytes parameter is a pointer to a DWORD that indicates the size of the buffer. On input, *lpcbTotalBytes indicates the number of bytes that are available in the buffer for your DLL to add its information. After placing your information in the buffer, set *lpcbTotalBytes to the total number of bytes that your DLL added to the buffer. When Collect returns, the system will subtract this value from the remaining buffer size and pass this new number to the next performance DLL. The lpNumObjectTypes parameter also points to a DWORD, but this DWORD means nothing on input to the Collect function. A single Collect function can return performance information for several objects. Before Collect returns, *lpNumObjectTypes should be set to the number of objects whose performance information was added to the data buffer. If, for example, your Collect function is called and the lpValueName parameter doesn't indicate any of the Objects your DLL is responsible for, your Collect function should leave *lppData unchanged, set *lpcbTotalBytes and *lpNumObjectTypes to 0, and return ERROR_SUCCESS. If your Collect function is called and the lpValueName parameter indicates objects that your DLL is responsible for, but determines that the data buffer is too small for the amount of data you need to return, you should leave *ppbData unchanged, set *lpcbTotalBytes and *lpNumObjectTypes to 0, and return ERROR_MORE_DATA. If your Collect function does successfully append information to the buffer, then it adds the number of bytes appended to *ppbData, sets *lpcbTotalBytes to the number of bytes appended, sets *lpNumObjectTypes to the number of objects added, and returns ERROR_SUCCESS.
Performance Information Data Structures
Immediately following the PERF_ OBJECT_TYPE structure is one or more PERF_COUNTER_DEFINITION structures (see Figure 10), one structure for each counter offered by the Object. Each PERF_COUNTER_DEFINITION structure defines characteristics of a single counter but the actual value of the counter is not returned inside one of the Se structures. After the PERF_ OBJECT_TYPE structure and all the PERF_COUNTER_DEFINITION structures, there is a single variable-length PERF_COUNTER_BLOCK structure. The first member, ByteLength, indicates how many bytes are in the struct (this number must include the four bytes for the ByteLength member itself). The counter values immediately follow the ByteLength member. It is critically important that each new value is aligned on a 32-bit boundary or alignment violations will occur on RISC platforms. Inside the PERF_COUNTER_DEFINITION structure, the CounterSize member indicates how many bytes are used by the counter value, and the CounterOffset member indicates the Offset of where the counter value is inside the PERF_COUNTER_BLOCK structure.
The beginning of this memory block is laid out just like the previous example. It begins with a PERF_ OBJECT_TYPE structure followed by one PERF_COUNTER_DEFINITION for every counter offered by the Object. This is where the similarity ends. Since this object supports instances, the PERF_COUNTER_DEFINITION structures are followed by one PERF_ INSTANCE_DEFINITION structure for each instance currently existing for the Object. The members of the PERF_ INSTANCE_DEFINITION structure are described briefly in Figure 12. Immediately following the fixed portion of the PERF_INSTANCE_DEFINITION structure is the variable-length portion, which contains the Unicode string name for the instance terminated with a 0 character. If the name contains an odd number of characters (including the 0 character), you must add another two bytes of padding so that the following PERF_COUNTER_BLOCK structure begins on a 32-bit boundary. If the instance name is an even number of characters, no padding is required.
The CPrfData C++ Class
Notice that the performance data map is placed in a source code file all by itself. This is necessary because both the application and the DLL will have to include this file in their projects. Now, let's look at some code that demonstrates how to use the CPrfData class's public member functions: the WinMain function inside HWInputMon.cpp (see Figure 16). For the application to expose performance information, the registry must be configured as described previously. CPrfData has a static member function called Install. |
|
This function must be passed the full path name of the
DLL that exposes your performance information. In WinMain, you can see that my application calls this function if -Install is passed as a command-line switch. In this case, I get the full path name of the executable file, replace the EXE file name with the DLL file name, and then call this function. This, of course, assumes that the EXE and the DLL are in the same directory. If you add performance information to your own application, you may want to have your setup program run the application with the -Install switch.
If the user wants to remove the application from their system, call CPrfData's Uninstall function: |
|
In WinMain, I call this function if -Uninstall is passed on the command line. This causes all of the registry information to be deleted. When you're debugging counters, you may want to have your application install the registry information on startup and uninstall it during shutdown. This way, if you decide to add, delete, or move any of the entries in the performance data map, the registry will not get out of sync.
If the application is just going to run as normal, then the CPrfData class's Activate function must be called: |
|
This function allocates the shared memory block and initializes it with the information contained in the performance data map. An application must only call this function after the performance information has already been installed.
Once the performance data has been activated, WinMain adds instances to one of its objects. Instances are added simply by calling the AddInstance function: |
|
The first parameter is the programmatic symbol used to identify the Object that is to get the new instance; the second parameter is the Unicode string name of the instance. The last two parameters allow you to indicate that this instance is the child of some other object's instance. Most instances do not have parent instances, so you will usually pass just the first two parameters to this function. If the function is successful, it returns an INSTID. This is my own data type that is simply a handle to the newly created instance. If the function fails, -1 is returned.
The CPrfData class has another version of AddInstance that's identical to the first except that it allows you to identify the instance using a unique ID instead of a Unicode string. |
|
Instances can come and go as your application executes so you should feel free to add new instances at any time. You can also remove instances by calling RemoveInstance: |
|
Now I can get to the fun stuff: changing a counter's value. Two functions exist that allow you to alter a counter's value. |
|
If the counter value you want to change is 32 bits wide, call GetCtr32; if the counter value is 64 bits wide, call GetCtr64. In debug builds, my source code will raise an assertion if you accidentally call a function that doesn't match the counter value's width. To both of the Se functions, you pass the programmatic symbol for a counter that you've defined. If this counter is inside an object that doesn't support instances, leave off the second parameter. If this counter's object supports instances, then you must pass the INSTID (returned from AddInstance) as the second parameter.
the Se functions return either a LONG reference or an __int64 reference that identifies the counter's value inside the shared memory block. With this reference, it is trivial to change the value of a counter. Here is an example: |
|
Just sprinkle lines like the Se through your application's source code whenever you want to change the value of a counter. the Se lines of code execute very quickly and should not hurt the performance of your application significantly.
Synchronizing Access to the Counter Values
|
|
Most applications will not use the Se functions. However, I do call the Se functions inside the implementation of my CPrfData class itself. For example, my implementation of the Collect function always locks and unlocks the counter information when called. I felt that this was necessary because the Collect function has a lot of work to do and the additional overhead is insignificant compared to the Other instructions that will execute when collecting the data.
In addition, I also lock the shared memory block when I add or remove object instances. This prevents the data structures from becoming corrupt and my code from crashing. The HWInputMon.cpp module also demonstrates the Se functions from inside the WinMain function. One important note: since threads running in different processes access the shared memory block, the easy way to synchronize the Se threads would be to use a mutex kernel object. But waiting on a kernel object requires about 600 CPU instructions. I'd get much better performance if I could use a critical section. Unfortunately, critical sections can only be used to synchronize threads that are running in a single process. Since I want very high-speed mutual exclusive access to the shared memory buffer from threads in multiple processes, I decided to synchronize access to the shared memory buffer using my COptex class described in my January 1998 Win32 Q&A column.
Performing at Your Very Best
From the August 1998 issue of Microsoft Systems Journal.
|