The Call Attributed Profiler, or CAP, details the internal function calls within an application. You use it to see how much time is spent in each function and in the functions called by that function. You can also use it to see how much time is spent in the function itself, ignoring the functions that it calls. This gives a complete picture of how time is spent throughout the application.
Many older-generation application profilers were sampling profilers. They would interrupt the processor at high frequency and take a snapshot of the instruction pointer. The areas in the program most heavily hit during sampling were the program's "hot spots." Tuning the application consisted of recoding its hot spots. This approach is very successful in programs that are computationally rich. Modern applications tend to be highly structured, having thousands of functions—none of which are computationally intensive. Such programs yield a flat sampled profile and thus do not lend themselves to tuning with such profilers.
A different approach was needed to help resolve this issue, and this led to the evolution of CAP. In CAP, each function call is timed. The start time is stored in a data structure allocated to the function when it is called (suppose it's named "foo"), and attached to the calling function's structure (call this one "sweet"). A count is incremented so we'll know how many times the function foo was called by the function sweet, as well as total time in the function. When another function, say "bar," is first called, a new data structure is allocated. (Now you know what we do with all that memory.) The result is a dynamic call tree showing the sequence sweet->foo->bar with counts and times at each level. This permits the entire structure and not just the individual functions to be tuned.
Initial proposals for call attributed profiling on Windows NT involved using the debugging APIs to intercept the function calls. This design would have had the advantage of not requiring you to recompile the application to measure it with the profiler. While older systems with simple debuggers could probably get away with this, the extra protection and security features of Windows NT made the debugging APIs, well, rich. Using them would have used all the space in the processor's cache and thus greatly distorted the execution time of the functions. Cousin Heisenberg again. Initial estimates indicated this would severely degrade the accuracy of the results. So instead, the module is recompiled with the -Gh compiler option, and a special call is inserted by the compiler at the start of every function. This invokes a measurement module called CAP.DLL which takes the measurement. It still interferes with the processor's caches, but nowhere near as much as using the debugging APIs would have.
CAP uses an elapsed time clock to measure time in functions. This has both benefits and liabilities. The benefit is that you see where time is spent during disk or LAN activity. The flip side is that if your thread gets switched out while the national debt is being computed by another application, it will appear as though the preempted function used all that time. So it is important to control the environment when using CAP. (Actually the principle is not unique to CAP: it applies equally to WAP.)
CAP can be used to measure the functions within one or more executable programs and/or dynamic-link libraries. The activity in each thread of each process is tracked in a separate call tree. It can also monitor the calls from one such module to another, just as WAP does. Unlike WAP, however, CAP is not restricted to measuring only the calls to system DLLs. Calls to any DLL can be monitored with CAP, whether the DLL belongs to the application or to Windows NT.
By default, CAP collects data only on functions written in Microsoft C or C++ or a compatible product from another vendor. Data is collected from assembly language procedures only if you provide some special support in those routines.