We want to measure the functions inside the Solitaire program, because we want to know where all those calls to GetPixel and SetPixel are coming from during the cascade operation. So we'll recompile Solitaire specifying the -Gh and -Zd compiler options, and we'll link specifying the CAP.LIB library and using the following linker options: -debugtype:coff -debug:full.
Note that you must make the following call in the initialization code for each thread before you recompile, or batching will confuse the data.
GdiSetBatchLimit(1)
Our CAP.INI file looks like this:
[EXES]
sol.exe
[PATCH IMPORTS]
sol.exe
[PATCH CALLERS]
This will profile SOL.EXE and measure any calls it makes to functions in other modules, including the system DLLs. In our case, there is nothing after the [PATCH CALLERS] header. You should use that section to profile particular DLLs that you've developed after you have listed in the [EXES] section the applications that call your DLLs.
First we start CapDump, and then we start the recompiled version of Solitaire. We'll tell CapDump to clear the counters because we are not concerned at present about the performance while Solitaire starts. Once CapDump clears the counters, we press shift+alt+2 to activate the cascade. After the cascade is completed, we shift to CapDump to dump the data. Data files with the default .CAP extension appear in the directory where each measured application resides.
A center section of the results from the SOL.CAP file produced by this run are shown following. This first output shows function times including called functions:
T h r e a d #1: (pid|tid=0x6f|0x40 Client:pid|tid=0x0|0x0)
----- Rtn + Callees -----
Depth Routine Calls Tot Time Time/Call
7 SOL.EXE: _DrawCardPt 4267 89561181 20989
8 SOL.EXE: _cdtDrawExt 4267 89359010 20941
9 SOL.EXE: _HbmFromCd 4267 166394 38
10 USER32.DLL: _LoadBitmapA 4 43754 10937
10 GDI32.DLL: _CreateCompatibleDC 8 3940 492
10 GDI32.DLL: _CreateCompatibleBitmap 4 4959 1239
10 GDI32.DLL: _SelectObject 16 6485 404
10 GDI32.DLL: _BitBlt 4 31462 7865
10 GDI32.DLL: _DeleteObject 4 1563 390
10 GDI32.DLL: _DeleteDC 8 3728 466
9 GDI32.DLL: _CreateCompatibleDC 4267 2849629 667
9 GDI32.DLL: _SelectObject 8534 3495921 409
9 SOL.EXE: _SaveCorners 4267 29082632 6815
10 GDI32.DLL: _GetPixel 51204 25928990 506
9 GDI32.DLL: _BitBlt 4267 19896612 4662
9 SOL.EXE: _RestoreCorners 4267 23112842 5416
10 GDI32.DLL: _SetPixel 51204 20224631 394
9 GDI32.DLL: _DeleteDC 4267 2892566 677
9 GDI32.DLL: _PatBlt 5776 2863906 495
9 GDI32.DLL: _SetPixel 5776 2273112 393
7 SOL.EXE: _FAbort 4267 2632621 616
This second output shows function calls excluding called functions.
T h r e a d #1: (pid|tid=0x6f|0x40 Client:pid|tid=0x0|0x0)
----- Rtn - Callees -----
Depth Routine Tot Time Time/Call
7 SOL.EXE: _DrawCardPt 202170 46
8 SOL.EXE: _cdtDrawExt 2725391 637
9 SOL.EXE: _HbmFromCd 70500 15
10 USER32.DLL: _LoadBitmapA 43754 10937
10 GDI32.DLL: _CreateCompatibleDC 3940 492
10 GDI32.DLL: _CreateCompatibleBitmap 4959 1239
10 GDI32.DLL: _SelectObject 6485 404
10 GDI32.DLL: _BitBlt 31462 7865
10 GDI32.DLL: _DeleteObject 1563 390
10 GDI32.DLL: _DeleteDC 3728 466
9 GDI32.DLL: _CreateCompatibleDC 2849629 667
9 GDI32.DLL: _SelectObject 3495921 409
9 SOL.EXE: _SaveCorners 3153642 738
10 GDI32.DLL: _GetPixel 25928990 506
9 GDI32.DLL: _BitBlt 19896612 4662
9 SOL.EXE: _RestoreCorners 2888211 676
10 GDI32.DLL: _SetPixel 20224631 394
9 GDI32.DLL: _DeleteDC 2892566 677
9 GDI32.DLL: _PatBlt 2863906 495
9 GDI32.DLL: _SetPixel 2273112 393
7 SOL.EXE: _FAbort 347749 81
The leftmost column is the function call nesting depth. This starts at zero with the first function call that CAP encounters. If a function is called but has not returned when the data is dumped, there will be an asterisk to the left of this number. The next column shows the module name followed by the function name. If the function name is not known because the COFF symbols with the function names in them are not contained in the module, ???: ??? is shown instead of the module and function names.
This section of the results, where SOL.EXE executed for 89 seconds, starts at call level 7 with DrawCardPt, which was called 4267 times. It called cdtDrawExt each time it was called, and that called HbmFromCd. The program didn't spend much time here: only 1.6 out of the 89 seconds. HbmFromCd called the LoadBitMapA function four times. This is a Win32 API function located in USER32.DLL. The final "A" in this function name means this is the ASCII form of the call, indicating that Solitaire is not a Unicode application (or else the function would have a final "W"). The HbmFromCd call uses some graphics primitives in GDI32.DLL, but only a few times, even though it was called 4267 times. So it is not the central cause of the poor performance.
The call level returns to 9, which means cdtDrawExt at level 8 is back in control. We see that even though we spent 89 seconds in this routine and the routines it called, we spent only 2.7 seconds in the routine itself. The other 86-plus seconds were in the functions it called.
The cdtDrawExt function calls CreateCompatibleDC each time it is called, and then SelectObject twice on every call. We have more than 6 seconds between them. Then there is a call to SaveCorners, which takes 29 seconds. We see only 3.1 seconds in SaveCorners itself, so the rest must be in GetPixel. A similar story applies to RestoreCorners and _SetPixel right after the call to BitBlt.
We went to the developer porting Solitaire at this stage in our testing and asked what SaveCorners and RestoreCorners were doing. We discovered that they modified three pixels on each corner of the bitmap of a card to make the corner look a little more rounded. However, because each call forced a client-server transition, we decided to remove the calls altogether during a cascade. This was a very small change, just a few lines of code, but it made the Solitaire cascade twice as fast.
This concludes the Solitaire tuning example.