Using CAP on Solitaire

We want to measure the functions inside the Solitaire program, because we want to know where all those calls to GetPixel and SetPixel are coming from during the cascade operation. So we'll recompile Solitaire specifying the -Gh and -Zd compiler options, and we'll link specifying the CAP.LIB library and using the following linker options: -debugtype:coff -debug:full.

Note that you must make the following call in the initialization code for each thread before you recompile, or batching will confuse the data.

GdiSetBatchLimit(1) 
 

Our CAP.INI file looks like this:

[EXES]
sol.exe

[PATCH IMPORTS]
sol.exe

[PATCH CALLERS]
 

This will profile SOL.EXE and measure any calls it makes to functions in other modules, including the system DLLs. In our case, there is nothing after the [PATCH CALLERS] header. You should use that section to profile particular DLLs that you've developed after you have listed in the [EXES] section the applications that call your DLLs.

First we start CapDump, and then we start the recompiled version of Solitaire. We'll tell CapDump to clear the counters because we are not concerned at present about the performance while Solitaire starts. Once CapDump clears the counters, we press shift+alt+2 to activate the cascade. After the cascade is completed, we shift to CapDump to dump the data. Data files with the default .CAP extension appear in the directory where each measured application resides.

A center section of the results from the SOL.CAP file produced by this run are shown following. This first output shows function times including called functions:

T h r e a d  #1:   (pid|tid=0x6f|0x40   Client:pid|tid=0x0|0x0)
                                         ----- Rtn + Callees -----
Depth Routine                            Calls   Tot Time   Time/Call
  7   SOL.EXE: _DrawCardPt                4267   89561181   20989
  8   SOL.EXE: _cdtDrawExt                4267   89359010   20941
  9   SOL.EXE: _HbmFromCd                 4267     166394      38
 10   USER32.DLL: _LoadBitmapA               4      43754   10937
 10   GDI32.DLL: _CreateCompatibleDC         8       3940     492
 10   GDI32.DLL: _CreateCompatibleBitmap     4       4959    1239
 10   GDI32.DLL: _SelectObject              16       6485     404
 10   GDI32.DLL: _BitBlt                     4      31462    7865
 10   GDI32.DLL: _DeleteObject               4       1563     390
 10   GDI32.DLL: _DeleteDC                   8       3728     466
  9   GDI32.DLL: _CreateCompatibleDC      4267    2849629     667
  9   GDI32.DLL: _SelectObject            8534    3495921     409
  9   SOL.EXE: _SaveCorners               4267   29082632    6815
 10   GDI32.DLL: _GetPixel               51204   25928990     506
  9   GDI32.DLL: _BitBlt                  4267   19896612    4662
  9   SOL.EXE: _RestoreCorners            4267   23112842    5416
 10   GDI32.DLL: _SetPixel               51204   20224631     394
  9   GDI32.DLL: _DeleteDC                4267    2892566     677
  9   GDI32.DLL: _PatBlt                  5776    2863906     495
  9   GDI32.DLL: _SetPixel                5776    2273112     393
  7   SOL.EXE: _FAbort                    4267    2632621     616
 

This second output shows function calls excluding called functions.

T h r e a d  #1:   (pid|tid=0x6f|0x40   Client:pid|tid=0x0|0x0)
                                         ----- Rtn - Callees -----
Depth Routine                             Tot Time   Time/Call
  7   SOL.EXE: _DrawCardPt                  202170          46
  8   SOL.EXE: _cdtDrawExt                 2725391         637
  9   SOL.EXE: _HbmFromCd                    70500          15
 10   USER32.DLL: _LoadBitmapA               43754       10937
 10   GDI32.DLL: _CreateCompatibleDC          3940         492
 10   GDI32.DLL: _CreateCompatibleBitmap      4959        1239
 10   GDI32.DLL: _SelectObject                6485         404
 10   GDI32.DLL: _BitBlt                     31462        7865
 10   GDI32.DLL: _DeleteObject                1563         390
 10   GDI32.DLL: _DeleteDC                    3728         466
  9   GDI32.DLL: _CreateCompatibleDC       2849629         667
  9   GDI32.DLL: _SelectObject             3495921         409
  9   SOL.EXE: _SaveCorners                3153642         738
 10   GDI32.DLL: _GetPixel                25928990         506
  9   GDI32.DLL: _BitBlt                  19896612        4662
  9   SOL.EXE: _RestoreCorners             2888211         676
 10   GDI32.DLL: _SetPixel                20224631         394
  9   GDI32.DLL: _DeleteDC                 2892566         677
  9   GDI32.DLL: _PatBlt                   2863906         495
  9   GDI32.DLL: _SetPixel                 2273112         393
  7   SOL.EXE: _FAbort                      347749          81
 

The leftmost column is the function call nesting depth. This starts at zero with the first function call that CAP encounters. If a function is called but has not returned when the data is dumped, there will be an asterisk to the left of this number. The next column shows the module name followed by the function name. If the function name is not known because the COFF symbols with the function names in them are not contained in the module, ???: ??? is shown instead of the module and function names.

This section of the results, where SOL.EXE executed for 89 seconds, starts at call level 7 with DrawCardPt, which was called 4267 times. It called cdtDrawExt each time it was called, and that called HbmFromCd. The program didn't spend much time here: only 1.6 out of the 89 seconds. HbmFromCd called the LoadBitMapA function four times. This is a Win32 API function located in USER32.DLL. The final "A" in this function name means this is the ASCII form of the call, indicating that Solitaire is not a Unicode application (or else the function would have a final "W"). The HbmFromCd call uses some graphics primitives in GDI32.DLL, but only a few times, even though it was called 4267 times. So it is not the central cause of the poor performance.

The call level returns to 9, which means cdtDrawExt at level 8 is back in control. We see that even though we spent 89 seconds in this routine and the routines it called, we spent only 2.7 seconds in the routine itself. The other 86-plus seconds were in the functions it called.

The cdtDrawExt function calls CreateCompatibleDC each time it is called, and then SelectObject twice on every call. We have more than 6 seconds between them. Then there is a call to SaveCorners, which takes 29 seconds. We see only 3.1 seconds in SaveCorners itself, so the rest must be in GetPixel. A similar story applies to RestoreCorners and _SetPixel right after the call to BitBlt.

We went to the developer porting Solitaire at this stage in our testing and asked what SaveCorners and RestoreCorners were doing. We discovered that they modified three pixels on each corner of the bitmap of a card to make the corner look a little more rounded. However, because each call forced a client-server transition, we decided to remove the calls altogether during a cascade. This was a very small change, just a few lines of code, but it made the Solitaire cascade twice as fast.

This concludes the Solitaire tuning example.