Bugslayer, MSJ, June 1998

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

June 1998

Download Jun98Bugslayercode.exe (2KB)

John Robbins is a software engineer at NuMega Technologies Inc. who specializes in debuggers. He can be reached at john@jprobbins.com.

This month let's talk about one of the most important tools in your bugslaying arsenal: your debugger. Many engineers just use their debuggers in a passive manner—to set a breakpoint or two and look at a couple of local variables. With a small investment of time, you can really spelunk through your bugs and learn how your application is interacting with the operating system. Since you probably spend more time debugging your code than writing it, it's very important to be totally comfortable with the debugger.
      To ensure that everyone is on the same sheet of music, I will start with a couple of basic issues and work up to some advanced debugger tricks. All of the samples are part of this month's code, so you can work along if you download them. I encourage you to follow any tangents that you might go off on with the sample code, since I can cover only a few tricks in this column and there are enough out there to write a whole book!
      For this column, I will be using the Visual C++® debugger for several reasons. First, almost everyone has it. Second, when your program crashes, no matter if it's written in C, C++, Visual Basic®, or even Java, you need to debug it at the native code level. Even if C or C++ is not your main development language, you need to be familiar with the one debugger that works for all. If you are using another vendor's debugger, you should not have any trouble following along.
      While most of the techniques mentioned in this column work on any Win32®-based operating system, several of them work only on Windows NT®. When debugging on Windows® 95 and Windows 98, I always use a kernel debugger like WDEB386. In my opinion, GUI debuggers are too limiting because Windows 95 and Windows 98 do not support copy-on-write page updating. Any address above 2GB, where most of the operating system resides, is shared across all processes. If a debugger set a breakpoint in one process, it is shared by all processes—thus crashing the operating system. Windows NT gets around this by copying the page where a breakpoint is set and giving the debuggee process its own view of that page.
      If you are debugging problems inside your application's logic, GUI debuggers work fine. Unfortunately, half the problems you encounter in your application are bugs that come from your app's interaction with the operating system. By using a kernel debugger on Windows 95 and Windows 98, you get the advantage of being able to step into the operating system, just as you can under Windows NT.
      Before you jump right into my debugging tips, I strongly recommend reading Matt Pietrek's "Under The Hood" columns in this issue and the February 1998 issue. In these columns Matt gives an excellent introduction to the Intel assembler instructions. Since only a few people in Redmond have the source code to something like KERNEL32.DLL on their system, the rest of us need to look at disassembly. In this column, I will assume that you have at least an understanding of the assembler which Matt covered. While it would be really nice if all crashes occurred in places where you have full source code, 94.6 percent of the time your application crashes in the middle of a ton of assembler code.
Endians, Calling Conventions, and Symbols
      Now I'll cover a couple of things that everyone should know before they look at a debugger. First, there are two ways computers store bytes in memory: Big Endian and Little Endian, which derive from "Big End In" and "Little End In." Intel CPUs are Little Endian, which means that the little end of a multibyte value is stored first. For example, the value 0x1234 is stored in memory as 0x34 0x12. Win32 only supports Little Endian processors, or those that can be switched to Little Endian like the Alpha. It is important that you keep the Little Endian storage in mind when you are looking at memory in the debugger. You'll need to convert it in your head so you are looking at the correct values. If you use the memory window to look at one of your link list nodes and the next pointer is shown as 0x12345678, it will be displayed in byte format as 0x78 0x56 0x34 0x12.
      Second, you need to know the different calling conventions at the assembler level that dictate how parameters are passed to individual functions and how the stack is cleaned up when a function returns. Understanding calling conventions makes it a snap to figure out where the parameters are so you can use the debugger memory window to view them. There are three common calling conventions: standard call (__stdcall), c declaration (__cdecl), and fast call (__fastcall). Another calling convention, thiscall, is the default for C++ code, though it cannot be specified directly like the first three.
      If you have been around Win32-based development for a while, you might have heard of the naked calling convention. While you can use the naked calling convention, it is primarily used in VxD programming. This form of naked means that the compiler does not generate any prolog and epilog code for the function that does the work of setting up and tearing down the stack. All of the calling conventions are the same in that they all push the parameters on the stack in right to left order. The differences are in who cleans up the stack: the caller or the callee.
      Figure 1 compares the different calling conventions, while Figure 2 shows the mixed C code and assembler examples for each type of call. With the descriptions from Figure 1, look at the actual code in Figure 2 to see exactly how the calls are made. Notice in Figure 2 just how the different calling conventions clean up the stack. In the __cdecl function, CDeclFunction, the actual call to the function occurs at address 0x00401012 and the next instruction after the call is ADD ESP,0Ch. This is where the caller is cleaning up the stack. In the __stdcall function, StdCallFunction, the actual call at 0x00401027 is not followed by any stack adjustment. If you look at the StdCallFunction return at address 0x00401075, you will see the callee stack clean up, RET 0Ch.
      If this is your first exposure to the different calling conventions, you might wonder why the different types exist. The __fastcall convention pertains only to Intel CPU computers, so it is never used by the operating system because it is not portable. The differences between __cdecl and __stdcall are subtler. In a __stdcall call, the callee cleans up the stack so it knows exactly how many parameters it expects. This means that a __stdcall function cannot be a variable argument function. Since __cdecl functions have the caller cleaning up the stack, variable argument functions are just fine. Win32 uses __stdcall instead of __cdecl because it takes fewer instructions and is thus faster for the caller to clean up the stack than the callee. A good Windows trivia question to ask when interviewing a candidate is how many functions in the core Windows API are __cdecl functions? The answer is at the end of the article.
      The last thing to do before starting the debugger is to get the Windows NT symbols installed so the Visual C++ debugger can use them. While the symbols are primarily functions and globals and not full symbols, they can help immensely when you are in the middle of some random system call tracking down a crash. Installing the symbols is as easy as going to the Visual C++ program group and clicking on the NT System Symbols Setup icon. This will install the main .DBG files for the core system DLLs into the WinNT\Symbols\Dll directory.
      If you are doing something out of the ordinary, like WinSock or Control Panel programming, those symbols are supplied as well. The .DBG files for all of the Windows NT DLLs, EXEs, CPLs, and device drivers are in the individual service pack symbol updates that you download, or are on the original Windows NT distribution CD. You just pull the .DBG file for the system file that you want symbols for and put it in the appropriate subdirectory in your WinNT\Symbols directory.
      While the symbols really help when you are in the debugger, there is one major gotcha that you need to watch out for: half the time, the supplied symbols might not match the corresponding binary file. The symbols are out of date because it seems that many of the Microsoft applications come with updated core system files that are newer than the ones from the latest Windows NT build or service pack. While Microsoft has taken steps to stop this, certain files like COMCTL32.DLL or OLE32.DLL are updated each time you install a new application or application service pack. Fortunately, the Visual C++ debugger does not load mismatched symbols; it only reports "no matching symbolic information found." While this is better than loading them, you need to manually look for each DLL through the text in the Debug Output tab.
      To check your symbol files against the binary files manually, use DUMPBIN.EXE. The PE COFF header, which is used by both the binary files and the .DBG files, has a field for the time-date stamp when the binary was built. Using DUMPBIN.EXE with the /headers command-line option will allow you to view the headers for the two files. For example, I found that my COMCTRL32.DLL and COMCTRL32.DBG files do not match by running the following commands and viewing the two redirected files looking for the time-date stamp field in the initial header.
dumpbin /headers COMCTRL32.DLL > b dumpbin /headers COMCTRL32.DBG > d
My COMCTRL32.DLL was built at 20:57:56 Tuesday, November 18, 1997, while COMCTRL32.DBG was built at 15:33:18 Friday, April 25, 1997.
Setting Breakpoints Inside a System DLL
      Now that the basics are out of the way, it's time to get on with the advanced stuff! When you are debugging, it really helps to get control at a known point. This is especially important when you are working with ActiveX® controls and other things that the operating system calls in your code. For example, if an OLE container cannot get your control instantiated and the error message leaves something to be desired, you can always set a breakpoint on the KERNEL32.DLL LoadLibraryA to gain control right before your OCX is loaded. Unfortunately, the Visual C++ debugger does not make this as easy as you would expect.
      The "Advanced Breakpoint Syntax" help file on the MSDN™ CD states that you can set a breakpoint on a module and function with a context operator. The format is {[function],[source],[exe]}xxx, where xxx is either a location, a variable name, or an expression. When I saw this, I figured that all I needed to do was put the expression {,,kernel32.dll}CreateProcessA in the Breakpoints dialog to break when CreateProcess was called. Alas, it never went off. When I set the same breakpoint before starting my app under the debugger, I got the dreaded message box, "One or more breakpoints cannot be set and have been disabled. Execution will stop at the beginning of the program."
      It eventually dawned on me what the problem was. The Visual C++ debugger does not synthesize export symbols for a DLL if there is no proper symbol table present. Other debuggers, like the Platform SDK's WinDBG, will use the export names as symbols. While it would be nice if a DLL's exports would get treated as symbols (another thing to wish for in Visual Studio™ 98), you can still set breakpoints on system DLL functions—it just takes slightly more work. It's important to install the system .DBG files because that is what you need to set the breakpoint. The exported functions are given proper symbol names and you just need to calculate the symbol name manually.
      The symbol for the CreateProcessA example is to prefix the name with an underbar and append it with an @ followed by the number of bytes, in decimal, for the argument list. Therefore, CreateProcessA is really _Create-ProcessA@40, according to the debugger. Since all symbols are unique, all you need to specify in the Breakpoint dialog is "_CreateProcessA@40" and the Visual C++ debugger will stop at the beginning of CreateProcessA.
      While all functions will have the underbar prefix, not all functions will follow this same suffix pattern. CreateProcessA is a __stdcall function, so the @40 needs to be appended. Other functions like wsprintfA, which is exported from USER32.DLL as a __cdecl call, do not have anything appended, so the proper symbol is _wsprintfA. For the most part, when calculating the suffix string, just count the number of parameters and multiply by four, the size of a DWORD. This should work, but if a function only takes a WORD parameter, just add two instead of four.

Looking Up Things on the Stack
      Now that you can set breakpoints on system calls, I want to cover looking up information on the stack as well as looking at memory in general. While the debugger can tell you where you crashed and the individual memory access that caused the crash, looking at information on the stack can tell you why your application crashed. I'll show you a real-world example of how to look up this information instead of a contrived, simple example.
      When Visual Basic 5.0 first came out, I wanted to see the native compilation. I saw that C2.EXE and LINK.EXE were installed in the Visual Basic directory. Since these two programs were also part of Visual C++, I just needed to look at how they were spawned from Visual Basic and I would have a good idea how the compilation worked. As you can tell from the name, LINK.EXE links the object files together and produces the actual executable binary. C2.EXE is a little more obtuse. If you are familiar with Visual C++, you know C2.EXE is the code generator that produces the actual machine code. I wanted to see if the Visual Basic C2.EXE was used as it was with Visual C++.
      First, from the debugger, open VB5.EXE. Set a breakpoint on _CreateProcessA@40 as I discussed in the previous section. Start Visual Basic running in the debugger. When Visual Basic starts, create a simple project, set the project properties to create native code, and select File | Make from the Visual Basic IDE. The breakpoint on _CreateProcessA@40 will give the debugger control when starting either C2.EXE or LINK.EXE.
      On Windows NT 4.0 SP3, the debugger will stop at address 0x77F17A45 where the instruction about to be executed is MOV EAX,FS:[00000000]. This is the first instruction in CreateProcessA. When you see the start of a function accessing memory from the FS register, the routine is generally setting up the Structured Exception Handling (SEH) frame. Since the breakpoint is on the first instruction of CreateProcess, nothing has been placed on the stack except the parameters and the return address. Open the Memory window by selecting the View|Debug Windows|Memory menu. Enter "ESP" in the address field, which is the stack pointer register, to see what is on the stack.
      The default format for the Memory window shows everything in byte format, which gets a little tedious when you're looking for values that are multiple bytes because you have to do all the Endian conversions in your head. One day I right-clicked in the window and, lo and behold, the memory window showed different formats: byte, short hex (two byte or WORD), and long hex (four byte or DWORD). One note of caution about changing the memory display format: the display will jump around (regardless of the address you asked to view) depending on where you right-clicked, and the address at the top of the view will change. If you position the address you want to view at the top of the window and you switch formats, you can end up looking at the wrong memory.
       Figure 3 shows the debugger memory window at the start of the CreateProcess breakpoint. The first value is the return address for the instruction 0xFAC1F03; the next 10 are the parameters to CreateProcess (see Figure 4). There are 40 bytes of parameters to CreateProcess; each parameter is four bytes long. The stack grows from high memory to low memory and the parameters are pushed in right to left order so the addresses match up to the parameters.
      When I switched the Memory window to byte format and viewed the application name and command-line parameters, I saw the following values:

0x0012EAB4 g:\VB\C2.EXE 0x0012ED20 C2 -il "E:\TEMP\VB732095" -f "c:\junk\temp\Form1.frm" -W 3 -Gy -G5 -Gs4096 -dos -Zl -Fo"C:\Junk\temp\Form1.OBJ" -QIfdiv -ML -basic

I do not know what some of the flags to C2.EXE mean, but it doesn't really matter—it's not something that you can use directly. The value 0x08000000 for dwCreationFlags is an undocumented flag defined in WINBASE.H as CREATE_ NO_WINDOW. This seems to be the way to spawn another application, in this case a console application, without having any window show up at all. I will leave it as an exercise for you to look up the parameters to LINK.EXE.

Figure 3 Debugger Memory Window

Figure 3 Debugger Memory Window

      There are a couple of things that you need to keep in mind when looking at items on the stack. All local variables for a function are created on the stack. If you see an instruction like SUB ESP,54h, this is what is happening. For the most part, the reservation of local variables occurs at the start of a scope. As you are looking at the stack, make sure to account for these local variables.
      In conjunction with the stack pointer, the base pointer, EBP, is the register used to access both local variables and parameters to functions. EBP is referred to as the frame pointer. If the memory access is a positive offset from EBP, like

MOV EAX, DWORD PTR [ebp+008h]

then the code is accessing a parameter. Negative offsets from EBP, like

MOV [EBP-004h],EAX

are accessing local variables. If the code has been highly optimized, then EBP can be used as a general register. If the start of the function has a PUSH EBP followed by a MOV EBP,ESP instruction, then frame pointers are being used and it is much easier to find the data.
      When looking at values in the stack, it helps to know what is data and what are addresses. As I mentioned in my April 1998 column, it is very important to know where your DLLs load into memory. If you know the starting addresses for your DLLs, then you can quickly find various return addresses in the data. Also, you might want to start getting familiar with the load addresses of important system DLLs like KERNEL32.DLL, NTDLL.DLL, and OLE32.DLL so that you can look them up at a glance. I keep a simple text file around with all the load addresses for the DLLs so I can look them up when I am debugging. To find the load address for a DLL, run

DUMPBIN /HEADERS <DLL>

and look for the image base field. After a little work in the debugger and looking at your cheat file, you will become about 70 percent accurate at guessing if a value on the stack is an address or data.

Skipping and Changing Code
      Changing the program execution or instruction pointer (EIP) on the fly is a little difficult, but becomes a powerful addition to your bugslaying arsenal. It can allow you to do things like reexecuting problem functions and skipping whole functions altogether on-the-fly, especially those that will cause a crash. However, changing the program flow with the debugger can make your program crash if you are not extra careful.
      There are several ways to control the program flow by changing what executes next. But before you start changing at random, here are a couple of tips. First, only execute the changes from the debugger's Disassembly window. If you use the "Set Next Statement" from the popup menu in a Source window, each source line can translate into many assembler instructions and you'll miss the granularity needed to control your changes properly. Second, watch what items are pushed and popped from the stack.
      For example, if you want to reexecute a function without crashing immediately, make sure to change the execution so everything stays lined up. Here, I want to execute the call to the function at 0x00401005 twice.

00401032 55 push ebp 00401033 8B EC mov ebp,esp 00401035 68 10 44 40 00 push 404410h 0040103A E8 C6 FF FF FF call 00401005 0040103F 83 C4 04 add esp,4 00401042 5D pop ebp 00401043 C3 ret

As I step through the disassembly twice, I need to make sure that I let the ADD instruction at address 0x0040103F execute to keep the stack aligned. As my earlier discussion of the different calling conventions indicated, the assembler snippet shows a call to a __cdecl function because of the ADD instruction right after the call. To reexecute the function, I would set the instruction pointer to 0x00401035 to ensure that the PUSH occurs properly.
      The simplest way to change the instruction pointer is to right-click in the Disassembly window on the instruction you next want to execute and select Set Next Statement from the popup menu. The other way is to show the Register window, click on the address next to EIP, and type in the address. If you want to return to a different address from a call, you can open the Memory window, view the memory at ESP/EBP, and change the return address directly. Since swapping around the instruction pointer can lead to crashes, you might want to practice on a simple program to see the effects. (You can use the Calling program that is part of this month's code distribution.)
      While it is useful to change the instruction pointer, it can get tedious to set a breakpoint and change EIP each time you want to avoid a function. In these cases I actually change the code to skip the function altogether. Fortunately, Intel has an instruction that is perfect as a placeholder: NOP. The NOP instruction means exactly what the name implies (no operation), and it will not change anything in your program.
      To change the code at debug time, you need to become a miniassembler. Since you cannot assemble directly into memory with the Visual C++ debugger, you need to poke in the opcode for the instruction into memory yourself. In the case of a NOP instruction, the opcode is 0x90. If you know other opcodes, you can poke those in as well. If you are curious, the Intel CPU manuals list all the opcodes for each instruction. The steps are pretty simple and I will walk through them using the previous code snippet.

After starting your program in the debugger, open the Disassembly window.
Right-click in the Disassembly window and select Code Bytes to show the opcodes. The snippet is a cut and paste from the Disassembly window.
Open a Memory window and enter the address for the instructions that you want to change into the Address field. For the snippet, I want to NOP-out the call at address 0x0040103A.
Make sure the Memory window is showing the memory in byte format.
The call instruction is five bytes long, E8 C6 FF FF FF, so put the cursor on the E8 in the Memory window and type 90 five times.
As you are typing in the NOP opcodes, notice that the Disassembly window changes automatically to show the new instructions.
Keep in mind that changing the instruction stream in this manner works only in the current instance of the running program. If you start your program under the debugger again, you will need to NOP-out the call all over again.
      Notice that I only needed to remove the call instruction as the stack is pushed and popped because it is a __cdecl call. If the call were a __stdcall, then I also would have had to NOP-out the preceding PUSH to keep the stack straight.

The Debugger is a Tool
      I want to stress that the debugger is just a tool like any other in your toolbox. The hallmark of a good bugslayer is that you can use tools to solve problems other than those the tool was obviously designed for. While a debugger can single step through code and let you look at variables, it also makes one heck of a code coverage tool. (For those of you who don't know, code coverage means that you find out which lines have—and, most importantly, have not—executed in your program.)
      As all good bugslayers know, if code does not get executed there is no way of knowing how many bugs are in that code. In Steve McConnell's excellent book, Code Complete (Microsoft Press, 1993), a study claims that without a code coverage tool, the average coverage of a program is only 55 percent. That means that nearly half the code in your program has never been executed! I don't know about you, but that would definitely keep me awake at night! In my opinion, maximum code coverage during unit tests is one of the most important responsibilities of the developer.
      I use the debugger to ensure that I get as much coverage as possible. My technique is to finish the code and get through the unit tests. Once I am at the point where everything is running pretty well, I load the program into the debugger and, for each source file in turn, I set a breakpoint on each executable line. I run the program under the debugger, removing each breakpoint as it is hit, until all breakpoints in the file are cleared. Sometimes to get a line to execute, I need to change return values or parameters and the debugger makes that quite easy. When all the executable line breakpoints in each file are cleared, I have complete code coverage for that file. Once I finish with one source file, I move on to the next. If I fix bugs that I uncover, I retest those functions for code coverage to ensure that I do not introduce more bugs with the fix.
      As you can see, this gets a little tedious sometimes, but you and I are paid to write good code that does not crash. The only way to guarantee that the code won't crash is to execute the code. When planning my development, I always include two to five days for coverage testing. Managers, have never complained when I explained what I was going to do during those days!

Wrap-up and Update
      I have covered a lot of ground and I hope I showed you some advanced techniques for getting the most out of a debugger. As I mentioned at the beginning, if you have not really explored what can and cannot be done in a debugger, you should take a couple of hours and just poke at a few executables. Once you see how some of the techniques I presented work, you can easily start developing your own.
      In my April 1998 column I presented CrashFinder, an application that will take an address and tell you which function, source file, and line it corresponds to. Regarding my discussion of how to generate .PDB files for your builds, reader Andy Barnhart pointed out that you need to make sure that each .PDB file for each Visual C++ configuration is built to the output directory of the executable. If you are not careful, you can get into situations where the .PDB file is overwritten with each build. I have also found a problem with the IMAGEHLP.DLL symbol engine: it does not support static functions. I hope that someone at Microsoft can get this fixed in the next release.
      As I promised, the answer to the interview question is two: wsprintfA and wsprintfW from USER32.DLL.

Da Tips!
      In honor of spring, you must send your debugging tips to me at john@jprobbins.com so you can share them with all of your fellow developers.
      Tip 9 The Visual C++ debugger has a nasty habit of forgetting any breakpoints you set in a dynamically loaded DLL. Hopefully, the next version of the debugger will have deferred breakpoints like those WinDBG has been offering since the first version of the Win32 SDK shipped with Windows NT 3.1. For now, to keep your breakpoints in dynamically loaded DLLs, you must find the feature buried in the Project Settings Dialog. Open your project and select Project|Settings. In the Debug tab, click in the Catagory dropdown and select Additional DLLs. In the Modules listbox, add the complete path and name of the DLL.
      Tip 10 The Visual C++ debugger has a very cool and seemingly undocumented AutoExpand feature. The AutoExpand allows the Variables window to automatically show the pertinent data for the different types in their commonly used format. For example, a CRect variable is automatically expanded to show each of the member fields. The file that controls autoexpansion is AUTOEXP.DAT, located in the MSDEV\Shared IDE\BIN directory. The comments at the top of the file list all of the rules for adding new types to the file. The format specifier section shows the items that can be used directly in the Variables window. Some of the more useful items are: s to show strings, su to expand Unicode strings, and c to show character values. To use a format specifier, type in the variable or value followed by a comma, then the format specifier. For example, m_strData,su would view the value as a Unicode string. You can also use C/C++ cast operations in the Variables window to view different items. To view the value at address 0x00404410 as a character string, type in (char*)0x404410.

Have a tricky issue dealing with bugs? Send your questions or bug slaying tips via email to John Robbins: john@jprobbins.com

From the June 1998 issue of Microsoft Systems Journal.