October 1998
Download Oct98BugSlayer.exe (155KB)
John Robbins is a software engineer at NuMega Technologies Inc. who specializes in debuggers. He can be reached at john@jprobbins.com. |
The other day my good friend Jim Austin and I were swapping debugging war stories, trying to outdo each other with our debugging prowess. When Jim started bragging about finding and fixing a nasty bug in my code, I quickly changed the topic: What are the worst problems to debug? It only took a microsecond for us to agree that multithreaded deadlocks were the absolute worst. Even if you think you have planned for every contingency, your application stops dead when you least expect it.
Jim and I started comparing notes on what techniques have worked for us in the past, and we laughed when we realized that luck was number one. As Jim pointed out, luck does not ship good code on time. We had a few more scientific techniques for the design phase, but during the debugging phase, when the code is actually deadlocking, we only had luck to go on. Jim then threw down the gauntlet and issued this challenge: "You're the Bugslayer guy, figure something out." This month I'll discuss a few tricks that have worked for me when doing multithreaded programming. I developed a utility, DeadlockDetection, that tells you where the deadlock occurred in your code. I designed DeadlockDetection with a great deal of extensibility in mind because I want to do more with it in a future column. Also, the techniques I used can be applied to different bugslaying utilities, so don't be surprised if I use some of them in the future. After the tips and tricks, I will discuss the design requirements I had for DeadlockDetection, followed by the key implementation details.
Multithreading Tips and Tricks
DeadlockDetection Requirements
High-level Design Issues
Using DeadlockDetection
|
|
As you can see from some of the INI settings, Dead-lockDetection can initialize just by having LoadLibrary called on it. A good bugslayer idea would be to create a backdoor in your application initialization that calls LoadLibrary on the specified DLL name if it sees a special registry key or environment variable. This way you would not need conditional compilation and you have a means of getting other DLLs into your address space cleanly. Of course, all of this assumes that the DLLs you are loading are smart enough to take care of their initialization in their DllMains.
If you want to initialize DeadlockDetection yourself, all you need to do is call OpenDeadlockDetection when appropriate. OpenDeadlockDetection takes a single parameter, the initial reporting options. Figure 2 lists all of the DDOPT_ flags. Of course, you will want to call OpenDeadlockDetection before your application starts creating threads so all the key information about the synchronization objects can be recorded. At any point, you can change the reporting options by calling SetDeadlockDetectionOptions. This takes the same ORd set of flags as the OpenDeadlockDetection function. To see what the current options are, call GetDeadlockDetectionOptions. You can change the current options as many times as you like during your program's execution. If you want to suspend and resume logging, call the SuspendDeadlockDetection and ResumeDeadlockDetection functions. Along with this month's source code, I wrote one DeadDetExt DLL, TextFileDDExt.dll. This is a relatively simple extension that records all the information to a text file. When you run DeadlockDetection with TextFileDDExt.dll, it creates a text file in the same directory as the executable program. The text file will use the name of the executable with a .dd extension. For example, if you run SimpTest.exe, the resulting file will be SimpTest.dd. Some sample output from TextFileDDExt.dll is shown in Figure 3. The output shows the information in this order:
A word of caution: if you turn on full logging of all functions, you can generate some extremely large files in no time. One of the test samples, MTGDI, can generate an 11MB file in a minute or two if you create a couple of threads.
Implementation Highlights
|
|
The entire output for the single function that appears in Figure 3 was constructed with the information in the DDEVENTINFO structure. While most of the fields in DDEVENTINFO are self-explanatory, the dwParams field needs a special mention. This is really a pointer to the parameters as they appear in memory. DeadlockDetection is only intercepting __stdcall functions. If you read my June 1998 column, you should see that dwParams is each of the parameters in left-to-right order. I applied a little creative casting to make it easy to convert dwParams.
In DeadlockDetection.h, I provide typedefs that describe each of the intercepted function parameters. For example, if eFunc was eWaitForSingleObjectEx, then you would cast dwParams to a LPWAITFORSINGLEOBJECTEX_ PARAM to get the parameters. To see all of this in action, check out the TextFileDDExt.dll code in this month's source code. While output processing is relatively easy, the hard part is gathering the information. I wanted DeadlockDetection to hook the synchronization functions in Figure 1, but to appear exactly as if the real function had been called. I also wanted to get the parameters and the return value and to write the hook functions in C/C++ easily. It took quite a while with the debugger and the disassembler before I got it right. My initial step was to write all the hook functions so they were just pass-through functions and called the real function directly. This worked great. My next step was to get the parameters and the return value for the functions into local variables. While getting the value returned from the real function was simple, I realized that I did not have a clean way to get the return address with my C/C++ hook function. I needed the DWORD right before the current stack pointer. Unfortunately, in straight C/C++ the function prolog had already done its magic by the time I could get control, so the stack pointer had already moved away from where it needed to be. You might think that the stack pointer is just offset by the number of local variables, but that is not always the case. The Visual C++® compiler does a pretty good job of optimizing so that it is not in the same place with different optimization flags set. While you might declare a variable as a local, the compiler can optimize it down to a register so it does not even appear on the stack. I needed a guaranteed way to get the stack no matter what optimizations were set. At this point, I started thinking nakedno, not me without clothes, but declaring the hook functions as __declspec(naked) and creating my own prolog and epilog code. With this approach, I would have complete control over ESP no matter what optimizations were used. Additionally, getting the return address and parameters is a snap because they are at ESP + 04h and ESP + 08h, respectively. Keep in mind that I am not doing anything out of the ordinary with the prolog and epilog code, so I still do the usual PUSH EBP and MOV EBP , ESP for prolog and MOV ESP , EBP and POP EBP for epilog. One downside to doing my own prolog and epilog is that I need to use inline assembler. Since I do not know Alpha assembler as well as I know Intel assembler, I cannot provide a version of DeadlockDetection that works on the Compaq (née Digital) Alpha. If you can port DeadlockDetection to the Alpha and send me the code, I will report about it in a future column and post the code on my Web site. Since each of the hook functions was going to be declared as __declspec(naked), I made a couple of macros to handle the prolog and epilog: HOOKFN_PROLOG and HOOKFN_ EPILOG. I also went ahead and declared some common local variables that all hook functions would need in HOOKFN_PROLOG. These included the last error value, dwLastError, and the event information structure to pass to the DeadDetExt DLL, stEvtInfo. The dwLastError is yet another thing that I needed to watch when intercepting functions. The Windows API can return a special error code through SetLastError to provide more information if a function fails. This error code can be a real boon because it tells you why an API failed. For example, if GetLastError returns 122, then you know that the buffer parameter was too small. WINERROR.h contains all the error codes the operating system returns. The problem for hook functions is that they can reset the last error as part of their processing. This can wreak havoc if your application relies on this behavior. If you call CreateEvent and want to see if the returned handle was truly created or just opened, CreateEvent sets the last error to ERROR_ALREADY_EXISTS if it just opened the handle. Since the cardinal rule of intercepting functions is that you cannot change the behavior of the application, I needed to call GetLastError immediately after the real function call so my hook function could properly set the last error code that the real function returned. The rule in the hook function is that you need to call GetLastError right after you call the real function, and then call SetLastError as the last thing before leaving the hook function. At this point, I thought I was pretty much done except for the testing. The first thing I found was a bug: I wasn't preserving ESI and EDI across the hook call even though the documentation on using the inline assembler explicitly stated this. After I fixed this, everything seemed to work fine. When I started doing register comparisons on before, during, and after cases, I noticed that I was not returning the real functions in the EBX, ECX, EDX, and worse yet, the flags registers. While I did not see any problems and the documentation said that those registers did not need to be preserved, I was concerned that there might be something I had not tested. I declared the REGSTATE structure to hold the register values after the real function call so I could put them in that state when my hook function returned. This meant that I needed to create two additional macros, REAL_FUNC_PRE_CALL and REAL_FUNC_POST_ CALL, that must be used around the real call the hook function makes. After a little more testing, I found another problem: in release builds with full optimizations, I would have a crash every once in a while. I finally tracked that down to the effect of the optimizer on some of my hook functions. The optimizer was doing the right thing, but not when I needed it to. I was very careful about the register usage in my hook functions and only used EAX or stack memory directly. I found the _DEBUG build code sequence was |
|
and the optimizer was turning it into the following: |
|
It is easy to see in the second snippet that the POP into EBX was trashing the register. To avoid the optimizer doing things like this behind my back, I turned it off for all hook function files by placing a |
|
at the top of each file. This also made debugging easier because very similar code is generated for both release and debug builds.
Figure 4 shows the final version of DD_Funcs.h, which is the internal header file where all of the special hook function macros are declared. The comment at the top of the file has a sample hook function where all the macros need to be called. I strongly encourage you to step through the SimpTest example that is part of the source code. Make sure that you watch an entire function call at the assembler level because that is the only way you will see the whole thing work. There are a couple of other minor points that I want to make about the implementation. First, I made sure that DeadlockDetection.dll and its supporting DLLs, BugslayerUtil.dll and TextFileDDExt.dll, did not use any runtime library resources from the user's application. Since I need to do some memory allocations, if I were to use the user's heap, I would be affecting the application a little too much and the program could do an overrun into DeadlockDetection's data. All of the DLLs that I use link to their own static versions of the RTL and, if need be, use a separate private heap to handle some memory allocations. Second, DeadlockDetection is always called by your application even if it is suspended. Instead of hooking and unhooking on the fly, I leave the functions hooked and look at some internal flags to determine how the hook should behave. This makes it easy to toggle different function-logging on the fly, but it does add a bit to the overhead of your application. It seemed error-prone to allow hooking and unhooking on the fly. Finally, DeadlockDetection hooks the functions out of a DLL when brought into your program through LoadLibrary. However, it can only gain control after that DLL's DllMain has executed, so if there are any synchronization objects created or used during DllMain, DeadlockDetection can miss them.
What's Next For DeadlockDetection?
Tips
|
Have a tricky issue dealing with bugs? Send your questions or bug slaying tips via email to John Robbins: john@jprobbins.com
From the October 1998 issue of Microsoft Systems Journal.
|