Bugslayer, MSJ, October 1999

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

October 1999

Bugslayer

Code for this article: Oct99BugSlayer.exe (16KB)

John Robbins is a freelance consultant and writer working on a book for Microsoft Press tentatively titled Debugging Microsoft Windows. He was formerly an engineer at NuMega Technologies. Reach John at at www.jprobbins.com.

Q I have been trying, without success, to implement an SEH translator with _set_se_translator like you discussed in the August 1998 Bugslayer column. I am finding that it works in debug builds, but not in release builds. Do you know what could be happening?
Mark Walsen
A Mark also sent along some code snippets that allowed me to duplicate the problem quickly in a standard MFC program. Briefly exploring with the debugger, I saw that it was consistently crashing in the MFC TranslateMessage processing. Since MFC goes through a few gyrations (to say the least) with the message translation, I started looking for the problem from that side.
      As I was poking at the problem, I got another piece of mail from Mark—he figured out that the problem was independent of MFC and he thought it was related to code generation. Turning off global optimizations for the function that catches the SEH wrapper class generates the proper code.
      Yikes! I have been recommending that everyone use SEH translators and here is a case where they do not work. While Mark had found a workaround that got him going, I dug a little bit more and found out that while I thought I had a solid understanding of C++ exception handling, I had not kept up with the new rules. Visual C++® 6.0 changed the default exception handling from asynchronous to a synchronous exception model.
      In Visual C++ 5.0, asynchronous exception handling meant the compiler had to assume that each instruction in the program could generate an exception. This meant that the compiler had to track the lifetime of unwindable objects, so it had to generate code that might never be needed. By switching to the synchronous model, the compiler only assumes that exceptions happen at the throw statement or at a function call, so it generates less code.
      The /GX compiler switch (enable exception handling) now maps to /EHsc (synchronous exception handling), and assumes that extern C functions never throw an exception. If you do have extern C functions that can throw exceptions and want to use synchronous exceptions, use /EHsc. To turn on asynchronous exceptions, the default for Visual C++ 5.0, use /EHa.
      In this month's code distribution, I included a project, SEHProblem, that has a couple of different release builds defined so you can see how the different types of exception handling affect the program. (The code in Figure 1 comes from this project.) One thing that had me a little confused was that I never had a problem with synchronous exception crash handlers in a Visual C++ 6.0-based project. I always used a single class for all catastrophic errors that I could throw. Like the GrungyFunc function in Figure 1, if INSTREAMTHROW is defined, I would throw an error if a parameter was corrupt or there was some other fatal error like the system being out of disk space. This was masking the synchronous exception handling change, so I had never seen it before. If you compile SEHProblem using the Release IN STREAM THROW configuration, you will see that synchronous exception handling works just fine, as long as you throw the crash handler exception somewhere in the code stream.
      According to the documentation, you can mix and match synchronous and asynchronous exception handling all you want. That way you can get the benefit of using exception translators and still keep the size of your executable down. However, if you are using projects generated with Visual C++, you cannot set the exception handling type on a per-file basis, so you must choose one or the other. If you are using C++ exception handling throughout your application and you want to translate structured exceptions into C++ exceptions, you should use asynchronous exceptions (/EHa). In addition, if you are moving code that was developed and fully tested in Visual C++ 5.0, you should consider the implications of synchronous and asynchronous exception handling carefully.
Q I am trying to debug a Visual Basic® 6.0-based executable within the Visual C++ debugger. After reading your June 1998 column, I installed the MSVBVM60.DBG file into the symbols\dll directory, and compiled the executable with debug symbols. When I fire up the debugger, it insists that MSVBVM60.DLL was loaded without symbols because it always reports "Loaded exports for 'E:\WINNT\system32\MSVBVM60.DLL'" instead of "Loaded symbols...". I checked the timestamps of both the DBG and the DLL to validate that they are the same. How do you get symbols loaded for MSVBVM60.DLL?
David Whitney
A I did not realize that Microsoft was providing MSVBVM60.DBG. While it is not the same as having full source, the DBG is very helpful in that it can at least give you function addresses, labels, and global variables in the debugger's disassembly window. If you are curious, MSVBVM60.DBG is on the Visual Studio® Service Pack CD in the same directory as your particular language installation. However, as David indicated, it does not help you at all because the Visual C++ debugger will not load it.
      At first, it stumped me as to why the debugger was choking on MSVBVM60.DBG. After I whipped out my trusty friend, DUMPBIN.EXE, and started looking at the file, I saw what looks like the problem. If you do a
DUMPBIN /SYMBOLS MSVBVM60.DBG
the output shows only the normal COFF (Common Object File Format) and the undocumented OMAP debugging information in the file. Strangely enough, the Visual C++ debugger does not load COFF symbols. Even though the Debug options tab in the Project Options dialog has a setting called Load COFF & Exports, the Visual C++ team has pulled COFF support out of the debugger as Microsoft is moving away from the COFF format. The good news is that WinDBG (the Platform SDK debugger) still supports COFF, so you can load MSVBVM60.DBG through it and get the benefit of the DBG file. You can download the latest WinDBG from http://msdn.microsoft.com/developer/sdk/windbg.asp.
      The undocumented OMAP information is interesting because it appears to have something to do with basic block relocations. (Fellow MSJ colleague Matt Pietrek briefly discussed this in the May 1997 Under the Hood column.) My guess is that Microsoft has some sort of internal tool that packs the binary so that the most common code is pushed up to the front and the rest is put in the rear so that the working set is much smaller. Consequently, this binary rearrangement makes the program faster because it will not have to page in as much of the program.
      This is similar to the Working Set Tuner (WST) tool that ships with the Platform SDK. However, WST only stops at the function level (also called the block level), while it looks like the internal Microsoft® tool goes into the function and will break it apart so that the function itself is scattered across the binary. In the following code snippet, the portion inside the if statement is a basic block, and can be broken out and moved to a different location at the end of the module as it will not be called much:
void foo ( char * p) { // Do some work... BOOL bError = DoSomething ( ) ; if ( TRUE == bError ) { // <---- Basic block starts here. ShowTheUserTheError ( ) ; // Do some work to account for the error. • • • } // <---- Basic block ends here. // Continue on....
Q I am about to embark on a big project that will have some interesting requirements. The main requirement is that it will have both apartment threading and freethreading. To protect against using an interface pointer in the inappropriate threading apartment, I would like to have a function like DebugCoQueryApartmentID. I can then use this function in an ASSERT to make sure that the interface pointers are being used with the proper threading model.
      From what I understand, the OXID is stored in thread local storage (TLS), and I thought about using your HookImportedFunctionByName to hook the TLS functions so I could grab the COM TLS values and get the value that way. I have tried querying the OXID in the MEOW packet of the interface (after marshaling it to a stream), but that is not exactly what I am looking for. Do you think hooking the TLS functions will get me the information I need?
Robert Shearer
A This is an excellent question and a great example of proactive debugging! First, I needed to look at CoInitializeEx to see what it did. Since you are supposed to have 64 TLS slots, I had a sneaky feeling that while COM used TLS, it probably had its own backdoor way of doing so. When embarking on a little reverse-engineering, it is a good idea to sit down and think for a bit on how you would implement it yourself.
      I started by looking at what happens when you call CoInitializeEx. As I suspected, COM has its own function for TLS, COleTls::TLSAllocData. The COM TLS structure is stored at offset 0xF80 in the Thread Environment Block (TEB). This means the TLS functions are never called, so hooking them would do no good. At least I found where the COM TLS structure is stored, so I could then start groveling through this structure to find the offset where the threading model is stored. Please keep in mind that the offset I found is undocumented and is subject to change at any time.
      I will save you the extremely exciting blow-by-blow discussion of single-stepping through CoInitializeEx with the different COINIT flags. What it came down to was this: the threading model is stored in the DWORD at offset 0xC in the COM TLS structure. It is set with a mask of 0x80 for apartment threaded, and 0x140 for freethreaded. As with the offset, these masks are undocumented, so they can change in future versions of the operating systems.
       Figure 2 shows the DebugCoGetThreadingModel function. You can also get this function from this month's source code distribution, along with a test harness program.
      Robert also asked for a debugging function that would take an interface pointer and tell you which threading model it supports. I poked around a little bit, but I did not see a way to get that information. Obviously, the information has to be there because you can get RPC_E_WRONG_ THREAD errors when you have the threading model screwed up. If any readers out there have solved this, let me know so I can share it with the world and earn you a small slice of immortality!
Q Your CrashFinder program (MSJ, April 1998) was great, but why do you recommend that everyone still use MAP files? How do you read a MAP file anyway?
Many Readers
A It is time for everyone to gather around the virtual campfire and let me tell a very scary story about the old days of MS-DOS®. That is, "DOS" as in disk operating system, not the Spanish number two for you young whippersnappers. Back then, programming was only the realm of real men and women. Yep, back in those days there was no such thing as these newfangled IDEs with their built-in debuggers. If you wanted to type something in, you used edlin, one line at a time! Also, there were no such things as these mamby-pamby wizards that help you out and do your job for you.
      Back then you had to be tough, I tell ya! If we wanted a window on the screen, we had to go out and diddle the hardware directly in the code, and we liked it! Who cares if you had to type until your fingers bled and your wrists locked up from carpal tunnel syndrome just to program the keyboard interrupt to get a single keypress from the user?
      Anyway, back then—and even today in some parts of the embedded systems world—there were some compilers and linkers that did not emit debugging information. (Gasp!) If you want a taste of the rough-and-tumble days, just try to debug through a Windows NT® system call with the Visual C++ debugger. If you were using MASM (the Microsoft Assembler), there was generally a one-to-one correspondence between a line of assembler and a line of binary code. You could debug it by simply looking at a printout of your source code.
      With higher-level languages, there is no one-to-one correspondence. To help you out, the linkers that did not produce symbols instead produced a textual listing of where your program maps out in memory. Hence the name: MAP file. Fortunately, even linkers that produced symbol information also produced MAP files. Generally, the MAP file contains the locations of all the public functions, the sizes of the functions, the names of the functions, and sometimes the addresses to which each line maps. If you know how to read a MAP file, you can get the source, line, and address of a crash.
      Whenever I do a release build of commercial software, I still produce a MAP file. The main reason is that it provides another form of symbols for my application. If you have been reading the Bugslayer column for a while, you know that I do a great deal with the IMAGEHLP.DLL symbol engine. If everything is working correctly with it, you can use CrashFinder to find your crash addresses very quickly. If everything is not working correctly, you are completely out of luck.
      Based on the huge number of emails that I got when people upgraded from Visual C++ 5.0 to 6.0, many people lost their ability to use CrashFinder. The PDB format changed between versions 5.0 and 6.0, and the IMAGEHLP.DLL symbol engine did not know how to load the new version 6.0 PDB reader. Consequently, folks were unable to get their crash addresses resolved to source and line information. If you had the MAP file, then you just had to work harder, but you could get the same information.
      There's another reason for having the MAP file. I can confidently predict, with 87.345 percent accuracy, that you will get a call in five years from your best customer who still has marketing people on staff (who are always the last to upgrade) using the version released in 1999, and are livid that your application crashes on Windows® 2005 beta 3. On your development machine you will be up to Visual Studio 11 SP6, and it will no longer read Visual Studio 6.0 PDB files. How are you going to find the problem then? With a MAP file, of course!
      Please note that only Visual C++ produces MAP files. There is no way to produce them with Visual Basic. Since the symbol information contains the same information that is in a MAP file, it would be possible to write a program that could generate a MAP file from a compiled binary. If there were sufficient interest (meaning enough readers email me), I would certainly consider writing it.
      The first step is to tell the linker to produce a MAP file with full information. Going into the project settings and checking "Generate mapfile" in the Link Page/General Category is not enough. That only produces the function information. To get the full information, including source and line, go to the Project Options edit control and add the following parameters:
/MAPINFO:EXPORTS /MAPINFO:LINES
If you also want to see the relocation fixups, you can add /MAPINFO:FIXUPS, but there is not much you can do with it. When producing the MAP file, it is a good idea to have the linker place it in the same directory as the resulting binary so you can find it later. It goes without saying that when you ship your application you should also save your MAP files just like they were PDB files.
      Now that you can generate MAP files properly, it is time to look at one so I can show you how to read it. Figure 3 shows the source to a simple DLL included with this month's code distribution. It's appropriately named MapDLL.cpp. Figure 4 shows the resulting MAP file. In Figure 4, the top part of the MAP file contains the module name, the timestamp from the binary header indicating when the binary was built, and the preferred load address. I would be remiss in my Bugslayer duties if I did not remind everyone that it is vitally important to rebase your DLLs so that they load at unique addresses. See my April 1998 column for a complete discussion on rebasing DLLs.
      After the header comes the section information. This shows the section distillation that the linker went through with the OBJ and LIB files to produce the final binary.
      After the section information you get to the good stuff, the public function information. Notice the "public" part. If you have static-declared C functions, they will not show up in the MAP file. This is the same behavior that occurs in the IMAGEHLP.DLL symbol engine, but for the IMAGEHLP.DLL symbol engine this is a bug. At least the line numbers will show the static functions in the MAP file so that you can still resolve addresses.
      The important parts of the public function information are the function names and the information in the Rva+Base column, which is the starting address of the function. The line information follows the public function section. The lines are shown as:
10 0001:00000030
The first number represents the line number, while the second is the offset from the beginning of the first code section where this line occurred. Yes, that sounds confusing, but later, in the algorithm step, I will show you how to calculate this information.
      The final section of the MAP file is the exports, which is the same information that you can get by running
DUMPBIN /EXPORTS MAPDLL.DLL
Obviously, only DLLs will have this section.
      The algorithm for getting the function, source, and line number from a MAP file is straightforward, but you will need to do a few mental hexadecimal calculations. As an example, a crash in MAPDLL.DLL occurs at address 0x03901099. As I show you the steps, I will use that as an example with Figure 4.
      From the crash address, find the MAP file that contains the crash address. First, look at the preferred load address and the last address in the public section. If the crash address is between those values, you are looking at the correct MAP file.
      To find the function—or closest function if there are static C functions involved—scan down the Rva+Base column until you find the first address above the crash address. The previous address is the function that had the crash. Keep in mind that you are dealing with hexadecimal numbers, so make sure you account for A to F. With the example of 0x03901099, the first function address above this is 0x039010f6, so the function that crashed is ?MapDLLHappyFunc@@YAPADPAD@Z. Any name that starts with a question mark is a C++ decorated name. To translate the name, pass it as a command-line parameter to the Platform SDK program UNDNAME.EXE. In the example, this function is MapDLLHappyFunc.
      To find the line number, you get to do a little hexadecimal subtraction using the following formula:
(crash address) - (preferred load address) - 0x1000
Remember that the lines are shown as offsets from the beginning of the first code section, so the formula does that conversion. While you can probably guess why you subtract the preferred load address, you can get extra credit if you guess why you still have to subtract 0x1000. The address is from the beginning of the code section, but that is not the first address of the binary. The first part of the binary is the PE (Portable Executable) header.
      I am not sure why the linker still generates MAP files that require this odd calculation. The linker team put in the Rva+Base column a little while ago, so I do not see why they didn't just fix up the line number at the same time. My guess is that it had something to do with legacy code.
      Once you have the calculated offset, look through the line information until you find the closest number that is not over the calculated value. Keep in mind that the code-generation optimizer can jiggle the code around so that the lines are not in ascending order. With my crash example, the formula is:
0x03901099 - 0x03900000 - 0x1000 = 0x99
If you look through the MAP file, the closest line that is not over is 38 0001:00000096 (Line 38) in MapDLL.cpp.

Wrap-up!
      I would like to take the time to thank all my readers, and especially those who have emailed with questions and suggestions. As the Bugslayer column enters its third year, it has definitely helped me to get all this constructive criticism. If you have any questions, ideas you think I should explore, or debugging tips, I would love to hear from you.

Tips!
      At TechEd 1999, Jay Bazuzi, a developer on the Microsoft Visual C++ debugger team, gave a fantastic talk about debugging and had a couple of undocumented tricks that you can use in the debugger. This month's tips come from Jay's talk.
Tip 25 You can do rudimentary profiling in the debugger watch window with the undocumented @clk pseudo register. While the times returned include debugger overhead and such, it can be very helpful to get an idea of how long an operation took. The trick is to add two watches to the watch window. The first is @clk, and the second is @clk=0. Each time the watch window is reevaluated, the first @clk will show you the time in nanoseconds it took to execute. If you want to see the time in milliseconds, you can always enter the first watch as @clk*1000,d.
Tip 26 If a pointer to an array only expands to a single item, the undocumented numerical format specifier will force the expansion. For example, if you have an LPDWORD pointer, pVals, that points to an array of 10 items, you can display all 10 by entering pVals,10 in the watch window. The number after the comma tells the watch window how many items to display.

Have a tricky issue dealing with bugs? Send your questions or bug slaying tips via email to John Robbins: john@jprobbins.com.

From the October 1999 issue of Microsoft Systems Journal.