April 1998
Download apr98hoodcode.exe (10KB)
Matt Pietrek is the author of Windows 95 System Programming Secrets (IDG Books, 1995). He works at NuMega Technologies Inc., and can be reached at mpietrek@tiac.com or at http://www.tiac.com/users/mpietrek. |
Here's a problem nearly every C++ programmer has encountered. In your code, you've made a call to a function in some DLL and the linker complains that it can't find the symbol. It usually doesn't take too long to figure out that you need to add another library (.LIB) file to the linker's command line. The only problem is, which .LIB file?
From day one, certain formats have remained relatively constant in the Microsoft® Win32® development tools, and many tools have sprung up around them. For example, the Microsoft DUMPBIN utility can be used to display the contents of both Portable Executable files and COFF (Common Object File Format) .OBJ files. (For users of Visual Basic® 5.0, the command line
is functionally the same as DUMPBIN.) However, when it comes to libraries, there seems to be a real dearth of tools that can intelligently tell you the contents of a COFF format .LIB file. All 32-bit Microsoft tools use COFF.
Perhaps you need to know if a function is imported by name versus ordinal value. DUMPBIN isn't much help here. Sure, DUMPBIN has a few obscure command options for .LIB files (/ARCHIVEMEMBERS and /LINKERMEMBER, for example). But they just provide raw output of portions of the .LIB file. A few gurus can cast the runes of DUMPBIN's output to figure out what they're after. However, to really see what's in a .LIB file, you need either a good understanding of .LIB file structures or a tool that displays the .LIB contents in a meaningful manner. In this column, I'll provide some relief on both counts. While mucking about inside .LIB files might appear forbidding, they're really not complicated. Essentially, a .LIB file is just a collection of COFF format .OBJ files strung sequentially together. A table of contents at the beginning tells the linker where things are. Actually, there are two tables of contents, but this detail isn't important for the ensuing discussion. In my July 1997 column, I described the basic principles of how a linker works. The important factoid for this column is that a linker is responsible for resolving symbols between compilation units. For example, if MyFile1.CPP calls function FooBar in another source file, the linker has to locate the .OBJ file containing FooBar's binary code and include it in the finished executable. From the linker's perspective, a .LIB file is just a collection of .OBJ files. The table of contents in a .LIB file is a list of all the symbols from all the .OBJs contained in the library. For each symbol, the table of contents also indicates which .OBJ file the symbol came from. This mapping of a symbol name to an .OBJ file allows the linker to quickly bring in just the .OBJ from the .LIB file that it needs, while ignoring the rest of the library. You might be thinking, "What about import libraries? Aren't they special?" Under the Win32 COFF format, the answer is no. The linker resolves calls to DLL functions the same way as it does for internal (static) functions. The only real difference is that when you call a DLL function, the .OBJ file in the import library provides data for the executable's import table rather than code for the actual function. The data that an import library provides for an imported API is kept in several sections whose names all begin with .idata (for instance, .idata$4, .idata$5, and .idata$6). The .idata$5 section contains a single DWORD that, when the executable loads, contains the address of the imported function. The .idata$6 section (if present) contains the name of the imported function. When loading the executable into memory, the Win32 loader uses this string to call GetProcAddress on the imported function effectively. As I described in the July 1997 column, the linker lumps together sections that have the same name up to, but not including, the $. The portion after the $ is used to order the sections. Thus, all the .idata$4 sections are put in the executable contiguously, followed by all the .idata$5 sections, and finishing with all the .idata$6 sections. The linker's combining and sorting of sections is what builds the import address table (IAT) and other parts of the imports table in a finished executable. Not surprisingly, an executable's imports table is usually in a section that is named .idata. If you've used OLE, COM, or ActiveX®, you probably remember that there are also .LIB files that are used for predefined class IDs (CLSIDs) and interface IDs (IIDs). Both CLSIDs and IIDs are forms of GUIDs, which are 16-byte unique values. If you poke around in one of these import libraries (for instance, UUID.LIB), you'll see that the GUID values are stored in a section called .rdata. The linker takes all the referenced .rdata sections in the .LIB file and creates the .rdata section in the executable. Put differently, every GUID that you reference in your program reserves 16 bytes in the final executable.
The COFF .LIB File Structure
Inside LibDump
|
|
Converting the four bytes into a DWORD and accounting for the little Endian nature of Intel CPUs, you have a value of 0x80000010. Removing the high bit (which means the symbol is exported by ordinal only) gives an export ordinal of 0x10, or 16 decimal. What a major pain! This is where the LibDump program jumps in and does all the hard work of interpreting .LIB files for you.
LibDump's mission statement is simple: for each symbol name in the .LIB file's first names member, LibDump tries to figure out what type of symbol it is and prints out all the relevant data. For example, LibDump determines if a symbol is imported by ordinal or by name. If it's by ordinal, LibDump shows you the ordinal value. If by name, LibDump shows the actual symbol name and the name that appears in the imports table (for instance, the symbol _CreateProcessA@40 is imported as CreateProcessA). If the symbol appears to be a GUID, LibDump displays the 16 bytes of GUID data as you'd expect to see it. If all else fails, LibDump just displays the symbol name. This would be the case for static functions and variables. LibDump is a console-mode program that takes one command-line argument, the name of the .LIB file to work with. Function main in LibDump.CPP (see Figure 3) opens a memory-mapped file using the command-line argument as the file name. I've wrapped all memory-mapped file code in a C++ class called MEMORY_MAPPED_FILE, which is implemented in MemoryMappedFile.H and MemoryMappedFile.CPP (see Figure 4). I'm simply reusing the MEMORY_MAPPED_FILE class from a previous column, so I won't describe the class methods here. Once function main has mapped the .LIB file into memory successfully, it verifies that the file begins with the expected string that starts an archive. Following that string is the first archive member, which (as I described earlier) contains the names of the symbols in the .LIB, along with the offset of the matching .OBJ file. The second half of function main simply locates and iterates through both the member offset array and the matching string table. For nearly every symbol, function main passes the name and archive member offset to the DisplayLibInfoForSymbol function. (I'll explain why a few symbols are excluded later.) The DisplayLibInfoForSymbol function is where all the action occurs for figuring out what type of symbol it is. The code doesn't bother with the IMAGE_ARCHIVE_MEMBER_HEADER at all. Instead, it immediately skips to the IMAGE_FILE_HEADER that begins the .OBJ. For LibDump's purposes, the important thing in the IMAGE_FILE_ HEADER is how many IMAGE_SECTION_HEADERs follow it. LibDump uses this number to loop through each entry in the array of IMAGE_SECTION_HEADERs. As the code loops through the IMAGE_SECTION_ HEADERs, it's looking for sections with specific names. The presence or absence of a particular section gives a good indication of what type of symbol LibDump is working with. Any .OBJ that contains .idata$5 or .idata$6 sections is probably for an imported API. In my experience, you'll always see an .idata$5 section for an imported API. If the import is via ordinal, the DWORD value in the .idata$5 section has the high bit set, and the low WORD is the ordinal value. Otherwise, the .idata$5 DWORD is zero. If the symbol is imported by name, you'll also find a .idata$6 section in the .OBJ. The first WORD of this section is the "hint" ordinal, which is essentially useless these days. Immediately following the hint ordinal is the null-terminated ASCII string with the name of the API as it will appear in the executable's import table. If the .OBJ file doesn't have any .idata$ sections, but does have an .rdata section, then the symbol may or may not be a GUID symbol (for example, _IID_IDispatch). In the LibDump code, I took the cheesy approach of checking the size of the .rdata section. If it's exactly 0x10 bytes long, and if there's only one .rdata section in the .OBJ, LibDump assumes it's a GUID symbol and displays it as such. This trick for GUID symbol detection works great for some .LIBs, including the all-important UUID.LIB. However, many COM import libraries lump multiple GUIDs into a single .rdata section. LibDump isn't smart enough to catch these cases. To make it do this would require LibDump to read the COFF symbol table at the end of the .OBJ file. While it could be done, it would add quite a bit to the LibDump code and detract from the "small is beautiful" approach I took. If you've noticed the ConvertBigEndian function, good catch! It turns out that in COFF format .LIBs and .OBJs, certain fields are stored in the big Endian format. Intel CPUs use the opposite format, known as little Endian. In a big Endian number, the most significant bytes are at lower addresses. In my conversion function, I could have used the Intel BSWAP instruction to convert from big Endian to little Endian format, but then the code wouldn't run on DEC Alpha-based systems. The IsRegularLibSymbol function is just a convenient way to filter the output so that certain symbols don't appear in the output. The brief synopsis is that the .LIB file contains some symbols that are used by the linker but have no connection to the user's code. For example, the symbol __IMPORT_DESCRIPTOR_COMCTL32 appears in COMCTL32.LIB. Rather than cluttering up the LibDump output with these symbols, the IsRegularLibSymbol function looks for certain patterns in the symbol names and returns FALSE if it looks like the symbol didn't originate from user code.
Running LibDump
Have a question about programming in Windows? Send it to Matt at mpietrek@tiac.com From the April 1998 issue of Microsoft Systems Journal.
|