March 1999
Improve Your Debugging by Generating Symbols from COM Type Libraries |
Having worked extensively with both type libraries and debug information, I have discovered that the information they contain is very similar. However, typelibs and debug information are completely different in their formats and completely different in how you access the information. |
This article assumes you're familiar with COM, Visual Basic, C++ |
Code for this article: TLBDBG.exe (40KB) Matt Pietrek does advanced research for the NuMega Labs of Compuware Corporation, and is the author of several books. His Web site at has a FAQ page and information on previous columns and articles.
|
One advantage to writing MSJ's "Under the Hood" column is that your subject matter isn't limited to one specific area. Even so, it's not often that I get to cover something that encroaches equally on Don Box's COM column and John Robbins' Bugslayer column. Let's face it. The typical resident of the House of COM probably isn't writing grungy low-level system code, and vice versa. So, what if you could somehow use COM type libraries (typelibs) to generate debug information for your debugger? After all, typelibs contain essentially the same information as debug information generated by the compiler and linker. The only thing typelibs lack is the addresses of the methods they describe. In this article, I'll show you how to remedy this situation. In the end, you'll have a program that can run against many COM-based components (typically third-party ActiveX ® controls) to get a symbol file for your debugger. Pretty cool, eh? If you've worked with ActiveX controls in environments like Visual Basic®, you know that using them is very similar to using the built-in intrinsic controls like CommandButtons; that is, ActiveX controls have methods, events, properties, enum values, and so on. How does an environment such as Visual Basic know about all of these control-specific things? The answer is type libraries. Whenever you add a new OCX or reference to your project (for example, the ActiveX Data Objects), you're really telling the environment to read a type library and incorporate its contents into the project. Now switch gears for a moment and consider a typical debugger, such as the Visual Studio® IDE debugger. Out of the box, it has no knowledge of your program's functions, methods, variables, structures, and enums. Where does it get this information? All of these details come from the debug information generated for your executable by the compiler and linker. Step back and consider these last two points. Both type libraries and debug information convey information such as function names, the number of parameters, and the type of each parameter. Likewise, there's the need in both cases to describe user-defined types (structs in C++). Having worked extensively with both type libraries and debug information, I've discovered that the information they contain is very similar. However, typelibs and debug information are completely different in their formats and how you access the information. Although type libraries and debug information seem to live in completely separate universes, their similarities can be exploited to your advantage. With a little bit of grungy code, I'll show how to do this. First, though, it's important to understand typelibs beyond just knowing when to feed them to your programming environment to keep it healthy, shiny, and happy. Check out this month's Under the Hood column for continued coverage of COM type libraries related code. What's Your Type?
If you've used the Object Browser in Visual Basic, you've seen class objects, methods, events, and properties, as well as enum constants. The Visual Basic Object Browser is really just a fancy type library viewer with a Visual Basic language orientation. If you haven't noodled around with the Visual Basic Object Browser, I highly recommend it. You don't need a specific goal in mind; just explore a component and see how it interacts with the host language (in this case, Visual Basic). To view typelibs in a more natural habitat, use OLEVIEW from the Platform SDK, selecting View TypeLib from the File menu. Figure 1 shows a typical typelib display in OLEVIEW. The OLEVIEW user interface isn't as nice as the Visual Basic Object Viewer, but its display of typelibs is much closer to their underlying representation. |
Figure 1 TypeLib Display in OLEVIEW |
Where do typelibs come from? Anybody who's endured the agony of using Interface Definition Language (IDL) probably knows the answer. The Microsoft® IDL (MIDL) compiler provides all sorts of facilities to completely describe classes, interfaces, methods, constants, and so on. The input IDL code has to be concise enough for the Remote Procedure Call (RPC) and DCOM code in Windows® to marshal code and data across process and machine boundaries. For a good overview of IDL, see the Platform SDK documentation, as well as Bill Hludzinski's article, "Understanding Interface Definition Language: A Developer's Survival Guide," in the August 1998 issue of MSJ.
An MIDL-generated typelib is really just a binary form of the IDL. In fact, you can work backwards from a typelib to get IDL. As the right-hand side of Figure 1 shows, OLEVIEW recreates the IDL (minus things like comments) for any typelib it displays.
In your everyday work it's possible to avoid using typelibs explicitly, but they're definitely around on your system. Files with a .TLB or .OLB (Object Library) extension are probably typelibs and can be viewed with OLEVIEW. Typelibs can also be embedded into executables. Prime examples are the OCXs that seem to be taking over your hard drive. When embedded in an executable, the typelib is stored as a named resource with the name TYPELIB.
What's really funkadelic about typelibs is that there's a relatively simple mechanism to grovel around in them at any level of detail. Sure, you can examine them with the Object Viewer or OLEVIEW. However, real programmers know that it's only fun if you can do the same sort of thing with your own code. As you might expect, the mechanism to access typelibs is a group of COM interfaces. The binary format of typelib files isn't documented, but in my experience the COM interfaces have always been sufficient to extract all the information they contain.
For typelib work, the two COM interfaces to stamp on your forehead are ITypeInfo and ITypeLib. There are various articles out there describing why you might need a type library or what to do with it, but there isn't much written about ITypeInfo or ITypeLib other than the interface references in the Platform SDK. Languages such as Visual Basic use these interfaces to grok the nature of external COM objects. I'm going to use these same interfaces for a different purpose: to create symbol tables for use in a debugger.
ITypeLib, ITypeInfo, and Friends
As a first-time user of the typelib interfaces, where do you start? Take a look at the LoadTypeLib API exported by OLEAUT32.DLL. It takes the name of a typelib file (or a file containing a typelib resource) and returns an ITypeLib interface instance. The ITypeLib interface has various bits of information that can be obtained via its methods. More importantly, the ITypeLib interface is a container of sorts for ITypeInfo interface instances (see Figure 2). As you'll see in more detail later, an ITypeInfo is how you get at the names of interface methods, parameters, enums, structs, and so on.
Figure 2 ITypeInfo Instances |
Each typelib (and its associated ITypeLib interface instance) has a unique GUID assigned to it. This GUID is typically found in the registry under the HKEY_CLASSES_ROOT\TypeLib key. The ITypeLib::GetLibAttr method allocates
and returns a pointer to a TYPEATTR structure that contains the typelib's GUID and other assorted information. When you're done with the TYPEATTR, you'll need to call ITypeLib::ReleaseTLibAttr to free the memory for the TYPEATTR structure.
The ITypeLib::GetDocumentation method retrieves things like the typelib name, a description string, a help file lookup string, and the name of the typelib's associated help file. These are the very same strings that OLEVIEW uses when it generates IDL from a typelib. Passing -1 as the first parameter to GetDocumentation gets information specific to the typelib rather than for one of the ITypeInfo instances that it encapsulates. In true COM fashion, the strings that GetDocumentation returns are allocated internally, and you're responsible for freeing them by calling SysFreeString. This is a good time to point out that all strings used with typelibs are Unicode.
There's a variety of other ITypeLib methods that you can explore on your own. However, let's look at the single most important ITypeLib method, GetTypeInfo. Each ITypeLib acts as a dispenser for ITypeInfo interface instances. To get an ITypeInfo instance, simply call GetTypeInfo, passing in the index of the desired ITypeInfo. What's this index thing? Each ITypeInfo instance in an ITypeLib has an associated index value. The index values are 0 through n-1, where n is the number of ITypeInfos. The GetTypeInfoCount method returns the number of ITypeInfos in the typelib. As you'll see shortly, calling GetTypeInfo repeatedly with a monotonically increasing ITypeInfo index is all that's necessary to pick apart a typelib.
After getting your hands on an ITypeInfo instance, the real fun begins. Each ITypeInfo represents something
such as an interface (TKIND_INTERFACE), a structure (TKIND_RECORD), or a creatable object (TKIND_COCLASS). The complete list of ITypeInfo representable items are the TKIND_XXX enums in OAIDL.H.
Like ITypeLib, the ITypeInfo interface has a GetDocumentation method that returns the name of the interface, a description string, a help file lookup string, and the name of the associated help file. Passing -1 as the memberid parameter returns information for the interface, structure, enum, or whatever the ITypeInfo represents. Passing other values for the memberid parameter retrieves the corresponding strings for the desired interface's methods, an enum's string values, and so on. If you've ever wondered how Visual Basic can call up context-sensitive online help for third-party controls, here's your answer.
Much more information about an ITypeInfo is obtained via the GetTypeAttr method, which returns a pointer to a TYPEATTR structure containing all sorts of goodies concerning the ITypeInfo. Included in this structure is a GUID and a TYPEKIND (TKIND_XXX) indicating what the ITypeInfo describes. If the ITypeInfo is of type TKIND_INTERFACE or TKIND_DISPATCH, the GUID is the IID for the interface. This is the same IID you'll see under the registry's HKEY_CLASSES_ROOT\Interface key. Likewise, for TKIND_COCLASS ITypeInfos, the GUID is the CLSID. Most CLSIDs are found in the registry under the HKEY_
CLASSES_ROOT\CLSID key. Note that TKIND_COCLASS ITypeInfos are essentially what you think of as creatable objects (for instance, an MSGraph.Chart object).
Some TYPEATTR fields are meaningful only for specific TYPEKIND values. For example, in a TKIND_DISPATCH the TYPEATTR.cFuncs element indicates how many methods are associated with this class, while a nonzero cImplTypes element indicates if the interface is derived from another interface. For a TKIND_COCLASS ITypeInfo, however, the cImplTypes field indicates how many interfaces the creatable object exposes directly. Typically, a TKIND_COCLASS exposes two interfaces. The first is the incoming interface that you think of as the object's properties and methods. The second interface is the out interface that you associate with the object firing events. (Note to COM geeks: think IConnectionPoint.) For example, the Graph1_DblClick event handler subroutine you'd write in Visual Basic is called through the out interface.
Some other notable fields in the TYPEATTR are cVars and cbSizeInstance. For a TKIND_ENUM, cVars indicates the number of enum values. In a TKIND_RECORD, the cVars field indicates how many structure elements are present, and cbSizeInstance tells you the size of the structure.
Just the Facts, Please
While I could go on for quite a while about the ins and outs of ITypeInfos and TYPEATTRs, let's narrow the focus to what I'll use in creating symbol tables from typelibs. In particular, I need the names of the interface methods exposed by a TKIND_COCLASS object. By matching these names with their executable file addresses, I have the essential information to create a symbol table.
How do you obtain the method names for interfaces described by a typelib? Interfaces are described by TKIND_
INTERFACE or TKIND_DISPATCH ITypeInfos. To get information for a particular method, use ITypeInfo::GetFuncDesc, passing the index of the desired method. The index values range from 0 to cFuncs-1, where cFuncs is the value from the TYPEATTR structure. The GetFuncDesc method returns a pointer to a FUNCDESC structure. (If there's one thing typelibs aren't lacking, it's structures with oddly truncated names.) When you're done with the FUNCDESC, don't forget to release its memory by calling ITypeInfo::ReleaseFuncDesc.
A FUNCDESC is the root node of a whole bunch of information that completely describes a COM method. Although it's called a FUNCDESC, in most typelibs a FUNCDESC describes COM methods rather than normal C-callable functions. However, in the relatively few cases where you see an ITypeInfo of type TKIND_MODULE, the FUNCDESC describes a regular function.
Figure 3 shows the elements in a FUNCDESC. The first element is a MEMBERID, which, for the purpose of my description, can be thought of as an Automation dispatch ID (DISPID). The memid value can be passed to ITypeInfo::GetNames to get the name for this method, and optionally the name of the method's parameters (if supplied in the original IDL file).
The lprgelemdescParam element in a FUNCDESC points at an array of ELEMDESC structures. Each ELEMDESC represents one of the method's parameters. The vital component of an ELEMDESC is its TYPEDESC value. To make a long story short, a TYPEDESC corresponds to a language data type. For example, there are TYPEDESCs for two-byte integers, four-byte integers, BSTRs, pointers to BSTRs, user-defined structures, and just about anything else you can imagine. The complete list of fundamental TYPEDESC values is the VT_XXX enums from WTYPES.H.
If you've worked with type information in symbol tables, you'll notice that TYPEDESCs are quite similar. In particular, there are provisions for extending the primitive types with arrays, pointers, and user-defined types. I won't delve further into this area since it's off my main topic. Nonetheless, it can be quite enlightening to write code that parses TYPEDESCs. Hint: think recursion.
Moving along in the FUNCDESC structure, you come to the INVOKEKIND element that indicates if this is a regular method call, a property put, or a property get. Experienced COM Automation programmers know that properties are really just special types of method calls with an implied effect (that is, setting or retrieving a property value). The cParams element specifies how many parameters are in the method. Any optional parameters are included in this count. The number of optional parameters is given by the cParamsOpt element. Using the cParams and lprgelemdescParam elements, you can enumerate through the ELEMDESC array to find out everything about each of the method's parameters. Again, this is off my main theme, so I'll leave it to the ambitious reader for further exploration.
The oVft element of a FUNCDESC contains the vtable offset of the method's pointer. If vtable offsets don't make sense, it's time to hit the remedial COM books. The oVft value is critical to the symbol table generation code that I'll get to later.
The elemdescFunc field indicates the return type of the method. This ELEMDESC is treated just like the ELEMDESCs that describe the parameters. The final FUNCDESC element, wFuncFlags, is a set of flags indicating primarily if and how the method should be exposed. For example, FUNCFLAG_FRESTRICTED means that this method shouldn't be exposed by macro languages like VBScript. FUNCFLAG_FHIDDEN means that this method shouldn't be shown in programs like the Visual Basic Object Viewer. (Of course, if you write your own typelib code, you're free to ignore these flags.)
The COMTypeLibDump Program
Before plunging ahead to the symbol table generation, let's take a look at a basic program that uses the interfaces I've described. The code isn't anywhere as fancy as OLEVIEW or the Visual Basic Object Browser. However, I think you'll be quite surprised by just how much of a typelib can be cracked open with a relatively tiny amount of code.
Figure 4 contains the code for COMTypeLibDump, a console program that does a rudimentary display of type library information in a file. The file to display is passed as the command-line argument and can be a .TLB file, an .OLB file, or any DLL containing a type library. The first thing to note about the COMTypeLibDump code is that it's a Unicode-enabled program. There's no escaping Unicode when you work with the type library interfaces.
Other than necessary housekeeping (such as calling Co-Initialize), the important thing that function _tmain does is pass the file name to DisplayTypeLib. DisplayTypeLib calls the LoadTypeLib API, which returns an ITypeLib instance if a valid typelib file name was specified. After passing the ITypeLib instance to the EnumTypeLib function, the code releases the ITypeLib instance.
Inside EnumTypeLib is where things get interesting. The function first calls ITypeLib::GetTypeInfoCount to determine how many ITypeInfo instances can be obtained from the typelib. Next, EnumTypeLib iterates through each instance with a for loop. For each instance (starting with index 0), the code calls ITypeLib::GetTypeInfo, which returns an ITypeInfo instance. After each ITypeInfo is passed to DisplayTypeInfo, the loop code releases the ITypeInfo instance. Each ITypeInfo created here represents an interface, an enum, a CoClass, or one of the other TKIND_XXX types.
The DisplayTypeInfo function begins by calling the GetDocumentation method to retrieve the name of the interface, enum, CoClass, or whatever. Next, the code uses GetTypeAttr to retrieve the TYPEATTR for the ITypeInfo. This TYPEATTR is passed to EnumTypeInfoMembers, which enumerates through each method and variable. After the call, the code releases the TYPEATTR and the BSTR allocated by GetDocumentation.
Finally, the EnumTypeInfoMembers function uses the cFuncs and cVars elements of the TYPEATTR to enumerate through all the methods and variables in the ITypeInfo. For each method, the GetFuncDesc retrieves a FUNCDESC structure. The equivalent for variables, GetVarDesc, retrieves a VARDESC structure. In either case, the GetDocumentation method obtains the name of the method or variable.
Figure 5 shows an abbreviated version of the results from running COMTypeLibDump on the Visual Basic runtime DLL (MSVBVM60.DLL). Notice that the TKIND_ENUM ITypeInfos describe Visual Basic constants (for instance, vbNull). Visual Basic runtime library functions such as ChDir are in a TKIND_MODULE, in this case named FileSystem. Finally, note the interface definition for _ErrObject (a TKIND_INTERFACE), and the corresponding CoClass, ErrObject (a TKIND_COCLASS). Within the _ErrObject interface, observe that certain methods such as Source appear twice. Upon closer examination, you'll see that one of these is an INVOKE_PROPERTYGET, while the other is an INVOKE_PROPERTYPUT. This is proof that properties are implemented as special-purpose method calls.
All in all, the COMTypeLibDump program doesn't hold a candle to more sophisticated viewers such as OLEVIEW. For example, it doesn't display method parameters or the types of variables. On the other hand, COMTypeLibDump shows the basics of accessing typelib information in a small amount of code. More importantly, this code acts as a starting point for the ultimate task at hand: matching up method names with addresses to create a symbol table.
Getting Dirty
Now that you can extract method names from a type library, the next hurdle is to find the method addresses. Unlike a symbol table, a typelib contains no addresses, so it's of no direct help. However, the oVft field in a FUNCDESC provides a big hint of what you can do instead. The oVft field contains the offset into a vtable where you can find the method's address.
I won't give a full account of vtables and COM interfaces since it's covered in every COM fundamentals text. The important thing here is that a vtable contains addresses of interface methods. To be strictly accurate, the preceding statement is true only for COM interfaces within the same apartment. For out-of-process servers, the vtable points to stubs that marshal the data across apartment, process, or machine boundaries. However, OCXs and most other executables for which you want to create symbol tables will run in-process.
The $1.28 question now is, "How do I find a vtable for a given interface?" Your knowledge of COM basics comes to the rescue here. By definition, the first thing an interface pointer points to is a pointer to the vtable for the interface. Thus, if you have an interface instance pointer, you can treat the first pointer-sized slot (a DWORD in Win32®) as a pointer to the vtable.
Rephrasing the problem, how can you get an interface instance from which you can then extract a vtable pointer? As fate would have it, the CoCreateInstance API creates interface instances for you. Just tell CoCreateInstance the IID and CLSID for the interface you'd like created. As I've shown, type libraries are chock full of CLSIDs and IIDs. Recall that TKIND_COCLASSs are thought of as creatable objects. This is where all that type library grunginess becomes useful.
At a high level, the heart of my typelib-to-symbol-table program is pretty simple: just skim through a type library looking for TKIND_COCLASSs. For each TKIND_COCLASS, use the CLSID and IIDs as parameters to CoCreateInstance. The result is an interface instance. Next, enumerate through the interface methods to get their vtable offsets. The value at the appropriate offset in the vtable is a pointer to the method in memory. I'll describe the code that does this, but there's a loose end to deal with first.
All addresses in a vtable are virtual addresses. That is, they're actual addresses at which the CPU executes code. In a symbol table, you need a logical address. Logical addresses are values relative to where the containing executable loaded. For example, say a DLL loads at virtual address 0x10000000. Within the loaded DLL is a method, Foo::Bar, at virtual address 0x10002034. The logical address would be 0x2034. The importance of logical addresses is that they don't change, regardless of where the operating system loads the DLL.
There are actually two forms of logical address. What I just described is known as a Relative Virtual Address (RVA), and is used in COFF-format symbol tables. The second form of a logical address uses two components: the executable file section number containing the address and the offset within the section.
Returning to the previous DLL example, its first section is a code section that begins 0x1000 bytes into the executable. This first code section encompasses its virtual address. The second form of its logical address is 1:0x00001034. That is, section 1, offset 0x00001034. The section:offset form of logical addressing is used in .MAP files and CodeView® debug information. This digression into logical addresses is necessary because you can't just slap virtual addresses from a vtable into the generated symbol table.
What Shall You Write?
You've come a long way, and now there's a tough choice necessary before you can proceed: what symbol table format do you want? The easiest thing to do is to emit a .MAP file, which has a simple, well-defined format. An added bonus is that .MAP files are human-readable text files. The downside to .MAP files is that no debugger that I'm aware of reads them directly. Thus, you'd need some way to translate the .MAP file into something usable by a debugger such as WinDBG or the Visual Studio IDE debugger.
Recent excavations by archaeologists have shown that in ancient times primitive programmers ran a program called MAPSYM that converted a .MAP file into a .SYM file. Alas, there's not much support for .SYM files in most 32-bit programming tools. I dug around in an old SDK and found some IMAGEHLP.DLL source code that ostensibly converted .SYM files into CodeView-format information. This code wasn't salvageable for my purposes since it assumed 16-bit addresses, although .MAP files can contain 32-bit addresses.
Knowing that I'd need to go beyond simple .MAP files, I started looking closely for the simplest symbol table format supported by Microsoft debuggers. At first I thought that the COFF symbol format would be itafter all, COFF symbols are somewhat documented in WINNT.H, and have a reasonably simple format. However, an obscure Knowledge Base article, as well as the IMAGEHLP sources, showed me that the Visual Studio debugger and WinDBG don't work with COFF symbols directly.
For my symbol-table-generation program, I didn't want to generate symbols that would need to be converted to another format. I didn't see any good solution short of generating CodeView information. The CodeView format is an industry standard debug format, and is partially documented in MSDN. In fact, the .PDB (Program Database) files that are used as symbol tables by Microsoft compilers are somewhat based upon CodeView data structures.
If you want to know more about .PDB files, let me tell you what I know. The .PDB file format is not publicly documented and changes from version to version of the Visual Studio tools. The APIs that Microsoft uses to read and write .PDB files are private, and I can't answer questions about them. The best suggestion I can offer is to use the most recent version of IMAGEHLP.DLL since it knows how to use the private .PDB file APIs.
Knowing that I'd need to produce CodeView format symbols, the next question was where to put the symbol information. I could either attach the symbols to the appropriate executable or put them in a separate .DBG file. Since I'm generally leery of altering a working executable, using a .DBG file seemed the better choice, but it entails more work. It's necessary to create the trappings of a .DBG file that encapsulate the CodeView symbols in addition to creating the actual CodeView symbols.
Looking at the code required to create both CodeView symbols and the surrounding .DBG file framework, I decided that there was simply too much material for a single article. Thus, I'm going to describe the CodeView symbol table and .DBG file generation in this month's Under the Hood column. However, the code to create a .MAP file is much simpler, so I included it as part of this article's code.
The CoClassSyms Program
The result of everything I've described so far is a program called CoClassSyms. "CoClass" refers to the TKIND_COCLASS entries in the typelib from which the symbols are created. CoClassSyms is a command-line program that operates on executable files containing a type library. This can be an .OCX or some other DLL such as MSHTML.DLL (which is a core component of Microsoft Internet Explorer).
The output from CoClassSyms is either a .MAP or .DBG file. The code included with this article only supports .MAP file generation. However, if you drop in the DLL from this month's Under the Hood column, CoClassSyms generates a .DBG file instead. In either case, the output file has the same root file name as the input executable. Thus, running CoClassSyms on MSHTML.DLL creates MSHTML.MAP or MSHTML.DBG.
Regardless of whether you make a .MAP or .DBG file, you'll no doubt want to get the debugger to recognize and load the symbol information. If you generate a .DBG file, make sure it is in the same directory as the associated executable. In my experience, the Visual Studio 6.0 debugger automatically loads the .DBG file as needed. Using WinDBG, I had to explicitly load the .DBG file in the command window. I wasn't able to get Visual Studio 5.0 to load the .DBG file, but I couldn't determine the cause of the problem.
If everything goes well and the debugger loads your generated .DBG file, you should be able to set breakpoints by name on the methods. (Hint: you may want to first generate a .MAP file to get an idea of the available method names.) Of course, since you likely don't have source code for the executable, you'll be in the assembly language view when the breakpoints hit. You should also see method names in the call stack.
The CoClassSymsCallout API
To allow for both .MAP file and .DBG file generation, the main executable, CoClassSyms.EXE, doesn't write the output files. Instead, I defined a set of three APIs that the CoClassSyms.EXE code calls at known points. By implementing and exporting these APIs from DLLs, I achieved modularity of code and allowed for enterprising readers to add support for other symbol tables with a minimum amount of work.
The three APIs implemented by both the .MAP and .DBG file DLLs are defined in CoClassSymsCallouts.H (see Figure 6). The first API, CoClassSymsBeginSymbolCallouts, is called once near the beginning of the symbol table generation. Its single argument is the name of the executable. Both .MAP files and CodeView information need to include information such as the location and size of the code and data sections in the executable. The implementation of CoClassSymsBeginSymbolCallouts can use the executable name to open the executable and read its header to get the information you want.
The second API, CoClassSymsAddSymbol, is invoked for each symbol that is matched up with a logical address. For this program, a symbol is just the name of a COM method. Symbol names are of the format Interface::MethodName (for example, IDispatch::Invoke). Note that the only information passed to CoClassSymsAddSymbol is the symbol name and address (in section:offset format). While I could theoretically extract and pass along information about the parameter names and types, it would make the code much more complex. Consider this an exercise for the ambitious reader.
The final callout API is CoClassSymsSymbolsFinished, which is invoked after all symbols have been processed. This is the place where the symbol table generation DLL can do any additional cleanup work, and presumably close the file handles of the output file. For a .MAP file, this includes appending the executable's entry point. In the case of the .DBG generation DLL, my implementation writes all of the data structures that can't be written until the number and size of the CodeView public symbols are known.
The CoClassSyms Code
Figure 7 contains excerpts from the code for CoClassSyms.CPP. If you examine it closely, you'll see that its structure is much the same as the earlier COMTypeLibDump program. The main difference between the programs is that while COMTypeLibDump displays a little about everything in a typelib, CoClassSyms skims off just the interesting information, does a little calculation, and ships the results off to the CoClassSymsCallout DLL.
The first indication that CoClassSyms isn't just a typelib dumping program is in the ProcessTypeInfo function. This function ignores every ITypeInfo that isn't of type TKIND_
COCLASS. Remember, a TKIND_COCLASS corresponds roughly to a creatable COM object. The key thing about TKIND_COCLASSs is that they usually just contain references for the two primary interfaces that implement the object. One is the incoming interface and the other is the outgoing or event sink interface. The referenced interfaces that implement an object are described by other ITypeInfos found elsewhere in the typelib. I apologize if what I've said sounds a bit muddy, but that's the terminology used to describe typelibs.
To create a COM object instance from which you can get a vtable, you need the IID of a referenced interface. This involves two steps. First, you call ITypeInfo::GetRefTypeOfImplType, which returns an HREFTYPE (Handle to REFerenced TYPE). You'll see this in the ProcessTypeInfo code. Second, call ITypeInfo::GetRefTypeInfo, which takes the HREFTYPE as input and returns the ITypeInfo for the referenced type as output. In my code this occurs in the ProcessReferencedTypeInfo function. The returned ITypeInfo will be either a TKIND_DISPATCH or a TKIND_
INTERFACE.
At this point in the code,
I have two ITypeInfos: one for the TKIND_COCLASS, the other for the interface that implements the object. Finally, the magic can happen. The code calls the CoCreateInstance API to attempt to make an instance of the
desired interface for the specified TKIND_COCLASS object. For the GUID parameter I use the GUID for the TKIND_COCLASS. For the IID parameter I use the IID retrieved from the TYPEATTR of the implementing interface. Because vtable addresses for interface instances that aren't in-process will point to marshaling stubs, they're not of much practical use. Therefore, I made the dwClsContext parameter specify only in-process interface instances.
If a COM interface instance is created successfully, the code in EnumTypeInfoMembers performs the grungy work of creating symbol names, matching them up with a logical address, and shipping them off to the symbol table writing APIs. The first time the function is called, it invokes the CoClassSymsBeginSymbolCallouts API.
At this point, EnumTypeInfoMembers creates a pointer to the vtable by dereferencing the interface pointer obtained via CoCreateInstance. It then enters into a loop where it uses the ITypeInfo to enumerate the methods of the designated interface. For each method, the code constructs a symbol name such as IFoobar::MyMethod. It also reaches into the vtable, grabs the virtual address for the method, and converts it into a logical form. Finally, the loop's code sends the symbol name and address off to the CoClassSymsAddSymbol API.
The CoClassSymsMapFile DLL
The code for generating .MAP files is isolated in CoClassSymsMapFile.DLL (see Figure 8). Earlier, I described the CoClassSymsCallout API, which consists of three functions. CoClassSymsMapFile.DLL is a very simple implementation of these functions, partly because its output is simple text. The other reason it's so simple is because the information passed to the DLL's exported APIs is in the same order as the various sections of a .MAP file.
The CoClassSymsBeginSymbolCallouts function writes the section information that begins a .MAP file. This information includes the section number, its length, its name, and whether the section is code or data. To get these details, the function uses MapAndLoad from IMAGEHLP.DLL. MapAndLoad returns a structure that includes a pointer to the Portable Executable section information. The final action of CoClassSymsBeginSymbolCallouts is to write the Publics by Value line that indicates the end of the section information and the beginning of the symbol information.
The second exported API, CoClassSymsAddSymbol, is called once for each symbol. The implementation here simply appends each symbol's information to the end of the file using fprintf. Technically, I should have sorted the symbols by address before writing them out to the file. However, I haven't noticed any adverse effects from unsorted symbols. Caveat emptor!
The final API, CoClassSymsSymbolsFinished, could get away with just closing the file handle used for writing the .MAP file and calling UnMapAndLoad. However, to keep MAPSYM happy, I added a small bit of code that retrieves the executable's entry point, converts it to a logical address, and appends it to the .MAP file. Without the entry point line, MAPSYM complains about "no entry point" and announces that it's assuming 0000:0100. Why that address? Would you believe that it's the predefined entry point for .COM files, circa 1981?
Wrap-up
While I think CoClassSyms is a decent implementation of a pretty cool concept, it has serious limitations. For starters, it only works with typelibs that are embedded in an executable such as an .OCX. There are many cases where typelibs are in separate files. A prime example of this is the Visual Basic runtime DLL. The typelib for the standard Visual Basic controls is called VB6.OLB, but the implementation of these interfaces is in MSVBVM60.DLL. CoClassSyms requires the typelib to be in the executable so that it can easily match the typelib to the implementing executable. However, it wouldn't be hard to extend my code to let you specify the typelib and executable files separately.
Another shortcoming of CoClassSyms is that it only sees the top-level interfaces for a COM objectthat is, interfaces that are referenced in a TKIND_COCLASS. Put another way, if you can't use CoCreateInstance to make an interface of the desired IID, then CoClassSyms won't know how to get a vtable pointer. This is an especially acute limitation where a typelib describes a global or application object with methods that return instance pointers for other interfaces. In this scenario, CoClassSyms might generate a symbol for a method such as global::DataSheet, but it surely won't generate symbols for the much more interesting DataSheet interface methods.
CoClassSyms doesn't pick up on the event sink interfaces of COM objects. To make this work, you'd have to muck about with connection point code to make an instance of the event sink interface. To keep the code small and understandable, I didn't do this.
While CoClassSyms is far from perfect, the ability to get any sort of symbol table where previously there was nothing should be a big boost to your debugging capabilities. How many times has some third-party control or library faulted, leaving you with no idea of what call was at fault? The symbols from CoClassSyms might be enough to give you a fighting chance. Also, don't forget to check out this month's Under the Hood column, where I describe creating .DBG files and CodeView information.
For related information see: Enhanced Debugging with Edit and Continue in Microsoft Visual C++ 6.0 at http://premium.microsoft.com/msdn/library/techart/msdn_vc6ed_cont.htm. Also check http://msdn.microsoft.com for daily updates on developer programs, resources and events. |
From the March 1999 issue of Microsoft Systems Journal
|