December 1998
Download dec98hood.exe (94KB)
Matt Pietrek does advanced research for the NuMega Labs of Compuware Corporation, and is the author of several books. His Web site at http://www.tiac.net/users/mpietrek has a FAQ page and information on previous columns and articles. |
One of the coolest new features in Visual C++® 6.0
is the /DELAYLOAD linker option. Executables
that use the /DELAYLOAD option don't implicitly link to the DLLs that you specify with /DELAYLOAD. Instead, the DLL isn't loaded until the first call to one of its functions. While you can achieve a similar effect by using LoadLibrary and GetProcAddress, the new /DELAYLOAD capability is much more seamless. No source code changes are required; just make a few changes to the linker options in your makefile or project settings and you're finished.
What's the advantage of waiting to load a DLL until the first time it's used? Two reasons immediately come to mind. First, your program may need to run on multiple Win32®-based platforms, and you use a DLL that's not available on one of the platforms. It's not enough to check which platform you're running on and avoid calling the APIs that aren't available. If you implicitly linked with that DLL's import library, normally your program wouldn't run in environments that don't have that DLL. For instance, consider PSAPI.DLL, which I've written about in previous columns. PSAPI.DLL doesn't exist on Windows® 9x, so even if I didn't call its APIs while running on Windows 9x, the Windows 9x loader would still refuse to load it since it can't resolve the implicit links to the PSAPI APIs. The /DELAYLOAD option is perfect for this scenario. If you don't call into the DLL, it won't be loaded. The second obvious benefit of using /DELAYLOAD is reduced load time and initialization. Let's say your program wants to call APIs in some DLL, but only in relatively rare circumstances. If you defer loading those DLLs unless they're actually called, you'll cut down on the loader's work. Likewise, the initialization code in those DLLs won't execute unless the DLL is actually used. In fact, the Microsoft® linker (LINK.EXE) now makes use of /DELAYLOAD for just this purpose. A little-known feature of the linker is that it can disassemble OBJs and executables (for example, link -dump /disasm foo.exe). LINK.EXE relies on a separate DLL (for instance, MSDIS100.DLL) to do the actual disassembly. The vast majority of the time LINK.EXE isn't used for disassembly. You'll find that LINK.EXE from Visual C++ 6.0 uses the /DELAYLOAD capability to load the disassembly DLL only as needed. In my September 1998 column, I briefly touched upon /DELAYLOAD and said that it was a feature of Windows NT 5.0. I was wrong (not the first time that's happened). Using /DELAYLOAD requires no operating system support. In fact, I'm surprised that Microsoft or another compiler vendor didn't come up with this technique long ago. What confused me was that a new PE file DataDirectory #define (IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT) was added to WINNT.H. The operating system itself doesn't use this DataDirectory entry. However, this entry is the only way that tools such as DUMPBIN or my PEDUMP program could locate the delay load information reliably. When I first experimented with executables built using /DELAYLOAD, I was struck by two things. First, to my knowledge this is the first time that the Microsoft linker has actively generated code. As I described in my July 1997 column, the linker typically combines raw data from OBJ and LIB files to produce the sections of the final executable. When using /DELAYLOAD, the linker actually generates small stubs of code and inserts them alongside the compiler-generated code. The second thing that intrigued me about /DELAYLOAD executables is the similarity to Visual Basic®. When you call an API from Visual Basic, the executable doesn't implicitly link to the target DLL. Rather, the Visual Basic code generator creates data describing the API call name and DLL, then makes a jump to a small stub. The first time the stub is called, it uses the data to call LoadLibrary and GetProcAddress. The address returned by GetProcAddress is stored in the stub so that future calls go directly (almost) to the API's code. The /DELAYLOAD mechanism is conceptually similar to what Visual Basic does in that it causes a small bit of data and a code stub to be generated. Similarly, the stub calls LoadLibrary and GetProcAddress. However, the actual implementation of the stub is quite different, and subsequent calls to the API are even faster than the Visual Basic method. (Under Visual Basic, the call to an API still goes through a JMP instruction to get to the API's code.) Having gushed long enough about /DELAYLOAD, let's see how you'd use it in your own projects. To implement /DELAYLOAD, two changes are necessary to your linker options. First, you'll need to add the DELAYIMP.LIB file to your list of libraries. DELAYIMP.LIB contains the code that calls LoadLibrary and GetProcAddress. DELAYIMP.LIB can be found in the Visual C++ 6.0 \LIB directory. The second change to the linker command line is to add /DELAYLOAD:<DLLNAME>, where <DLLNAME> is the name of the DLL that you want to be loaded, such as:
If you need to delay load multiple DLLs, it's fine to use multiple /DELAYLOAD fragments. Just because you're delay-loading the DLL doesn't mean that you can exclude the DLL's LIB file though. The linker needs the information in the DLL's LIB file to resolve the fixups properly and generate the delay-loading stubs. If you want to verify that you did everything correctly, DUMPBIN.EXE now has a
/DEPENDENTS option that tells you which DLLs are implicitly linked against and which are delay-loaded.
/DELAYLOAD Nuts and Bolts
|
Figure 2 Using Pseudo Import Address Table |
The dliNotify parameter is one of the dliXXX enums specified in DELAYIMP.H. The pdli parameter is a pointer to a DelayLoadInfo structure that is also declared in DELAYIMP.H. The DelayLoadInfo structure contains everything you need to know about this particular API, including the API and DLL names. The DelayLoadInfo structure is constructed on the fly in the __delayLoadHelper function.
Beyond normal notification hooks, the delay-load code can also call a second function in the event of an error (for example, if the DLL wasn't found). This error hook function takes exactly the same arguments as the regular notification hook function. The only difference is that the dliXXX enums indicate failure states (dliFailLoadLib or dliFailGetProc). As with the notification hook functions, there is no default implementation for the failure hooks. To provide one, write your own failure function, then declare a global variable named __pfnDliFailureHook that points to your failure function. The linker will use your __pfnDliFailureHook variable rather than the one in DELAYIMP.LIB, which contains a NULL pointer. By making your failure hook function return appropriate values, you can recover from the failure gracefully. For instance, if you receive a dliFailLoadLib failure notification, you might prompt the user for the location of the DLL and call LoadLibrary yourself. The resultant HMODULE would then be used as the failure hook return value.
The Linker Gets into the Act
This small snippet of stub code is worth scrutinizing carefully. The first two instructions (PUSH ECX and PUSH EDX) preserve the values of the ECX and EDX registers on the stack. The next instruction (PUSH __imp__GetDesktopWindow@0) pushes the address of the pseudo IAT entry for the GetDesktopWindow function. When I described the __delayLoadHelper code earlier, I mentioned that it patched the pseudo IAT entry before returning. This PUSH instruction is where __delayLoadHelper gets the location to patch. Before a delay load API is called for the first time, the pseudo IAT entry points to this per-API stub. This is how control gets to this stub rather than to the target API.
The final instruction of a per-API stub points to the second type of linker-generated stub ("per-DLL"). Thus, no matter how many functions you delay load from USER32.DLL and COMCTL32.DLL, there will still only be two stubsone for USER32.DLL and the other for COMCTL32.DLL. The linker names these stubs __tailMerge_XXX, where XXX is the name of the DLL. For example, the stub for USER32 looks like this:
The first instruction of the per-DLL stub pushes the address of a data structure that the linker has included elsewhere in the executable. This struct is of type ImgDelayDescr, defined in DELAYIMP.H. The ImgDelayDescr struct contains pointers to the DLL name, pointers to the DLL's pseudo IAT and INT, and various other items needed by the __delayLoadHelper function. This is the data structure pointed at by the IMAGE_DIRECTORY_ENTRY_DELAY_
IMPORT slot in the executable's DataDirectory.
Here's an important side note to people writing PE file utilities. All the pointer values in an ImgDelayDescr are virtual addresses (that is, normal linear addresses that can be used as pointers). The use of virtual addresses is in sharp contrast to the relative virtual addresses (RVAs) used by the IMAGE_IMPORT_DESCRIPTOR structure for regular imports. This use of virtual addresses rather than RVAs is unprecedented. I assume that this is because the notification hooks are passed a pointer to the ImgDelayDescr, so it wouldn't do to have hook implementors using RVAs. The next instruction of the per-DLL stub makes the actual call to __delayLoadHelper. The __delayLoadHelper returns the address of the target API in the EAX register. The next two stub instructions restore the ECX and EDX registers that were put on the stack by the per-API stub. The last instruction JMPs to the address in the EAX register (that is, the value returned by __delayLoadHelper function). Since __delayLoadHelper patched the pseudo IAT entry, the per-API stub only executes once. Subsequent calls go through the pseudo IAT directly to the target API. Likewise, the per-DLL stub only executes once for each delay-loaded API. |
Figure 3 DelayLoadDemo Stubs |
The DelayLoadDemo Program
Warnings and Caveats
Have a suggestion for Under the Hood? Send it to Matt at mpietrek@tiac.com or http://www.tiac.com/users/mpietrek. From the December 1998 issue of Microsoft Systems Journal.
|