One of the coolest new features in Visual C++® 6.0
is the /DELAYLOAD linker option. Executables
that use the /DELAYLOAD option don't implicitly link to the DLLs that you specify with /DELAYLOAD. Instead, the DLL isn't loaded until the first call to one of its functions. While you can achieve a similar effect by using LoadLibrary and GetProcAddress, the new /DELAYLOAD capability is much more seamless. No source code changes are required; just make a few changes to the linker options in your makefile or project settings and you're finished.
What's the advantage of waiting to load a DLL until the first time it's used? Two reasons immediately come to mind. First, your program may need to run on multiple Win32®-based platforms, and you use a DLL that's not available on one of the platforms. It's not enough to check which platform you're running on and avoid calling the APIs that aren't available. If you implicitly linked with that DLL's import library, normally your program wouldn't run in environments that don't have that DLL.
For instance, consider PSAPI.DLL, which I've written about in previous columns. PSAPI.DLL doesn't exist on Windows® 9x, so even if I didn't call its APIs while running on Windows 9x, the Windows 9x loader would still refuse to load it since it can't resolve the implicit links to the PSAPI APIs. The /DELAYLOAD option is perfect for this scenario. If you don't call into the DLL, it won't be loaded.
The second obvious benefit of using /DELAYLOAD is reduced load time and initialization. Let's say your program wants to call APIs in some DLL, but only in relatively rare circumstances. If you defer loading those DLLs unless they're actually called, you'll cut down on the loader's work. Likewise, the initialization code in those DLLs won't execute unless the DLL is actually used.
In fact, the Microsoft® linker (LINK.EXE) now makes use of /DELAYLOAD for just this purpose. A little-known feature of the linker is that it can disassemble OBJs and executables (for example, link -dump /disasm foo.exe). LINK.EXE relies on a separate DLL (for instance, MSDIS100.DLL) to do the actual disassembly. The vast majority of the time LINK.EXE isn't used for disassembly. You'll find that LINK.EXE from Visual C++ 6.0 uses the /DELAYLOAD capability to load the disassembly DLL only as needed.
In my September 1998 column, I briefly touched upon
/DELAYLOAD and said that it was a feature of Windows NT 5.0. I was wrong (not the first time that's happened). Using /DELAYLOAD requires no operating system support. In fact, I'm surprised that Microsoft or another compiler vendor didn't come up with this technique long ago. What confused me was that a new PE file DataDirectory #define (IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT) was added to WINNT.H. The operating system itself doesn't use this DataDirectory entry. However, this entry is the only way that tools such as DUMPBIN or my PEDUMP program could locate the delay load information reliably.
When I first experimented with executables built using
/DELAYLOAD, I was struck by two things. First, to my knowledge this is the first time that the Microsoft linker has actively generated code. As I described in my July 1997 column, the linker typically combines raw data from OBJ and LIB files to produce the sections of the final executable. When using /DELAYLOAD, the linker actually generates small stubs of code and inserts them alongside the compiler-generated code.
The second thing that intrigued me about /DELAYLOAD executables is the similarity to Visual Basic®. When you call an API from Visual Basic, the executable doesn't implicitly link to the target DLL. Rather, the Visual Basic code generator creates data describing the API call name and DLL, then makes a jump to a small stub. The first time the stub is called, it uses the data to call LoadLibrary and GetProcAddress. The address returned by GetProcAddress is stored in the stub so that future calls go directly (almost) to the API's code.
The /DELAYLOAD mechanism is conceptually similar to what Visual Basic does in that it causes a small bit of data and a code stub to be generated. Similarly, the stub calls LoadLibrary and GetProcAddress. However, the actual implementation of the stub is quite different, and subsequent calls to the API are even faster than the Visual Basic method. (Under Visual Basic, the call to an API still goes through a JMP instruction to get to the API's code.)
Having gushed long enough about /DELAYLOAD, let's see how you'd use it in your own projects. To implement
/DELAYLOAD, two changes are necessary to your linker options. First, you'll need to add the DELAYIMP.LIB file
to your list of libraries. DELAYIMP.LIB contains the
code that calls LoadLibrary and GetProcAddress. DELAYIMP.LIB can be found in the Visual C++ 6.0 \LIB directory.
The second change to the linker command line is to add /DELAYLOAD:<DLLNAME>, where <DLLNAME> is the name of the DLL that you want to be loaded, such as:
/DELAYLOAD:COMCTL32.DLL
If you need to delay load multiple DLLs, it's fine to use multiple /DELAYLOAD fragments. Just because you're delay-loading the DLL doesn't mean that you can exclude the DLL's LIB file though. The linker needs the information in the DLL's LIB file to resolve the fixups properly and generate the delay-loading stubs. If you want to verify that you did everything correctly, DUMPBIN.EXE now has a
/DEPENDENTS option that tells you which DLLs are implicitly linked against and which are delay-loaded.
/DELAYLOAD Nuts and Bolts
In a nice bit of design and implementation, Visual C++ not only makes the /DELAYLOAD extensible (and even user-replaceable), it also provides the source for the default implementation of the /DELAYLOAD code. The Visual C++ 6.0 \INCLUDE directory contains two files: DELAYHLP.CPP and DELAYIMP.H. In DELAYHLP.CPP, the key function to look at (which takes up about two-thirds of the file) is __delayLoadHelper. The__delayLoadHelper code can look a little intimidating at first glance. A lot of its complexity has to do with error checking and notifications, which I'll explain momentarily.
To help you understand what __delayLoadHelper does, I've distilled its primary operations down to the pseudocode that appears in Figure 1. The pseudocode assumes everything goes off without a hitchwithout any errors, and without any notification callouts (that is, hook functions).
In studying __delayLoadHelper, it's essential to understand that the linker creates a pseudo Import Address Table (IAT) and Import Name Table (INT) for each delay-loaded DLL. There's one IAT and INT per referenced DLL. Each IAT and INT entry represents one imported function
in that DLL. Note that the IAT and INT data structures also happen to be used
by regular DLL imports. In the case of
/DELAYLOAD, the operating system doesn't know (or care) that the executable has additional pseudo import tables. Rather, the choice to store the delay import tables in the same format as normal imports is solely a matter of convenience for the linker and __delayLoadHelper function.
In the pseudocode, beyond the calls to LoadLibrary and GetProcAddress, note that the rest of the code manipulates the pseudo IAT and INT. The purpose is to optimize subsequent calls to the API. The first time an API is called, the IAT entry for the API points to linker-generated code. After the stub successfully completes, the IAT entry points directly at the target function (see Figure 2). Another way to picture it is like this: within your executable, a delay-loaded DLL has a pseudo IAT with entries that are patched with the target address on demand.
|