September 1996
Jeffrey Richter wrote Advanced Windows (Microsoft Press, 1995) and Windows 95: A Developer's Guide (M&T Books, 1995). Jeff is a consultant and teaches Win32-based programming seminars. He can be reached at v-jeffrr@microsoft.com. I'd like to start off this column with an apology. There is a bug in the OPTEX code in my July column. Unfortunately, I noticed the bug after the article was sent to the printer, but I was able to correct the code before it was posted online. You can download the correct code from any of the places mentioned on page 5. I apologize for any inconvenience this may have caused. The bug was in my original implementation of OPTEX_Enter. Specifically, the code only worked if you specified a timeout value of INFINITE. The call to InterlockIncrement was not countered with a call to InterlockedDecrement in the case where waiting did timeout. Unfortunately, I couldn't add this feature to the code without always jumping to kernel mode, so I removed the ability to specify a timeout value when calling OPTEX_Enter. However, I did add a new OPTEX_TryEnter function that is similar to the new TryEnterCriticalSection function available in Windows NT® 4.0. My OPTEX_TryEnter function works with Windows NT 4.0 only because it takes advantage of the new InterlockedCompareExchange API. QI'm designing an application that may or may not need to call functions contained in a DLL. Since I want to load the DLL only if I need to call a function in it, I obviously need to use explicit linking rather than implicit linking. However, I'd like to design my application so that it is smart-Figure 1 will show what I mean. In this stripped-down sample, I declare a global variable, g_pfnMessageBeep, which holds the address of the MessageBeep function. I initialize this variable to NULL so I will get an access violation when I try to call this function. However, since the call to this function is inside a structured exception handling (SEH) frame, my exception filter function will be called. Inside the exception filter, I call LoadLibrary to load USER32.DLL (the module that contains the MessageBeep function) into my process's address space dynamically. Next, I call GetProcAddress to get the address of the MessageBeep function and save the address in the global g_pfnMessageBeep variable. Finally, the filter returns EXCEPTION_CONTINUE_EXECUTION so the thread will re-execute the call to the function. The call should succeed this time. When I build and test the code shown, it does not work correctly. In fact, I get an infinite loop! It appears that even after I change g_pfnMessageBeep and return EXCEPTION_ Can you explain what I am doing wrong? Is there a way to accomplish what I'm trying to do? Jeremy Y.Y. Lai Via the Internet AFirst, let's go into what's happening. From your question, it appears that you have a good understanding of how SEH works. As you pointed out, when an exception filter returns EXCEPTION_CONTINUE_EXECUTION, the thread re-executes the failed CPU instruction. However, let's take a closer look at this CPU instruction. Inside your __try block, you attempt to call a function using g_pfnMessageBeep. Since you initialize this variable to NULL, the thread tries to call a function at address 0x00000000. This means the thread's program counter (PC) is set to 0x00000000. After the PC is set, the CPU tries to read the instruction at address 0x00000000. This is what causes the access violation, not setting the PC to 0x00000000. When the access violation occurs, the system changes your thread's PC to your exception filter and the global g_pfnMessageBeep variable is changed correctly. Your filter returns EXCEPTION_CONTINUE_EXECUTION, which causes the thread's PC to be set back to the same address, 0x00000000. Again, the CPU is unable to read an instruction at this address and another access violation is raised. This explains why the function is never called and why the thread is in an infinite loop. Think of it this way-if the PC was a salesman hawking giant hair dryers, and the person at address 0x00000000 was Jean-Luc Picard, the exception is raised when Picard answers the door-not when the salesman arrives at the doorstep. If, after Picard slams the door in the salesman's face (the exception), the salesman does not go to the correct door (Marge Simpson), the exception will be raised again when Picard answers the door a second time. Let's fix this. Inside the exception filter, tell the system that you want the thread to continue execution from the MessageBeep function instead of address 0x00000000 and then return EXCEPTION_CONTINUE_EXECUTION. Inside an exception filter, you can tell the system where you want it to continue execution. You do not have to continue execution from the same CPU instruction that raised the violation. To set the PC yourself, your exception filter will have to be modified so you pass the result of calling GetExceptionInformation into it. I wrote some code that fixes your problem and adds some features (see Figure 2). Notice that the WinMain function (shown at the end of the listing) calls GetExceptionInformation when calling the DemandLoadDll_ExceptionFilter function. GetExceptionInformation returns a pointer to an EXCEPTION_POINTERS structure. The ContextRecord member points to a CONTEXT structure that contains a member for each register on your CPU. When a thread raises an exception, the system takes a snapshot of the CPU's registers and saves them in this structure. This way, when a filter returns EXCEPTION_CONTINUE_EXECUTION, the system can restore the CPU registers to their state when the exception was raised. This is necessary so the thread can continue executing successfully. A filter can examine this CONTEXT structure to see the exact register values when the violation was raised. You should always avoid directly referring to this structure (if possible) because CPU registers have different names on different CPU platforms. If you reference a member in this structure, you're writing CPU-dependent code that will require modifications if you build your application on other CPU platforms. However, if you don't care about CPU independence, not only can you examine the members in this structure but you can also change them. If you change a member, the thread will restart when the filter returns EXCEPTION_CONTINUE_EXECUTION, but the registers will have the modified values. So, to solve your problem, you need to change the program counter member in the structure to g_pfnMessageBeep before returning EXCEPTION_CONTINUE_EXECUTION from the filter. When you do this, the thread will continue its execution from the MessageBeep function rather than from address 0x00000000. In Figure 2, you'll see the solution to CPU dependence. All I do is create a PROGCTR macro that abstracts the program counter register on the different CPU platforms supported by Windows NT. When Windows NT is ported to another CPU platform, the source code will only require a tiny change to make it work. QI am writing an ISAPI DLL that creates several worker threads when my DllMain function receives a DLL_PROCESS_ATTACH notification. These worker threads run in the background as long as my ISAPI DLL is loaded. When my ISAPI DLL is unloaded, I need to terminate my threads gracefully or the code executed by these threads will just disappear and access violations will be raised. I use a manual-reset event kernel object to signal the worker threads to terminate. Currently, I call SetEvent when my DllMain function receives a DLL_PROCESS_DETACH notification, then I call WaitForMultipleObjects, passing in all my worker thread handles. However, the call to WaitForMultipleObjects never returns, which seems strange to me because my worker threads do seem to see the signaled event and terminate. What is causing this deadlock in my DLL and how can I prevent it? Lucy Gooding Via the Internet AFirst, let's just remind everybody that ISAPI is part of the Microsoft® Internet Information Server (IIS). This question is a variation of the problem I discussed in my December 1994 column. In that column I discussed how the system serializes all calls to DllMain functions in a process. This means that, when your DllMain receives the DLL_PROCESS_DETACH notification, no other threads can execute code in any other DllMain functions, including yours. So, when you call SetEvent, the worker threads are trying to terminate but they can't completely terminate until every DLL's DllMain function receives a DLL_THREAD_DETACH notification. Since the worker threads can't terminate, your call to WaitForMultipleObjects never returns and you have deadlocked the threads. The essence of your problem is that you need to terminate the worker threads just before your DLL gets a DLL_PROCESS_DETACH notification. Here is what I propose: create your DLL as usual but modify your DLL_PROCESS_ATTACH processing so it increments the usage count of your DLL (see Figure 3). I do this by calling the IncrementLibraryUsageCount function (implemented inside my DllWork.c file). By incrementing the usage count of the DLL, it won't be unloaded when IIS calls FreeLibrary. This stops the problem of your DLL code going away while your threads keep running. (It introduces the problem that your DLL never gets unloaded, but I'll solve that problem in a moment.) Export an additional function, called ShutdownLibrary, from the DLL. This function will simply call SetEvent to signal your event object to terminate the worker threads and then return. Just before your thread functions return, place a call to FreeLibraryAndExitThread. This counters your call to IncrementLibraryUsageCount. This way, as your worker threads terminate, each one will decrement the usage count on the DLL. Eventually, one of the worker threads will decrement the usage count to 0 and the DLL will be unloaded from the IIS address space. Since this function also terminates the thread, you don't have to worry about the thread continuing to run after the DLL's code has been unloaded. Finally, remove any calls to WaitForMultipleObjects from your DllMain's DLL_PROCESS_DETACH processing so this code executes only when all of the worker threads have terminated and the DLL is really being unloaded. Now that I moved the shutdown code from the DLL_PROCESS_DETACH processing to the ShutdownLibrary function, I'm sure you're wondering how ShutdownLibrary will be called since IIS doesn't know anything about it. The answer lies in another DLL; you must create a very small stub DLL like the one shown in Figure 4. This stub DLL needs only a DllMain function that processes DLL_PROCESS_DETACH notifications. When it receives this notification, you'll want to call the ShutdownLibrary function contained in your main DLL. You make this work by telling IIS that the stub DLL, not your main DLL, is your ISAPI DLL. When IIS calls LoadLibrary to load the stub DLL, the OS loader automatically loads your worker DLL because the stub DLL implicitly links to it by calling ShutdownLibrary. When IIS calls FreeLibrary, passing the handle of the stub DLL, the stub DLL will get a DLL_PROCESS_DETACH notification and call the worker DLL's ShutdownLibrary function to set the event. At this point, the worker threads will begin terminating. However, they won't be able to enter any DllMain functions until the thread in the stub's DllMain returns. This is OK. In fact, the stub DLL will probably get unloaded from the process's address space almost immediately, but the worker DLL will stay in memory until all of the worker threads have terminated completely. There is one last problem. Since IIS loads the stub DLL instead of the worker DLL, it will try to call functions that are in the stub DLL. At first the solution seemed obvious; put stub functions in the stub DLL and let them call the real functions in the worker DLL. I hated this solution because it meant more work, but then I remembered a little feature about linking: function forwarders. A function forwarder is an entry in a DLL's export table that redirects a function call to another function in another DLL. For example, if you run the Visual C++® DumpBin utility on the Windows NT Kernel32.dll, you'll see a part of the output that looks like this: This output shows four forwarded functions. Whenever your application calls HeapAlloc, HeapFree, HeapReAlloc, or HeapSize, your executable is dynamically linked with Kernel32.dll. When you invoke your executable, the loader loads Kernel32.dll and sees that there are forwarded functions that are actually contained inside NTDLL.dll, so the loader also loads the NTDLL.dll module. When your executable calls HeapAlloc, it is actually calling the RtlAllocateHeap function inside NTDLL.dll. The HeapAlloc function does not actually exist anywhere in the system! If you call GetProcAddress looks in Kernel32's export table, sees that HeapAlloc is a forwarded function, and calls GetProcAddress recursively looking for RtlAllocateHeap inside NTDLL.dll's export table. Because of the way function forwarders work, all I have to do is place function forwarders inside my stub DLL. The easiest way to do this is using a pragma directive as shown at the top of Figure 4. See Figure 5 for a diagram showing how forwarding works. This pragma tells the linker that the stub DLL should export a function called SomeFunc, but that the actual implementation for the function is in a function SomeFunc contained in the DllWork.dll. You'll have to have one pragma line for each function exported by your worker DLL for IIS to call your functions correctly. Figure 5 Function Forwarding in an ISAPI DLL Have a question about programming in Win32? Send it to Jeffrey Richter at v-jeffrr@microsoft.com.
CONTINUE_EXECUTION, the thread still raises access violations. These access violations cause my exception filter to be called, which again returns EXCEPTION_CONTINUE_EXECUTION, and so on and so on. typedef struct _EXCEPTION_POINTERS {
PEXCEPTION_RECORD ExceptionRecord;
PCONTEXT ContextRecord;
} EXCEPTION_POINTERS;
C:\winnt\system32>DumpBin -Exports Kernel32.dll
(some output omitted)
360 167 HeapAlloc (forwarded to
NTDLL.RtlAllocateHeap)
361 168 HeapCompact (000128D9)
362 169 HeapCreate (000126EF)
363 16A HeapCreateTagsW (0001279E)
364 16B HeapDestroy (00012750)
365 16C HeapExtend (00012773)
366 16D HeapFree (forwarded to NTDLL.RtlFreeHeap)
367 16E HeapLock (000128ED)
368 16F HeapQueryTagW (000127B8)
369 170 HeapReAlloc (forwarded to
NTDLL.RtlReAllocateHeap)
370 171 HeapSize (forwarded to NTDLL.RtlSizeHeap)
(remainder of output omitted)
GetProcAddress(GetModuleHandle("Kernel32"),
"HeapAlloc");
// Function forwarders to functions in DllWork
#pragma comment(linker, "/
export:SomeFunc=DllWork.SomeFunc")