Win32 Q & A

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

July 1996

Jeffrey Richter wrote Advanced Windows (Microsoft Press, 1995) and Windows 95: A Developer's Guide (M&T Books, 1995). Jeff is a consultant and teaches Win32-based programming seminars. He can be reached at v-jeffrr@microsoft.com.

QIn your March 1996 column, you wrote about the problems associated with using the TerminateThread function to kill a thread. You specifically mentioned the ill effects this could have during a call to malloc because of a critical section malloc uses to serialize access to the heap.

I noticed that critical sections were not implemented properly in Windows¨ 95. Specifically, the abandonment of a critical section under Windows 95 does not cause all other threads contending for the object to block indefinitely. This behavior is not implied by the Win32¨ documentation, and Windows NT¨ certainly doesn't behave this way.

This could have profound effects on the malloc case that you described. Imagine that a thread is killed when malloc has serialized access to the heap by grabbing the aforementioned critical section object. There is no telling what weird state the heap manager might be in. However, under Windows 95, future calls to malloc and other heap-manipulating functions will inherit the manager in this weird state and chaos may ensue.

The worst part is that, unlike a mutex, the next receiver of the critical section has no way of testing the object for abandonment. I tried digging around in the _RTL_CRITI-CAL_SECTION structure at run time for some clues about this behavior. I thought about writing a thin wrapper for EnterCriticalSection that watches and validates the owner member of this structure before actually trying to grab the critical section object. The effect would be similar, I assumed, to the new TryEnterCriticalSection function to be introduced with the next version of Windows NT. Unfortunately, the structure seems to be used actively by the OS under Windows NT but not under Windows 95.

Please let me know how Microsoft intends to deal with this problem in the future and how I might work around it until a fix is available. Critical sections are great for speed when there is no contention, but I'm not sure I feel safe using them under Windows 95.

I'm using a mutex until I hear from you.

Kevin Hazzard

Via the Internet

AWhen I first received this mail, I just couldn't believe that the implementation of critical sections in Windows 95 had this bug. Then, after running into Kevin at the Microsoft Professional Developer's Conference, he convinced me to delve deeper into this situation and delivered the sample program shown in Figure 1. After compiling and testing Kevin's sample on both Windows 95 and Windows NT, I saw that the two operating systems did in fact behave differently.

At that point, I was sure there was a bug in the implementation of critical sections in Windows 95 because I felt (like Kevin did) that an abandoned critical section should stay abandoned, preventing the potentially corrupted data (guarded by the critical section) from becoming even more corrupted. In fact, I asked the Microsoft¨ Windows 95 team about this bug. They told me that this "feature" was intended to be in the operating system. Specifically, code in VWIN32 unblocks any thread waiting for a critical section owned by a terminating thread. The Windows 95 team considers this a feature because, "as with many design decisions in Windows 95, it was deemed more important for users to be able to save their work instead of having an application hang." Since it is a feature, the Windows 95 team has no plans to alter this behavior.

In light of this news, a mutex does seem to be the best solution-your application's thread can get notifications of abandonment and react accordingly. However, mutexes are not as lightweight as critical sections, which brings us to a comparison of mutexes and critical sections.

As you should know, critical sections and mutexes behave almost identically. However, mutexes have a few advantages over critical sections: mutexes can synchronize threads across process boundaries, you can wait on a mutex by specifying a timeout value, and mutexes notify a thread when they are abandoned. This is a nice list of mutex features that critical sections don't share. Why use a critical section instead of a mutex? There is only one answer: critical sections are faster. Mutex objects are kernel objects and as such the functions that manipulate them (WaitForSingleObject and ReleaseMutex) require the transition from user mode to kernel mode. This transition is on the order of 600 CPU instructions (on x86 processors).

Critical sections are not kernel objects and the implementations of EnterCriticalSection and LeaveCriticalSection exist almost entirely in user mode so the CPU does not transition to kernel mode. Calling these functions executes approximately 9 CPU instructions (on x86 processors). For threads making repeated calls to malloc and free, the performance hit from using kernel objects (like mutexes) versus critical sections can be quite noticeable and is certainly not desirable.

To be fair, critical sections do not execute entirely in user mode. As long as a thread does not attempt to acquire the critical section while another thread owns it, EnterCriticalSection and LeaveCriticalSection execute entirely in user mode as I mentioned. However, if a thread attempts to enter the critical section while it is owned by another thread, the critical section degrades to a kernel object and the thread executes 600 CPU instructions. However, in most applications it is rare that two (or more) threads contend for a critical section simultaneously, which still makes critical sections very useful.

OptEx.h and OptEx.c show my OPTEX (optimized mutex) API library (see Figure 2). This library shows how critical sections could be implemented in Win32. After understanding this code you should be able to see why critical sections are faster than mutexes.

The library consists of a single data structure called OPTEX and four functions, all prefixed with "OPTEX_". My library works exactly like the CRITICAL_SECTION data structure and the four functions that operate on it. In fact, if you want to use my library functions, you should be able to replace the critical section functions with calls to my functions by performing a global search and replace throughout your existing code.

To use the library, you'll have to replace your CRITICAL_ SECTION data structure with the OPTEX structure.

 typedef struct {
   LONG   lLockCount;    // # times OPTEX entered
   DWORD  dwThreadId;    // unique ID of thread owning 
                         // OPTEX
   LONG   lRecurseCount; // # times OPTEX owned by 
                         // thread
   HANDLE hEvent;        // handle to event kernel 
                         // object
} OPTEX, *POPTEX;

This structure contains four members that your application should consider to be opaque or "off-limits" just like the members inside the CRITICAL_SECTION data structure.

After creating an OPTEX structure, you'll want to initialize it by calling

 BOOL OPTEX_Initialize (POPTEX poptex);

This function works just the like the InitializeCriticalSection function in that it initializes the members of the OPTEX structure. However, OPTEX_Initialize returns a Boolean value, indicating failure if the event kernel object cannot be created. (By the way, the Win32 InitializeCriticalSection function can also fail, but since it is prototyped as returning VOID, an application cannot detect when the function fails.)

When you know that no threads are entering or leaving the OPTEX, you should delete it by calling

 VOID OPTEX_Delete (POPTEX poptex);

This function works just like its DeleteCriticalSection counterpart. Notice that the function does not check to see if the OPTEX is currently owned by a thread. It's up to you to call this function at the correct time.

To enter an OPTEX, your code calls its equivalent of EnterCriticalSection:

 DWORD OPTEX_Enter (POPTEX poptex, DWORD dwTimeout);

You should notice some big differences between Enter-CriticalSection and OPTEX_Enter. First, OPTEX_Enter has a second parameter, dwTimeout. This parameter gives you an advantage over using critical sections: the ability to time out if the OPTEX is owned by another thread. You can pass zero to indicate no timeout period, a time in milliseconds, or INFINITE for this parameter's value. The second difference is that OPTEX_Enter returns a DWORD indicating why the calling thread is allowed to continue execution. The possible return values are shown in . Unfortunately it is not possible for a thread to know when an OPTEX is abandoned because the kernel-mode code must detect when a thread terminates and signal a kernel object. Since an OPTEX is not a kernel object, there is no way to detect abandonment and return WAIT_ABANDONED to a waiting thread.

Finally, to leave an OPTEX, you call

 VOID OPTEX_Leave (POPTEX poptex);

Like LeaveCriticalSection, this function decrements the calling thread's ownership of the OPTEX, and if the thread doesn't own the OPTEX anymore, a thread that is waiting for the OPTEX can become its new owner. Like OPTEX_Delete, this function does not determine that the calling thread already owns the OPTEX before decrementing its ownership count.

There is one additional feature that could be added to the OPTEX library, but I left it out of this first version: the ability for threads in different processes to synchronize each other on the OPTEX. It wouldn't be too difficult to add this feature. You'd have to separate the OPTEX into two parts: a shared part (which contains the thread ID and the two count members) and a private part (which contains a process-relative handle to the event kernel object and a pointer to the shared part). You would use a memory-mapped file for the shared part, so you'd also have to keep the file mapping's process-relative handle inside the private part. However, I thought that adding this support would add too much confusion to this example. If I get enough responses from people who really want to share an OPTEX across process boundaries, I will add this feature in a future column.

QI noticed that some applications change the screen resolution under Windows 95. For example, the Hover game that ships with the Windows 95 CD-ROM has a full- screen mode that switches the display to 640 ´ 480. When you switch to another application, the screen resolution switches back to the user's default resolution automatically. How can I add this support to my own applications?

Vivian Yuan

Via The Internet

AThe Win32 API has some new functions that allow you to work with screen resolutions. The first function is

 BOOL EnumDisplaySettings(LPCTSTR lpszDeviceName, 
                         DWORD iModeNum, 
                         LPDEVMODE lpDevMode);

This function enumerates all of the possible display settings for a given display. The first parameter, lpszDeviceName, indicates the display for which you want to enumerate settings. For now, you must pass NULL, but Microsoft is hard at work adding multiple-display support to Windows. In the future you'll be able to pass a string like "\\.\DisplayX", where X can have the values 1, 2, or 3.

Each display has a collection of settings that it supports. The iModeNum parameter indicates the collection entry that you want to obtain (the first setting is index 0). EnumDisplaySettings returns TRUE unless you pass an index in iModeNum that is outside the collection, in which case it returns FALSE. The display's setting information is returned in the DEVMODE structure pointed to by the lpDevMode parameter. DEVMODE has many members, but only 5 members have anything to do with display settings (see Figure 4).

OK, so that's how you get the settings supported by your display. To change a display's settings, you'll need to create a DEVMODE structure, initialize the members that pertain to the display, and call ChangeDisplaySettings.

 LONG ChangeDisplaySettings(LPDEVMODE lpDevMode, 
                           DWORD dwflags);

The first parameter is the address of the initialized DEVMODE structure. The second parameter is one of the flags shown in Figure 5. Possible return values for ChangeDisplaySettings are shown in Figure 6. If DISP_CHANGE_SUCCESSFUL returns, a WM_DISPLAYCHANGE message is broadcast to all the top-level windows indicating the new bits-per-pixel, width, and height of the display. Finally, to get the current display settings, you'll use a Win32 function that's been around for years and years: GetDeviceCaps.

The following example shows how to get the current display settings:

 DEVMODE dvmdOrig;
HDC hdc = GetDC(NULL);  // Screen DC used to get 
                        // current display
                        // settings
dvmdOrig.dmPelsWidth        = GetDeviceCaps(hdc,
                                            HORZRES);
dvmdOrig.dmPelsHeight       = GetDeviceCaps(hdc, 
                                            VERTRES);
dvmdOrig.dmBitsPerPel       = GetDeviceCaps(hdc,
                                            BITSPIXEL);
dvmdOrig.dmDisplayFrequency = GetDeviceCaps(hdc, 
                                            VREFRESH);
ReleaseDC(NULL, hdc);

To demonstrate the ChangeDisplaySettings function, I wrote the ChgResAndRun application (see Figure 7). This is a small, useful utility that changes the display's settings and spawns another application. It waits for the child process to terminate, then changes the resolution back to its original settings. I use this application myself all the time when playing games. For example, I usually run my machine in 1024´768 resolution, but when I want to play "You Don't Know Jack" I switch my display to 640´480 mode. When I'm finished, I want the display settings to reset to 1024´768.

To switch resolution and run my game, I created a shortcut with the following command line:

 "C:\Program Files\ChgResAndRun.exe" 640 480 0 0
                                     =C:\YDKJ\YDKJ32.EXE

ChgResAndRun requires five command line arguments. The first two ("640" and "480") indicate the requested width and height of the display. The third argument indicates the bits-per-pixel, and the fourth argument indicates the refresh-frequency rate. If you pass a zero for any of the arguments, that particular setting is not changed. In the command line shown above, I pass zero for both the bits-per-pixel and the refresh frequency so these settings will not be affected.

After the four display setting arguments, you must have an equal sign followed by the command line that you want to execute. In my example, YDKJ32.EXE is invoked after the display settings are changed. While I'm playing, ChgResAndRun lingers in the background. When I quit the game, ChgResAndRun changes the display settings back to the original values and terminates.

Have a question about programming in Win32? You can mail it directly to Win32 Q&A, Microsoft Systems Journal, 825 Eighth Avenue, 18th Floor, New York, New York 10019, or send it to MSJ (re: Win32 Q&A) via:

Internet:

Jeffrey Richter
v-jeffrr@microsoft.com

From the July 1996 issue of Microsoft Systems Journal.