Traps

The following are all real problems that were encountered by virtual device developers:

Trap: Forgetting to free memory associated with a thread data slot when a thread terminates. When a thread terminates, don't forget to free the contents of the thread data slot while you still can; otherwise, the memory will be leaked.

Trap: Scheduling too many events. The VMM has only a limited amount of space to record events. The space does not grow until the next time the VMM processes events with the critical section free. Scheduling thousands of events in rapid succession will crash the machine. Coalesce duplicate events to reduce the demand on the event heap.

Trap: Allocating too many list elements. As with events, the VMM has a limited amount of space for allocating list elements (unless the list uses the heap, in which case list elements come from the heap and this remark does not apply). Also, as with events, the space for list elements grows only the next time the VMM processes events with the critical section free. Allocating thousands of list elements in rapid succession will crash the machine. If you need thousands of list elements, you should reconsider your data structure.

Trap: Forgetting to pass a priority value in the EAX register to Call_Restricted_Event or Call_Priority_VM_Event. The result is that the thread or virtual machine gets boosted by a random amount, and the system seems to freeze for extended periods of time. (The debugging version of Windows 95 reports this problem.)

Trap: Calling the registry at unrestricted event time. Unless you set the PEF_WAIT_NOT_NEST_EXEC restriction when scheduling the event, it is possible that the event may be dispatched on a thread that is already in the registry. This deadlocks the system. (The debugging version of Windows 95 reports this problem.)

Trap: Leaving events outstanding when you unload. Dynamically-loaded device drivers must remove all hooks and cancel all events, timeouts, and callbacks before they unload. Otherwise, the hook, event, timeout, or callback causes a jump to an invalid location when the appropriate condition is met. If you need to leave a hook or callback in place (for example, because the unhook failed, or there is no way to cancel the callback), put the hook or callback in a static code segment so that it remains loaded even after your VxD unloads. Of course, the stub in the static code segment shouldn't do anything unless the rest of the VxD is already loaded. (The debugging version of Windows 95 reports many cases of this problem, but not all of them.)

Trap: Looping without blocking. Remember that going into a Resume_Exec loop does not actually block the current thread; it merely processes events. If the current thread is the highest-priority thread, nothing happens until the loop ends. For additional discussion of this problem, see Events.

Trap: Assuming registers don't change across a call. This typically happens when calling a service whose name begins with an underscore (which in most cases indicates that the service uses the C calling and register convention) and assuming that the EAX, ECX, or EDX registers are be preserved across the call. (The debugging version of Windows 95 often intentionally modifies those registers in an attempt to catch code that relies on this non-feature.)

Trap: Assuming stack parameters don't change across a call. The C calling convention permits the called function to modify the input parameters, so don't rely on their values being preserved across a call. The following example is wrong:


push    0              ; flags
push    nPages         ; number of pages to lock
push    page           ; first page to lock
VMMCall _LinPageLock   ; Lock them
; do stuff 
VMMCall _LinPageUnlock ; Unlock them
add     esp, 12

The LinPageLock service is allowed to damage the top three double-words on the stack, resulting in garbage being passed to LinPageUnlock. (The debugging version of Windows 95 often intentionally modifies input parameters, in an attempt to catch code that relies on this non-feature.)

Trap: Confusing VMSTAT_PM_Exec with VMSTAT_PM_Use32Mask. When deciding whether to use 16-bit or 32-bit offsets from the client, the rule is that you should use a 32-bit offset if the VMSTAT_PM_Exec bit is set, and if one of the VMSTAT_PM_Use32Mask bits is set. If a VxD checks only the VMSTAT_PM_Use32Mask bits, it may end up using 32-bit offsets even though the virtual machine is not in protected mode.

Trap: Leaving files open. If a virtual device opens files, it should close them before giving control back to the virtual machine. Since open files are tracked on a per-VM and per-PSP basis, the application on which you opened the file might create another file or might exit, causing the PSP to change, at which point the file handle becomes invalid. Also, having files open causes the virtual IFS Manager device to have problems when it attempts to transition to the protected-mode file system.

Trap: Changing the DS or ES selectors. The CS, DS, ES, and SS selectors must remain as flat selectors at all times. Do not load any other values into those registers, even temporarily. Doing so will crash the system.

Trap: Setting the direction flag. The direction flag must remain clear (up) whenever control passes to the VMM or another virtual device. You can set the direction flag to the 'down' state, but you must clear the flag before yielding control. (The debugging version of Windows 95 attempts to catch this error.)

Trap: Cancelling events the wrong way. If an event is scheduled as a VM Event, it must be cancelled with the Cancel_VM_Event service. Similarly, global events must be cancelled with Cancel_Global_Event, thread events must be cancelled with Cancel_Thread_Event, and restricted events must be cancelled with Cancel_Restricted_Events. Failing to observe these rules results in corrupted memory. (The debugging version of Windows 95 attempts to catch this error.)

Trap: Race conditions between timeouts, events, and hardware interrupts. VxD's often initiate an operation and then schedule a timeout that cancels the operation if it takes too long. A race condition can occur if the timeout is dispatched just as the operation completes. This is particularly true if the user is running high-speed communications software at the same time, because communication interrupts will be streaming in, causing the VxD to schedule and cancel many timeouts. If the VxD cancels a timeout after it has been dispatched, and the system has already recycled the handle and given it to another VxD, . the first VxD would end up cancelling the other VxD's timeout by mistake.

Trap: Assuming that MapPhysToLinear(N, 1, 0) = MapPhysToLinear(0, 1, 0) + N. This was never guaranteed, but it happened to be true under Windows 3.1 by accident. This is not true under Windows 95.

Trap: Passing the wrong number of arguments to a service. This happens most often with the HeapFree service because the VxD writer forgets that there is a second parameter of flags (which must be zero). In general, VxD writers tend to forget to pass flags to services that accept a bitmask of flags as the last parameter. This results in the service picking up stack garbage as the flags parameter. (The debugging version of Windows 95 attempts to catch this error.)

Trap: Assuming that the high word is always zero. Many VxDs that interface to DLLs at ring 3 pass a function code in the AX register to the VxD's protected-mode entry point, but the VxD looks at the entire EAX register to parse the function code. This code tended to work by accident under Windows 3.1 because the high words of extended registers were usually zero. This is no longer true under Windows 95.

Trap: Forgetting to disable interrupts when hooking an asynchronous service. Remember that asynchronous services can be called at hardware interrupt time. If a hardware interrupt occurs after a hook is installed, but before the pointer is saved in a place where the hook procedure can get to it, the interrupt can result in a call to the service before the hook is ready.

Trap: Using Begin_Nest_Exec inappropriately. Any protected-mode procedure called from a nested execution block must be in a nondiscardable segment, must access only data in nondiscardable segments, and may not switch to a 32-bit stack. (This means, in particular, that only interrupt-safe Windows functions can be called, because anything else might thunk to Win32.) Although these restrictions existed in Windows 3.1 as well, Windows 3.1 almost never spent any time on a 32-bit stack, unless a WINMEM32 or Win32s application was running. But now that Windows 95 has Win32 support built-in, the system spends a large percentage of its time on a 32-bit stack, so the window of opportunity for error is much greater. If you need to call functions that do not respect these restrictions, use an application time event, coupled with the _SHELL_CallDll service, passing lpszDll = 0 and lpszProcName equal to the 16:16 address.