The information in this article applies to:
SUMMARYWhen OS/2 experiences certain exceptions, it displays a screen like this:
This article explains how you can use this information to find the source
of the problem.
MORE INFORMATIONThe OS/2 trap screen displays the values of the registers at the time of failure. Specifically, CS:IP indicates the selector:offset address of the command that was trying to execute. Unfortunately, OS/2 uses Intel's protection mechanism instead of physical memory addresses, so it is sometimes hard to determine which component's code CS:IP is pointing to. Determining the Ring LevelThe last two CS bits indicate which of Intel's ring levels the error occurred in. This helps to narrow down which component failed and helps determine how to find the appropriate component later. To determine the ring, evaluate CS in binary and examine the least significant two bits. If they are both on (ones), the failure occurred at ring 3. If they are both off (zeros) the failure occurred at ring 0.Ring 3 encompasses application level code. This includes all PM applications and their DLLs, all LAN Manager services and anything started with the RUN= line in CONFIG.SYS. Examples include the Netlogon service, print manager, printer drivers, print queue drivers, any DLLs, and Presentation Manager itself. Ring 0 encompasses kernel level code. This includes the OS/2 kernel, all device drivers and installable file systems. Examples include OS2KRNL (the OS/2 kernel), HPFS386.IFS, IBMTOK.OS2, TCPDRV.OS2, and NETWKSTA.SYS. Usually a ring 3 trap allows you to kill the offending application and let the operating system continue running. Ring 0 traps require that you reboot the machine. Determining the Failing ComponentThe value of CSLIM in the trap screen represents the limit of the current code segment. The length of an application's code segment(s) does not change (with an exception to be discussed shortly). It is also very unlikely that two components would have code segments with the same lengths. Therefore, if we can determine which component has a code segment that matches CSLIM we will have determined the failing application or driver.Using the fact that you are in ring 0 or ring 3, as well as any other evidence that narrows the focus of your search, start searching on the candidates most likely to have failed. Remember to search any DLLs an application might be linked with (PSTAT.EXE will tell you which DLLs an application uses if you run PSTAT while your application is running). Determining the Lengths of an Application's Code Segment(s)
"Lim=A06A" tells us that the size of the segment associated with
this selector is A06A and thus matches our CSLIM value in the trap.
As mentioned earlier, some ring 0 components can significantly change the size of their memory segments from what is reported by EXEHDR. This change can only make the segment smaller. Therefore, if your CSLIM on your trap dump is 912E, you need only test components using the kernel debugger who report segments larger than 912E with EXEHDR. Determining the Failing FunctionOnce you are in the kernel debugger, have found the failing component, have loaded the component's symbols, and have found the appropriate code segment, simply unassemble the segment's selector at the offset specified by IP in the trap dump. In our example we might enter the following command:
Now use the "LN" command to list the nearest symbols. This will most
likely give you the name of the failing function. In our case we get:
This tells us that we are 12 bytes past the _strncpy symbol and 30
bytes before the _strncmp symbol. This means we are in the middle of
the code for strncpy.
At this point you should be able to find the source code for the function and could possibly get more information on why the failure occurred by analyzing the values in the other registers. You can also use breakpoints and other debugging techniques to determine the reason for failure. Also, ways of avoiding the failing function may be possible to determine by seeing why and how the function is used. This can often be helpful in keeping production environments from hanging. Note that this is only one way to understand problems better and should not be considered all the information needed to resolve a bug. Ultimately, a developer will most likely need a reproduction scenario to actually fix the bug. If this is not possible, a network trace may also be needed. For more information on traps, Intel architecture, and machine code, see any of Intel's Microprocessor Programmer's Reference Manuals. For information on the OS/2 kernel debugger, see Chapter 11 of the Microsoft OS/2 Device Driver Reference. Additional query words: 2.00 2.0 2.10 2.1 2.10a 2.1a 2.20 2.2 exception protection
Keywords : |
Last Reviewed: November 9, 1999 © 2000 Microsoft Corporation. All rights reserved. Terms of Use. |