Handling Exceptions in C and C++, Part 4

Robert Schmidt

July 1, 1999

In Part 4 of this series on exception handling, Robert Schmidt dissects the EH viscera of a few small Visual C++ code examples.

Introduction

Until now, I've stayed safely in the realm of C and C++, but in this column I risk spelunking a tiny way into assembly language. My goal: Exposing the rudiments of Visual C++'s implementation for simple exception handling (EH) throws and catches. This treatise is not meant to be exhaustive—after all, my principle focus is still the language itself. However, even a brief exposure to EH implementation will help you understand and trust EH in your designs.

The Only Thing We Have to Fear

As it unwinds the stack in the wake of a throw, EH tracks which local objects require destruction, schedules the requisite destructor calls, and routes control to the proper exception handler. To perform this EH bookkeeping and management, the compiler implicitly injects data, instructions, and library references into your generated code.

Unfortunately, many programmers (and their managers) fret that such injection introduces extravagant code bloat. They hold a fear, bordering on paranoia, that EH will cripple their programs beyond practical use. In those cases, I think EH touches on people's primal fear of the unknown: Since their source code does not obviously betray EH's inner workings, they convince themselves of the worst.

To help defuse some of this fear, let's dissect the EH viscera of a few small Visual C++ examples.

Example #1: Baseline

Create a new C++ source file EH.cpp that contains the following:

class C
   {
public:
   C()
      {
      }
   ~C()
      {
      }
   };

void f1()
   {
   C x1;
   }

int main()
   {
   f1();
   return 0;
   }

Next, create a new Visual C++ console application project, and include EH.cpp as the only source file. Within the IDE's C/C++ project settings, turn on generation of mixed source/assembly .asm files; otherwise leave all default project settings intact. Build the project's Debug version. On my system, the resulting EH.exe file is 23,040 bytes long.

If you crack open EH.asm, you'll find that f1 does pretty much what you'd expect: Set the stack frame, call x1's constructor and destructor, and then reset the stack frame. In particular, you'll notice a distinct lack of any obvious EH plumbing or bookkeeping—not surprising, since the program neither throws nor handles any exceptions.

Example #2: Single Handler

Now change f1 to the following:

void f1()
   {
   C x1;
   try
      {
      }
   catch(char)
      {
      }
   }

Rebuild EH.exe, and then note its file size. On my system, the size has increased from 23,040 bytes to 29,696 bytes. Such news might induce fibrillation, as you reckon that EH caused a horrifying 29 percent file-size increase. But if you look at the absolute increase, you'll see the change is only 6,656 bytes—and most of that comes from fixed-size library overhead. Relatively little is extra code or data implicitly injected into EH.obj.

Within EH.asm you'll find the symbol __$EHRec$ defined as a constant value. This symbol represents an offset into the local stack frame. For each function that references __$EHRec$ in its generated code, the compiler has implicitly defined a hidden local "EH record" bookkeeping object.

EH records are transitory: Rather than requiring a permanent static footprint in your code, they live on the stack, come into being when a function is entered, and disappear when that function exits. The compiler adds an EH record (and local code to manage it) if and only if a function might need early destruction of local objects.

By implication, then, some functions don't require EH records. To see this, add a second function

void f2()
   {
   }

that traffics in neither objects nor exceptions. Then rebuild. EH.asm shows f1's stack frame containing an EH record as before, but no such record for f2. However, if you change the code to

void f2()
   {
   C x2;
   f1();
   }

f2 now defines a local EH record—even though f2 itself has no try block. Why? Because f2 calls f1, which may throw an exception terminating f2 and thereby require early destruction of x2.

Moral: If a function with local objects does not explicitly handle exceptions, yet can pass on exceptions thrown by others, that function still requires an EH record and associated management code.

Should this worry you, just short-circuit the exception chain. In our example, change the definition of f1 to

void f1() throw()
   {
   C x1;
   try
      {
      }
   catch(char)
      {
      }
   }

f1 now promises to throw no exceptions. As a result, f2 cannot leak exceptions from f1, and thus requires no EH record. You can verify this by building the project, inspecting EH.asm, and finding that f2's code no longer mentions __$EHRec$.

Example #3: Multiple Handlers

EH records and their support code are not the only bookkeeping the compiler introduces. For each handler within a given try block, the compiler also creates a dispatch table entry. To see this more clearly, save your current EH.asm under another name, and extend f1 to

void f1() throw()
   {
   C x1;
   try
      {
      }
   catch(char)
      {
      }
   catch(int)
      {
      }
   catch(long)
      {
      }

   catch(unsigned)
      {
      }
   }

Rebuild, and then compare the two versions of EH.asm.

(Danger, Will Robinson: In the EH.asm code listings below, I've either omitted irrelevant pieces, or replaced them with ellipses. Also, the precise label names Visual C++ generates on your system may vary from what I show here; they will definitely vary as you modify the code. Finally, I make no claims to .asm wizardry, so please don't take what you're about to read as an assembly language analysis clinic.)

For each exception handler, the compiler generates a uniquely named descriptor within the segment .data. Each descriptor encodes the mangled type name corresponding to the collateral handler's exception type. (These are the same mangled type names the compiler generates for overloaded functions.)

In my EH.asm, the relevant names, descriptors, and comments are:

PUBLIC ??_R0D@8 ; char `RTTI Type Descriptor'
PUBLIC ??_R0H@8 ; int `RTTI Type Descriptor'
PUBLIC ??_R0J@8 ; long `RTTI Type Descriptor'
PUBLIC ??_R0I@8 ; unsigned int `RTTI Type Descriptor'

_DATA SEGMENT
??_R0D@8 DD FLAT:??_7type_info@@6B@ ; char `RTTI Type Descriptor'
         DD ...
         DB '.D', ...
_DATA ENDS

_DATA SEGMENT
??_R0H@8 DD FLAT:??_7type_info@@6B@ ; int `RTTI Type Descriptor'
         DD ...
         DB '.H', ...
_DATA ENDS

_DATA SEGMENT
??_R0J@8 DD FLAT:??_7type_info@@6B@ ; long `RTTI Type Descriptor'
         DD ...
         DB '.J', ...
_DATA ENDS

_DATA SEGMENT
??_R0I@8 DD FLAT:??_7type_info@@6B@ ; unsigned int `RTTI Type Descriptor'
         DD ...
         DB '.I', ...
_DATA ENDS

(The commented references to "RTTI Type Descriptor" and "type_info" suggest to me that Visual C++ uses the same type-name descriptors for EH as it does for RTTI.)

The compiler also generates references to these type descriptors in the xdata$x segment. Each type is paired with the address of the handler catching that type. The resulting set of descriptor/handler pairs forms a dispatch table used by the EH library code for routing exceptions. Again, from my EH.asm (with comments/diagrams added):

xdata$x SEGMENT

$T214 DD ...
      DD ...
      DD FLAT:$T217 ;---+
      DD ...        ;   |
      DD FLAT:$T218 ;---|---+
      DD 2 DUP(...) ;   |   |
      ORG $+4       ;   |   |
                    ;   |   |
$T217 DD ...        ;<--+   |
      DD ...        ;       |
      DD ...        ;       |
      DD ...        ;       |
                    ;       |
$T218 DD ...        ;<------+
      DD ...
      DD ...
      DD 04H        ; # of handlers
      DD FLAT:$T219 ;---+
      ORG $+4       ;   |
                    ;   |
$T219 DD ...        ;<--+
      DD FLAT:??_R0D@8 ; char RTTI Type Descriptor
      DD ...
      DD FLAT:$L206    ; catch(char) address

      DD ...
      DD FLAT:??_R0H@8 ; int RTTI Type Descriptor
      DD ...
      DD FLAT:$L207    ; catch(int) address

      DD ...
      DD FLAT:??_R0J@8 ; long RTTI Type Descriptor
      DD ...
      DD FLAT:$L208    ; catch(long) address

      DD ...
      DD FLAT:??_R0I@8 ; unsigned int RTTI Type Descriptor
      DD ...
      DD FLAT:$L209    ; catch(unsigned int) address

xdata$x ENDS

The dispatch table preamble—that is, the code associated with labels $T214, $T217, and $T218—is specific to the function f1 and is shared by all handlers in f1. However, each entry in the $T219 dispatch table is specific to a particular handler in f1.

More generally, the compiler generates one table preamble per function that contains a try block, plus one table entry per handler in that try block. Fortunately, the type descriptors are shared among all dispatch tables in the program. (For example, all catch(long) handlers in a program reference the same ??_R0J@8 type descriptor.)

Moral: To reduce EH space overhead, you should minimize the number of functions catching exceptions, the number of handlers within those functions, and the number of types caught by those handlers.

Example #4: Thrown Exception

We pull all of this together by actually throwing an exception. Change the try clause of f1 to

try
   {
   throw 123; // type 'int' exception
   }

Rebuild as usual, open EH.asm, and note the new data (summarized, with my comments/diagrams as before):

; in these exported names, 'H' is the RTTI Type Descriptor
;   code for 'int' -- which matches the data type of
;   the thrown exception value 123
PUBLIC __TI1H
PUBLIC __CTA1H
PUBLIC __CT??_R0H@84

; EH library routine that actually throws exceptions
EXTRN __CxxThrowException@8:NEAR

; new static data blocks used by library
;   when throwing 'int' exception
xdata$x SEGMENT

__CT??_R0H@84 DD ...                ;<------+
              DD FLAT:??_R0H@8      ;          |   ??_R0H@8 is RTTI 'int'
                                    ;          |    Type Descriptor
              DD ...                ;          |
              DD ...                ;          |
              ORG $+4               ;          |
              DD ...                ;          |
              DD ...                ;          |
                                    ;          |
__CTA1H       DD ...                ;<--+   |
              DD FLAT:__CT??_R0H@84 ;---|---+
                                    ;   |
__TI1H        DD ...                ;   |  __TI1H is argument passed to
              DD ...                ;   |   __CxxThrowException@8
              DD ...                ;   |
              DD FLAT:__CTA1H       ;---+

xdata$x ENDS

As with the type descriptors, these new data blocks are shared throughout a program; for example, all code that throws an int will reference __TI1H. Also note that the same type descriptors referenced for handlers are used for throw statements as well.

Next cruise down to f1. The relevant parts:

;void f1() throw()
;   {
;   try
;      {

       ...
       push $L224 ; Address of code to adjust stack frame via handler
                  ;   dispatch table.  Invoked by __CxxThrowException@8.
       ...

;      throw 123;

       push OFFSET FLAT:__TI1H       ; Address of data area diagramed
                                     ;   above
       mov DWORD PTR $T213[ebp], 123 ; 123 is the exception's value
       lea eax, DWORD PTR $T213[ebp]
       push eax
       call __CxxThrowException@8    ; Call into EH library, which in
                                     ;   turn eventually calls $L224
                                     ;   and $L216 a.k.a. 'catch(int)'
;      }
;   // ...
;   catch(int)

    $L216:

;      {

       mov eax, $L182 ; Return to EH library, which jumps to $L182
       ret 0

;      }
;   // ...

    $L182:

;   // Call local-object destructors, clean up stack, return
;   }

$L224:                         ; This label referenced by 'try' code.
    mov eax, OFFSET FLAT:$T223 ; $T223 is handler dispatch table, what
                               ;   had previously been label $T214
                               ;   before we added 'throw 123'
    jmp ___CxxFrameHandler     ; internal library routine

When the program is run, the __CxxThrowException@8 EH library function calls $L216, the address of the catch(int) handler. Once the handler finishes, program execution flow meanders around the EH library, transfers to $L224, meanders around the library some more, and then eventually jumps to $L182. That label is the address of f1's termination and cleanup code; among other things, this code calls x1's destructor. You can verify all of this by stepping through the code in a debugger.

Coda

All exception handling mechanisms induce overhead. Unless you are willing to run your code without any exception-handling safety net, you simply must be willing to pay some speed/space cost. EH has the virtue of being a language-level feature, meaning the compiler has intimate knowledge of how EH is implemented and can make optimizations based on that knowledge.

Beyond the compiler's optimizations, you can make many of your own. As we go along in this column series, I'll show you specific ways to minimize the cost of EH. Some of those ways will be generic to Standard C++, while others will make specific assumptions about the Visual C++ implementation.

Robert Schmidt is a technical writer for MSDN. His other major writing distraction is the C/C++ Users Journal (http://www.cuj.com/), for which he is a contributing editor and columnist. In previous career incarnations he's been a radio DJ, wild-animal curator, astronomer, pool-hall operator, private investigator, newspaper carrier, and college tutor.

Deep C++ Glossary