Object Mapping in C++

Tom Germond

Tom Germond is a consulting software engineer and writer with over 15 years of experience in the computer industry. He has worked on the Microsoft documentation teams for both SQL Server and the Chinese version of the MicrosoftÒ WindowsÔ Software Development Kit (SDK). He seems to live somewhere between Seattle and Silicon Valley; he commutes back and forth so much it’s rather difficult to tell.

Created: March 20, 1992

ABSTRACT

This article describes the algorithms and data structures that MicrosoftÒ C/C++ version 7.0 uses to map objects into memory. It concentrates heavily on virtual functions because they are so powerful and inexpensive to use. Although the information in this article describes our implementation, all C++ compilers must solve the same basic problems.

INTRODUCTION

MicrosoftÒ C/C++ version 7.0 provides a rich set of features for defining and manipulating objects. These new capabilities let you work at a much more abstract level than does C, but at a price. The language is more complex than C, and some of its features require a significant amount of support code to execute at run time. This is especially true if you use late binding, which abstract classes and virtual functions provide. To make the best design decisions, you must understand how the compiler implements these more complex features.

This article assumes that you have a general knowledge of C++. If you do not understand the following language features, see the documentation listed below:

Virtual functions: Chapter 7 of the C++ Tutorial and Chapter 9 of the C++ Language Reference.

Multiple inheritance: Chapter 7 of the C++ Tutorial and Chapter 9 of the C++ Language Reference.

Virtual bases: Chapter 9 of the C++ Language Reference.

OBJECT MAPPING DESIGN GOALS

We designed the object mapping to achieve two goals:

1.Correctness—It must be able to represent all legal C++ version 2.1 constructs.

2.Efficiency—It must use machine resources as efficiently as possible.

We measure efficiency in the following ways:

Instance data size—It must be as small as possible.

Execution speed—It must be as fast as possible.

Most of the design decisions were made to improve efficiency. Sometimes we had to forgo an optimization because it made it impossible to represent some legal constructs correctly.

Our chief (self-imposed) limitation was that the design must be implemented by a simple, multipass compiler. No “heroic” measures, such as a type database, were allowed.

LAYOUT OF CLASS INSTANCES WITH VIRTUAL FUNCTIONS

For objects belonging to classes that are not derived from a base class, the layout of data members is identical to the layout of corresponding C structures. The placement of data members within the object does not depend on each member’s protection.

For classes containing one or more virtual functions, the compiler creates a virtual function table, called the vftable. This table comprises an array of pointers to each of the virtual functions declared in the class. Each instance of the class contains a pointer, called the vfptr, which points to the virtual function table associated with that class. In the current implementation, the vfptr is always located at the address point of the instance.

Simple Class with Virtual Functions

The following source code for a simple class A contains member data, nonvirtual functions, and virtual functions.

// Simple class definition.

Class A {

public:

int d_A_1; // data member 1

int d_A_2; // data member 2

int f_A_1; // nonvirtual function 1

virtual int vf_A_1; // virtual function 1

virtual int vf_A_2; // virtual function 2

};

Figure 1 illustrates the memory layout of class A. Note that the this pointer points to the address point of the instance. The vfptr resides at the address point, immediately followed by the instance data members. Notice also that the vftable contains pointers to the virtual functions declared in the class. Nonvirtual functions are promoted to global scope and do not require table entries.

Figure 1. Object Mapping for a Simple Class

Derived Class Using Single Inheritance

If a class is derived through single inheritance, all of the vftables for the base class are merged into a single table for the entire derived class. Each instance will have a single vfptr.

The following example shows a class D, which is derived through single inheritance from base class B. In this example, the base class had member data, a nonvirtual function, and two virtual functions. The derived class D contains a single virtual function vf_B_1 that overrides the corresponding function in the base class. In addition, class D contains two new virtual functions, vf_D_1 and vf_D_2.

// Base class definition.

Class B {

public:

int d_B_1; // data member 1

int d_B_2; // data member 2

int f_B_1; // nonvirtual function 1

virtual int vf_B_1; // virtual function 1

virtual int vf_B_2; // virtual function 2

};

// Derived class definition.

Class D {

public

int d_D_1; // new data member 1

int d_D_2; // new data member 2

virtual int vf_B_1; // overrides B virtual function 1

virtual int vf_D_1; // new virtual function 1

virtual int vf_D_2; // new virtual function 2

};

Figure 2 illustrates the memory layout of an instance of derived class D. Note that because class D is derived through single inheritance, the object contains only one vfptr and one vftable. Notice also that the data members for base class B are located immediately after the vfptr; they are followed by the data members for derived class D.

Next, look at the vftable. The pointers to the virtual functions occur in the order in which they were declared in their respective classes. Again, the pointers to functions declared in the base class come before those for the derived class. Finally, notice that the pointer to the function B::vf_B_1 has been replaced by a pointer to D::vf_B_1. This is the underlying mechanism that lets derived classes override the virtual functions of base classes.

Figure 2. Derived Class—Single Inheritance

Derived Class Using Multiple Inheritance

If a class is derived through multiple inheritance, a separate vfptr must exist for each base class. The following example illustrates a class derived through multiple inheritance. Classes B1 and B2 are the base classes; each contains data members and virtual functions. Class D is the derived class, which contains data members, a virtual function D::vf_B1_1 that overrides B1::vf_B1_1, a virtual function D::vf_B2_1 that overrides B2::vf_B2_1, and a new virtual function D::vf_D_1.

// First base class definition.

Class B1 {

public:

int d_B1_1; // data member 1

int d_B1_2; // data member 2

int d_B1_3; // data member 3

virtual int vf_B1_1; // virtual function 1

virtual int vf_B1_2; // virtual function 2

};

// Second base class definition.

Class B2 {

public:

int d_B2_1; // data member 1

int d_B2_2; // data member 2

virtual int vf_B2_1; // virtual function 1

virtual int vf_B2_2; // virtual function 2

};

// Derived class definition.

Class D: public B1, B2 {

public:

int d_D_1; // new data member 1

int d_D_2; // new data member 2

virtual int vf_B1_1; // overrides B1 virtual function 1

virtual int vf_B2_1; // overrides B2 virtual function 1

virtual int vf_D_1; // new virtual function 1

};

Figure 3 illustrates the memory layout of an instance of derived class D. Again, the data members for each class are mapped after the respective vfptr. Because this class is derived through multiple inheritance, it contains a vfptr and a vftable for each base class. As before, pointers to overriding functions replace their corresponding entry from the base class. Pointers to new virtual functions introduced in the derived class are placed at the end of the first vftable.

Figure 3. Derived Class—Multiple Inheritance

DISPATCHING TO VIRTUAL FUNCTIONS

When a C++ program calls a virtual function, the run time passes the address of the calling object to the function. The address is known as the this pointer. For objects derived through multiple inheritance, the run time must manipulate the this pointer to reference the correct instance data. In C version 7.0, the run time performs the manipulation within the callee. We chose this implementation because it usually reduces the number of instructions needed to make the call and because the calculation can often be folded into other calculations within the callee.

The Introducing Class

In the C version 7.0 object mapping, member functions always expect the this pointer to reference the introducing class in which a member function is first defined. For virtual member functions that override functions in a base class, the this pointer must be adjusted after entering the function. This operation uses the “register indirect plus displacement” addressing mode of the 8086; it incurs no additional cost when dereferencing the this pointer.

Using the classes defined in the previous example, let’s examine the following code fragment:

D myD; // an instance of class D

myD.vf_D_1(); // call the virtual function in D

myD.vf_B1_2(); // call the function inherited from B1

myD.vf_B1_1(); // call the function overridden in B1

The call to D::vf_D_1 passes the this pointer to the introducing class, D. This is the address point of the instance myD. Within the vf_D_1 function, the this pointer correctly points to the address point of class D.

The function B1::vf_B1_2 expects to operate on an instance of class B2. Thus, before dispatching to myD.vf_B1_2, the run time must cast the this pointer to B2, the function’s introducing class, by setting the this pointer to the address of the B2 data members within myD. After this is done, B1::vf_B1_2 can operate on the portion of myD that “looks like” an object of class B2.

The function D::vf_B1_1 overrides the original function B1::vf_B1_1. To reduce the amount of code in the caller, we again cast the this pointer to the introducing class, B2. The compiler recognizes that the pointer coming into the function has been cast to the introducing class; it logically adjusts the pointer back to B1 in the function prolog. This operation is performed using the 8086 “register indirect plus displacement” addressing mode, so it incurs no performance penalty.

Adjustor Thunks

In some cases, the same member function can be called with different address points within a given instance. When this is true, the compiler emits an adjustor thunk to dynamically adjust the this pointer. An adjustor thunk is a code sequence that is called instead of a virtual function. It adjusts the this pointer to the value expected by the target function and then jumps to the original callee.

The following example shows a class hierarchy that requires an adjustor thunk. Each class in the hierarchy implements its own unique version of the isA function, which prints the name of the class on cout.

// First base class definition.

Class Sort {

public:

virtual void isA() {cout << "I'm a Sort\n";}

};

// Second base class definition.

Class Coll {

public:

virtual void isA() {cout << "I'm a Coll\n";}

};

// Derived class definition.

Class SortColl : public Sort, Coll {

public:

virtual void isA() {cout << "I'm a SortColl\n";}

};

The SortColl class has a version of the isA function that overrides the versions from both of its base classes. Figure 4 illustrates the memory layout of an instance of the SortColl class.

Figure 4. Adjustor Thunk Object Mapping

Consider the following code fragment, which is based on the classes defined in the previous example and illustrated in Figure 4:

SortColl mySC; // define the object

Sort *aSort = &mySC; // take pointers to it

Coll *aColl = &mySC;

aSort->isA(); // call the function

aColl->isA(); // call the function

The call to Sort->isA goes through the vfptr, passing the this pointer correctly pointing to the address point of the SortColl class. On the other hand, the call to Coll>isA initially has the this pointer referencing the introducing class Sort, but the member function is overridden in the derived class. Before executing this function, the this pointer must be adjusted to point to the address point of the SortColl class. The adjustor thunk performs this adjustment and then jumps to the function.

Classes with Virtual Bases

It is illegal in C++ for a derived class to directly inherit more than once from the same base class. However, frequently in a complex class hierarchy, a derived class indirectly inherits a base class more than once. In this case, the instance variables of the base class would occur as many times as the class is indirectly inherited. Because this can cause problems, C++ lets you specify that a base class is virtual. This ensures that only one copy of its instance variables will be present in the resulting derived object. In the C++ object mapping, virtual bases are mapped after all nonvirtual bases. The position of the virtual base varies in different inherited classes, depending on the order of inheritance.

To locate the members of a virtual base, the compiler creates a virtual base table, called the vbtable, for each class that inherits virtually. Each entry in the table is a displacement from the address point of the instance to the address point of the virtual base within the instance. The first entry in the virtual base table is the displacement back to the address point of the instance. This entry allows for backtracking to the address of the instance. Each instance contains a pointer to the vbtable, called the vbptr.

When a class with a virtual base is inherited further, its virtual bases are typically mapped into different positions relative to the nonvirtual part. In this case, different vbtables must be defined to reflect the new offsets.

The following example shows a simple hierarchy containing a virtual base class. In this case, the base class B is a virtual base for the intermediate classes I1 and I2. In the derived class D, only one copy of class B’s instance variables exists.

// Base class definition.

Class B {

public:

int d_B_1; // data member 1

int d_B_2; // data member 2

virtual void vf_B_1(); // virtual function 1

};

// First intermediate class definition.

Class I1: virtual B {

public:

int d_I1_1; // new data member 1

int d_I1_2; // new data member 2

virtual void vf_B_1(); // overriding virtual function

virtual int vf_I1_1(); // new virtual function 1

};

// Second intermediate class definition.

Class I2: virtual B {

public:

int d_I2_1; // new data member 1

int d_I2_2; // new data member 2

virtual void vf_B_1(); // overriding virtual function

virtual int vf_I2_1(); // new virtual function 1

};

// Derived class definition.

Class D: I1, I2 {

public:

int d_D_1; // new data member 1

int d_D_2; // new data member 2

virtual int vf_I1_1; // override of I1 virtual function

virtual int vf_I2_1; // override of I2 virtual function

};

Figure 5 illustrates the memory layout for an object belonging to class I1, which has virtual base class B.

Figure 5. Memory Layout of Class with Virtual Base

In this case, the vbtable contains only two entries: The first entry is the offset from the vbptr back to the address point of I1; the second entry is the delta from the vbptr to the instance variables of B.

The following code fragment calls a member function in the virtual base:

I2 *pI2;

pI2->vf_B_1

Before calling the member function from the virtual base B, the run time must adjust the this pointer to the introducing class. A cast from I2 to B uses the vbtable as follows:

this + {offset to vbptr} + {entry in vbtable for delta to B}

I2::vf_B_1 takes this value and operates on the virtual base of B as if it were a normal I2 instance.

CONCLUSION

Understanding the internal mechanisms of Microsoft’s C++ object mapping will help you make better software design decisions.

Some of the features of C++, particularly virtual functions and virtual bases, require run-time overhead that will affect the performance of your applications. However, because of the efficiency of our implementation, the cost of these features is minimal. Classes with virtual functions in C++ are every bit as efficient as arrays of pointers to functions in C. Furthermore, they are much more elegant and readable.

Explore the advanced features of C++. You’ll find that they improve the quality of your programs without forcing you to pay a heavy performance penalty.