Dr. GUI Does Templates

Dr. GUI Online
September 7, 1998

Click to copy the ATL Templates Sample discussed in this article.

Where We've Been; Where We're Going

Last time, we finished up the COM series—you could call it "Dr. GUI's COM Basics." Next time, we're going to discuss (finally) writing simple COM objects using the Active Template Library (ATL). As you've noticed, templates are so central to ATL that, literally, "template" is its middle name. So before we press onward toward COM objects in ATL, let's take a moment to make sure everyone is up to speed on a relatively new C++ feature: templates.

Something New

Have you ever read a tutorial column such as this one and not really known whether you understood the topic exactly? If so, rejoice: this column also introduces a new feature called "Give It a Shot!" that contains an exercise for you to complete. If you can complete the exercise, you'll know for sure that you understand the topic. And there's no better way to learn than by doing. So when you get done reading about templates, "Give It a Shot!" and try them out! Dr. GUI's designed the exercise so it should take most programmers an hour or so.

Templates

Why Templates Are Needed

Consider, if you will, the problem of writing a class to implement a collection—let's say an array—of objects. It's fairly easy to write a class to implement an array of any specific type. What's hard is to write a general array class that could be used as an array of any type. The closest you can come is to write an array of some union (for instance, a VARIANT) or an array of void pointers. Neither of those are typesafe, and both require use of casts to get data in and especially out.

Another option is to use the class we just described as a base class. The derived class implements wrappers for all the functions that take void pointers. The wrappers do the appropriate casting. This is typesafe, but it's a pain to write all the mindless wrapper classes. (But it's your only typesafe choice in Java.)

Wouldn't it be cool to write a class where the type of the data in the collection was a parameter to the class? In that way, you could easily create the exact class you wanted just by specifying the correct parameters. Being able to use types as parameters when creating classes is exactly what templates do for you.

Life without parameters

If you, like Dr. GUI, used to program in old versions of BASIC, you might well remember your frustration with the primitive function calling mechanism in those old versions—the amazingly powerful (NOT!) GOSUB statement.

Sure, GOSUB allowed you to jump to the subroutine and return, but since early versions of BASIC had only global variables, the subroutine could only operate on the variables for which it was written. You might recall copying values into the subroutine's variables, calling the subroutine, and copying the variables back out. What fun!

So it was a great advance to have parameterized variables such that you could call the subroutine with an argument list and pass any variables or expressions you wanted. The language implementation was responsible for making sure the subroutine operated on the right data—you were freed from writing the code to copy values in and out.

Templates represent a similar advance by allowing you to write patterns for classes and functions that can take parameters. When you write the template, no code is created. When you instantiate a template, the class or function that's appropriate for the parameters you passed is created. Note that template parameters can be any type—not just classes and types. We'll talk about how to define and instantiate templates later.

Macros? No, thanks

You may well be thinking that you can do this now by using the preprocessor macro facility. But if you've ever tried using macros for a substantial class, it'll make you crazy—since macros do simple text substitution for parameters, it's easy to run into problems such as expressions being evaluated more than once, hidden precedence problems, and so forth. In addition, the macro facility in C++ is intended for short macros, so it's limited to one logical line. It wouldn't be at all impossible to come up with a container class that exceeds the character limit for macro expansion. Finally, and most importantly, since the compiler, not the preprocessor, evaluates templates, templates use the compiler's type system to help you avoid type matching problems.

What Are Templates For?

Collection classes are the most obvious use of templates—but there are other uses, too. For instance, Visual C++ and ATL both provide template classes for smart COM pointers. When you instantiate one of these templates, you get a smart pointer that's customized for the specific type you want it to point to. But templates have many more uses than just those two: any time you can generalize a class's code to work with different types or different parameters, you can write a template to represent that generalization.

Template libraries

ATL, which stands for Active Template Library, uses templates so heavily that "template" is its middle name. Visual C++ also ships with the Standard Template Library (STL), which has an incredible variety of templates ranging from common algorithms to stings and containers. Check out both of these in the Visual C++ documentation.

Many of the ATL templates are templates of COM interface implementations. In addition, there are many classes and template classes representing many things—from windows and dialog boxes to threading models. Interestingly, ATL does not have any collection templates.

We'll talk more about using ATL to create COM objects in an upcoming article.

Advantages over function and class libraries

Since template libraries are distributed as source code, they offer a huge advantage over function libraries: the compiler can inline functions and optimize them heavily, including removing dead code. With a good optimizer, the code generated from a template library is as good as carefully hand-written code—but with much less effort. It's this incredible efficiency combined with the very high-level abstractions that make ATL components reasonably easy to write while being very, very fast.

So with a template library, you get exactly the code you need. If you've looked at the MFC source code much lately, you'll have noticed many conditional blocks using if statements to work differently depending on whether the code executing is an application, a DLL, an ActiveX control, an in-place editing server, and so on. Since there's only one MFC library, code for all of the cases has to be included in the library, making it considerably larger—and making anything you statically link with MFC considerably larger. If MFC was a template library, then only the code for the case you're using will be included in your executable. In ATL, the optimizations are incredible. For instance, if you're building a single-threaded object, the built-in increment and decrement operators are used for reference counting. But if you're building an apartment model object, reference counting uses the thread-safe Win32 APIs InterlockedIncrement and InterlockedDecrement. All you have to do is include a macro so ATL knows which case to select as it instantiates the templates and you'll get the leanest, meanest possible code.

Writing and Using Templates

So now you know why templates are cool—how do you write them?

There are two types of templates: function templates and class templates. Stand-alone function templates are not nearly as common as class templates—but note that the member functions in your template classes will often themselves be templates.

Function Templates

A function template allows you to define an entire family of overloaded functions in one quick definition.

Defining function templates

For instance, we could define a min function template as follows:

//function template for min
template <typename T> inline T min(T a, T b) { return a < b ? a : b; }

For any given type T, min(T, T) is a function that takes two parameters of type T and returns a value of type T. min calls whatever operator < that's defined for type T using the parameters as the operands. If operator < returns zero (meaning false), min returns the second parameter. If operator < returns a non-zero value (meaning true), min returns the first parameter. The template will fail with a compiler error if there's no operator < for the type. It will fail silently if the class T defines operator < to mean something other than "less than."

Older versions of C++ don't have the newer "typename" keyword—they use the keyword "class" instead. The good doctor prefers to use "typename" because it's clearer, but that's up to you for your code. Note that if you need to compile code that contains "typename" on a older compiler, just use "#define typename class"—Dr. GUI's example uses this always since it won't hurt anything.

Here's what the template would look like if we used the older keyword:

template <class T> inline T min(T a, T b) { return a < b ? a : b; }

Note the key differences between the template and a possible macro implementation, such as:

#define min(a, b) ((a) < (b) ? (a) : (b))

The template insures that the types of the parameters and the return type are all the same type. In the macro, a and b can be different types, which is a real mess when dealing with classes of different types. And since the error shows up in the source code for the macro "call," it's doubly hard to find.
The template takes care of several nasty text-expansion problems. Specifically, it doesn't require parens around each reference to a parameter. For instance, you might pass an expression that messes up precedence, such as if we passed "x == y" as the first parameter. Without the parenthesis, the binding in the macro case would be "(x == (y < b)" rather than (x == y) < b. The parentheses around the whole expression are necessary in case you embed the macro call in another expression. None of these parentheses are necessary for the template because it's instantiated by the compiler, which knows about C++ semantics, rather than by the preprocessor, which doesn't.
The template, like all functions, is guaranteed to evaluate each of its operands exactly once. The macro will evaluate one of the two parameters twice—which could be nasty if you pass an expression that has a side effect, such as "a++", as one of the parameters.

Instantiating function templates

Function templates are trivial to instantiate: Just call the function!

   // test min template
   printf("min(3, 5) is %d\n", min(3, 5)); // int
   printf("min(3.7, 8.3) is %f\n", min(3.7, 8.3)); // double

There is one gotcha, though: when the compiler instantiates a template, it will not perform any type conversion of the operands. For instance, "min(5, 3.5);" would give you a compiler error because the two parameters are not the same type.

   // min(3.7, 8) is an error because conversions not allowed
   // for template instantiations

You can, of course, use a typecast to make both of the parameters the same type. Or you can specify which template instantiate to use when you call the function:

   printf("min(3.7, 8) is %f\n", min<double>(3.7, 8));

A warning is in order here: any parameter difference is enough to cause a new instantiation to be generated. For instance, if you use min on longs and on ints, you'll get two separate instantiations even though the data format is the same (under Win32, at least) and even though the code is identical. (A smart compiler and/or linker could eliminate the code bloat caused by identical code, though.)

A problem that no smarts in the compiler or linker could fix is what happens if conversions between types are possible. For instance, you could do min for all numeric types by converting both parameters to long double and returning the min as a long double, which would be converted by the caller to the desired type. It would be inefficient, but you could do it.

The template specification is conservative on this issue: if there's any difference in parameters, a new instantiation is generated. Usually that's the safer (even if not the best) thing to do. If generating all these functions is a problem, you can provide specializations of your templates as described next.

Explicit specialization

We can also override the template for specific types. In this case, for instance, we might override if there is no operator < for a class we want to use the template on—or if the operator has the wrong semantics. For instance, if we wanted min called with two char pointers to check the strings to which the pointers point rather than the addresses, we could write an "explicit specialization" of the template:

// explicit specialization
char * min(char *a, char *b) {
   return (strcmp(a, b) < 0) ? a : b;
}

This specialization will be called if both parameters are char pointers:

// test explicit specialization
printf("min(\"abc\", \"def\") is %s\n", min("abc", "def"));

We can also explicitly specify that we want certain instantiations used. For instance, we could have said:

   // can call a specific funtion, force conversions
                        // 8 converts to double
   printf("min<double>(3.7, 8) is %f\n", min<double>(3.7, 8));

Class Templates

Function templates are dandy, but far more useful are templates of entire classes. These allow you to instantiate entire classes on the fly by specifying the parameters to the template you need.

Defining class templates

We could define a simple array template as follows:

// Very simple and fast array template
// functions both declared inline and outside of class
template <typename T, int N> class FastArray {
  public:
   inline FastArray();
   ~FastArray() { delete pArray; }
   inline void Set(int i, T val);
   T Get(int i) { return pArray[i]; }
  protected:
   T *pArray;
};

Here we have a very simple array class template which dynamically allocates memory for the array (it had better, since the destructor deletes the memory). The class also provides Get and Set functions.

Note that half of the functions are defined as inline because the body of the code is actually in the class declaration; the other half are explicitly defined as inline, even though their bodies won't appear until later. So, implicitly or explicitly, they're all inline.

Note that this template has two parameters: the type of the elements of the array, and the size of the array. You might be surprised to see a nontype parameter such as "int N" in this template, but these can be quite handy. And parameters can be any type, not just int. The one limitation is that the value of all the parameters has to be known at compile time so the template can be instantiated; in other words, the value must be a constant.

An aside: One handy use of int parameters is to allow the template user to select how a general-purpose template is to be instantiated. For instance, you could have an int parameter called "which" that you could use as follows:

template <typename T, int which> class Foo {
   void f() {
      switch (which) {
         case 1: DoOne(); break;
         case 2: DoTwo(); break;
         // etc.
         default: DoDefault(); break;
      }
      // etc.

You might think that this leads to code bloat, but it won't. Since which is a constant, the switch statement will be evaluated at compile time and the optimizer (at least when doing a release build) will remove all of the unused code, as well as the switch statement evaluation code. In other words, you'll end up with one function call from this switch statement. Nothing more.

This is one of the coolest things about templates: they allow you to handle all of the cases in your code, but only the case that is actually used generates code. This trick is obviously handy for debugging code that you want to disable in release builds. We'll discuss other ways of achieving this same end later on.

Definitions outside of the class

If you don't define your functions inside the template class, you must define them later. For regular member functions, the syntax looks more-or-less like a regular function template:

template <typename T, int N> void FastArray<T, N>::Set(int i, T val) {
   pArray[i] = val;
}

The big difference here is that the name of the class must be specified fully (as for all member functions). Note that the class name must include the parameters—in this case, "FastArray<T, N>".

The implication here is that, for a template called FastArray, FastArray<Param1, Param2> names a class. In fact, you can use the fully-specified template name anywhere you would use a class name: in front of the scope resolution operator, as here; when declaring an object, in derivation lists, and even as template parameters! (If you were using macros, you'd need a separate macro for the class name and for the class implementation—a definite pain.)

The naming rules for constructors and destructors defined outside the class might surprise you some:

template <typename T, int N> FastArray<T, N>::FastArray() {
   pArray = new T[N];
}

Note that the second instance of the class name is merely "FastArray", not "FastArray<T, N>". There's no good reason to repeat the template parameters since the name of the constructor and destructor is the same for all instantiations of the template. Also, allowing them to be repeated would only lead to errors if they're not repeated correctly.

Instantiating class templates

Class templates, unlike function templates, aren't instantiated automatically. You must supply the parameters each time you reference a class that's instantiated from a template. And if the parameters change, the compiler instantiates a new template. For instance, FastArray<int, 5> and FastArray<int, 6> are two different array types. (We will fix this problem in a later array class by passing the size to the constructor rather than to the template.)

We can instantiate and use our array template as follows:

// test arrays
const int nSize = 5; // size for all
printf("Fastarray--no error checking\n");
FastArray<char, nSize> fachar;
for (int i = 0; i < nSize; i++) // OK
   fachar.Set(i, 'A' + i);
for (i = 0; i <= nSize; i++) // one too far, but no exception
   putchar(fachar.Get(i));

Note that the second loop overindexes the array by one but doesn't catch the error. Our next template will provide bounds checking.

Referring to an instantiated class does not necessarily instantiate it. The class is only instantiated when you create an object or a reference to an object. In addition, member functions are not instantiated unless they're called. This is normally good—unless you're producing a library for others to call. In that case, you can explicitly instantiate an entire class template simply by declaring it at file scope:

template class FastArray<double, 30>; // instantiate all member functions

You can force instantiation of template functions—both file-scope and individual member functions—by declaring them at file scope, as above.

Microsoft C++ has an extension that allows you to use the "extern" keyword on a template declaration like above without instantiating a class or function. This allows you to forward declare a template class instantiation without instantiating it.

Inheriting from class templates

When the good doctor said that the expression "class_template_name<parameters>" is a class name and could be used anywhere any other class name could be used, he meant it. "Anywhere" includes a class derivation list—so let's add error checking to our array class by deriving a new template class from it:

template <typename T, int N> class RobustArray : FastArray<T, N> {
  public:
   RobustArray() { if (pArray == NULL) throw "Not enough memory"; }
   void Set(int i, T val) {
      BoundsOK(i);
      FastArray<T, N>::Set(i, val);
   };
   T Get(int i) {
      BoundsOK(i);
      return FastArray<T, N>::Get(i);
   }
 protected:
   void BoundsOK(int i) { // throws exception if bad
      if (i >= N || i < 0) throw "Array index out of bounds";
   }
};

Note that since the base class requires a parameter for the size, the derived class must also have a size parameter so it's available for use in the inheritance list. We're going to fix the problem of generating a different array class when only the size of the array changes, but not in this class.

Our strategy for error checking is to do an error checking routine before or after calling the base class's member function. In the cases of Get and Set, we check the bounds before we call the base class's member function to do the work. (Note the syntax for referring to the base class—another example of using the template expression anywhere.)

We don't define a destructor in this class—the base class's destructor will be called automatically and does exactly what's needed, so there's nothing for us to add.

The way the constructor works is a little tricky. Remember that the base class constructor automatically will be called before the derived class constructor. That's why we don't call new in the derived class constructor. All that's left to do is to make sure that new didn't return NULL.

The code to instantiate and test this template is similar to the code for the base class except that we include a try/catch block to deal with the exception that will be thrown.

   try {
      printf("\n\nRobustArray has error checking\n");
      RobustArray<char, nSize> rachar;
      for (i = 0; i < nSize; i++) // OK
         rachar.Set(i, 'A' + i);
      for (i = 0; i <= nSize; i++)  // one too far, throws exception
         putchar(rachar.Get(i));
      printf("\n\n");
   }
   catch (char * str) { // have to catch if we want to go on
      printf("\n\a%s exception (RobustArray)\n\n", str);
   }

Templates to the max

As the good doctor said earlier, we can use template class instantiations anywhere we can use a regular class—including as a template parameter:

RobustArray<RobustArray<char, 5>, 10> five_by_ten;

The template declares a RobustArray of RobustArrays.

Be careful with the angle brackets: you'll need a space between them if there are two in a row—or else the compiler will think they're the >> operator. And use parentheses if you need to pass an expression containing angle brackets to a template.

Foo<Goo<String,(a > b)> >; // note parens and space necessary

Using templates to generate customized code

Wouldn't it be nice, though, to have one array class that would do or not do error checking depending on a parameter we passed to the template? Even better yet, what if we could customize the error handling by writing the functions we wanted to be called when an error occurs?

It would be perfectly possible to pass a function (actually, a function pointer) or two to the template, then to call the function(s) from the members of the template class.

But the good doctor is going to do something a little more powerful: he'll write a class whose members are the necessary error-checking functions. For instance, a class that throws an exception if there's an error would look like this—with a static member function for each error check:

// Helper class for flexible array template--exception on error
class ErrorThrowsException {
  public:
   static void BoundsOK(int i, int N) {
      if (i >= N || i < 0) throw "Array index out of bounds";
   }
   static void CheckNullPointer(void *p) {
      if (!p) throw "Not enough memory";
   }
};

We can eliminate error checking by substituting another class:

// Helper class for flexible array template--no error checking
// The compiler will optimize these inline functions away to 
// nothing in release builds.
class NoChecks {
  public:
   static void BoundsOK(int i, int N) { }

   static void CheckNullPointer(void *p) { }
};

Now all we do is write a template for the array class that will use one of these classes to do error checking:

// Fast or robust array, using one of the error classes
// Note the default parameter for the error class.
template <typename T, typename E = ErrorThrowsException>
class FlexArray {
  public:
   FlexArray(int N_) {
      N = N_;
      pArray = new T[N];
      E::CheckNullPointer(pArray);
   }
   ~FlexArray() { delete pArray; }
   void Set(int i, T val) {
      E::BoundsOK(i, N); // throws exception if bad
      pArray[i] = val;
   };
   T Get(int i) {
      E::BoundsOK(i, N); // throws exception if bad
      return pArray[i];
   }
 protected:
   T *pArray;
   int N;
};

This template takes advantage of a feature we've not discussed: template parameters can have default argument values. In this case, we've specified that the class that throws an exception if there's an error as the default.

Note also that there's no template parameter for the size of the array. That means that while there will be one FlexArray class for each combination of element type and error checking class, there will be only one class for all sizes, not one per distinct size. That's a considerable savings in code! Rather than passing the size to the template, the size is passed to the constructor and stored in a member variable.

The two error-checking classes look polymorphic, but they're not. Remember that polymorphism means that we select at run time which function to call based on the dynamic type of the object. In this case, we determine at compile time which function to call based only on the template parameter. Dr. GUI jokingly calls it "compile time polymorphism" because it looks polymorphic, but it's not.

Note that the compiler can't check to see that your classes implement the correct functions as it could if inheritance was involved. But that doesn't much matter: if the function calls in the array class work, they work, even if the parameter types are different.

We can test our array template with the following code:

   printf("FlexArray, no checks\n");
   FlexArray<double, NoChecks> fadouble1(nSize);
   for (i = 0; i < nSize; i++)
      fadouble1.Set(i, i);
   for (i = 0; i <= nSize; i++) // overindexed, not checked
      printf("%f ", fadouble1.Get(i));
   printf("\n\n");
   try {
      printf("FlexArray with error checking\n");
      FlexArray<double> fadouble2(nSize);
      for (i = 0; i < nSize; i++)
         fadouble2.Set(i, i);
      for (i = 0; i <= nSize; i++)
         printf("%f ", fadouble2.Get(i)); // overindex throws exception
      printf("\n\n");
   }
   catch (char * str) {
      printf("\n\a%s exception (FlexArray)\n\n", str);
   }

In this code, we instantiate two array classes, both with double as the array element type. One uses the class that throws an exception, one uses the error checking class that does nothing. Note that the size of the array has changed from a compile-time template parameter to a run-time constructor parameter.

Looking at the assembly code in release mode shows the magic of templates and inline functions. For instance, the statement inside the set loop compiles to:

; C++ code: fadouble1.Set(i, i);
fild        dword ptr [i]
inc         dword ptr [i]
fstp        qword ptr [ecx]
add         ecx,8

Actually, these four instructions are a bit misleading, because they include two instructions (inc and add) that don't have anything to do with the Set function—they implement the "i++" part of the enclosing for loop. (One increments i, one increments the pointer to the array of doubles by eight bytes.)

The real work of the Set function is the fild instruction, which converts the int operand to double and stores the result at the top of the floating-point stack, and the fstp instruction, which stores the number at the top of the floating-point stack to memory and pops the stack. Those two instructions are about as lean and mean as it gets! The call to Set is gone, replaced by two instructions. The call to BoundsOK is gone, since the function didn't do anything. You can see why Dr. GUI recommends smart use of templates for his patients who care about performance, both size and speed.

What if you're not happy with this error checking? What if, for instance, you wanted error checking when DEBUG was defined but not otherwise? It's easy to write an error-checking class that tests the DEBUG symbol. And since DEBUG is a constant, the compiler will eliminate any dead code, leaving you with no extra overhead. Just write the class, and pass it when you instantiate arrays.

Explicit specialization

Just as you can with template functions, you can create specializations of template classes. You can do this for efficiency, or just for fun, as in the code below:

// Expression template uses recursion, specialization
// Instantiations evaluated completely at compile time
template <int N> class Factorial {
public:
   enum {value = N * Factorial<N-1>::value};
};
// specialization for 1, ends recursive evaluation
class Factorial<1> {
public:
   enum {value = 1};
};

This does what it looks like it does: The compiler recursively evaluates the template for values of N that go down by one each time. When the value is one, the compiler uses the specialized template, thus ending the recursion.

You can ask for a factorial by simply writing:

int f12 = Factorial<12>::value;

The compiler evaluates the factorial and generates a single MOV instruction that moves the constant the compiler calculated to the variable. It just doesn't get faster than that. These templates are sometimes called "expression templates."

Give It a Shot

Okay . . . now you've seen some templates, so it's time for Dr. GUI to prescribe some exercise(s). So Give It a Shot!

Write a function template that implements Swap. What are the limitations of your implementation?
Write a collection template of your choice. Keep it simple!
Write an additional error-checking class for the FlexArray template that honors the DEBUG flag. Test your code. Modify your collection to reuse these error-checking classes, including your new one.
Write an expression template that evaluates an expression such as the nth Fibonacci number. The nth Fibonacci is the sum of the n-1th and n-2^nd Fibonacci numbers, except for n = 1 and 2, where the Fibonacci number is 1. (Hint: this takes three classes.)

Dr. GUI's solution code will be available with the next column.

Where We've Been; Where We're Going

This time, we discussed templates, including some fairly advanced uses of them. You know you understand 'em because you've used 'em (see the "Give It a Shot!" section, above). Next time, we'll finally get to ATL—we're going to build a very simple ATL control.