IDispatch Performance and Restrictions

With all the argument handling, type coercion, localization support, and exceptions that are involved with a function such as IDispatch::Invoke, you've probably recognized at least once that accessing the functionality and content—that is, the methods and properties—of an object through a dispinterface is not exactly fast. Hence, the topic of performance.

If you compare Figure 14-3 with Figure 14-1, you'll notice that when a client accesses the functionality in, say, CBeeper::Beep through a dispinterface, both client and object are executing a lot more code than if the client were accessing it through a custom interface such as IBeeper. The client has more overhead because instead of calling the function directly, it has to push a fair number of parameters on the stack before calling IDispatch::Invoke. If the function itself took additional parameters (Beep takes none itself), the client would also have to create the appropriate DISPPARAMS and VARIANTARG structures first, then push pointers to those parameters on the stack, and then call Invoke. In contrast, making a direct call to a vtable interface function allows the client to push values on the stack instead of first stuffing them into other structures. In general, the client will always execute more instructions to set up the call, resulting in lower performance. While this may or may not be significant with respect to the execution time of the called function, it is the trade-off of late binding.

The object itself also incurs additional overhead because it receives its parameters through DISPPARAMS and VARIANTARG structures. It must not only unpack those parameters from the structures but also perform type checking on each parameter. When function calls are early-bound, the compiler does type checking. In late binding, the object has to enforce types at run time. So not only is there an extra function call to get from IDispatch::Invoke to, in this case, CBeeper::Beep, there is more overhead to manage the parameters and the return value.

In general, you will find that accessing a dispinterface that is implemented on an in-process object is anywhere from 5 percent to more than 90 percent slower—depending on the object—than accessing a custom vtable interface that implements the same features. When the object exists in a local server and marshaling is involved, a custom interface is still faster but not as much because the marshaling overhead comes into the picture. Still, is there any way to improve performance, especially for in-process objects?

The Dual Interface

The way to improve performance for a dispinterface is somehow to cut down the per-call overhead for each property and method while still preserving the ability to perform late binding. In truth, however, you can't do much to speed up calls to IDispatch::Invoke: you still have to create the necessary structures for Invoke on the client side, and you still have to perform type checking on the object side.

What is called dual interface isn't as much an improvement on late binding to a dispinterface as it is a technique to combine a dispinterface, which has the IDispatch vtable, with the vtable for the equivalent custom interface, where the two share a common implementation of IUnknown members. A controller aware of dual interfaces can then choose to access the methods and properties either through IDispatch::Invoke or through direct calls to vtable entries, as illustrated in Figure 14-5. With a dual interface, a controller can choose to perform early binding or late binding to improve performance as it deems necessary, especially with in-process objects.

Figure 14-5.

Dual interfaces combine the IDispatch vtable with a custom interface, allowing access to methods and properties through either route.

One benefit of implementing a set of functions as a dual interface for local objects is that OLE provides automatic marshaling of all the functions in the custom vtable portion of the interface. This can occur because OLE will have access to the type information for this interface, as it does for any other dispinterface. Tests show, however, that performance is about the same through the vtable portion of a dual interface as it is for the IDispatch portion—the reason is that generic marshaling code, which has to examine type details to determine how to actually marshal each parameter, takes a considerable amount of time. In fact, when making cross-process IDispatch::Invoke calls, most of the time is spent in the marshaling code, and OLE employs this same code for all parts of a dual interface. Therefore the performance is about equivalent.

Remember, a straightforward custom interface is still faster than a dispinterface across process boundaries. If performance in such cases is of prime importance to you, a dual interface is not necessarily the best option. The argument and return value types that you can use within a cross-process dispinterface and a dual interface also affect your decision here. These types are limited to what are called automation-compatible types, which means you cannot pass arbitrary data structures by reference through these interfaces because OLE's generic marshaling service can handle only a basic set of types. OLE doesn't handle pointers to structures that may contain pointers to other structures, ad infinitum. If you want to pass data structures in this fashion, you should use either custom marshaling or standard marshaling with a custom interface.

Automation-Compatible Interfaces and Types

When a client packages parameters in VARIANT[ARG] structures to pass to methods and properties through IDispatch::Invoke, some proxy and stub must marshal those parameters as well as the return value. Because OLE itself implements the proxy and stub for the IDispatch interface, OLE provides the generic marshaling code to handle any VARIANT structure. However, you can store only a limited number of data types in a VARIANT, and this limits the types you can employ through a dispinterface or a dual interface. These types are int, short, long, boolean, char, wchar_t, float, double, IUnknown *, IDispatch *, CY (currency), DATE, BSTR, VARIANT (containing a compatible type), NULL, SCODE, and a Safe Array of any of these types. You can also have a custom typedef enum <type> that is the same as an int, but the size is also system dependent.

When you limit methods and properties to these types, you have an automation-compatible interface. As we saw in Chapter 3, you can use the oleautomation attribute in an ODL to enforce the restrictions listed here.

Passing Data Structures

Certainly you can pass a lot of useful information when you follow the restrictions for an automation-compatible interface. However, you might want to pass some sort of data structure between processes through a dispinterface or a dual interface without having to provide your own custom interface.

You can accomplish the exchange of structured data through two methods. The first and most common method is to implement another object with IDispatch that wraps the data structure and turns each field into a property. For example, suppose I had a structure as follows:


typedef struct
{
int Lengthcm; //Length in centimeters
int Radiuscm; //Top radius in centimeters
int Weightg; //Weight in grams
COLORREF Shade; //Exact color shade
} PARSNIP;

I could then create a dispinterface with each field described as a property:


dispinterface DIParsnip
{
properties:
[id(0)] int Lengthcm;
[id(1)] int Radiuscm;
[id(2)] int Weightg;
[id(3)] long Shade;
};

Wherever I want to have a PARSNIP * argument7 to a method in my interface (or as a return value), I can use an IDispatch * instead. Whenever I need to access a field in the data structure, I can call IDispatch::Invoke with the dispID of the property associated with that field.

One obvious drawback to passing data structures in this manner—especially across process boundaries—is that it is slow. It potentially requires a cross-process call for each field in a data structure. For small structures, this generally isn't a problem, but for larger structures or when performance really matters, the faster alternative is to have your original automation object return structures as IUnknown pointers so that each pointer refers to an object that also implements IDataObject (which can be the automation object itself). Through IDataObject, you can provide data structures in their entirety with one call. You might also find it useful to implement a separate data structure object that has both an IDataObject implementation for fast access to data and an IDispatch implementation for flexible late-bound access to each field. To be really flexible, you could make your IDispatch implementation part of a dual interface for in-process objects or also provide a custom interface for local objects.