hen I was in sixth grade, my parents bought a TRS-80 Model I computer. My biggest frustration with that early-model PC had little to do with its hardware limitations. Instead, I was stymied by the scaled-down interpreter, which had no concept of data structures and supported only two string variables, A$ and B$. With no way to implement collections (queues, lists, vectors, arrays, and so on) it was nearly impossible to develop applications with any degree of complexity.
Along with the astounding advances in hardware capabilities over the past twenty years, there's been a corresponding increase in the sophistication of the programming languages available for the PC. But while most of these languages have long supported the use of collections, only in the last few years have any of them actually included collections as part of their standard libraries.
Take C++. One of the things that makes C++ preferable over C is its ability to innately associate a data structure with a set of related functions. For example, I can model a last-in first-out collection of numbers by developing a Stack class and limit access to the underlying data structure that makes up the class to a few well-defined, aptly named methods like Push and Pop (see
Figure 1).
The Stack class code in
Figure 1 is an improvement over simply passing around a chunk of memory and an index pointer because the tasks of construction, initialization, and destruction are taken care of automatically. Since the data is only accessible indirectly via public methods, I can change the underlying implementation of the class without requiring users to do anything more than recompile the code that accesses it.
Before the days of the Standard Template Library (STL), every development team that needed a stack class ended up having to either plunk down a few hundred dollars for a commercial class library or roll its own. The result was a plethora of homegrown stack classesto say nothing of the queue, array, list, and vector classeseach with slightly different features, semantics, and nomenclature. The widespread adoption of the STL will reduce duplication of effort and help avoid bugs introduced by badly written or poorly documented C++ collection classes.
It is important to note that in the genesis of the STL, type safety was deemed preferable to implementation inheritance. In other words, the STL architects did not attempt to provide a general-purpose, weakly typed stack class that could be linked to your application via a runtime library. Rather, the STL provides a strongly typed template that generates type-safe code at compile time using an essentially automated copy-and-paste mechanism.
Enter COM
Even though C++ now has a well-defined, templatized mechanism for representing arrays and such, it doesn't solve the increasingly significant problem of sharing collections across process and language boundaries. Which brings me to COM. Two of the key benefits of COM are its language independence and location transparency. If you are developing a component that needs to expose a collection of objects across process or language boundaries, COM is ideal. As with C++, you can use the flexibility of COM to compose whatever custom-access interfaces you need by defining the appropriate methods to give indirect access to your collection data.
Similar to how the STL defines a standard for handling collections, there is a de facto standard mechanism for handling collection classes in COM via an IDispatch-derived interface, which supports the methods described in Figures 2 and 3. Implementing these properties and methods allows Visual Basic® clients to iterate through the collection using the familiar For Each...Next construct, like this:
For Each Animal in Animals
MsgBox Animal.Name
Next Animal
As you'd expect from a COM interface, the properties and methods shown in
Figure 2 do not dictate the underlying implementation of the collection. Thus, COM collection objects can be developed in any COM-supported language and can use any internal storage mechanisms supported by the language in which they are written.
Enumerators
Note that the collection object must support two forms of indexing: random access via the Item property, and sequential, thread-safe access through an iterator interface (known in COM parlance as an enumerator) via the _NewEnum property. An enumerator is a COM interface that allows you to walk through a list of objects of the same type. Enumerators have a well-defined set of methods (see Figure 4), and follow the IEnumObjType naming convention, where ObjType is the type of object in the list. Popular enumerator incarnations include IEnumMoniker, IEnumString, and IEnumUnknown, but you can (and should) create type-specific enumerators for your own objects.
Although it's possible to define a single weakly typed enumerator interface such as IEnumVoidPtr for all data types and require interface implementers to cast all data to void* as is done with the malloc routine, it is semantically preferable to create a unique, strongly typed interface for each type. In theory, at least, if you want to let clients iterate over a list of Animal objects, you should create an IEnumAnimal interface for the best results.
However, in real-world COM when a client calls _NewEnum (see Figure 2), the collection object returns a generic, catch-all IEnumVariant interfaceessentially the same as the IEnumVoidPtr approach described previouslyinstead of a strongly typed enumerator. That design choiceincluding the use of the dispatch-style properties and methods (Count, Item, Add, Remove) instead of a standard interfaceessentially dumbs down the implementation of COM collection objects so they can be used in languages like VBScript that value flexibility over type safety.
This lack of type safety is exacerbated by the fact that
object developers are encouraged to overload the functionality of the Item and Remove methods such that items
in the collection can
be indexed using a VARIANT rather than
a specific data type.
For example, the Item method in Figure 5
allows clients to index into the Animals collection using either a long or a BSTR. It's an enticing techniquesome might even call it robustbut it encourages semantic innuendo. You can pass just about anything in, and who knows what you'll get back!
The lack of type safety offers flexibility, but it comes at a price. If the notion of a COM collection had instead been defined using one or more standard interfaces, clients could easily find out if an object exposed a collection by calling QueryInterface. Furthermore, supporting a collection interface would be an all-or-nothing proposition. With the lowest-common-denominator dispatch interface approach commonly used today, it's possible for an object to only partially implement the expected properties and methods. There's nothing to prevent me from not implementingor worse, incorrectly implementingone or more of the Count, Item, and _NewEnum methods. The only way to know if the collection object supports the optional Add and Remove dispatch methods (aside from digging into its type library) is to fire dispatch calls and see what happens. If these methods were defined using a custom interface, you could call QueryInterface and know immediately.
One way to partially solve this dilemma is to implement the collection methods using a dual interface. Script-based clients must still rely on IDispatch to interrogate the collection object. More intelligent clients can use QueryInterface instead. That doesn't solve the problem of type safety, but it at least gives intelligent clients a way to verify a COM object's adherence to a semantic (albeit weakly typed) contract.
And Visual Basic-based clients can still use the powerful For
Next construct.
Perhaps the best solution is to implement one or more strongly typed custom interfaces in addition to the
expected dispatch methods and properties. Figure 6 shows
one possibility I've experimented with. I essentially grouped the related methods found in the dispatch approach as
two separate custom interfaces. You'll also notice that I
used the type-safe IEnumXxxx interface pattern rather
than using the IEnumVariant panacea. I've included an
ATL-based sample implementation of those interfaces
along with a couple of test clients in the code that
accompanies this column.
I hope I've encouraged you to give some thought to one aspect
of software constructionthe weakly typed versus strongly
typed tradeoffin the context of COM collection classes. Although each approach has its advantages, the designers of C++ and the
STL clearly prefer the strongly typed approach. In my opinion,
effective COM interface design
has the same affinity.