Wicked Code, MSJ May 1999

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

May 1999

Wicked Code

Jeff Prosise is the author of Programming Windows 95 with MFC (Microsoft Press, 1996). He also teaches Visual C++, MFC and COM programming seminars. For more information, visit http://www.solsem.com.

At some point along the path to COM enlightenment, just about every COM programmer discovers that the secret to being efficient on the wire is to know Interface Definition Language (IDL) cold. IDL is the language in which COM interfaces are defined, and it's the language you feed to MIDL to build type libraries and proxy/stub DLLs.

Unfortunately, many COM programmers' knowledge of IDL begins and ends with [in] and [out]. IDL looks pretty easy when you see trivial methods like this one defined:



   HRESULT Add ([in] long a, [in] long b,
                [out] long* pResult);

In the real world, many COM methods really are this simple. But others are not. How do you define in IDL a method that returns an arbitrary amount of data to the caller? How do you pass arrays of strings? Is it possible to pass linked lists as parameters to COM methods? If so, and if the linked list is being returned to the caller rather than passed to the callee, who's responsible for allocating memory to hold the items in the linked list, the caller or the callee?

These are the kinds of questions that you encounter when you take IDL from the classroom to the workplace. And they're the kind of questions for which answers—not to mention good sample code—are difficult to find. Aside from some excellent columns by fellow columnist Don Box, and Bill Hludzinski's "Understanding Interface Definition Language: A Developer's Survival Guide" (MSJ, August 1998), very little has been written about IDL specifically for COM programmers. That's why I've compiled a list of some of the most frequently asked IDL questions and documented their solutions. If you have some IDL stumpers of your own that you'd like to see solved, email them to me and I'll address them at some future date.

Returning Arbitrary Amounts of Data

It's easy to return arrays of data from a COM method. In the following example, the caller allocates a buffer and passes its address in a method call. The callee returns data to the caller by copying data to that address:



 // IDL
 HRESULT GetData ([in] int nMax,
     [out, size_is (nMax)] unsigned char* pBuffer);
 
 // Caller
 unsigned char cData[256];
 pInterface->GetData (sizeof (cData), cData);

If the possibility exists that the callee might return less than the 256 bytes requested, GetData can be made more efficient by adding a third parameter specifying the number of bytes actually copied to the buffer and tagging it with a [length_is] attribute:



 // IDL
 GetData ([in] int nMax, [out] int* pCount,
     [out, size_is (nMax), length_is (*pCount)]
     unsigned char* pBuffer);
 
 // Caller
 int nCount;
 unsigned char cData[256];
 pInterface->GetData (sizeof (cData), &nCount, cData);
 // nCount holds number of bytes copied

Using [length_is] in this manner prevents the stub from having to copy the entire buffer back to the caller if the callee returns fewer than nMax bytes.

The problem with both of these techniques is that the caller must anticipate the maximum amount of data the callee might return and allocate memory for it ahead of time. That's inefficient, because if the caller allocates 10,000 bytes of memory, but the callee returns only 1,000, then 90 percent of the allocated memory goes to waste. It's also limiting. What if the callee wants to return 10,001 bytes, but the caller allocated only 10,000? You could allow the caller to obtain the required buffer size in a separate call, but then each fetch would require two round-trips instead of one. And round-trips are expensive, especially when they're performed over a network.

The solution is to let the callee allocate the buffer in the caller's address space. Sound impossible? It's not, thanks to COM's CoTaskMemAlloc and CoTaskMemFree functions and a little sleight-of-hand performed by the proxy/stub.

Figure 1 shows how you would rewrite GetData to allow the callee to return an arbitrary amount of data to the caller. Here's how it works. The caller passes in a pointer to a pointer. The callee allocates the buffer and copies the buffer's address to the caller's pointer. The caller frees the memory allocated by the callee. The interface proxy and stub foster the illusion that memory is being allocated in one process and freed in another by freeing the callee-side buffer when the method call returns and creating an identical buffer on the caller's side. And the caller knows exactly how much data the callee returned because the callee copies a count to a caller-specified address—in this case, the variable named nCount.

One aspect of this code that bears further explanation is the [size_is] attribute in the IDL method definition. Notice the comma preceding *pCount:



 size_is (,*pCount)

[size_is] supports this syntax to resolve ambiguities involving pointers to pointers. The blank first parameter to [size_is] indicates that ppBuffer is a pointer to just one pointer (as opposed to an array of pointers). The second parameter indicates that the pointer at *ppBuffer points to an array of *pCount unsigned chars.

Variable-length Data in SAFEARRAYs

Returning variable-length data using indirect pointers is all fine and good for clients written in C++, but clients written in scripting languages such as VBScript require special handling. Today's scripting languages don't deal with C-style arrays very well; in fact, they don't deal with them at all. They expect arrays of data to be packaged in Automation-compatible SAFEARRAYs, which are great for script writers but a pain in the neck for programmers using C++. Fortunately, returning arbitrary amounts of information in a SAFEARRAY isn't terribly difficult.

Assume that a scripting client calls an Automation method named GetData, and that GetData returns a callee-determined amount of data in a SAFEARRAY. Here's the caller's code:



 // Caller
 Dim arrVals As Variant
 arrVals = MyObject.GetData
 For i = LBound(arrVals) To UBound(arrVals)
     .
     .
     .
 Next i

Here's the method definition in IDL:



 // IDL
 HRESULT GetData ([out, retval] VARIANT* pVariant)

And here's a method implementation that returns an array of 10 integers:



 // Callee
 VariantInit (pVariant);
 pVariant->vt = VT_ARRAY | VT_I4; // Array of integers
 
 SAFEARRAY* psa;
 // 10 elements numbered 0-9
 SAFEARRAYBOUND bound = { 10, 0 }; 
 psa = SafeArrayCreate (VT_I4, 1, &bound);



 if (psa == NULL)
     return E_OUTOFMEMORY;



 for (long i=0; i<10; i++)
     SafeArrayPutElement (psa, &i, &i);



 pVariant->parray = psa;
 return S_OK;

SafeArrayCreate is a Windows® API function that creates a SAFEARRAY. You pass a SAFEARRAY in a VARIANT by setting pVariant->parray equal to the SAFEARRAY's address and including in pVariant->vt a VT_ARRAY flag and a flag specifying the type of data stored in the array. The scripting client does the dirty work on the other end.

Returning Strings of Arbitrary Length

Ah, you say. Now I know how to return arrays of any length to conventional COM clients and to scripting clients. But can I also return strings of arbitrary length?

You bet. Once again, the solution involves using CoTaskMemAlloc to allocate memory for the string on the server side and CoTaskMemFree to free it on the client side. But this time the IDL is dramatically simpler because you can use IDL's [string] attribute to qualify the pointer (see Figure 2). This is much better than forcing the caller to allocate memory to receive the string and guessing at the amount of memory required. Just be sure you make it clear in the method's documentation that the caller is responsible for freeing the memory in which the string is stored. Failure to call CoTaskMemFree will result in memory leaks.

And what about scripting clients? Believe it or not, returning variable-length strings to scripting clients is easier than returning them to C++ clients because Auto- mation methods use counted strings called BSTRs. If an Automation method is defined and implemented like this



 // IDL
 HRESULT GetText ([out, retval] BSTR* pBstr);



 // Callee
 *pBstr = SysAllocString (L"Hello, world");
 return S_OK;

then a scripting client can retrieve a string like this:



 // Caller
 Text = MyObj.GetText

Once more, it's the client's responsibility to free the memory allocated for the string. A scripting client will do this automatically, but if a C++ client calls this version of GetText, it must free the BSTR explicitly with SysFreeString.

Passing Arrays of Strings

Most of the really sticky IDL questions have to do with transmitting data from the server to the client, but one of my favorites involves data input rather than output: namely, how do you pass an array of strings in a COM method call? It's not hard if you understand IDL's pointer-to-pointer syntax.

Remember that in IDL, applying the tag



 [size_is (,10)]

to a method parameter indicates that the parameter is a pointer that points to a pointer, which in turn points to an array of 10 somethings. Similarly, the tag



 [size_is (10,)]

indicates that a pointer points to an array of 10 pointers, each of which points to just one something.

Now consider the following IDL method definition:



 // IDL
 HRESULT InputStrings ([in] int nCount,
     [in, size_is (nCount,), string]
     wchar_t** ppStrings);

In this example, ppStrings is a pointer to an array of nCount pointers, and each pointer in the array points to one something—a string. Given this definition, a caller could pass an array of four strings to the object that implements the method like this:



 // Caller
 wchar_t* szText[] = {
     L"One",
     L"Two",
     L"Three",
     L"Four",
 };
 
 pInterface->InputStrings (4, szText);

On the other end, the object can simply iterate through the array, dereferencing the string pointers one by one to get at the strings created by the caller.

Returning Structures with Embedded Pointers

One of the coolest things about IDL is that it supports structures just like C. Cooler still is the fact that structures can contain pointers. Suppose you define two structures—POINT and LINE—and include a pair of POINT pointers in LINE, like this:



 typedef struct tagPOINT {
     int x;
     int y;
 } POINT;
 
 typedef struct tagLINE {
     POINT* pFrom;
     POINT* pTo;
 } LINE;

If you define a method like this



 // IDL
 HRESULT Draw ([in] LINE* pLine);

then you can call it like this:



 // Caller
 POINT from = { 0, 0 } ;
 POINT to = { 50, 100 };
 LINE line = { &from, &to };
 pInterface->Draw (&line);

Because IDL supports both top-level pointers (the LINE* in the method's parameter list) and embedded pointers (the POINT* members of the LINE structure), you can be sure that the LINE structure and the POINT structures that it references will be transmitted in the method call. And it's clear in this example that it's the caller's responsibility to allocate memory for the LINE structure and the POINT structures.

Now suppose you turn things around and write a method that passes a LINE structure from the callee to the caller:



 // IDL  
 HRESULT GetLine ([out] LINE* pLine);

To call this method, the caller allocates memory for a LINE structure and passes the address in the method call. But who allocates memory for the POINT structures, the caller or the callee? This is a common source of confusion in IDL, but it shouldn't be because the rule is clear: when the caller passes a top-level pointer in a method call for the purpose of retrieving data from the callee, and when the item referenced by that pointer contains embedded pointers, it's the callee's job to allocate the memory referenced by the embedded pointers. That means—you guessed it—you need CoTaskMemAlloc and CoTaskMemFree.

Figure 3 shows the proper way to implement and call GetLine. By allocating the memory for the POINT structures itself, the callee makes life easy for the caller. Just remember that if you're the caller, you must free the POINT structures when you're through with them or they're liable to hang around for a very long time.

Linked Lists

This brings me to the king of all IDL questions: is it possible to pass linked lists in COM method calls, and if so, how? Actually, it's not difficult at all. How you do it varies slightly depending on which direction the list is being passed. If the list is being input (that is, passed from caller to callee), the caller allocates the memory. If the list is being output, then the callee allocates the memory. Let's look at examples demonstrating both.

In the first example, assume the caller would like to input a singly linked list of ITEM structures to a COM object via a method call. The structure and method are defined as:



 // IDL
 typedef struct tagITEM {
     int nVal;
     struct tagITEM* pNext;
 } ITEM;
 
 HRESULT SetList ([in] ITEM* pList);

The caller then builds a list and passes it to the object:



 // Caller
 ITEM* pItem1 = new ITEM;
 ITEM* pItem2 = new ITEM;
 ITEM* pItem3 = new ITEM;
 
 pItem1->nVal = 1;
 pItem1->pNext = pItem2;
 pItem2->nVal = 2;
 pItem2->pNext = pItem3;
 pItem3->nVal = 3;
 pItem3->pNext = NULL;
 
 pInterface->SetList (pItem1);

Because the interface proxy and stub copy chase the chain of pointers contained in the ITEM structures and copy everything referenced by those pointers to the callee's address space, the object can walk the list as if it had built the list itself:



 // Callee
 ITEM* pItem = pList;
 while (pItem != NULL) {
 •
 •
 •
     pItem = pItem->pNext;
 }

Passing a linked list in a method call is slightly more work if the list is being returned to the caller rather than input to the callee. Now the callee must build the list in memory allocated with CoTaskMemAlloc, and the caller must remember to free that memory with CoTaskMemFree. Figure 4 shows how it looks in code. Once more, much of the magic comes from the fact that the proxy and stub chase embedded pointers to ensure that all the data referenced in the method's parameter list is transmitted intact to the other side.

You can pass doubly linked lists in method calls with the same ease with which you pass singly linked lists, but there's one issue unique to doubly linked lists that you mustn't forget. IDL supports three types of pointers: reference pointers, unique pointers, and full pointers. The corresponding IDL attributes are [ref], [unique], and [ptr]. When a COM method call contains top-level or embedded pointers that point to the same address in memory, you must use full pointers; otherwise, the memory won't be aliased properly on the other side. In other words, if pointer A and pointer B refer to the same location in memory, but aren't explicitly labeled as full pointers, the MIDL-generated code that marshals the pointers will create not one but two blocks of memory at the destination. It will point A to one of them and B to the other.

What it boils down to is simply this: if you redefine ITEM to support doubly linked lists, you also need to declare the embedded pointers and the pointer passed in the method's parameter list explicitly to be full pointers, as shown here:



 // IDL
 typedef struct tagITEM {
     int nVal;
     [ptr] struct tagITEM* pNext;
     [ptr] struct tagITEM* pPrev;
 } ITEM;
 
 HRESULT SetList ([in, ptr] ITEM* pList);

With this simple change, the proxy and the stub will handle the duplicitous pointers properly and transfer a doubly linked list intact.

Drop Me a Line

Are there tough Win32®, MFC, COM, or Microsoft® Transaction Server programming questions you'd like answered? If so, drop me a note at jeffpro@msn.com. Please include "Wicked Code" in the message title. I regret that time doesn't permit me to respond to individual questions, but rest assured that each and every suggestion will be considered for inclusion in a future column.

From the May 1999 issue of Microsoft Systems Journal.