Strings Inside Out

The C string format, known in API jargon as LPSTR, is a sequence of characters terminated by the null character (ASCII value 0), as shown in Figure 2-6 on the facing page. The LPWSTR format is the same except that it uses 16-bit Unicode characters. Notice that the length of the string isn’t stored. C programmers must either keep track of the length themselves or call a function that calculates the length by looping through each character until it finds a terminating null.

Supposedly, the implementation of Basic strings isn’t documented because it might change in a later version of Basic. In fact, the format of Basic strings is well known and changed little from QuickBasic to Visual Basic 3. These strings were at least partially documented in the VBX custom control documentation, where they went by the name HLSTR (high-level string). Figure 2-6 illustrates the format.

However, if you ignored Microsoft’s advice and wrote C DLLs that took advantage of your knowledge of Basic strings, Visual Basic version 4 sent you officially up the creek, and version 5 leaves you there. Basic now uses the BSTR format, described in the COM Automation documentation. Figure 2-6 points out the difference. BSTRs are better than HLSTRs for two reasons. First, they have one less pointer. Second, they are already null terminated, so Basic doesn’t have to null-terminate before passing them on to C.

Figure 2-6. Four kinds of strings.

As a Basic programmer, you must make an unnatural distinction when passing strings to the Windows API: you have to separate input strings from output strings and handle each case in a completely different fashion. This takes some getting used to.