The General Property Set Layout

At the highest level, a property set is composed of three major pieces, illustrated in Figure 16-5. These pieces are a property set header, a list of FMTID/Offset pairs identifying sections (each offset points to the location of a separate section from the beginning of the stream), and the serialized sections themselves. When a property set has only a single format, as Summary Information does, there is only one FMTID/Offset pair and only one section.

Figure 16-5.

The top-level property set layout.

A property set structure is thus defined generically as follows. (This is not an actual structure but simply a model of the property set layout.)

struct
    {
    PROPHEADER        ph;
    FORMATIDOFFSET    rgFIDO[cSections];
    PROPERTYSECTION   rgSections[cSections];
    };

You can see that a property set is a header followed by an array of FMTID/Offset pairs followed by an array of sections. Both arrays have the same number of elements. The PROPHEADER structure itself is defined as follows:3

typedef struct
    {
    WORD    wByteOrder ; // Always 0xFFFE
    WORD    wFormat ;    // Always 0
    DWORD   dwOSVer ;    // System version
    CLSID   clsID ;      // Source CLSID
    DWORD   cSections ;  // Number of sections (must be at least 1)
    } PROPHEADER;

The wByteOrder field indicates whether the property set is written in big endian or little endian. The OLE specifications state that all property sets must be written in little-endian (Intel) ordering. This means that the value 0xFFFE appears in the stream as 0xFE and 0xFF—in that exact order—regardless of the originating operating system stored in dwOSVer. Currently this can be 0 for 16-bit Windows, 1 for the Macintosh, or 2 for 32-bit Windows. (Other system codes will be added as necessary.)

As indicated in the structure, the wFormat version number is always 0 for the specification of property sets defined here. If someone creates a new general property set layout that is different from the one described here, wFormat will contain another value. Code that reads a property set must always check this value to read the rest of the stream robustly. Either the CLSID field identifies the code that knows how to read and display the information in this property set, or it contains the FMTID of the property set itself if only one format is contained within it. For example, because Summary Information is defined to be generic across many applications, its FMTID will always appear here.

Finally, the cSections field defines how many FMTID/Offset pairs exist following this header, and this defines the number of overall sections. Each pair is a structure of the following sort:

typedef struct
    {
    GUID        formatID;
    DWORD       dwOffset;
    } FORMATIDOFFSET;

Again, the offset in each pair is a seek offset from the beginning of the entire property set stream to the beginning of a section. We can see why a piece of code attempting to read a property set can safely ignore sections it doesn't understand—for example, new extensions to an existing property set. If the reader doesn't recognize the FMTID in a given pair, it can simply read the next pair to see whether that one makes sense. Whenever this code finds a format it does understand, it can seek to the offset of that section and read its contents.

The Section Layout

At any given section offset, you'll find a header and PID/Offset pairs. Each offset points to a property in the bytes that follow. Furthermore, each property is a type/value pair in which the real data is stored. This structure is similar to the property set structure itself, as shown in Figure 16-6.

Figure 16-6.

The section layout nested within a property set.

You can look at a section through a generic structure, as we did for the entire property set:

struct
    {
    SECTIONHEADER     sh;
    PROPERTYIDOFFSET  rgPIDO[cProperties];
    PROPERTY          rgProperties[cProperties];
    };

The SECTIONHEADER structure has only two fields. One describes the size of the section as a whole, and the second describes the number of properties within it. Together they describe the size of the PID/Offset list that follows it in the stream:

typedef struct
    {
    DWORD       cbSection ;
    DWORD       cProperties ;
    } SECTIONHEADER;

Having the size of the section first allows anyone either to skip over this section entirely or to copy it wholesale from the stream without having to know anything more about the contents of the section. When someone is interested in the internals of the section, he or she can look at the properties it contains by using the list of PROPERTYIDOFFSET structures:

typedef struct
    {
    DWORD       propertyID;
    DWORD       dwOffset;
    } PROPERTYIDOFFSET;

Just as code that is reading a property set can skip unrecognized FMTIDs in the property set structure, code can skip unrecognized PIDs contained in this list in the section structure. Once again, if the reading code doesn't recognize the PID, it simply reads the next one. The code attempts to extract information for a property only when it understands the PID in the context of the FMTID. It then seeks to the offset written after the PID and at the offset finds information that it can use. Note that all properties must begin on a 32-bit boundary within the stream; extra zeros pad the leftover bytes.

The Property Type/Value Layout

When you get to the bottom of a property set, you're faced with a variable length structure that contains the data type of the property and the bytes that make up its value:

struct
    {
    DWORD      dwType;     //From VARTYPE
    BYTE       rgbValue[cbValue];
    };

The type is always described with some VT_* value from the VARTYPE enumeration that we saw in Chapter 14. This type is followed by the number of bytes appropriate for the type. If dwType is VT_I2, the next 2 bytes contain the value. If the type is VT_BSTR, the bytes contain a character count followed by the characters. Other types, such as VT_BLOB and VT_STORAGE_OBJECT, also have a specific format of the bytes that follow. Any type combined with VT_VECTOR specifies that an array of that type follows, in which the first DWORD specifies the number of elements and the bytes that follow contain the elements themselves. (Each element is aligned to a DWORD boundary, which matters only for VT_I2 and VT_BOOL because all other types are naturally DWORD aligned.) The list of available types and their data formats, taken from the OLE Programmer's Reference, is given in Table 16-1 beginning on the following page.

For various reasons, there is no provision for adding new VT_* values to define new types—that is, new data structures. Say you want to store a structure such as PARSNIP, as shown in the following:

typedef struct
    {
    short      Lengthcm;   //Length in centimeters
    short      Radiuscm;   //Top radius in centimeters
    short      Weightg;    //Weight in grams
    COLORREF   Shade;      //Exact color shade
    } PARSNIP;

To do this, you would need to write each field as a separate VARIANT structure and define the entire structure as VT_VARIANT œ VT_VECTOR. In the stream, you would have the following:

DWORD = VT_VARIANT œ VT_VECTOR11
DWORD = 4 (Count of elements)
DWORD = VT_I2  (Element 1)
WORD  = Lengthcm value
DWORD = VT_I2  (Element 2)
WORD  = Radiuscm value
DWORD = VT_I2  (Element 3)
WORD  = Weightg value
DWORD = VT_I4  (Element 4)
DWORD = Shade value (32-bit COLORREF)
WORD  = 0 (padding to DWORD align)

Two special properties always have the same PID regardless of the property set. PID 0 is defined as a dictionary that maps other PID values to user-readable strings, as described in the next section. PID 1 is a code page indicator (VT_I2) identifying the character set in which strings are stored, depending on the originating operating system.4 Programs that don't understand the code page should not attempt to read any string-oriented data in the section. Modifying a code page property means that you must also modify all other string properties in the section according to the new code page. If no code page is present, the program must assume the system default code page.

Type	Value	Description
VT_EMPTY	0	None. A property set with a type indicator of VT_EMPTY has no data associated with it. The size of the value is 0.
VT_NULL	1	None. This is like a pointer to NULL.
VT_I2	2	2-byte signed integer value zero-padded to a 32-bit boundary.
VT_I4	3	4-byte signed integer value.
VT_R4	4	4-byte 32-bit IEEE floating-point value.
VT_R8	5	8-byte 64-bit IEEE floating-point value.
VT_CY	6	8-byte two's complement integer (scaled by 10,000) as commonly used for currency amounts.
VT_DATE	7	64-bit time format, a floating-point number representing seconds since January 1, 1900. This is stored in the same representation as VT_R8.
VT_BSTR	8	Counted, zero-terminated binary string; represented as a DWORD byte count (including the terminating null character) followed by the bytes of data.
VT_BOOL	11	2 bytes representing a Boolean (WORD) value containing 0 (FALSE) or -1 (TRUE), zero-padded to a 32-bit boundary.
VT_VARIANT (only with VT_VECTOR)	12	DWORD type indicator followed by the corresponding value.
VT_I8	20	8-byte signed integer.
VT_LPSTR	30	Same as VT_BSTR; used for most strings.
VT_LPWSTR	31	A counted and zero-terminated Unicode string; a DWORD character count (in which the count includes the terminating null character) followed by the same number of Unicode (16-bit) characters. The count is not a byte count but a WORD count.
VT_FILETIME	64	64-bit FILETIME structure as defined in Win32.
VT_BLOB	65	DWORD count of bytes, followed by the same number of bytes of data. The byte count does not include the 4 bytes for the length of the count itself; an empty BLOB would have a count of 0, followed by 0 bytes. This is similar to VT_BSTR, but it does not guarantee a null byte at the end of the data.
VT_STREAM	66	A VT_LPSTR (DWORD count of bytes followed by a zero-terminated string of the same number of bytes); it names a stream containing the actual property value. The stream is a sibling of the stream holding this type indicator; this stream must be named "CONTENTS".
VT_STORAGE	67	A VT_LPSTR (DWORD count of bytes followed by a zero-terminated string of the same number of bytes); it names a substorage that contains the real property value. The substorage is a sibling of the stream containing this type. The stream itself must be named "CONTENTS".
VT_STREAMED_OBJECT	68	Same as VT_STREAM (same requirements) but indicates that the named stream contains a serialized object, which is a CLSID followed by initialization data for the class. The object can be instantiated with OleLoadFromStream.
VT_STORED_OBJECT	69	Same as VT_STORAGE (same requirements) but indicates that the named substorage contains an object that can be loaded through OleLoad or Read-ClassStg and IPersistStorage::Load.
VT_BLOB_OBJECT	70	An array of bytes containing a serial-ized object in the same representation as would appear in a VT_STREAMED_OBJECT (VT_LPSTR). The only significant difference between this type and VT_STREAMED_OBJECT is that VT_BLOB_OBJECT does not have the system-level storage overhead that VT_STREAMED_OBJECT has. VT_BLOB_OBJECT is more suitable for scenarios involving numerous small objects.
VT_CF	71	An array of bytes containing a clipboard format identifier followed by the data in that format. In other words, following the VT_CF identifier is the data in the format of a VT_BLOB. This is a DWORD count of bytes followed by the indicated number of bytes of data. A LONG followed by an appropriate clipboard identifier and a property whose value is plain text should use VT_LPSTR, not VT_CF, to represent the text. Also, an application should choose a single clipboard format for a property's value when using VT_CF.
VT_CLSID	72	A CLSID, which is a DWORD, two WORDs, and 8 bytes.
VT_VECTOR	0x1000	If the type indicator is one of the previous values in addition to this bit being set, the value is a DWORD count of elements followed by the indicated number of repetitions of the value. When VT_VECTOR is combined with VT_VARIANT (VT_VARIANT must be combined with VT_VECTOR), the value contains a DWORD element count, a DWORD type indicator, the first value, a DWORD type indicator, the second value, and so on. Examples: VT_LPSTR œ VT_VECTOR has a DWORD element count, a DWORD byte count, the first string data, a DWORD byte count, the second string data, and so on. VT_I2 œ VT_VECTOR has a DWORD element count followed by a sequence of 2-byte integers, with no padding between them.

Table 16-1

OLE property types that can appear in property sets.

Dictionaries

As mentioned earlier, the property with PID 0 is always an optional dictionary that is generically defined as the following structure:

struct
    {
    DWORD    cEntries;
    ENTRY    rgEntry[cEntries];
    };

Each entry is a structure containing the PID along with what is essentially a BSTR:

struct
    {
    DWORD    propertyID;
    DWORD    cbName;          //Includes the null terminator
    char     szName[cbName];
    };

The count of entries in a dictionary is the one exception to the usual property structure of a type/value pair. The count of entries in the dictionary sits in the place of the type indicator. In addition, if a dictionary exists, it must contain at least one entry describing PID 0 with the name of the property set itself. Other than that, a dictionary can contain any entries it wants, and it doesn't need to include entries for every last property in the rest of the set. Some PIDs are implicitly understood by any other code that knows how to read the set in the first place.

3 This structure and those that follow are (unless noted otherwise) taken from the source code for the OLE Control Development Kit (CDK) shipped with Visual C++ 2.0. You won't find this structure and the others in any standard OLE header file, only in places that implement a property set. The CDK happens to be one of those places; it also includes some convenient classes for working with property sets.

4 See the Win32 GetACP function for legal values for Windows-originated properties. For the Macintosh, see Inside Macintosh Volume VI §14-111, Addison-Wesley, 1991.