Persistence Through Property Sets

At the beginning of this chapter, I described a property set as a "sparse, flexible, and extensible stream format in which you can serialize almost any information you want." By stream format, I mean a format for a series of bytes stored in a stream object and accessed through the IStream interface, although the same property set format can be used anywhere that byte streams are in use. In fact, the format also accommodates property sets spanning multiple streams that are all contained within a single storage object. We will see how this can be done shortly. First, however, what do I mean by sparse, flexible, and extensible?

In one sentence, a general property set is a serial collection of property values in which each property is tagged with a type (a VT_* value) and a 32-bit property identifier, or PID. This information is stored in the order PID, type, value so that any code reading the property set can first check the identifier to see whether the property is of interest, and if so, it can read the type. That type then describes the format and length of the value that follows in the stream. This sort of structure means that the semantics of the type are easily determined, although the meaning of the value is left to the program to decide. The property set writer can also include a dictionary that maps binary PIDs to human-readable strings for display in the user interface.

Individual properties are grouped into a format, or section. Each section, tagged with a format identifier, or FMTID, represents a particular set of properties, although any code reading this format will again attempt to read only those properties that exist.

Obviously, because a property set is intended to be used as a data exchange format, you'd think some standards must exist as far as property identifiers and format identifiers are concerned, yes? Well, yes and no. Take a quick look through the standard OLE header files on your machine for any value that starts with PID_. (Include a space before the P so that you don't pick up DISPID_ values.) You didn't find any, did you? Now look for anything containing FMTID. Didn't find any for that either. So do any standards exist? What use is this format specification if there are no standard property or format identifiers?

The fact of the matter is that the plural property sets describes a general format in which you can store any information. A singular property set describes a named collection of specific information. The specifications for that single property set are responsible for defining which identifiers are used for which formats and properties, giving each property a name that is meaningful in the context of that single property set. The format identifiers themselves are GUIDs, and the designer of the specific property set is responsible for assigning a GUID as the FMTID. For example, I mentioned the Summary Information property set at the beginning of this chapter. This set is assigned the FMTID F29F85E0-4FF9-1068-AB91-08002B27B3D9 and is defined to contain strings for a document title, a subject, an author, keywords, and comments; integers for page count, word count, and character count; time stamps for creation, modification, and print times; and so on. In the Summary Information set, the symbol PID_TITLE, for example, is assigned the value 2 and has the type VT_LPSTR. In another property set, the PID 2 might be for something completely different, but because the second set has its own FMTID, no one will attempt to interpret that property as if it belonged to a different format. So while the specific contents of any two sets differ, the means of storing individual values in a PID/type/value triplet is the same. In other words, a VT_LPSTR will always have the same layout in the stream regardless of the specific property set.