MFC TN002--Persistent Data Format

Created: April 15, 1992

ABSTRACT

This technical note describes the MFC routines for supporting persistent C++ objects and the format of those objects in a persistent store.

The Microsoft Foundation Class (MFC) library provides a full-featured set of C++ object classes for the MicrosoftÒ WindowsÔ graphical environment. It includes classes that directly support application development for Windows as well as general-purpose classes for collections, files, persistent storage, exceptions, diagnostics, memory management, strings, and time. Each MFC technical note describes a feature of MFC using code fragments and examples.

THE PROBLEM

The MFC implementation for persistent data relies on a compact binary format for saving data on disk. This format is distinct from the format used for diagnostic output of class objects for two reasons:

Diagnostic output is human readable.

Maximum space efficiency is desired when saving to a persistent store (usually a disk).

For these reasons, MFC does not provide a polymorphic interface for storing objects, as is common in other “pure” object-oriented languages, such as Smalltalk-80Ô.

MFC uses the CArchive class, which provides a context for persistence. This context lasts from the time the archive is created until the CArchive::Close member function is called, either explicitly by the programmer or implicitly by the destructor.

This technical note describes the implementation of the CArchive protected members WriteObject and ReadObject. Users never call these functions directly; the type-safe insertion and extraction operators (supplied by DECLARE_SERIAL) are used instead. Similarly, the user rarely calls the CObject::Serialize virtual member function directly, unless the object being stored is embedded in another class object, in which case the exact type of the object is known.

Note:

This technical note describes code in the MFC ARCHIVE.CPP source file.

SAVING OBJECTS TO THE STORE (CARCHIVE::WRITEOBJECT)

The protected member function:

void CArchive::WriteObject(const CObject*)

writes out enough data so that the object can be correctly reconstructed. This data consists of two parts: the type of the object and the state of the object. This member function also maintains the identity of the object being written out so that only a single copy is saved, regardless of the number of pointers to that object (including circular pointers).

The saving (inserting) and restoring (extracting) of objects rely on several manifest constants. These are values that are stored in binary form and that provide important information to the archive (note that the w prefix indicates 16-bit quantities).

wNullTag // used for NULL object pointers (0)

wNewClassTag // indicates class description that follows is new

// to this archive context (-1)

wOldClassTag // indicates class of the object being read

// has been seen in this context (0x8000)

When storing objects, the archive maintains a CMapPtrToWord (the m_pStoreMap), which is a mapping from a stored object to a 16-bit persistent identifier (PID). A PID is assigned to every unique object and to every unique class name that is saved in the context of the archive. These PIDs are handed out sequentially starting at 1 and have no significance outside the scope of the archive. In particular, they should not be confused with the “record number” or other identity concepts.

Thus, when a request is made to save an object to an archive (usually through the global insertion operator), the following operation takes place: A check is made for a NULL CObject pointer; if the pointer is null, the wNullTag is inserted into the archive stream.

If you have a real object pointer that can be serialized (a DECLARE_SERIAL class), WriteObject then checks the m_pStoreMap to see if the object has already been saved. If it has been saved, it inserts the 16-bit PID associated with that object.

If the object has not been saved before, either both the object and the exact type (that is, the class) of the object are new to this archive context or the object is of an exact type already seen. To determine if the type has been seen, you query the m_pStoreMap for a CRuntimeClass object (formally, CRuntimeClass is a structure to avoid problems associated with meta-classes) that matches the CRuntimeClass object associated with the object you are saving. If you have seen this class before, WriteObject inserts a 16-bit tag that is the bit-wise ORing of wOldClassTag and this index. This operation imposes a hard limit of 32,766 indexes per archive context. This number is the maximum number of unique objects and classes that can be saved in a single archive, but a single disk file can have an unlimited number of archive contexts. If the CRuntimeClass is new to this archive context, WriteObject assigns a new PID to that class and inserts it into the archive, preceded by the wNewClassTag value. The descriptor for this class is then inserted into the archive using the CRuntimeClass member function Store. CRuntimeClass::Store inserts the schema number of the class (see below) and the ASCII text name of the class. The use of the ASCII text name does not guarantee uniqueness of the archive across applications; thus, it is advisable to tag your data files to prevent corruption (imagine two distinct applications that both define the CWordStack class, for example). After the class information is inserted, the archive places the object into the m_pStoreMap and calls the Serialize member function to insert class-specific data into the archive. Placing the object into the m_pStoreMap before calling Serialize prevents multiple copies of the object from being saved to the store.

When returning to the initial caller (usually the root of the network of objects), close the archive. If other CFile operations are pending, the Flush archive member function must be called. Failure to do so results in a corrupt archive.

LOADING OBJECTS FROM THE STORE (CARCHIVE::READOBJECT)

Loading (extracting) is the converse of the WriteObject operation. As with WriteObject, user code does not call ReadObject directly; the type-safe extraction operator (declared by the DECLARE_SERIAL macro) is used instead. This extraction operator ensures the type integrity of the extract operation.

Because the WriteObject implementation assigned increasing PIDs starting with 1 (0 is predefined as the NULL object), the ReadObject implementation can use an array to maintain the state of the archive context. When a PID is read from the store, ReadObject knows that a “new” object (or class description) follows if the PID is greater than the current upper bound of the m_pLoadArray.

SCHEMA NUMBERS

A useful feature of the archive mechanism is schema numbers. The schema number, which is assigned to the class when IMPLEMENT_SERIAL is encountered, is the “version” of the class implementation. The schema refers to the implementation of the class, not to the number of times a given object has been made persistent. Properly, the latter is usually referred to as the version of the object. If you intend to maintain several implementations of the same class over time, incrementing the schema number at release time lets you write code that can extract older versions of the implementation. The CArchive::ReadObject member function throws an exception (of type CArchiveException) when it encounters a schema number in the persistent store that differs from the schema number of class description in memory. If your implementation of Serialize for a class with multiple schemas catches this exception, you can continue the extraction operation, taking into account the differences in the implementation of the Serialize member function.

CRUNTIME CLASS

The persistence mechanism uses the CRuntimeClass data structure. MFC associates one structure of this type with each dynamic and/or serializable class in the application. These structures are initialized at application startup time using a special static object (CClassInit). You need not concern yourself with the implementation of this information, as it is likely to change between revisions of MFC.

The current implementation of CRuntimeClass does not support multiple inheritance (MI). This does not mean that you cannot use MI in your MFC application, but it does mean that you will have certain responsibilities. The CObject::IsKindOf member function will not correctly determine the type of an object if the object has multiple base classes. Therefore, you cannot use CObject as a virtual base class, and all calls to Serialize, operator new, and any other CObject member functions will need scope qualifiers so C++ can disambiguate the function call. If you do find the need to use MI within MFC, be sure to make the class containing the CObject base class the leftmost class in the list of base classes. For advice on the uses and abuses of MI, see Advanced C++ Programming Styles and Idioms by James O. Coplien (Addison-Wesley, 1991).