A C++ Persistence Interface

A number of issues arise when you translate the theory of persistence into actual C++ code. For example, you will probably choose a collection framework (e.g. the STL) to simplify your development. Rather than implementing all of the collection types yourself, you will use this library's well tested code to handle your collection requirements. While this will save months of work, it doesn't come without a price. The library itself will require you to write some code to support its collections, and this code must be integrated into your own code with sufficient encapsulation, so that if you decide to change your collection mechanism, you won't break the entire application.

Swizzling References

When you recreate an object from the information stored in the database, you need to recreate its associations with other objects as well. If the associated object is not yet in memory, you cannot initialize a pointer to it and you certainly cannot create a reference (references to null objects are illegal in C++). We use the OID to represent the object until it is loaded. An OID is unique, and it typically corresponds to a primary key in a database table, so pulling the referenced object out of the database should be fast.

Swizzling is the process of turning a pointer into an OID and writing it to persistent storage, or using an OID to retrieve an object from persistent storage

Once an object is in memory, the OID can be swizzled into a pointer. Although a regular C++ pointer can be used here, there are some subtle complications. Some memory allocation schemes allow objects to be shared. That is, if two parts of an application run a query that results in the same object being retrieved, then it will be retrieved once and shared. This can easily happen, since the result of most queries is a collection class, and an object can be in more than one of these returned collections.

This raises the problem of how you know when it is safe to delete an object. Since the persistence framework is generating these references, and the application programmer is using them, neither party necessarily knows the lifetime of a given reference. The application programmer doesn't know whether or not he is the only one pointing to the object, so he can never safely delete it. The persistence framework also cannot know when the application is finished with the object, and so it cannot delete it either. This leads to memory leaks.

There are two solutions to this problem. You can either disallow sharing of memory, or you must implement reference counting. In reference counting, each time you share an object you increment its reference count. As you delete objects, the object decrements its reference count. When the reference count returns to zero, then the object is destroyed.

Queries

In a typical persistence framework, all objects are accessed from the database through queries. This is the way that the application obtains any objects that have been previously persisted.

Typically, there are two types of queries and two types of return values. In the first type of query, you provide an OID and the database then returns the object that corresponds to that OID. This turns into a simple SQL query, whose WHERE clause is just an EQUALS match on the OID. This should run very fast.

In the second type of query, you provide an SQL WHERE clause, and the system returns the object, collection or web of objects that satisfies the SQL query. In this case the WHERE clause can be as complex as the application programmer desires. Of course, a complex WHERE clause may require a significant amount of database processing.

Query Signatures

Each of the above types of query is codified by one static member function for the class — the focus of the query. The OID-based queries are of the following (approximate) format:

PersistentObject* Get(const OID &);

There is one of these Get() functions for each persistent class. They each take an OID as an input parameter and return a pointer to a persistent object. A pointer is used as the return value instead of a reference, so that NULL can be returned if the object is not found.