Optimizations

As with all development, optimizations should be deferred until the system is up and running and debugged. As you approach your ship date, you will want to profile the system and determine where the performance bottlenecks are. At that time (and not before), you should tweak your system to make it run faster.

Places to consider for potential optimization include:

Classically, when you build a database-centric system, you build it first, and then you spend a good deal of time tweaking the database to make it run faster. The database definitions, the SQL embedded in the application code, the database utilities — they are all suspects when it comes to finding performance bottlenecks.

Lazy Evaluation

When an object is created from the database it may have member variables which are references to other objects. As we saw earlier, you normally swizzle these objects into memory by examining the OID stored with your object and retrieving the referred-to object. These objects in turn may need to swizzle their member objects into memory. Creating an object may cascade through a number of creations and can therefore be a very expensive process.

One answer to this problem is to defer swizzling the member objects until they are needed. This lazy evaluation of member references can dramatically improve performance at run-time.

For example, when an object is created in memory as the result of a query, the object's associated objects would not initially be retrieved from the database. The new object would, under lazy evaluation, just hold the OID, rather than creating a proper reference to its member variables.

In other words, only the object that is explicitly the result of the query would be brought into memory. The first time any of the object's associations are followed or referenced, the object represented by the OID are then be recreated in memory from its disk image, and the OID is swizzled into a reference to the newly recreated object.

You can also use lazy evaluation to create only the base part of an object without creating the specific subtype. If the return type of the query is wider than the actual returned type, then only the OIDs that are in the sliced portion of the object are turned into memory references. That is, if the return type of a query is a pointer to a base object, but the actual returned object is a sub-type, you can instantiate only the base type to save time.

Let's look at an example to illustrate this. Say your code is written to call a query and assign the result to a pointer to an

Animal
. In fact, the query returns a pointer to a
Dog
(
Dog
is a subclass of
Animal
). Rather than evaluating the entire
Dog
object, you can just swizzle the
Animal
portion into memory, and keep an OID to refer to the
Dog
part. Later, if you refer to a method of
Dog
, or you need data from the
Dog
part, you can then use that OID to swizzle the
Dog
part into memory.

Since the client code may only be using the functionality provided by the base class, it might never need to swizzle any of the remaining pointers. The point, of course, is not just saving the time it takes to turn the OID into a memory reference, but the time spent recreating the object from its disk image. Of course, there is a tradeoff. If the database is on a remote machine, then more trips across the network will be required to obtain the non-base class (

Dog
-specific) information. The time to set up the necessary messaging connections might more than make up for any savings that lazy evaluation otherwise provides.

Another possible use of lazy evaluation is for queries that return collections of objects. The idea here would be to return a collection of OIDs, and only turn them into objects the first time they are accessed. This will not usually be of any value, as collections of objects that are returned from queries are usually iterated through. The iterator causes the database to be hit for each object, one at a time. That's N+1 trips across the interface, 1 for the primary object and 1 for each association, as opposed to not doing lazy evaluation and getting all of the objects in 1 trip across the interface.

Gets and Sets

It is best to write the

Get()
and
Set()
functions for attributes yourself, even if you are using a CASE tool would generate them automatically.

A

Get()
function for a data member's value needs to check to see whether or not the value of the data member is an OID. If this is the case, then it must be swizzled into a pointer (and the object loaded) before the
Get()
can return. A
Set()
of a data member should mark the object as dirty, so that the saving of objects can be optimized. That is, as we walk through the web of objects we are saving, if we come to an object that has not been modified (the "dirty bit" is not set), we do not have to write it to the database.

If you really want to optimize database access, you can have dirty bits at the data member level as well as the class level. If you segregate the data members defined in the base class from the those defined in the derived class, and put this information into different rows in the database, then you can choose which parts of the object to write to the database when you save the object. If you changed the state of the sub-class but not of the base class, you may be able to minimize the amount of information you write into the database for updates.

Secondary Indexes

The easiest way to speed up the database is to make it faster to find a particular row. It is much faster to find the row that has a given key if the keyed fields are indexed. If the fields that make up the key are indexed, then the database provides what is known as a secondary index. This is a table that maps keys to row indexes for the regular data table. This secondary index table is usually very small and very fast.

Adding a secondary index makes creating, updating and deleting rows in the table slightly slower. This is because the secondary index has to be updated. This cost is low, and the increase in speed is usually noticeable; it is almost always a good idea to build this index.

Rewriting SQL

A good starting point for optimizing your use of a database is to tweak the SQL. Examine the

WHERE
clause. There are many examples of systems where rewriting or just changing the order of the sub-clauses in the
WHERE
clause has dramatically changed the overall responsiveness of the system.

Some of the ways that you can change the

WHERE
clause do not require any changes in the underlying DDL. These are the easiest changes to make.

Look for opportunities to eliminate joins. These tend to be quite expensive. Try to structure your complicated searches, so that the first search eliminates most rows, and subsequent searches are against a very small data set. Also structure your searches so that the first, larger part of the search is against indexed columns.

Sharing Tables

One way to speed up a join is to eliminate it. If there is, for example, a one-to-one correspondence between two objects, then the OID of the second one can be the value of a column in the first one. This "join" turns into a keyed lookup for the primary key, the OID, of the second object. Following this join can slow performance. Because this is a one-to-one relationship, the two tables easily can be combined. That is, the columns that hold the second table's data can be concatenated onto the end of the table, after the columns that hold the first table's data.

Denormalizing the Database Tables

Normalization attempts to reduce duplication of information in the database. When two tables hold the same data, it is possible for them to get out of synch with one another. For example, suppose you keep your customer's social security number in both the

Customer
object and in the
Account
object. This might speed up some of the processing as you could get the customer's social security number without having to touch an
Account
object. However, if you were implementing a use case that required you to update the social security number, it would be easy to forget to update both the
Customer
object and any
Account
objects for that customer, and the data would then get out of synch.

Typically, you'll want to start with a normalized database, and only denormalize as an optimization; allowing you to speed up your performance at the risk of corrupting data.

© 1998 by Wrox Press. All rights reserved.