Markus Egger
Database design is one of the most important tasks in the application development cycle. Unfortunately, this is a particularly complex task, too. But applying patterns that are proven to work promises a solution. In this article, Markus gives an introduction to data model patterns to help guide you through the complex process of data modeling.
One of the new buzzwords in the computer business, and especially in the FoxPro world, is "pattern." Patterns are everywhere. They help you create your object-oriented applications, they help you code your algorithms, and more. But very few realize that patterns are also helpful when it comes to data modeling. Of course, OO patterns are more obvious, because there are so many different data scenarios. But nevertheless, most of the data models describe similar entities. We store addresses, items, contracts, documents, and so forth.
As a consultant, I deal with many different companies -- software companies that plan to sell their products shrink-wrapped, as well as end-users who plan to use their product in-house. And they all have one thing in common: Their data model is very special and totally different from everyone else's. But is it really? I don't think so! This isn't to say that all companies' models are the same; quite the opposite. As a matter of fact, I was never able to totally reuse a model that I'd used before. But I was able see a number of patterns (sometimes with the help of others, I have to admit). After all, most models vary in the details -- different field names, some additional entities, some different business rules that describe how the data gets into the tables, and so forth.
Keeping track of people and organizationsPeople seem to be important to almost every application. That's not surprising, since quite a lot of companies have people working for them. What is surprising, though, is that most applications don't handle this basic requirement very well. People usually have more than one telephone number; many have more than one address. People have relationships with other people or companies; one person can work for more than one company, sometimes even simultaneously.
But again, most applications can't handle these scenarios. Usually, we only find one table in the database that simply has a few fields for name, address, telephone, and other attributes.
Introducing partiesPeople and organizations share many attributes and many relationships, and very often we don't even have to make a distinction. For example, we can send invoices to an individual (the guy who inspected our roof, and for whom we wrote an application) or a corporation (the bank that loaned us the money for the house, and for which we wrote another application). After all, a corporation is just a legal person. For this reason, I like to use the term "party."
To assign attributes to this entity, you'd need to interview your customer, since these attributes differ greatly in different applications. But you can also make some educated guesses. Most parties have a name. If the party is a person, this would be the last name. A person would also have a first name and a middle initial. If the party is a company or other organization, these fields will remain blank. Similarly, there are some attributes that are specific to organizations, like the purpose.
I can imagine more attributes that apply to both persons and organizations -- among them would be the address. Which exact attributes the address requires can, again, vary, depending on several factors. In the age of the Internet, it's very important to be able to handle different address formats. As a European, it always drives me crazy when I see an address block that only allows you to enter street, city, state, and ZIP code. In Europe, addresses are simply different. But you can take care of these problems in the interface (which is really convenient right now, since this article doesn't deal with interfaces). If you want to know more about solutions to this problem, you should consult Steven Black's article about strategies ("OOP Design Patterns: Add Design Flexibility with a Strategy Pattern," FoxTalk, January 1997).
After considering all these issues, I came to the conclusion that it's possible to identify a set of standard address attributes. Figure 1 shows our party entity.
Figure 1: The party entity contains attributes for all parties, as well as specific attributes depending on the type of party.
"But Markus," one might say, "how is that different from other applications? We still can't handle multiple addresses." And that's true. So let's see what can be done about it. The obvious solution would be to add a related table that holds one or more addresses; the pattern would then look like that shown in Figure 2.
Figure 2: The party entity with a related table to handle multiple addresses.
This might do the trick in simple scenarios. After all, you don't want to make the model unnecesarily complex. Please keep in mind that all the patterns I introduce here are simply suggestions, not rules. Data models are never wrong; they're just more or less useful for the given scenario. So if this model already fulfills your needs, then go ahead and use it.
But obviously, this model doesn't yet reflect the real world. Not only can one party have more than one address, but addresses can also be shared. For example, there are a couple of different companies in my office building. Also, each of these companies has several persons (sub-parties) working for them.
The next complexity is that addresses have an effective date. You can't just delete an old address whenever somebody moves -- this would result in dire consequences for existing data. For example, suppose you send an invoice to a customer. Two years later, you look at the invoice again, and all of a sudden it doesn't have an address associated with it because you deleted it when the customer moved. But, on the other hand, you can't just leave the address in the database because it's simply not valid anymore. You can resolve this problem by adding a placement entity. Figure 3 shows how this works.
Figure 3: The placement entity adds a temporal dimension to the address entity.
I replaced the entity address with the entity site. The reason is that "address" isn't a very well-defined term. In daily conversations, an address is usually associated with a single party ("What's John's address?"). On other occasions, an address can be associated with multiple parties, or parties might even share addresses, as already discussed. The term site should clarify the situation -- it's more clearly a physical location.
I also moved the purpose from the organization into the placement table. This might look strange to data pattern insiders, since the solution that's usually suggested is to move the purpose into the site. But I don't agree with that. A site might serve different purposes for different parties; what's the office for one company might be the support center for another. An individual who works at home would have multiple purposes for the same site. The purpose might also change over time -- the support center might become the sales management building, for instance. For this reason, I moved the purpose into the placement table.
By now, we already have a quite flexible model that's able to deal with most address scenarios. This can get a lot more complex, of course -- for example, we could add geographic locations, sub-areas, overlapping areas, and so on. But this would be too much for this article. If you're interested in more information about these kinds of things, I suggest the book Data Model Patterns -- Conventions of Thought by David C. Hay (ISBN: 0-932633-29-3).
Handling phone numbers and e-mail addressesAnother issue of interest when it comes to parties is phone numbers and the like. I refer to this data as "communication information." This isn't entirely accurate, because the address is basically communication information as well, but for now I'll leave it at that.
Communication information could include phone numbers, fax numbers, e-mail addresses, and Web pages; they're quite difficult to handle. A phone number could belong to a certain placement or site, but, at the same time, it could be a mobile phone number that's taken along wherever its owner goes. Even after moving or changing companies, the number would still be valid. Thus, the number needs to be attached to the party itself, independent of a site. The situation with e-mail addresses is similar -- some are related to a company, perhaps to a corporate server. So if one moves on to a different job at another company, that person's e-mail address would change. Other e-mail addresses might be more generic, such as yourname@compuserve.com or yourname@hotmail.com. In this case, the person probably wouldn't change it very often.
For these reasons, I usually attach the communication information to two different entities, as shown in Figure 4.
Figure 4: Communication data such as phone numbers and e-mail addresses are stored in yet another entity that's related to both party and site.
In this scenario, communication information can be attached to a party or to a site. Of course, this is a 1:n relation, so you can attach as many phone numbers or e-mail addresses as you want. Also note that singular communication information can be attached simultaneously to both a site and a person. Why is that important? Well, here's an example: I own a couple of cell phones, which would be attached directly to me. But every now and then I give one of them to one of my temporary employees; thus, for a while, the same communication information is attached to two different parties. But it wouldn't be attached directly -- rather, it would be attached over the site, since the phone really belongs to me/my company.
A variation would be to attach the communication information to the placement, rather than the site. This way, the communication information automatically gets an effective date when attached to the placement, but not when attached to a party. This makes a lot of sense, because communication information usually expires as soon as a person or party moves away from a placement. Phone numbers that are directly attached to the person (like a cellular phone) won't expire. This scenario is beautifully handled by this pattern.
Another way to accomplish this would be to add date fields to the communication information. But very few people would appreciate the hassle of having to attach dates to every phone number (unless it were an important requirement). After all, flexibility is powerful only if you can also provide a simple interface.
The complicated part: RelationshipsIn real life, relationships are complicated. The bad news: Data models are supposed to reflect the real world. The good news: Well, er . . . hmmmm.
Actually, life isn't that bad. Let's take a look at internal relations first. When I say "internal relations," I'm referring to relations within a corporation, organization, or other party -- in other words, a party can be a composition of other parties. A corporation has several departments in which there are people or other sub-parties. Unlike in private life, these are all 1:n relationships (well, at least you're not supposed to have 1:n relations in your private life <g>).
But it's not that simple. Not only can a department have multiple persons working for it, one person can also work in multiple departments. Over time, the complexity evolves. People move on, get promoted, and so on. And it really adds spice to the situation when a person might remain in the same department but work in a different position.
The pattern shown in Figure 5 allows you to handle the aforementioned scenario. A party, which would repesent the employee, can be linked to any position (actually to several different positions) using the position assignment entity. You can assign multiple positions to any party (this would be the employer or the department), and you can assign any number of parties (persons) to any position. Also, this allows you to keep track of historically correct data using the start and end dates in the position assignment entity.
Figure 5: A party can be linked to multiple positions through yet anotehr entity.
This should do the trick for most of us. If you're interested in patterns that handle more complex scenarios, have a look at David C. Hay's book.
Of course, there are many different kinds of relationships among parties -- far too many to discuss in this article. Figure 6 shows a simplified but very generic relationship pattern -- it can be used to link any kind of party to any number of other parties. This pattern isn't a special relationship pattern, but rather a very simple cross link pattern.
Figure 6: Relationships between parties might be handled by a generic relationship pattern.
I've found that this pattern serves most of my needs. All I usually have to do is create some business rules and some new fields, but the overall pattern remains the same. In other words, I try to keep my relationships simple (<g>).
ConclusionIn this article, I've shown you how to create a powerful and flexible data model to store party, address, and communication information. I use this model in almost all my applications that need to store information about parties. Whenever I interview a new customer, this part of the application is already in place and ready to go before I even enter the customer's office. For most purposes, you can simply reuse this model, add a couple of fields and perhaps a couple of business rules, and voila!
And of course, I also have a set of reusable interface classes that provide a simple interface to all this power. By now, you might think that some of these models seem nice in theory but are too complex for the user. And I agree. To enter data as it's stored in the model is too complex for the user and too hard to understand. The trick is to provide an interface that hides this complexity. A very good example of a powerful interface that's simple to use is the contact manager of Microsoft Outlook. At first, it feels like using a flat file. But once it's needed, it offers quite a lot of flexibility. Interface design, however, is a totally different issue and should be addressed another time.
As you've probably already realized, patterns can be found in almost all parts of data models; this article is just a brief introduction. I'll address more of these patterns in future issues of FoxTalk. I hope that, after reading this article, all of this seems simple and obvious to you. Chances are that you've either already stored your data in a flexible way (most likely differently from the way I've described), or you simply think that all this makes a lot of sense, prompting you start to wondering why most applications don't do a very good job handling the entities discussed here. In that case, this article will have served its purpose. If you have questions, feel free to e-mail me or visit my Web page (http://www.eps-software.com) for addenda to this article.
Markus Egger is the owner of EPS software, a software development firm located in Salzburg, Austria, that specializes in object-oriented development, training, and consulting. He's presented sessions at numerous Visual FoxPro conferences, including FoxTeach, the European VFP conference, the Dutch Lowlands conference in Amsterdam, and GLGDW in Milwaukee. He's written for Fuchs, FoxTalk, and Microsoft Office and Databaser Journal and is a Microsoft MVP. Max@foxgang.net, www.foxgang.net/eps.