Data Modeling for Access Developers



Markus Bosshard

For many Access developers, data modeling is something that's only used for large database systems -- far too technical to be of practical use. Well, that's not really true. The fact that you're using Access and not Excel shows that you want your data stored in a structured way in a database. This article shows you how to go about structuring your data using data modeling.

When I was still studying computer science at university, I'd sit bored and disinterested in the lectures on "Database Fundamentals." I kept thinking, "Who on earth needs to know all this stuff about entities, attributes, tuples, one-to-many, and so on and so forth?" It seemed to me to be unnecessarily theoretical. Besides, I preferred the visual aspect of programming, which meant that I preferred to write games. With games, you could see something happening on the screen.

In the real world, when I had to earn my own meals, I learned that just about every application needs to store data in some persistent form. Even games. Some applications, like the one I used to write this article, have a fairly simple data structure. In its simplest form, a Word document's data consists of characters and various formatting properties. The information is saved in a file, and that's the end of the problem.

Other applications consist of lots of different data objects that are somehow related to one another. In these applications, when the values of part of the data change, it has an effect on some other data, which in turn affects yet other data. The problem becomes progressively more complicated as you add more objects to the project. This is where data modeling comes into play. If you want to build complicated applications, you're going to do some data modeling. In fact, you already are.

I'm not going to go into a lot of low-level detail about how to create entity-relationship diagrams or use specialized tools. In fact, I'm going to suggest that Access is the only tool you need. Nor will I go into material on normalization and many-to-many relationships that's been done to death elsewhere. I'm going to suggest how you can organize what you're already doing to make yourself more effective.

Data modeling
Data modeling is a means for you, as a developer, to conceptualize and understand the structure of the data used by your application. It helps you to group the data into smaller logical groups that are easier for you to grasp. Using data modeling, you can start documenting and printing your understanding of the data organization early in the development cycle of the application.

That documentation matters. Two months after completion of a project, if you're like me, you'll have forgotten why you designed your data tables the way you did. If you don't have any documentation, you'll be using the source code to work out what your thoughts were when you did the design. Poor you, if you need to make changes to someone else's code. Especially if that someone has left the company and isn't available for consultation.

If data modeling is so important, how should you go about modeling the data your application needs? The answer is to think about how you tackled the task of defining the data tables for the last application you wrote. Maybe you had a clearly specified requirements document, or maybe you just made a few notes during a meeting with the customer. Did you then, armed with that material, "click-start" Access and begin creating tables, defining the fields that came to mind, looking at your notes, and changing the fields again? You probably went on in that way for a while, then opened the tables and entered some dummy data. If you used the report wizard, you could have had your first report of the data within a few minutes of creating the database. Isn't Access great? Day one and you already have results.

At this point, you're probably expecting me to tell you that this is a bad process. But it's not. The first step to data modeling is to work out what your data is. At that point in the cycle, you're trying to group the pieces of information that you know you need to store into logical units. In data modeling, these units are called entities, and the pieces of information are called attributes. If you intend to do your data modeling using Access, then you can represent the entities with tables. Most entities do become tables, though that doesn't have to be the case. And most attributes become fields in the tables. As an Access developer, you're probably used to the terms table and field, so I'll continue to use those in this discussion.

One of the things that I'm trying to get across is that you're already doing data modeling. It's really just a question of how well you're going to do it and whether you're going to do it in a manner that gives you the best results for your effort.

So how do you decide which fields you should put into which tables? I sometimes have problems with this part, because I tend to jump from the highest level of design to the lowest level and back again. At the highest level, I think about the interface to the user. In other words, I'm wondering, "What will the dialog box or form look like when the user enters the data?" Accordingly, I create a field in a table for every control in the form. At the lowest level, I'm trying to work out how I should optimally store the data. This is a case of do what I say, not what I do. You should avoid my tendency to jump back and forth, because it can confuse and irritate you. It's best to use a three-level approach when modeling your data.

Three layers
What you should do is look at your data from three points of view: the functional, the logical, and the physical view of the data. In Access, the functional view is the way the user of your application sees the data. It's basically the information entered and displayed by your forms and reports. Most of the functional layer comes from the requirements of the application. If you don't get a fairly detailed requirements document from your customer, you'll have to do that first. Obviously, you must have a fairly clear understanding of what the application is supposed to do. (For some good advice on requirements, see Mike Gunderloy's article, "The ABCs of Requirements Management for Access," in the November 1997 issue of Smart Access.)

At this stage, you can start to define the entities you'll require to represent the information that the user of your application wants to see, as well as the information he or she wants to enter into your application. Think of the entities as objects that have a life cycle. Typically, objects are created, then read and updated, and eventually deleted. The question is what business functions cause the object to enter a particular state. For example, in a business application, you might have a table called "Bills." This might be the states that the bill entity goes through:

1. When you send a customer some goods, you include a bill stating that he should pay within 10 days. So you create a new "bill" record in the Bills table. The bill is in the "new" state.

2. After 10 days, you'll read the record to check whether the bill was paid. If not, you'll send the customer a reminder. The bill enters the state "reminder has been sent."

3. Some time later, you receive payment (hopefully). The bill enters the state "paid."

4. After a while you'll delete the bill, after updating the turnover for the particular customer.

Thinking this way helps you to identify your entities and the attributes they'll require over their life. At this stage, you don't worry too much about the tables and fields in the database. When you have a fair understanding of the entities you'll require and how their life cycle works, you can move on to the logical view or layer. I say "fair" because you don't have to be 100 percent clear on how your information is structured at this stage. The whole modeling process is recursive, which means you'll constantly be going back to the beginning to check whether the requirements were fulfilled and refine your model. Each level in the process will help you understand your data better, which will require you to go back and modify your original design.

Implementing tables and fields
At the logical layer, you implement your entities as tables and your attributes as fields with particular data types. In the logical layer you don't worry too much about performance, as you'll do that when you start working at the physical layer. At this point, however, you should start setting your field names (see the sidebar "Data Domains").

Also at the logical level, you'll start implementing the relationships between the tables. In the preceding example of the Bills entity, there's an implied relationship between the Customers table and the Bills table. One of the fields in the Bills table is the ID of the customer that received the bill from you. I didn't realize how important defining these relationships was until I converted my first application to a SQL database.

I'd been working with BTRIEVE for almost 10 years. BTRIEVE isn't a proper database system -- it's more like a record manager. BTRIEVE can retrieve your data very quickly, but you can't define any relationships between the tables. So, if I wanted to print the addresses of the customers who were to receive bills today, I had to read all the records in the Bills table one by one. In order to get the Customer address, for each bill I would have to find the record in the Customer table that was related to the customer ID in the Bills record I'd read.

One day my client came to me and said, "If we wanted all our BTRIEVE applications to use a SQL database such as SQL Server, what would that mean?" "Well," I thought, "that's not a problem. SQL Server is a database system for the big boys, and its functionality is a superset of BTRIEVE's. It must be good." I promised, in a few days, to prepare a demo to show all the interested parties how SQL Server worked.

I began by implementing all the BTRIEVE files as SQL tables. Then I took the applications' source code and replaced all the BTRIEVE calls with SQL queries. I actually was ready for the demo, which consisted of printing a single report, in the two or three days that I'd promised. I was very anxious and went through some sleepless nights, concerned about the performance of the new app. With a stop watch in hand, I set up the first report and hit the Enter key. All I saw was the hourglass. The first thought that came to mind was that the application had died, but after a while the report was finally printed. I could have used a calendar to measure the response time, as the report was about six times slower than before. I realized I was in serious trouble, because the demo had already been arranged and I had to present the application.

To cover up the fact that my solution was so slow (remember this trick!), I replaced the standard Windows wait cursor with an animated cursor of a little dog running. Most attendants at the demo were so fascinated by the little dog that they didn't realize how long it took to generate the report.

It was this experience that made me realize I needed to define the tables in such a way that the database could do the work of joining them. Right now, I was doing all the work in the application. This is why relationships are important: When you know how your tables are tied together at the logical level, you know where your indexes should be at the physical level. In this example, the fact that the customer ID in the Bills table tied that table to the Customer table meant that I should put an index on the customer ID field in the Bills table.

The logical level is also when you should normalize your data (see the two Working SQL columns, "The Trouble with Normal" and "Getting Normal," in the January and February 1998 issues of Smart Access). Again, avoid the temptation to worry about performance at this level. If you need to de-normalize and optimize your table design to improve performance, you can do that in the physical layer.

Modeling tools
Up to now, I've suggested that you can do this design work just using Access as your modeling tool. There are, of course, specialized modeling tools on the market, but they can be fairly expensive. These tools allow you to define your entities and attributes, as well as the relationships between the entities. When you're done, you can print an entity relationship diagram to see the structure of the data visually. These tools will also, at the press of a button, talk to your database management system to generate your database for you.

Some tools in this category let you make changes to an existing database or reverse engineer a database back into the modeling tool. Each of these tools will urge you to learn a special diagramming technique to use it. These tools are great (I worked with Erwin from Logic Works for a while), but I personally still prefer to create and modify the database for my applications using Access itself. In the end, the tool you use isn't as important as working through the three layers of the design and understanding what you're doing at each level.

What I did miss badly in Access was a nice diagram representing the relationships between the tables. Access does have its Relationships window, but try to figure out which line goes where if you have 20 tables in your design. For that reason, I wrote an application called "Relation," which reads the table definitions in an Access database and creates an entity relationship diagram for you (see Figure 1). If you make changes to the Access database, Relation reflects the changes when you re-synchronize the diagram. The tool can be found on the Internet at http://ourworld.compuserve.com/Homepages/Markus_Bosshard.

Rounding off the database
Once your tables and fields are defined, you can move on to implementing the physical layer. In a very large database system, the physical layer is kept strictly separated from the logical layer, so that changes at the physical level don't require changes in the application. The only way to do this in Access is to create queries and have your application use those queries and not the tables themselves. While this is an excellent way to shield your application from changes in the database design, I don't always do it, myself. Those additional queries do increase the application maintenance overhead for the database. As a result, the logical and physical layers in Access often intersect.

The next step is to enter some test data, so you can create some queries to see whether you're getting the data the way you need it. Of course, you can also create prototypes of the forms you'll need and use them to enter the data. I recommend that, until you're sure what the data model will look like, you create only some "quick and dirty" forms. Test as many queries as your application will need, using what you learn to go back and modify your functional and logical designs.

As a final step, look at your table to see if you need to split tables with many fields into smaller tables, joined using a one-to-one relationship. In a network environment, when different users access the same table, splitting these tables can reduce locking conflicts. Also, some of the fields that are very rarely changed might be worth moving into another table.

Conclusion
It's worth investing some time in the data design. If you need to make changes to the database when your application is nearly finished, it can be very time-consuming. To summarize the data modeling process, remember the following points:


Once the data model of an application has been clearly defined, the remaining work on the application will be greatly reduced.



Sidebar: Data Domains
As you define your attributes, you should start thinking about the attributes' domains. An attribute's domain describes what values are permissible for the attribute. This can vary from a list of specific values (for example, all the states in the union) to a rule that describes the valid values (for instance, all positive integers between 0 and 500). In Access, the decisions you make about the attributes' domains will not only help you understand your data better, they can also translate directly into entries in the ValidationRule properties for the field and table.

Understanding the field's domain also helps when setting your field names. You'll have to decide whether you want the same field in different tables to have same name. What's meant by the "same field in different tables" is that these fields share the same domain. The question is whether you want to use unique names for these fields. As an example, do you call the description field in the bills table and the description field in the customer table both "Description"?

I give unique names to these tables, but I follow a naming convention that allows me to see that the fields share a domain. I do that by giving fields that share the same domain the same name and then adding a prefix to the field name based on the table the field is in. As a result, the customer description field will be called CustomerDescription and the field in the Bills table will be called BillDescription. This method saves quite a bit of typing when creating queries and helps to reduce confusion.

Markus Bosshard graduated in 1986 from Wits University, South Africa, with a bachelors degree in computer science. He owns the company MSB Software Engineering, which develops Windows software under contract. Currently, he's re-engineering a suite of business applications originally written for BTRIEVE to use a client/server architecture with a SQL database at the backend. Markus_Bosshard@compuserve.com.