Determining Fact and Dimension Tables

It is important to determine correctly what existing tables and data in the operational systems should comprise the fact and dimension tables. If these are not correctly identified, then the data warehouse can suffer from poor performance, or may have to be redesigned at a later stage. Redesigning an operational data warehouse, possibly containing large amounts of data, can be a prohibitively expensive task.

Although the process of determining the structure and composition of fact and dimension tables can be difficult, especially when there are multiple (heterogeneous) operational systems to consider, the two most important steps to follow are identifying the:

Identify Fundamental Business Transactions

The first step involves identifying the transactions that describe the basic operations of the business which the data warehouse will be used to analyze. For example, using the sample order entry system described earlier, the underlying business transaction is a sale of a product. Identifying the fundamental business transactions yields the information that will be represented in the fact tables. The information needed to describe the sale of a product is largely found in the Order_Details table.

When analyzing existing operational systems for potential fact tables, always look for the underlying business processes involved. Many operational systems are designed based on necessity rather than an accurate business model. For example, a school database may record only the grade per student for all subjects for a year because it does not have enough disk space to store the data at a lower level of detail.

In this instance, a data warehouse used to store student data from all schools in a region should be designed to capture this summarized data as well as a lower level of detail when the schools are able to provide the information in the future. For example, the fact table might store details regarding the grades for each subject, per student, per school, per region, per date period.

Identify the Dimension Tables

The next step involves identifying the entities that describe how the fact data will be analyzed. For example, given that the order entry system fundamental transaction is the sale of a product, dimension data from the operational schema could include payment method, product name, date of sale, or shipping method. However, the dimension data chosen should represent the focus of the business analysis. As an example, the business analysis performed on the order entry data warehouse will include variations of:

Therefore, the dimension tables will include product data, region data, and time period data. In this example, payment or shipping methods were not required because the business will not use the data warehouse to analyze that data.

From the original order entry OLTP schema, all the fact and dimension data for the data warehouse can be found in the Customers, Orders, Products, and Order_Details tables.

  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.