Designing and Building the Database

Designing and building the database is a critical part of building a successful data warehouse. This step is often performed by experienced database designers because it can involve taking data from multiple (sometimes heterogeneous) sources and combining it into a single, logical model.

Unlike OLTP systems that store data in a highly normalized fashion, the data in the data warehouse is stored in a very denormalized manner to improve query performance. Data warehouses often use star and snowflake schemas to provide the fastest possible response times to complex queries, and the basis for aggregations managed by OLAP tools.

The components of schema design are dimensions, keys, and fact and dimension tables.

Fact tables
Contain data that describes a specific event within a business, such as a bank transaction or product sale. Alternatively, fact tables can contain data aggregations, such as sales per month per region. Except in cases such as product or territory realignments, existing data within a fact table is not updated; new data is simply added.

Because fact tables contain the vast majority of the data stored in a data warehouse, it is important that the table structure be correct before data is loaded. Expensive table restructuring can be necessary if data required by decision support queries is missing or incorrect.

The characteristics of fact tables are:

Dimension tables
Contain data used to reference the data stored in the fact table, such as product descriptions, customer names and addresses, and suppliers. Separating this verbose (typically character) information from specific events, such as the value of a sale at one point in time, makes it possible to optimize queries against the database by reducing the amount of data to be scanned in the fact table.

Dimension tables do not contain as many rows as fact tables, and dimensional data is subject to change, as when a customer’s address or telephone number changes. Dimension tables are structured to permit change.

The characteristics of dimension tables are:

Dimensions
Are categories of information that organize the warehouse data, such as time, geography, organization, and so on. Dimensions are usually hierarchical in that one member may be a child of another member. For example, a geography dimension may contain data by country, state, and city. A city member is a child to a state member, which is in itself a child to a country member. Thus, the dimension is comprised of three hierarchical levels: all countries, all states, and all cities in the dimension table. To support this, the dimension table should include the relationship of each member to the higher levels in the hierarchy.
Dimensional keys
Are unique identifiers used to query data stored in the central fact table. The dimensional key, like a primary key, links a row in the fact table with one dimension table. This structure makes it easy to construct complex queries and support drill-down analysis in decision support applications. An optimal data warehouse database contains long, narrow fact tables and small, wide dimension tables.
Star Schema

The most popular design technique used to implement a data warehouse is the star schema. The star schema structure takes advantage of typical decision support queries by using one central fact table for the subject area, and many dimension tables containing denormalized descriptions of the facts. After the fact table is created, OLAP tools can be used to preaggregate commonly accessed information.

The star schema design helps to increase query performance by reducing the volume of data that is read from disk. Queries analyze data in the smaller dimension tables to obtain the dimension keys that index into the central fact table, reducing the number of rows to be scanned.

Snowflake Schema

The snowflake schema is a variation of the star schema where dimension tables are stored in a more normalized form. This can help improve performance for queries by reducing the number of disk reads.

  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.