Data Warehouse Characteristics

A data warehouse can assist decision support and online analytical processing (OLAP) applications because it provides data that is:

Consolidated and Consistent Data

A data warehouse consolidates operational data from a variety of sources with consistent naming conventions, measurements, physical attributes, and semantics.

For example, in many organizations, applications can often use similar data in different formats: dates can be stored in Julian or Gregorian format; true/false data can be represented as one/zero, on/off, true/false, or positive/negative. Different applications can also use different terms to describe the same type of data. One application can use the term “balance” instead of “total amount” to represent the amount of money in a bank account.

Data should be stored in the data warehouse in a single, acceptable format agreed to by business analysts, despite variations in the external operational sources. This allows data from across the organization, such as legacy data on mainframes, data in spreadsheets, or even data from the Internet, to be consolidated in the data warehouse, and effectively cross-referenced, giving the analysts a better understanding of the business.

Subject-oriented Data

Operational data sources across an organization tend to hold a large amount of data about a variety of business-related functions, such as customer records, product information, and so on. However, most of this information is also interspersed with data that has no relevance to business or executive reporting, and is organized in a way that makes querying the data awkward. The data warehouse organizes only the key business information from operational sources so that it is available for business analysis.

Historical Data

Data in OLTP systems correctly represents the current value at any moment in time. For example, an order-entry application always shows the current value of stock inventory; it does not show the inventory at some time in the past. Querying the stock inventory a moment later may return a different response. However, data stored in a data warehouse is accurate as of some past point in time because the data stored represents historical information.

The data stored in a data warehouse typically represents data over a long period of time; perhaps up to ten years or more. OLTP systems often contain only current data, because maintaining large volumes of data used to represent ten years of information in an OLTP system can affect performance. In effect, the data warehouse stores snapshots of a business’s operational data generated over a long period of time. It is accurate for a specific moment in time and cannot change. This contrasts with an OLTP system where data is always accurate and can be updated when necessary.

Read-only Data

After data has been moved to the data warehouse successfully, it typically does not change unless the data was incorrect in the first place. Because the data stored in a data warehouse represents a point in time, it must never be updated. Deletes, inserts, and updates (other than those involved in the data loading process) are not applicable in a data warehouse. The only operations that occur in a data warehouse, when it has been set up, are loading and querying data.

  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.