Data Granularity

A significant difference between an OLTP or operational system and a data warehouse is the granularity of the data stored. An operational system typically stores data at the lowest level of granularity: the maximum level of detail. However, because the data warehouse contains data representing a long period in time, simply storing all detail data from an operational system can result in an overworked system that takes too long to query.

A data warehouse typically stores data in different levels of granularity or summarization, depending on the data requirements of the business. If an enterprise needs data to assist strategic planning, then only highly summarized data is required. The lower the level of granularity of data required by the enterprise, the higher the number of resources (specifically data storage) required to build the data warehouse. The different levels of summarization in order of increasing granularity are:

Current and historical operational data are taken, unmodified, directly from operational systems. Historical data is operational level data no longer queried on a regular basis, and is often archived onto secondary storage.

Aggregated, or summary, data is a filtered version of the current operational data. The design of the data warehouse affects how the current data is aggregated. Considerations for generating summary data include the period of time used to aggregate the data (for example, weekly, monthly, and so on), and the parts of the operational data to be summarized. For example, an organization can choose to aggregate at the part level the quantity of parts sold per sales representative per week.

There may be several levels of summary data. It may be necessary to create summary level data based on an aggregated version of existing summary data. This can give an organization an even higher level view of the business. For example, an organization can choose to aggregate summary level data further by generating the quantity of parts sold per month.

Metadata does not contain any operational data, but is used to document the way the data warehouse is constructed. Metadata can describe the structure of the data warehouse, source of the data, rules used to summarize the data at each level, and any transformations of the data from the operational systems.

  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.