A data warehouse is often used as the basis for a decision-support system (also referred to from an analytical perspective as a business intelligence system). It is designed to overcome some of the problems encountered when an organization attempts to perform strategic analysis using the same database that is used to perform online transaction processing (OLTP).
A typical OLTP system is characterized by having large numbers of concurrent users actively adding and modifying data. The database represents the state of a particular business function at a specific point in time, such as an airline reservation system. However, the large volume of data maintained in many OLTP systems can overwhelm an organization. As databases grow larger with more complex data, response time can deteriorate quickly due to competition for available resources. A typical OLTP system has many users adding new data to the database while fewer users generate reports from the database. As the volume of data increases, reports take longer to generate.
As organizations collect increasing volumes of data by using OLTP database systems, the need to analyze data becomes more acute. Typically, OLTP systems are designed specifically to manage transaction processing and minimize disk storage requirements by a series of related, normalized tables. However, when users need to analyze their data, a myriad of problems often prohibits the data from being used:
By copying an OLTP system to a reporting server on a regularly scheduled basis, an organization can improve response time for reports and queries. Yet a schema optimized for OLTP is often not flexible enough for decision support applications, largely due to the volume of data involved and the complexity of normalized relational tables.
For example, each regional sales manager in a company may wish to produce a monthly summary of the sales per region. Because the reporting server contains data at the same level of detail as the OLTP system, the entire month’s data is summarized each time the report is generated. The result is longer-running queries that lower user satisfaction.
Additionally, many organizations store data in multiple heterogeneous database systems. Reporting is more difficult because data is not only stored in different places, but in different formats.
Data warehousing and online analytical processing (OLAP) provide solutions to these problems. Data warehousing is an approach to storing data in which heterogeneous data sources (typically from multiple OLTP databases) are migrated to a separate homogenous data store. Data warehouses provide these benefits to analytical users:
Sometimes organizations maintain smaller, more topic-oriented data stores called data marts. In contrast to a data warehouse which typically encapsulates all of an enterprise’s analytical data, a data mart is typically a subset of the enterprise data targeted at a smaller set of users or business functions.
Whereas a data warehouse or data mart are the data stores for analytical data, OLAP is the technology that enables client applications to efficiently access the data. OLAP provides these benefits to analytical users:
The terms data warehousing and OLAP are sometimes used interchangeably. However, it is important to understand their differences because each represents a unique set of technologies, administrative issues, and user implications.