From the information technology perspective, data warehousing is aimed at the timely delivery of the right information to the right individuals in an organization. This is an ongoing process, not a one-time solution, and requires an approach different from that required in the development of transaction-oriented systems.
A data warehouse is a collection of data in support of management’s decision-making process that is subject-oriented, integrated, time-variant, and nonvolatile. The data warehouse is focused on the concept (for example, sales) rather than the process (for example, issuing invoices). It contains all the relevant information on a concept gathered from multiple processing systems. This information is collected and stored at regular intervals and is relatively stable.
A data warehouse integrates operational data by using consistent naming conventions, measurements, physical attributes, and semantics. The first steps in the physical design of the data warehouse are determining which subject areas should be included and developing a set of agreed-upon definitions. This requires interviewing end users, analysts, and executives to understand and document the scope of the information requirements. The issues must be thoroughly understood before the logical process can be translated into a physical data warehouse.
Following the physical design, operational systems are put in place to populate the data warehouse. Because the operational systems and the data warehouse contain different representations of the data, populating the data warehouse requires transformations of the data: summarizing, translating, decoding, eliminating invalid data, and so on. These processes need to be automated so that they can be performed on an ongoing basis: extracting, transforming, and moving the source data as often as needed to meet the business requirements of the data warehouse.
In the operational system, data is current-valued and accurate as of the moment of access. For example, an order entry application always shows the current value of inventory on hand for each product. This value could differ between two queries issued only moments apart. In the data warehouse, data represents information gathered over a long period, and is accurate as of a particular point in time. In effect, the data warehouse contains a long series of snapshots about the key subject areas of the business.
Finally, information is made available for browsing, analyzing, and reporting. Many tools assist in analysis, from simple report writers to advanced data miners. Ultimately, analysis drives the final iterations of the data warehousing process, causing revisions in the design of the data warehouse to accommodate new information, improve system performance, or allow new types of analysis. With these changes, the process restarts and continues throughout the life of the data warehouse.