Data is usually added periodically to the data warehouse to include more recent information about the organization’s business activities. Changes to data already in the data warehouse are less frequent and usually made only to incorporate corrections to errors discovered in the source from which the data was extracted, or to restructure data due to organizational changes. Structural changes to the data warehouse design typically are the least common.
Referential integrity must be maintained when data warehouse data is added, changed, or deleted. Loss of referential integrity can cause errors during cube processing, fact table records to be bypassed, or result in inaccurate OLAP information.
It is common to add new data to the data warehouse. Cube information available online to clients can be affected when data is added to the data warehouse due to interaction between the data and cube partitions. You can manage the effects of adding data to the data warehouse by carefully defining partition filters, and by designing a strategy to synchronize OLAP and data warehouse data. For more information, see Partition Storage and Synchronizing OLAP and Data Warehouse Data.
Microsoft® SQL Server™ OLAP Services provides an incremental update processing method that efficiently incorporates data additions into OLAP cubes and dimensions.
Changes to correct errors in a data warehouse can be minimized by applying care during the data transformation, validation, and scrubbing operations. Other changes to existing data warehouse data can arise from changes in the structure of an organization or its products. For example, reorganizing products into different categories can require significant changes to data in the data warehouse, as well as to reports derived from the data warehouse. In some cases, such changes can require the complete redesign of cubes. In other cases, the redesign of dimensions and the processing of all cubes that use those dimensions may be all that is required.
Changes to correct errors in basic data should be incorporated in the source database, usually the OLTP business database, and then migrated to the data warehouse in a controlled manner. Many business OLTP database designs require changes to be made by a transaction that offsets the incorrect data and applies new correct data. It is often easier to manage the impact of such correction transactions on OLAP data. The incremental update processing method for cubes can incorporate new data transactions that correct value errors, such as an incorrect sale value. However, transactions that move a fact from one dimension member to another, such as a sale posted to the wrong customer, can affect the results of aggregate functions such as Avg. This is true for non-OLAP databases as well; if an original sale order is zeroed out but the record remains in the database, it will be included in the count of sales records and affect the calculation.
Changes to data in the fact table can affect the accuracy of queries to a cube until the cube is processed. In this case, the refresh data processing method can be used to reload the cube’s data and recalculate the aggregations. Because aggregation design remains the same, the refresh data processing method is faster than the complete process processing method.
Dimension hierarchies can be affected by changes to data in the data warehouse dimension tables even though the table schema remains the same. The dimension hierarchy is based on relationships between members in a dimension table. When these relationships are changed, for example when cities are reorganized into different sales regions, the dimension structure must be rebuilt using the rebuild the dimension structure dimension processing method.
The structure of OLAP cubes and dimensions can be affected by changes to the design of the data warehouse such as the addition, deletion, or alteration of tables, or relationships between tables. When the structure changes, you must modify the design of affected cubes and dimensions, redefine partitions and aggregations, and completely process the modified cubes and dimensions.
When a cube that is currently online is processed by any one of the three processing methods (process, incremental update, or refresh data), the cube remains online until the processing has been completed, at which time the online cube is replaced by the new cube version. When a cube is processed using the full process method, online clients will be disconnected from the cube when the switch is made to the new version of the cube, and the clients must individually reconnect to access the new version. When a cube is processed using either the incremental update or the refresh data method, online clients will not be disconnected from the cube when the processing completes. The new version of the cube will be immediately visible with no break in service.
The processing of a shared dimension can affect cubes that incorporate the dimension in their design. If a shared dimension is processed using the rebuild the dimension structure method, all cubes that use the dimension will immediately become unavailable to clients and must be processed before they can be used again. If a shared dimension is processed using the incremental update method, cubes that use the dimension remain available to clients and any new members added to the dimension automatically become available to clients when the dimension processing is complete. Any such new members will not have fact data associated with them until the cube is updated with new related facts.
Note Empty cells introduced by new dimension members can affect the results of some aggregate function computations in which empty cells are counted.
Caution If a shared dimension’s structure is updated and saved but not processed, it will automatically be processed when any cube incorporating the dimension is processed. At that time, any other cubes that incorporate the dimension immediately become unavailable to client applications and must be processed before they can be used again.