Synchronizing OLAP and Data Warehouse Data

Valid cubes are online and available to client applications at all times when the OLAP server is running. Because of the interaction of OLAP cube partitions with data in the data warehouse, the design of the data warehouse should include a synchronization strategy to enable the addition of data without causing cubes to provide incorrect answers to queries in cubes available to online clients.

A Data Synchronization Strategy

One strategy for managing additions to data warehouse and OLAP data is to design a batch update system. In this strategy, all data in the data warehouse fact table includes a batch number in each record. When you design a cube, add an expression to the filter for each of the cube’s partitions to specify the largest batch number applicable, for example, “… AND DWBatch <= 33 ...” When additions to the fact table need to be made, include a new, higher batch number in the new records. Cubes are unaffected by these added records because the cube partitions are restricted to reading data from previous batches only.

Data added to a dimension table does not affect existing cube private or shared dimensions until the dimensions are processed. A batch number in dimension table records is not necessary, but can be useful in ensuring continued referential integrity.

When a batch of data has been added to the fact table and dimension tables, processing of dimensions and cubes or partitions to incorporate the changes can proceed in an orderly manner. Shared dimensions should be processed before the cubes that use them. To add new members to a dimension that do not affect the dimension’s structure, use the incremental update method. To add new members and rebuild the dimension’s structure, use the rebuild the dimension structure method. Note that when a shared dimension is processed with the rebuild the dimension structure method, all cubes that incorporate that dimension will immediately become unavailable to clients and must be processed before they can be used again. However, when a shared dimension is processed using the incremental update method, a cube that uses the shared dimension will display the new members but the cells associated with those members will remain empty until the cube is updated with new data from the fact table that relates to the new members.

To incorporate a new data batch in a cube, update the filter expression in each of the cube’s partitions to include the new batch number, and then process or incrementally update the cube. If a cube’s data is divided among multiple partitions, you can use one of the partitions to accumulate new data batches and process that partition only. The cube’s other partitions must have filters that exclude new data so that data will be added only to the accumulation partition.