Aggregations

Aggregations are precalculated summaries of data that improve query response time by having the answers ready before the questions are asked. For example, a query requesting the weekly sales totals for a particular product line from a data warehouse fact table that contains hundreds of thousands of rows of transactions can take a long time to answer if the fact table has to be scanned to compute the answer. The response can be almost immediate if the summarization data to answer this query has been precalculated. Precalculation of summary data is the foundation for the rapid response times of OLAP technology.

Cubes are the way OLAP technology organizes summary data into multidimensional structures. Dimensions and their hierarchical levels reflect the queries that can be asked of the cube. Aggregations are stored in the multidimensional structure in cells at coordinates specified by the dimensions. For example, the question “What were the sales of product X in 1998 for the Northwest region?” involves three dimensions (product, time, and geography) and one measure (sales). The value in the sales cell within the cube at the coordinates (product X, 1998, Northwest) is the answer, a single numerical value.

Other questions may return multiple values, such as “What were the sales of hardware products by quarter by region for 1998?” Such queries return sets of cells from the coordinates that satisfy the specified conditions. The number of cells returned by the query depends on the number of items in the hardware level of the product dimension, the four quarters in 1998, and the number of regions in the geography dimension. If all summary data has been precalculated into aggregations, the response time of queries like this will depend only on the time needed to extract the specified cells. No calculation or reading of data from the fact table is necessary.

Precalculation of all possible aggregations in a cube results in the fastest possible response time for all queries. However, the storage and processing time required for the aggregations can be substantial. Storage requirements depend not only on the number of dimensions and measures, but also on the number of levels in the dimensions and the number of members of each level.

There is a tradeoff between storage requirements and the percentage of possible aggregations that are precalculated. If no aggregations are precalculated (0%), little storage space is required beyond that necessary to store the base data. In this case, however, query response time will vary and may be quite slow because all answers will have to be calculated from the base data for each query. Returning the single number that answers the first query above (“What were the sales of product X in 1998 for the Northwest region”) might require reading thousands of rows of data, extracting the sale value from each, and calculating the sum.

Microsoft® SQL Server™ OLAP Services incorporates a sophisticated algorithm to select aggregations for precalculation so that other aggregations can be quickly computed from precalculated values. For example, if the aggregations are precalculated for the month level of a time dimension, the calculation for a quarter requires only the summarization of three numbers, which can be quickly computed on the fly. This technique saves storage with little effect on query response time.

The Storage Design wizard provides options for you to specify storage and percentage constraints to the algorithm to achieve a satisfactory tradeoff between query response time and storage requirements. The Usage-Based Optimization wizard enables you to adjust the aggregation design for a cube by analyzing the queries that have been submitted by client applications. You can tune a cube’s aggregation design to provide rapid response to frequent queries and less rapid response to infrequent queries without substantially affecting the storage needed for the cube.