The primary goal when designing fact tables is to minimize the size without compromising the data requirements. Fact tables are the largest tables in the database because they contain detail level data representing the underlying business transactions. However, the costs of storing, and maintaining these large tables should be considered. For example, larger tables require more online, and potentially offline, storage; take longer to backup and restore in the event of a system failure; and take longer to query when building OLAP aggregations.
The easiest ways to reduce the size of fact tables include:
Remove any columns that are not required to analyze the operations of the business. For example, if the data does not represent a business transaction, or if the data can be derived using aggregates, remove the data from the fact table. Although aggregated columns often improve query performance, the size of a typical fact table can prohibit using them. For example, if the Order_Details fact table contains one billion rows, and a column, Total_Price, is added representing Quantity multiplied by UnitPrice, one billion new values now exist permanently in the table.
Important If a column is moved from a fact table to another table, and is referenced frequently in queries involving data from the fact table, large join operations may be required. These joins can affect query performance. Therefore, the trade-off between reducing storage costs and affecting query performance should be determined.
Although Order_Details forms the basis of the fact table, the OrderID column is not required in the final fact table because OrderDetailID is the unique identifier for the business transaction: a sale or a product. In fact, OrderID does not represent a single business transaction; it represents the sale of one or many products to a single customer, and so cannot be used.
Because fact tables tend to have a large number of rows, even one redundant byte per row can add up to a large amount of wasted database space. For example, a fact table containing one billion rows, with one unused byte in one of the columns, represents almost 1 GB of unused database. To reduce column widths:
If data within fact tables is rarely used, such as sales data from several years ago, it may be useful to archive the data. This approach reduces the volume of data in the fact table, hence increasing the performance of queries. Exceptional queries, on older data, can be run against multiple fact tables without affecting the majority of users querying the fact tables containing recent data. When Microsoft® SQL Server™ OLAP Services is used in conjunction with multiple fact tables, the OLAP Services engine manages queries against multiple back end fact tables. This simplifies the management and use of multiple fact tables containing historical data.