OLAP Services: Performance Implications of the Architecture

Alexander Berger and Ashvini Sharma
Microsoft Corporation

Created June 1, 1999

Processing
Query
Agent
Conclusion
Finding More information

Summary: Discusses the internal implementation of the OLAP Services component. (7 printed pages)

Introduction

Microsoft^® SQL Server™ OLAP Services is a high-performance tool for performing data analysis. Targeted for advanced users, this paper outlines the internal implementation of the OLAP Services component and relates this implementation to the performance experienced by users.

Although we will touch on DSO- and MDX-specific optimizations, discussing them in detail is not an objective of this paper. Please refer to other sources for more information on these types of optimizations.

For easier absorption, this paper is divided into three functional stages: Processing, Querying and Write-back.

Processing

Processing is the term used to describe the population of OLAP Services with metadata about your data warehouse, structural information like members, and optionally, population of OLAP Services with some or all of the warehouse data.

This section on processing performance is separated into three parts: Dimensions, Cubes and Partitions.

Dimensions

Dimensions are a very important structure for a cube. The way a dimension is designed has a significant impact on all parts of cube usage: processing performance, querying performance, and cube size.

The most important characteristic of a dimension is the number of levels and members contained within. The time it takes for processing a dimension or a cube with one or more dimensions may grow considerably with the number of members per dimension level. In addition to the memory and disk space utilized by the dimension on the server, member information is also stored in the index files of every partition. Depending on the dimension, this member information can account for significant memory usage on the clients as well. When you design a dimension, you can decrease processing time and memory usage by excluding members from the dimension that are not really needed. For example, include the year 2010 only when the data for it is available.

MemberKey

MemberKey is the most important property of a member. This value is used for identification of members during processing. When the server processes a dimension, it checks to see if every new member already exists in its tree. If the member does not exist, the server puts it into the tree by the order of MemberKey.

During cube processing, the server refers to the member using the Member Key name. Depending on the cardinality of a dimension, processing the dimension with a large number of members under the same parent might be slower than processing dimensions with a smaller number of members under a parent. This holds for processing cubes as well. Inefficiency in the balance of members forces additional resource allocation in the index file of partitions and in the memory of server and client. For large levels, member information is sent to the client only in groups, like children of a parent. If a lot of these members must be sent to the client, query performance degrades.

To minimize processing time, keep the Member Key value small. A 32-bit integer is a good choice, whereas keeping it as a string is not efficient—especially if the MemberKey is different from the MemberName. To avoid data transformations between the source and the server, it is better to keep the data type of the member key the same as the data type in the source table. If you must use a string as the MemberKey, keep it as short as you can. Since the server needs to perform comparison operations on these strings during processing of the dimension, using strings near the maximum length of a string for the Member Key and Member Name columns (which is 255 characters) can reduce processing performance. Also, these strings are also used when sending member information to a client and to identify members in the formula engine. For string Member keys, changing the compare modes of the server, which is case sensitive, case insensitive, or binary (best), can hamper processing performance. Since this mode applies to all string comparisons in the server, changing the compare mode should be considered very carefully.

Unique Members

The Unique Members property of a dimension has a minimal impact during the processing of dimensions, but has an important impact on the processing time of MOLAP and HOLAP partitions and also on the querying time for ROLAP and HOLAP partitions. There are two ways that this property impacts processing. First, the SQL query sent to the relational database will by much simpler (less columns, less joins). Second, the analysis and loading of data for a dimension is much faster when the Unique Members value is set properly. With a MOLAP partition, member uniqueness is important only on the last level of a dimension, but for ROLAP cubes this property is important for all levels of the dimension. This optimization is discussed further in the Performance section of this paper.

Member properties

Member properties also impact the performance of dimension processing. They are loaded on the client on demand, but consume memory on the server when the dimension is initially used. A good strategy to enhance performance is to store minimum properties for the dimension at the highest level possible and thereby decrease the instances where member properties are used during processing or querying. For example, in a Geography dimension made of Country, State and City, a Member Property "Region" (which might be NE, SW, and so on) can be set on either State or City. Setting this Member Property on State will conserve disk space and increase performance.

Cubes

Efficient cube design also dramatically affects processing performance and cube size. When you design your cube, make certain you are clear about which dimensions you need to include in your cube. Performance decreases and the potential increase in the size of the cube and the space needed for processing the cube and the dimensions is significant given a large number of dimensions.

Measure types

Information about the measures makes up the major part of a cube file. Keep this information as small as possible to generate a positive impact on file sizes and performance. The server has two groups of measures: 4 bytes, which include Integer and Single, and 8 bytes, which are Big Integer, Double, Currency and Date. Use the smallest data type if you can—but keep in mind the server will not recognize overflow in your data aggregation.

Record size

The size of a record affects performance at all stages of cube usage. Record size impacts file sizes, the time of loading data on the server, the memory allocated in server caches, the time to send the data to the client, and finally, the memory allocation for the data on the client.

You can estimate the size of a record by using the following calculation:

      Size = (2*levels*C_path)+(4*measures₄*C_data4)+(8*measures₈*C_data8)

Where

Levels = for fact data, this is total number of levels in your cube (without the All level). For an aggregation, it is the number of levels used in the aggregation, excluding the All level.
C_path = coefficient of path compression for the cube. It can range from 0.11 – 1, but is usually close to 0.25.
Measures₄ = number of 4 byte measures in the cube.
C_data4 = coefficient of compression for 4 byte data. Usually 0.6– 0.8.
Measures₈ = number of 8 bytes measures in the cube.
C_data8 = coefficient of compression for 8 byte data. Usually 0.4 – 0.5.

As seen above, the size of a record is affected by the compression achieved on our path (internal representation of a member) and the measures. Path compression is better when the dimension is small and well balanced. Data compression is better for data with natural ranges and stable precision.

Partitions

Partitioning is a very powerful and flexible mechanism that provides a wide range of methods for performance control. Partitioning allows the division of the data into logical parts. This allows you to place your data on different locations and to use different storage modes for each partition.

Data slices

When designing partitions, you need to determine how many partitions your cube should have and how to divide the data between the partitions. The most important criterion in this decision comes from the logic of your data model. The best way to create efficient partitions is to separate data that will not be queried as a whole or combined to answer a query. For example, if you rarely analyze the data across years, you can create a partition for each separate year. It is very important to define the right data slice for a partition—this allows the server to avoid querying those partitions that contain irrelevant data. The server also uses slice information during processing and excludes level information for this dimension on the map of segments, thereby creating smaller and faster map files. Also, since the number of active levels (see the formula above) is less if you select the right data slice, the size of the data file is smaller as well.

Storage modes

The next important decision you need to make is the storage mode to use for this partition: MOLAP, ROLAP, or HOLAP. ROLAP is usually recommended when either the resources are not available for MOLAP or the users depend on some functionality of the underlying relational system, which also helps them manage their OLAP implementation. Consider how frequently your users require access to the fact table data and balance this against how much time they are willing to wait for a query at this level. If query time is a deciding factor, use MOLAP. If not, use HOLAP, which can affect both processing times and decrease the disk space needed for your partition.

Aggregations

Now we come to the interesting issue of creating aggregations. Aggregation design is perhaps the single most important factor that impacts the times required for processing and querying your cube. Creating the right set of aggregations is a very complex problem and OLAP Services estimates usage statistically. We suggest creating a small number of aggregations when initially designing your cube. For example, design aggregations that answer 30% of the queries, with the assumption that the remaining 70% of queries must be answered from the fact table. This translates to a 30% optimization benefit in the Aggregation Design wizard. Then, log the queries sent to the server during its operation in this initial usage phase. Once you have a measurable quantity of average queries, use the Usage-Based Optimization wizard to design the optimal set of aggregations derived from the logged usage pattern of your users.

Additional performance suggestions are presented in the Performance section later in this paper.

Internal Implementation

Processing members and facts is a double-buffered scheme and is implemented using the following threads:

The Reader thread populates the Read Ahead buffer with records read from the data source.
The Processing/Aggregation thread processes this data, saves the data in a segment, and optionally creates any requested aggregations.
These two threads work in parallel and therefore a maximum of two processors can be utilized while processing a partition. The server's ability to process multiple cubes in parallel provides a big boost to processing performance. Currently, the OLAP Manager does not offer this implementation through the user interface. However, DSO does provide a means of implementation.
To reduce the working set needed during processing, the fact table is divided into a group of 64K (or multiples of 64K) records. These groups are called segments and one segment at a time is processed.

Reader

Data is read using OLEDB. OLEDB performs all data type conversions. After reading a set of records (usually 1000), this thread asks the Processor/Aggregator thread to wake up and process the records read in so far. It then continues on to read the next set, if there is enough space left in the Read Ahead buffer.

An important optimization parameter here is the ReadAheadBufferSize registry setting which, as the name implies, is the size of the buffer used for storing the source data. Increasing this buffer allows more data to be read. However, since the Reader thread is usually not the bottleneck (empirically, calculating aggregations is the slowest part of processing), the optimization benefit is limited if your partition has a lot of aggregations.

The ReadAheadBufferSize can also be changed on the Properties dialog box for an OLAP server in the OLAP Manager.

Processor/Aggregator

This thread wakes up every time the Reader thread reads in a chunk of records from the source. As soon as enough records for a segment have been read (and processed in memory), the records are saved to disk, in the ".data" file.

During saving, the data is compressed on a segment-by-segment basis, if beneficial.

Note Currently, there is no way to turn off compression.

After reading in a segment's worth of records, and writing this segment to the disk, we move on to processing the aggregations. Due to the potentially large amount of data generated during this stage, aggregations can consume all virtual memory available in our process. To avoid this, temporary aggregations are written, on a segment-by-segment basis, to a couple of temporary files.

The location of these files is specified in the Temporary Directory property on the Properties dialog box for an OLAP server in the OLAP Manager. If there is more than one segment's worth of data to process, aggregations for the first segment are written to the first temporary file. For the next segment, the temporary aggregations are merged with the aggregations calculated for this segment, and are stored in the second temporary file. Each segment goes through this procedure and is toggled between these two temporary files until aggregation of the last segment is finished. At this point all aggregations are compressed and stored at the end of the data file.

An important optimization property is the Process Buffer Size, also found on the Properties dialog box for an OLAP server in the OLAP Manager. This registry setting directly affects the number of records in a segment, which must be multiples of 64K. Increasing this limit means more memory is consumed on a segment-by-segment basis. If your fact table has lot of duplicate records, more available memory can mean collapsing more records in memory instead of on disk. This implies the files written to disk might be substantially smaller. Therefore it might not be necessary to write to the temporary files as often. Since the latter is fairly disk bound and therefore a very slow process, increasing this parameter means you'll probably trade memory for more processing throughput. If this setting is larger than the physical memory available, however, memory will page to disk and thrashing will occur.

While creating aggregations, the server uses as much memory as available on the server machine, starting with a minimum of memory required for a segment. If more memory is available, aggregations can be derived from each other—a more efficient alternative to simply creating straightforward aggregations from the fact table. When creating the aggregations, we recommend having available memory that is 2–4 times the memory required for a segment. Both the Temporary Directory and Process Buffer Size setting can be modified on the Properties dialog box for an OLAP server in the OLAP Manager.

Resources Utilized

CPU

As mentioned above, OLAP Services theoretically does not use more than two processors while processing a partition. Empirically, one of the threads usually does more work than the other. Depending on the environment and the design of a partition, only one processor might be used completely. If your source database server is on the same machine, or if you are processing in parallel, CPU utilization might increase.

Disk

To enable querying during processing, and to ensure atomicity of the processing operation, a shadow directory (or file if processing a shared dimension) is created. This means the disk space utilization during processing can be double the disk space needed during a non-processing stage.

The following formula (an approximation to the one discussed above) can be used to estimate the raw data storage for MOLAP (in bytes) needed to store data on disk, when we finally write in our files (assuming 0 aggregations):

(((2 * total number of levels) + (4 * number of measures)) * number of records) / 3

In addition to this space for raw data, we also use temporary files for calculating aggregations. A very rough estimate of the temporary file size amount can be obtained if you examine the file size that the Design Storage wizard in the OLAP Manager reports. However, note that these estimates are based on compressed data, while temporary aggregation files are not compressed.

Memory

Memory consumed during processing includes both the memory used for processing dimensions, and memory used for processing the facts on a segment-by-segment basis.

Memory for processing dimensions + Memory for processing facts

When dimensions are processed, the intermediate structures are also kept in memory. This means the memory requirement is potentially doubled. This can make a difference when processing large dimensions. Currently, we estimate taking about 100 bytes per member. This memory requirement can increase tremendously depending on whether member properties are used and what their sizes are.

When the fact table is processed, memory is consumed on a segment-by-segment basis—therefore the maximum memory used would be for storing a segment's worth of data in memory. As discussed above, changing the ReadAheadBufferSize and ProcessReadSegmentSize will affect the memory requirements.

Files created

The following files are created while processing (depending on the storage mode for the partition, some files might not be created):

PRT: minimal information about a partition
MAP: map for our data. Internal indexing structures.
INDEX: index for our data. Internal indexing structures.
DATA: the actual data, including any aggregations.

Performance

Optimize the feed into OLAP Services

Normally, the relational database is laid out in a star or snowflake schema. Extracting data from this structure involves performing joins on these tables.

Joins are complicated for relational databases, which have traditionally been optimized for a high volume of insertions and updates involving small numbers of records. Performance degrades appreciably as the number of joins increase. OLAP, however, has traditionally been optimized for querying through a large number of records and can handle "joins" very well.

Given this background, the following performance hints might make a big difference in processing performance.

Omit needless joins
Internally, members are stored according to their member keys, while users always see the corresponding member name. This isolation gives you the ability to optimize the internal representation while keeping the external view meaningful to your users.
If your data exhibits the following characteristics, a join is not needed while processing the partition:
- The member key column of the lowest level member of a dimension is unique.
- There is a one-to one- relationship between the member key column and the member name column for the lowest level of the dimension.
- There is only 1 join between the fact table and the dimension table, which is the member key column of the lowest level.
- The dimension is shared.
If these conditions hold, nothing new is established by performing the join and thus, the join is not needed for this particular dimension. You can use this optimization in the Cube editor of the OLAP Manager. Choose Optimize Schema on the Tools menu.

For example, in the sample FoodMart database, consider the Warehouse cube. The dimensions Store, Product and Customer each exhibit the 3 conditions above. Start the OLAP Manager, edit this cube, and then select Optimize Schema. The following message appears:
```
The following MemberKeyColumn properties have been updated:
Level: Store Name: "store"."store_id" -> "inventory_fact_1997"."store_id"
Level: Product Name: "product"."product_id"->"inventory_fact_1997"."product_id"
Level: Warehouse Name: "warehouse"."warehouse_id"-> "inventory_fact_1997"."customer_id"
```
As you can see above, the original member key column is replaced with the corresponding join column in the fact table. The initial SQL Statement to process the first partition in this cube would have been:
```
SELECT […] WHERE 
("inventory_fact_1997"."store_id"="store"."store_id") AND ("inventory_fact_1997"."time_id"="time_by_day"."time_id") AND ("inventory_fact_1997"."product_id"="product"."product_id") AND ("inventory_fact_1997"."warehouse_id"="warehouse"."warehouse_id")
```
Using the optimization outlined above, the statement now becomes:
```
SELECT […] WHERE 
 ("inventory_fact_1997"."time_id"="time_by_day"."time_id")
```
This statement contains three fewer join clauses. This optimization is especially useful if you're running into a limitation in the inability of the underlying relational database to support more than n joins clauses in a SQL statement.

In one of our tests, before performing this optimization, our underlying database was still evaluating a complex join after about three hours. After this optimization, we got the first record back in less than 30 seconds!
Use indices:
Make sure you have indices on the columns that will participate in a join. Microsoft SQL Server, for instance, provides the Index Tuning Wizard that can be very useful in indexing your data. Also make sure the statistics about Tables and indices in your relational database are up to date.
Relational server on the same machine.
There are times when having the relational database server on the same machine as the OLAP Services server can help the throughput of the processing stage as well. One of the common reasons is a slow network connection. There is an obvious tradeoff with other resources here.
Use native OLE DB drivers:
Since your goal is to obtain the fastest interface to your data, use a native OLE DB driver to access your data source, if one is available. This avoids the cost of data transformation between layers (from the ODBC driver, for example).
Configure OLE DB driver:
If you do not plan to write to your data source, set it as read-only. Also, since the OLAP server performs its own internal buffering of source data, another buffering scheme inside your driver might be counterproductive.

Help OLAP Services process faster

Aggregation design.
Computation of aggregations is usually one of the slowest parts of processing, so design them judiciously. Disable the levels that you know should never participate in this design, use Usage Based Optimizations, and/or start out with small benefit (~30%) to see the performance (query vs. processing) tradeoffs. If performing ROLAP aggregations, pre-sizing the ROLAP database can lead to better throughput. Microsoft SQL Server™, for instance, allows a user to set the growth rate to be used when the database gets full. It can waste a lot of time reallocating the new space and initializing it. Avoid this if possible.

If possible, avoid ROLAP aggregations. Empirically, we've seen relational servers grind to a halt while creating records for some of our aggregation tables.
Partitions.
Use partitions to intelligently organize your data. Not only does this give you the advantage of optimally choosing the storage strategy for each partition, it can also result in very good performance during querying, and the option to process the data in parallel.
Unique members.
Use unique members if possible. On the server side, we do a lot of work to make sure a member is uniquely identified. One of the ways to do this is for each member to find and remember the fully qualified name. This adds processing which might be unnecessary if members are unique (like SKUs). If the member is unique, set the Unique Members flag to "Yes." If repetitive data is noticed, the processing of this dimension will fail with an error.

This flag is important for the last dimension levels for MOLAP partition and is important for every level in a ROLAP partition.
Virtual dimensions.
You can potentially save processing time by using the virtual dimensions, since these dimensions do not store data in a partition (besides the member names). However, there is an explicit tradeoff in using virtual dimensions, as queries to these will now probably take longer. Understand this tradeoff and intelligently design dimensions (virtual or normal) after you develop a feel for how users are using them. (This is a great tool for those "just in case I need them" dimensions.)
Hardware setup.
Use the appropriate RAID configuration (or software stripe set) especially on the device that stores the temporary directory used for processing the aggregations.

As discussed above, multiple processors can also be used during the processing stage.

Similar to any other database product, make sure you have enough memory on the server to avoid thrashing.

Due to the potentially large amount of data moved between the OLAP Service server and the relational database server, use a fast network (100Mbps Ethernet or others) and make sure your network is reliable.
Parallel processing.
Process cubes (of the same database) in parallel, using your own DSO-based program.

One disadvantage of this approach is that currently DSO does not enable sharing the same transaction between different server objects. This means you might not have atomicity during processing of multiple partitions in parallel.

Also be aware of that there may be performance implications for the underlying source database when returning large recordsets due to multiple, potentially complex requests.
Server settings.
If you have enough memory, increase the Process Buffer Size and Read Ahead Buffer Size settings. You'll find an easy way to change these on the Properties dialog box for an OLAP server in the OLAP. As discussed above, the server can cut back on the number of disk I/Os if more records are in memory.

Also note from the discussion above that in this version, creating aggregations is a single-threaded process. On a multiprocessor system, if you notice 100% CPU utilization for one processor while computing aggregations, there's nothing much more to tune, except selecting fewer aggregations, getting a faster processor, and/or processing partitions in parallel.
Data type choice
If possible, use LONG as the data type of choice for keys and integers for measures. These are very good for compression and for mathematical operations in general.
Processing options
Use the option that causes the least work. Incremental updates, for example, are very useful while refreshing your OLAP store.

Query

OLAP Services was designed to be a highly responsive query engine. Query processing occurs both on the client and the server. The following section discusses the likely breakout between what gets done and where it is performed, followed by a discussion on how to suggest where axis resolution occurs.

Client

OLAP Services' client code is responsible for exposing an OLE DB for OLAP interface to the consumers. This includes all interfaces and properties (both mandatory and provider-specific). This layer translates the OLE DB for OLAP calls into what we understand internally.

The client component consists of two threads:

Ping thread
Query thread

Ping

This background thread is responsible for synchronizing both the data and metadata between the client and the server. This thread wakes up every 10 seconds by default, or as specified in the Auto Synch Period connection string property. If the server reports that data has changed, the client flushes its cache, thereby causing future queries to obtain fresh data.

Query

Most queries in OLAP Services are specified in the form of MDX statements. This language specifies very rich, OLAP-oriented keywords that allow the succinct description of a multidimensional query. The client code is usually responsible for parsing this MDX statement, and translating it into requests for data cells from the server.

OLAP users usually exhibit a pattern of activity when performing analysis, much like "hovering." Starting from their top view, users usually hover around a subspace of the cube, drilling down, coming back up, and moving to adjacent cells in a search for answers to their queries. The client-caching mechanism is tuned to this kind of activity.

Code-named "Sonar" (due to the effect it simulates), when asked for a specific cell value, if the server is asked for more data on subsequent queries, OLAP Services doesn't have to return to the server as often. The algorithm that determines what cells are cached is proprietary to Microsoft Corporation.

The cache size to be used on the client can also be specified with the Client Cache Size connection string property on the Properties dialog box for an OLAP server in the OLAP Manager. Client Cache Size settings can be described in the following manner:

0: no limit on the cache memory used on the client.
1-99: percentile of physical memory to be used for caching.
>=100: cached memory, in kilobytes.

The current default is 25% of the total physical memory available on the client machine.

Cache cleaning occurs only when new memory is needed and not enough memory is available. Cleaning includes deleting everything in the cache for a cube, starting with the cubes not currently in use.

Note The cache cleaning process is different from the cleaning process that occurs on the server, as described in the following Server section of this paper.

Server

The server is a multi-threaded query engine. To answer a request for data, the server utilizes the following threads:

Listener thread
Pool of worker threads
Pool of processing threads
Logger thread
Cleaner thread

Listener

The listener thread waits for requests for new connections and is responsible for creating connections as needed. Currently, we ask TCP to limit the number of simultaneous requests for connections to the server to five (5). If the connection request times out, the client keeps trying until it either gets a valid error or the Connect Timeout connection string parameter expires.

Worker and processor

The listener thread dispatches the worker threads using I/O completion ports. If the server cache can answer the request, the worker thread returns this data to the client. If not, it instructs several multiple processing threads to query files using asynchronous lookups in parallel. Specifying a data slice on a partition is very beneficial here. Since we query partitions in parallel, we can skip over those partitions that do not contain the data we're interested in—based on the data slice.

The server defaults to four (4) worker threads per processor, up to a maximum of 30 per processor. The listener thread wakes up every so often (configured by the BackgroundInterval registry setting) and checks to see how many threads are idle. If only one thread is idle, more threads are created. If too many threads are idle, idle threads are killed every 30 minutes.

Logger

The Logger thread logs query requests to a database. This query log can be used at a later stage to find patterns in user behavior. You can also use this log and the Usage Based Aggregations Wizard in the OLAP Manager to design better aggregations. Every tenth query is logged by default. You can reset this query log rate by changing the Sample Frequency property. This property can be changed on the Properties dialog box for an OLAP server in the OLAP Manager.

Cleaner

Query results are kept in the cache. Cache cleaning happens on a background thread (which usually has a below-normal priority) and wakes up based on the BackgroundInterval setting (the default is 30 seconds). Cache cleaning occurs if the server's cache is more than half way between the Minimum allocated memory and Memory conservation threshold parameters. While the cache utilization is between these limits and if cleaning is required, we clean iteratively using a scheme similar to LRU—least recently used. This iterative cleaning continues until the memory utilized is once again established halfway between the Minimum allocated memory and Memory conservation threshold. Cleaning occurs for all cubes in all databases.

The server places a read-lock on the database while cleaning all of its cubes. This may affect concurrency. If the cache utilization is too high, everything is cleaned and no partial cleaning occurs. Most of this happens on a below-normal thread priority. Under high stress conditions, the cleaner thread can get starved and the priority is bumped to Normal when the Memory conservation threshold is crossed.

If this process still does not free enough memory, cache cleaning gets really aggressive. If the difference between the system's commit limit and the system's current commit bytes level is less than a threshold (currently set at 4MB), the priority of the cleaner thread is boosted to above normal and cache cleaning starts again.

Note The Minimum allocated memory and Memory conservation threshold settings include memory consumed by the dimension trees, internal control structures, and records cached.

Resources Utilized

CPU

All available processors used.

Disk

Besides the virtual memory, which might be paged to disk by NT, OLAP Services does not need any additional disk space during the query phase.

Memory

Memory for Metadata + Memory for Dimensions + Memory for Cache

The server loads up metadata for all databases as soon as the server is started. Both time and memory for this operation is usually negligible.

The first connection to a cube will trigger the server to load dimensions relevant to this cube and keep them in memory for the lifetime of the server.

The total memory consumed by the server is controlled by the Minimum allocated memory and Memory conservation threshold settings.

Performance

Aggregations.
Materializing aggregations usually leads to a faster query response since we probably need to do less work to answer a request for cell values. The amount of work done depends on which aggregations are actually calculated and stored during the processing stage. Too many aggregations lead to data explosion and, therefore, bad processing performance. As with processing, it is recommended that you start with a low number of aggregations and understand the tradeoffs as you increase aggregations.
Partitions.
Partitions give you the ability to choose different storage strategies to optimize the tradeoff between processing and querying performance. For example, a partition based on less queried data might have fewer aggregations than the one used more often.
Data slices on partitions.
Setting a data slice is an efficient way to avoid querying irrelevant partitions.

If you know one of your partitions stores data about a particular member, this information can be used during querying. For example, one of your partitions might only store information for the "Budget" scenario. This information is used when the query is answered and the "Budget" partition won't be examined if the query refers to "Actual" data instead.

Another handy trick is to create a dummy dimension. For example, if you have Historical and Budget data, create a "Scenario" dimension, and two cubes (one for each scenario). Set the member key column for the cubes to "Historical" and "Budget," respectively. And then set the data slice to these as well. Queries to a virtual cube based on these two cubes, will automatically be routed to the appropriate underlying cube.
Use SMP machines.
Due to the large degree of parallelism during querying, all CPUs can be used. Consider upgrading the number of processors if CPU utilization is a bottleneck.
Usage-Based Optimization.
When asked to initially design aggregations, OLAP Services assumes any query on the database will have the same probability of occurring as any other query. Usage-Based Analysis logs what portion of your database was actually touched—by whom, how often and for how long.

Given this feedback, OLAP Services can do a much better job at choosing what to aggregate and what not to aggregate. This may be the best way to design aggregations for a "complex" database. OLAP Services will usually select the aggregation that is smaller if there is a choice between two aggregations to store, no query optimization has been used (no query usage is logged), and all other things are equal. But if user queries can really be answered efficiently by sending queries to the larger aggregation, OLAP Services can detect this optimization by examining usage patterns in the query log. OLAP Services may decide to create a larger aggregation due to specific frequent query usage.
Client/MDX optimizations.
- Use fully qualified member names.
  When using user-defined functions (UDFs), use fully qualified function names, especially if you have many DLLs or DLLs that contain many functions in the libraries.
- Use the Large Level Threshold connection property to limit the amount of metadata brought down to the client. For more information about this property, refer to OLAP Services: Working with Large Levels.
  Setting the Auto Synch Period property to a very low value can negatively affect performance. A lot of time may be spent querying the server to find out if cell values have changed—setting Auto Synch Period to a very high number might not synchronize the data frequently enough. Understand the user behavior and adjust this parameter as appropriate.
Note No updates are performed if you set Auto Synch Period to a value of 0.
Virtual dimensions.
Using virtual dimensions can degrade query performance. Refer to the discussion under Performance in the Processing section at the beginning of this article for more information.
Use a "cluster" of servers.
Since an OLAP Services cube can itself be a source to another OLAP Services partition, you can create a cluster of servers to distribute query resolution.
Enhanced security in NT.
During initial connections and queries, the OLAP Services server uses security functions in NT to restrict access. Make sure the both the users who have access to this cube and the server itself are in a domain suitable for fast response for information about user credentials.

With the security enhancements put into NT4 Service pack 4, this is a crucially important performance tip.
Server Settings.
Increase the Memory conservation threshold and the Minimum allocated memory properties on the Properties dialog box for an OLAP server in the OLAP Manager to allow caching more information on the server.
Hardware.
- Use the appropriate RAID level or software stripe sets.
- Use a fast network (100 Mbps Ethernet or others) and make sure your network is properly configured.
- Use plenty of memory to allow increased caching.

Agent

OLAP Services supports the ability to move axis resolution in queries to the server. This results in huge benefits by preventing unnecessary data movement across the network when you're working with functions such as FILTER, which may operate on a large level. An "agent" is the client component inside the same process as the server and connects and disconnects on the fly as needed.

The default cache size of the agent is 10% of the physical memory available on the server.

Agents are stateless and are created and destroyed as needed. Agents open and close connection to the server every time they're used. Since connecting is a fairly expensive operation, the benefit of using agents should be justifiable.

To indicate whether the query resolution occurs on the client or on the server, the following Execution Location connection property should be used. Currently, the values of this property are:

Default (see Smart).
Smart (this is internally analogous to the default (Execution Location = 1) option.)
Client
Server
Some functions are not safe to be remoted. If the query contains such functions, it will not be remoted—even if the user explicitly requests remoting.

Another connection property, Default Isolation Mode, helps decide whether the Agent should answer queries in addition to resolving the axes. This property might be important for clients working over slow network connections.

For more information, refer to OLAP Services: Working with Large Levels.

Local Cub Files

Local cube files can be created on the client machine through the PivotTable Service. Currently, PivotTable Service is supported on Windows 95, Windows 98, and Windows NT. Compared with the server, local cube files have the following restrictions:

No aggregations can be created.
PivotTable Services has its own syntax for creating and updating cube files.
A file created with PivotTable Services can contain multiple databases, each of which can only contain one cube and one partition.
Write-back is not supported.
HOLAP is not supported.
Virtual cubes are not supported.
Dimension and cube names must be less than 24 characters.

Performance

Reuse the cached rowset by using the DIRECTLYFROMCACHEDROWSET keyword if retrieving a rowset to populate the local cube is an expensive operation.
Use the PASSTHROUGH keyword for complex queries or when using optimizations or keywords specific to your relational database.
Keep the size of local cube files as small as possible, since these files do not support partitions or aggregations.
Omit joins when populating the local cube.
Note There is no user interface to optimize the schema using PivotTable Service, as there is when working with the OLAP Services server.

Write-back

To allow write-back to cell values, a new relational table is created (one for each write-enabled cube) in the underlying relational engine. To enable user-defined policies of allocating data from consolidated levels down to the lowest level, only the lowest level data is stored in this table.

Using the transaction support provided in OLE DB, all updates are cached on the client until the transaction is committed. Flushing the cache is transparent to the client applications and no special behavior is required.

At a later time, this table can be converted into a partition with the appropriate storage strategy.

Performance

Aggregations.
The write-back tables do not have any aggregations until they have been converted into a partition and therefore might adversely affect query performance. Frequent conversion is recommended if faster query response is desired.
Client synchronization.
Updating the cell values flushes the cache system on both the server and other client sessions. Therefore, we strongly recommend that you accumulate multiple changes in a single transaction rather than commit each cell change separately.
Concurrent updates.
Updates to a database are effectively serialized since each write must be transacted.
Relational database performance.
Obviously an important aspect of performance is the speed with which the underlying database supports record insertion.

Conclusion

Microsoft SQL Server OLAP Services is a high performance tool for performing analytical queries on multidimensional data. For most users, the information presented in this article will not be necessary. For other users, information about the internals of this tool will help utilize their hardware to its full potential.

Finding More information

See Microsoft's OLAP Services Web site at www.microsoft.com/sql/70/gen/olap.htm

For more information on information referred to in this paper, refer to the following whitepaper also available on MSDN Online:

OLAP Services: Working with Large Levels by Cristian Petculescu.

OLAP Services: Performance Implications of the Architecture

Contents

Introduction

Processing

Dimensions

MemberKey

Unique Members

Member properties

Cubes

Measure types

Record size

Partitions

Data slices

Storage modes

Aggregations

Internal Implementation

Reader

Processor/Aggregator

Resources Utilized

CPU

Disk

Memory

Files created

Performance

Optimize the feed into OLAP Services

Help OLAP Services process faster

Query

Client

Ping

Query

Server

Listener

Worker and processor

Logger

Cleaner

Resources Utilized

CPU

Disk

Memory

Performance

Agent

Local Cub Files

Performance

Write-back

Performance

Conclusion

Finding More information