This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.


August 1999

Microsoft Systems Journal Homepage

Manipulate and Query OLAP Data Using ADOMD and Multidimensional Expressions

Carl Nolan

Online Analytical Processing Services provide an architecture for programmatic access to information from a data ware-house. OLAP lets data from the warehouse be extracted, summarized, organized, and stored in multidimensional structures to enable rapid response to user queries.

This article assumes you're familiar with ASP and ADO

Code for this article: olap.exe (10KB)

Carl Nolan works for Microsoft Consulting Services in northern California (http://www.microsoft.com/norcal). He is an MCSD and MCP + Site Builder who specializes in SQL Server, including OLAP and Windows DNA-based applications. He can be reached at carlnol@microsoft.com.

When a company serves up browser-based data from data marts or data warehouses, it's usually in the form of static reports. Enhancing these reports has usually required lengthy development time or the purchase of specialized software. With the advent of Microsoft® SQL Server OLAP (online analytical processing) Services, this is no longer necessary. By using a few enabling technologies, it has become relatively easy to present multidimensional data to a Web client from an ASP application.

    OLAP Services provide an architecture for programmatic access to information from a data warehouse. OLAP lets data from the warehouse be extracted, summarized, organized, and stored in multidimensional structures (known as OLAP server cubes) to enable rapid response to user queries. The OLAP and PivotTable® Services work together to provide client access to OLAP data. The OLAP Server manages the data, while PivotTable Services work with the server to provide client access to the data using a connectivity tool called ActiveX® Data Objects Multidimensional (ADOMD).

    Just as you need to learn the basics of SQL before developing with ADO, you need to learn the language for querying and manipulating OLAP data before using ADOMD. This language is called Multidimensional Expressions (MDX). MDX provides a rich and powerful syntax for querying and manipulating the multidimensional data stored in OLAP server cubes.

    In this first of two articles I'll present the basics of both MDX and ADOMD. Both of these topics are very detailed, so I'll offer a general introduction to each. I'll assume that you're familiar with at least some multidimensional, data warehousing, and OLAP terms. If you're not, spend some time reading about these topics. One good place is on MSDN; the SQL in OLAP Services site provides some good coverage of OLAP terms and concepts at http://msdn.microsoft.com/library/psdk/olap/prsql.htm. Another helpful resource dedicated to OLAP Services is the public newsgroup microsoft.public.sqlserver.olap hosted on the news server msnews.microsoft.com.

Cube Concepts

    Cubes are key elements in OLAP, designed to enable fast access to data within a data warehouse. They are subsets of data from the data warehouse, organized and summarized into multidimensional structures. These data summaries provide the mechanism for rapid and uniform response times to complex queries.

    Cube definition is the first of three steps in cube creation. After definition, you need to specify how the data is summarized by designing aggregations, then load the cube by processing it. Cubes are defined in terms of dimensions and measures. Dimensions provide the categorical descriptions by which the measures are separated for analysis. Measures identify the numerical values that are summarized for analysis such as price, cost, or quantity sold. The collection of measures forms a special type of dimension called Measures.

    Each cube dimension can contain a hierarchy of levels that specifies the categorical breakdown available to users. For example, a Store dimension might include the level hierarchy: Country, State, City, and Name. Each level in a dimension is of finer granularity than its parent. Dimensional hierarchies are a powerful data modeling tool because they can be expanded to reveal more detail, allowing high-level questions to be asked about data. This common OLAP operation is known as drill-down.

    A dimension can be created for use in an individual data cube or across multiple cubes. When a dimension is created for an individual cube, it's called a private dimension; dimensions that can be used by multiple cubes are called shared dimensions. Shared dimensions let designers standardize business metrics across cubes. For example, standardized shared dimensions for time and geographical location ensure that data analyzed from different cubes will be similarly organized.

    A member is simply an item in a dimension or measure. For example, if you've defined a Gender dimension, its members might be M, F, and All Gender.

    A calculated member is a dimensional member whose value is calculated at runtime using a specified expression. Calculated members can also be defined as measures. Only the definitions for calculated members are stored; values are calculated in memory when they're needed to answer a query. Calculated members let you add members and measures to a cube without increasing its size. Although calculated members must be based on a cube's existing data, you can create complex expressions by combining this data with arithmetic operators, numbers, and a variety of functions.

    Although the term "cube" suggests three dimensions, an OLAP cube can have up to 64 dimensions, including the Measures dimension. In my sample code, I will only retrieve two dimensions of data: rows and columns. What I'm doing in this case is analogous to a data matrix.

    To make sure everyone can access the data and samples in this article, I'll use the Sales cube in the FoodMart database, which is installed as a sample database with Microsoft OLAP Services. If you are not familiar with this application, take the time to review this database first. This cube has been designed for the analysis of a chain of grocery stores and their promotions, customers, and products. Figures 1 and 2 outline the dimensions and measures associated with the FoodMart sales cube.

Multidimensional Expressions

    MDX is one of the key technologies you have to understand when working with OLAP. MDX can be used to describe multidimensional queries, define cube structures, and change data (in some cases). I'll focus on the query capabilities.

    To test freeform MDX statements, Microsoft OLAP Services comes with a program called the MDX Sample Application. This application is currently the only interface that supports freeform MDX, so it's the place you can go to test sample statements. Let's start by outlining the basic form of an MDX statement:

SELECT {member selection} ON COLUMNS
 FROM [cube name]
Knowing this syntax, the simplest way to learn MDX is by viewing some examples. To select a list of information based on store types, the MDX statement would read:
SELECT {[Store Type].MEMBERS} ON COLUMNS
 FROM [Sales]
      When you run this example, you'll see a column named All. The All member is generated by default, and it becomes the default member. In this example, I've selected all the members for the Store Type dimension. The square brackets are optional, unless you use identifiers with embedded spaces. The curly braces are used to denote sets. For my purposes, sets will be the axis definition. The use of curly brackets is actually only needed when enumerating sets; for consistency curly brackets will always be used here in the examples. In many cases, a combination of members from different dimensions will be enclosed in brackets. This is known as a tuple. Tuples allow multiple dimensions to be displayed on a single axis. If you use a single member tuple, the brackets can be omitted.

    This example shows what you can do with OLAP, but it's not very meaningful for data analysis. Often it is necessary (as will be the case in most of the examples) to compare information from one dimension to that of another. The basic form of a two-dimensional query is:

SELECT {member selection} ON COLUMNS,
 {member selection} ON ROWS
 FROM [cube name]
      Expanding on the previous example, to view information based on store types compared with the promotion media used for the sale, the MDX statement would read:
SELECT {[Store Type].MEMBERS} ON COLUMNS,
 {[Promotion Media].MEMBERS} ON ROWS
 FROM [Sales]
      In both examples, the column set has been defined as [Store].MEMBERS. This statement is asking for items at every level of the Store Type dimension. What do you do if members at one level are required? You ask for CHILDREN instead of MEMBERS. In the previous example, this would look like:
SELECT {[Store Type].[All].CHILDREN} ON COLUMNS,
 {[Store].[All Stores].CHILDREN} ON ROWS
 FROM [Sales]
MEMBERS will return the members for the specified dimension or dimension level; CHILDREN will return the child members for a particular member within the dimension. In this example the explicit use of the All member may not be required because being the default member for the dimension, it's value—if absent—will be assumed. Nonetheless, its inclusion is good practice. Expanding on the CHILDREN concept, to view the sales information based on store types compared with the store location for those stores in the USA, the MDX statement would read:
SELECT {[Store Type].[All].CHILDREN} ON COLUMNS,
 {[Store].[Store Country].[USA].CHILDREN} ON ROWS
 FROM [Sales]
When running this statement, it is interesting to note that the rowset could be expressed by either of the following two statements:
[Store Country].[USA].CHILDREN
 [Store].[USA].CHILDREN
Here, the fully qualified names have been used. Fully qualified member names include dimension and the parent member of the given member at all levels. When member names are uniquely identifiable, fully qualified members names are not required, but they're a good practice to adopt.

    In this example, it may be beneficial to include the total sales for the USA member on the rows. This is easily achieved if you remember that a set, or axis definition, can be constructed as a list of members. The revised MDX statement would read:

SELECT {[Store Type].[All].CHILDREN} ON COLUMNS,
 {[Store].[Store Country].[USA], 
     [Store].[Store Country].[USA].CHILDREN} ON ROWS
 FROM [Sales]
      The CHILDREN function is very useful, but does not help if you need to drill down to a lower level within the hierarchy. This is where the function DESCENDANTS comes into play. This function allows you to go an arbitrary number of levels deep in the cube. The form of this statement is:
DESCENDANTS(<member>, <level> [, <desc_flags>])
By default, only members at the specified level will be included. By changing a flag value, you can include or exclude descendants or children before and after the specified level. Using the DESCENDANTS function, it becomes easy to display the USA store information along with all the corresponding USA city information:
SELECT {[Store Type].[All].CHILDREN} ON COLUMNS,
 {[Store].[Store Country].[USA],DESCENDANTS([Store].[Store Country].[USA], 
     [Store City])} ON ROWS
 FROM [Sales]
      There's an obvious question that comes to mind about the given examples: what measure is being returned? The simple answer is the default. This is not very helpful, so a concept called slicing comes into play. To analyze the average sales of stores based on their types and city the MDX statement would read:
SELECT {[Store Type].MEMBERS} ON COLUMNS,
 {[Store].[Store City].MEMBERS} ON ROWS
 FROM [Sales]
 WHERE (MEASURES.[Sales Average])
The WHERE clause in this statement defines the slice of the cube that will be viewed. As the cube's measures are just another dimension, selecting the desired measure is achieved by selecting the appropriate slice of the cube. This can be, and often is, extended to other dimensions. This example can be refined to only view the sales average for the sales year 1997 by using the following MDX statement:
 SELECT {[Store Type].MEMBERS} ON COLUMNS,
 {[Store].[Store City].MEMBERS} ON ROWS
 FROM [Sales]
 WHERE (MEASURES.[Sales Average], [Time].[1997])
      Slicing is not the same as filtering. Slicing does not affect the selection of the axis members, but rather the values that go into them. This is different than filtering, which will be discussed later. Filtering actually reduces the number of axis members.

Advanced MDX Statements

    Although the basics of MDX are enough to provide a simple ASP application, many features of the MDX implementation are useful to know, as they allow more useful analysis of the cube data.

    An important concept you need to be comfortable with is calculated members. Within an MDX statement, the syntax for a calculated member is to put the following construction in front of a SELECT statement:

WITH MEMBER parent.name AS 'expression'
In this statement, parent refers to the parent of the new calculated member name. Since dimensions are organized as a hierarchy, when you add new members to this tree you must specify its position inside the hierarchy. Calculated members are extremely useful for the definition of new measures. Take, for example, the requirement to display the percentage profit for sales. The following calculated member could be used for this:
WITH MEMBER MEASURES.ProfitPercent AS 
'([Measures].[Store Sales]-[Measures].[Store Cost])/
([Measures].[Store Cost])', FORMAT_STRING = '#.00%'
The statement introduces the FORMAT_STRING property. FORMAT_STRING informs the MDX statement of the display format to use for the new calculated member. The format expression works just like the Format function in Visual Basic®. The use of the % symbol states that the calculation returns a percentage and should be treated as such, including multiplying it by a factor of 100. Putting this all together, to query the percentage profit for states broken down by 1997 quarters and six month intervals, the MDX statement would read:
WITH MEMBER MEASURES.ProfitPercent AS
     '([Measures].[Store Sales]-[Measures].[Store Cost])/([Measures].[Store Cost])',
 FORMAT_STRING = '#.00%', SOLVE_ORDER = 1
 MEMBER [Time].[First Half 97] AS  '[Time].[1997].[Q1] + [Time].[1997].[Q2]'
 MEMBER [Time].[Second Half 97] AS '[Time].[1997].[Q3] + [Time].[1997].[Q4]'
 SELECT {[Time].[First Half 97],
     [Time].[Second Half 97],
     [Time].[1997].CHILDREN} ON COLUMNS,
 {[Store].[Store Country].[USA].CHILDREN} ON ROWS
 FROM [Sales]
 WHERE (MEASURES.ProfitPercent)
      This example outlines how dimensions as well as measures can be calculated. This statement also introduces the SOLVE_ORDER property. SOLVE_ORDER helps to decide the order in which the calculated members are to be evaluated. The member with the highest solve order value will be the first to be calculated. The default value for the solve order is zero.

    In constructing MDX statements, it is often necessary to relate a current member value to others in the cube hierarchy. MDX has many methods that can be applied to a member to traverse this hierarchy. The most commonly used ones are PREVMEMBER, CURRENTMEMBER, and PARENT.

    Using these methods, it is easy to derive an MDX statement that calculates the percentage of sales within a city as a percentage of the state's sales:

WITH MEMBER MEASURES.PercentageSales  
AS '([Store].CURRENTMEMBER, 
MEASURES.[Unit Sales]) / ([Store].CURRENTMEMBER.PARENT, 
MEASURES.[Unit Sales])', 
FORMAT_STRING = '#.00%'
SELECT {MEASURES.[Unit Sales],
     MEASURES.PercentageSales} ON COLUMNS,
NON EMPTY {[Store].[Store City].MEMBERS} ON ROWS
FROM [Sales]
In this example, all empty rows have been eliminated using the NON EMPTY statement. This statement is a form of filtering, which will be discussed later. In addition, if you wanted to display sales profit and the incremental change to the previous time dimension, the MDX statement would read:
WITH MEMBER MEASURES.ProfitGrowth AS
'(MEASURES.[Profit]) - (MEASURES.[Profit], Time.PrevMember)', 
FORMAT_STRING = '###,###.00'
SELECT {MEASURES.[Profit], 
     MEASURES.ProfitGrowth} ON COLUMNS,
     {GENERATE([Time].[1997].CHILDREN,
     {[Time].CURRENTMEMBER, 
     [Time].CURRENTMEMBER.CHILDREN})} ON ROWS
 FROM [Sales]
      This example introduces an interesting function: GENERATE. GENERATE iterates through all the members of the first set, using the second set as a template for the resultset. Figure 3 shows the output from this query. As you can see, using GENERATE is easier than specifying a list of members consisting of all the quarters and their associated CHILDREN.

    In an earlier example, a statement was constructed to display all the USA store information along with the corresponding city details. To extend this to all countries, you could use the GENERATE function:

SELECT {[Store Type].[All].CHILDREN} ON COLUMNS,
 {GENERATE([Store].[Store Country].MEMBERS, 
 {[Store].CURRENTMEMBER, 
     DESCENDANTS([Store].CURRENTMEMBER, 
     [Store City])})} ON ROWS
 FROM [Sales]
      When working with MDX, you will come across many functions that operate on sets of data. Although these functions are too numerous to list, one does deserve a quick mention: the CROSSJOIN function. This function produces all combinations of two sets. This is useful for queries such as viewing all store metrics broken down by state and the quarter in which the sale was made. Figure 4 shows a section of the output derived from the MDX statement:
SELECT {MEASURES.MEMBERS} ON COLUMNS,
 NON EMPTY {CROSSJOIN([Store].[Store State].MEMBERS,
     [Time].[Quarter].MEMBERS)} ON ROWS
 FROM [Sales]
      When performing data analysis, certain metrics, like top counts and year-to-date calculations, are nearly always calculated. As you might expect, MDX has special functions for these calculations, among many others.

    A common process in data analysis is retrieving a top count filter, such as the top dozen best-selling stores. The general form for such calculations is:

TopCount(<set>, <count>, <expression>)
      To display the top dozen best-selling customers based on unit sales, the MDX statement would read:
SELECT {MEASURES.MEMBERS} ON COLUMNS,
 {TOPCOUNT([Customers].[Name].MEMBERS, 12,
     [Measures].[Unit Sales])} ON ROWS
 FROM [Sales]
      The previous example is very simple, but it doesn't have to be. A simple MDX expression can be calculated that displays the top half-dozen customers, based on unit sales, and how much all the other customers have sold. The following example also shows the use of the SUM function and calculated sets in addition to calculated members:
WITH SET Top6Customers AS 'TOPCOUNT([Customers].[Name].MEMBERS, 6,
     [Unit Sales])' MEMBER [Customers].
     Others AS '([Customers].[All Customers], 
     [Unit Sales]) - SUM(Top6Customers, [Unit Sales])'
 SELECT {MEASURES.MEMBERS} ON COLUMNS,
     {Top6Customers, [Customers].Others} ON ROWS
 FROM [Sales]
      Other functions for top filter processing are available to OLAP users. They are TOPPERCENT, which returns the top elements whose cumulative total is at least a specified percentage, and TOPSUM, which returns the top elements whose cumulative total is at least a specified value. There's also a series of BOTTOM functions, returning the bottom items in the list.

    Another common task in data analysis is the calculation and display of year-to-date numbers. Such a request would be something like "show me the year-to-date profit for each product by department." For this request, the MDX statement would read:

WITH MEMBER Measures.YTDProfit AS 'SUM(YTD(),
 Measures.[Profit])', FORMAT_STRING = '#.00' 
 SELECT {Generate([Time].[1997].CHILDREN,
     [Time].CurrentMember,
        [Time].CurrentMember.Children})} ON COLUMNS,
 {[Product].[Product Department].MEMBERS}ON ROWS
 FROM [Sales]
 WHERE (Measures.YTDProfit)
      As mentioned earlier, filtering is very different than slicing. The NON EMPTY statement performs a simple form of filtering by reducing the number of axis members, based on empty cells. The general form for a filter operation is:
FILTER(<set>, <search_condition>)
      In an earlier example, the sales average measure was queried for stores, based on their type and city of location for 1997. This query returns many empty rows that can be eliminated by using the NON EMPTY statement:
SELECT {[Store Type].MEMBERS} ON COLUMNS,
 NON EMPTY {[Store].[Store City].MEMBERS} ON ROWS
 FROM [Sales]
 WHERE (MEASURES.[Sales Average], [Time].[1997])
      Suppose you wanted to filter the store city axis instead, and only output those rows where the city is in the USA. To achieve this, the FILTER function can be used:
SELECT {[Store Type].MEMBERS} ON COLUMNS,
 {FILTER([Store].[Store City].MEMBERS AS StoreCities,
     StoreCities.CURRENT.PARENT.PARENT=USA)} ON ROWS
 FROM [Sales]
 WHERE (MEASURES.[Sales Average], [Time].[1997])
ADO Multidimensional

    Now that you've taken a whirlwind tour of MDX, let's shift gears and look at another enabling technology: ADOMD. ADOMD allows you to query both the cube schema and data from a simple programming interface. Figure 5 outlines the main components of the ADOMD object model. The cube schema is queried through the Catalog and the cube data through the Cellset object.

    When accessing a cube, the first object that's constructed is a standard ADO connection. The provider in this case will be MSOLAP. Once a connection has been made to the OLAP data source, a catalog can be constructed. This catalog will contain the multidimensional schema information (that is, cubes and underlying dimensions, hierarchies, levels, and members):

Dim conFoodMart, catFoodMart
 ' Create a connection to the FoodMart service
 Set conFoodMart = _   
     Server.CreateObject("ADODB.Connection")
 conFoodMart.Open _
     "Data Source=MyServer;Provider=MSOLAP;"
 conFoodMart.DefaultDatabase="FoodMart"
 ' Access the FoodMart cube catalog
 Set catFoodMart = Server.CreateObject("ADOMD.Catalog")
 Set catFoodMart.ActiveConnection = conFoodMart
      Next, access to the underlying cube schema is obtained via the CubeDefs collection. This collection has an item for each cube within the OLAP data source. Each item is represented by a CubeDef object, which can be accessed using the cube name or an ordinal number:
Dim cubFoodMart
 ' Obtain a reference to the Sales Cube
 Set cubFoodMart = catFoodMart.CubeDefs("Sales")
      In navigating the schema of a cube, the relevant items are a set of related dimensions, hierarchies, levels, and members. Each of these are represented by their corresponding ADOMD objects: Dimension, Hierarchy, Level, and Member.

    From the cube schema, you can then navigate down through the hierarchy of dimensions, hierarchies, levels, and members. The concept of dimensions should already be familiar. To simply list the dimensions the code would read:

Dim dimCube
 For Each dimCube In cubFoodMart.Dimensions
     Response.Write dimCube.Name
 Next
      The next object type is the hierarchy, which is the path of aggregation of a dimension. A dimension can have more than one hierarchy, but for my purposes, I will only consider a cube consisting of a single hierarchy. The levels collection provides the means for obtaining the items (levels) within the dimension. Finally, the members collection represents the data items within the dimension.

    Most dimensions have a special member, usually prefixed with the word All, whose children are the members of the first level in the hierarchy. The actual value of this member can be obtained via the ALL_MEMBER property of the hierarchy. The code in Figure 6 puts this into practice. Figure 7 contains the output.

    When reviewing the Measures dimension in Figure 6, the dimension MEMBERS would be required to allow an MDX statement to be constructed dynamically. This is achieved by iterating through the members of the first and only level. For the other dimensions, the easiest way to retrieve all the members of the first level is to take the CHILDREN of the All level. If an All level is not available, then you only need the members of the first level. The significance of this output will become apparent when outlining the ASP application.

    This is a very basic introduction on how to obtain schema information about a cube. The relationship to MDX statements has also been emphasized, as the cube structure will often be examined to enable the construction of MDX queries.

Querying the Data

    In querying the actual cube data, the key ADOMD objects are Cellset, Cell, and Axis. The relationship is simple. A Cellset is constructed using an MDX statement. The data is then reviewed using the Axis and Cell objects.

    The Axis is a collection of members from one or more dimensions. The Axis object will have a positions collection, which represents a point along an axis. For an axis consisting of a single dimension, these positions are a subset of the dimension members. Multiple dimensions can exist on an axis, such as when using a CROSSJOIN function.

    So far, the objects discussed only return a list of members. To actually get at the data you need a reference to a Cell object. Each cell has multiple pieces of information associated with it, including the data itself, a formatted string value (the displayable form of cell data), and the cell ordinal value (its unique value within the cellset). Once you have an MDX statement, a Cellset can be constructed with the following code:

Set cellFoodMart = Server.CreateObject("ADOMD.Cellset")
 cellFoodMart.Open strMDX, _
     "Data Source=nolancarl;Provider=MSOLAP;Initial  
     Catalog=FoodMart;"
In this snippet, strMDX represents the required MDX expression. The connection string should be replaced with an ADO Connection object. The same result could also be achieved using the Cellset Source and ActiveConnection properties.
Set cellFoodMart = Server.CreateObject("ADOMD.Cellset")
 cellFoodMart.Source = strMDX
 cellFoodMart.ActiveConnection = conFoodMart
 cellFoodMart.Open
      Once the Cellset has been opened, it is easy to obtain the values of the axis members. In this case, the column axis member names will be displayed:
Set colFoodMart = cellFoodMart.Axes(0)
 For Each posFoodMart In colFoodMart.Positions
     Response.Write posFoodMart.Members(0).Caption
 Next
The member caption property is used because it's a display name; the member's actual name is represented by the Name property. Another important property is called ChildCount. This represents the number of members at the next level down in the hierarchy. This property will be important in deciding if you can drill down into the cube data.

    As I mentioned earlier, to access the data within a cellset you need to access the Cell object. This can be achieved for an MDX statement that only has a COLUMNS axis by referencing the Cell object from the Cellset using the ordinal numbers. The value to be displayed is obtained from the FormattedValue property (rather than the Value property):

For iCount = 1 To cellFoodMart.Axes(0).Positions.Count
     strValue = cellFoodMart(iCount-1).FormattedValue
 Next
      If you have an MDX statement returning both COLUMNS and ROWS, the ordinal position of the cell can be calculated by iterating through the positions on both axes, and using the position ordinal number as the cell reference:
Set colFoodMart = cellFoodMart.Axes(0)
 Set rowFoodMart = cellFoodMart.Axes(1)
 For Each posFMRow In rowFoodMart.Positions
     For Each posFMCol In colFoodMart.Positions
         strValue = cellFoodMart(posFMCol.Ordinal,   
             posFMRow.Ordinal).FormattedValue
     Next
 Next
      The online documentation provided with OLAP services discusses the object model in detail, and should be studied if you're going to develop with ADOMD. This discussion of ADOMD only scratches the surface of a very flexible and comprehensive object model. My purpose here has been to give a brief overview of ADOMD to enable a simple application to be constructed. Such an application will be the focus of my next article.


For related information see:
Building OLAP Services Applications at http://msdn.microsoft.com/library/psdk/olap/progguideintro.htm.
Also check http://msdn.microsoft.com for daily updates on developer programs, resources and events.

From the August 1999 issue of Microsoft Systems Journal.