Data Shaping: Handling Non-Relational Data

Mike Gunderloy

Not all data can be force-fit into relational formats. Microsoft has introduced data shaping as an extension to SQL to handle hierarchical recordsets. Mike Gunderloy explains the basics of data shaping and digs into the syntax of the SHAPE statement.

Microsoft supports industry standards when it suits the company’s aims, but Microsoft isn’t afraid to go off in its own direction as well. ADO’s SHAPE command is a good example. Although most of what you can do with SQL statements through ADO to standard providers bears at least a close resemblance to ANSI SQL, SHAPE was invented out of whole cloth to support the idea of data shaping. In this article, I’ll review the basics of data shaping and then show how this extension to SQL can help you keep track of hierarchies of information.

What is data shaping?

Data shaping is the process of defining a shaped recordset. Okay, so what’s a shaped recordset? Well, a shaped recordset is one that can contain more than just data. In particular, the columns of a shaped recordset can contain the following:

• Data (just like the columns in any other recordset)

• Pointers to another recordset

• Values derived from calculations on the current row of the recordset

• Values derived from calculations over all values in a particular column of the recordset

• Empty, fabricated columns that aren’t part of the original data source. Empty columns are useful in cases where you need to track some temporary information related to rows while you work with a recordset within an application. I won’t be using empty columns in this article.

As you can see, shaped recordsets are a bit more flexible than regular recordsets. But what is this flexibility good for?

Hierarchical recordsets

One of the most useful features of a shaped recordset is that it can contain pointers to other recordsets. By allowing a recordset to contain a pointer to another recordset, a shaped recordset is an ideal way to represent a hierarchy of information. In traditional SQL, you handle a hierarchy by joining tables (see "Managing Hierarchical Data" by Don Franke in this month’s issue). Joining tables results in redundant information in the resulting recordset.

For example, suppose you’re interested in information regarding customers and orders. With traditional SQL, you’d create a recordset using a SQL statement like this one:

SELECT Customers. CompanyName, Orders.OrderDate
FROM Customers INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID

When you execute this statement, you’d get a recordset that repeated customer names, once for each order the customer has placed.

By contrast, with data shaping, you could create a recordset using a SQL statement like this one:

SHAPE 
  {SELECT CustomerID, CompanyName FROM Customers}
APPEND 
  ({SELECT * FROM Orders}
RELATE CustomerID to CustomerID)

If you use a shaped recordset like this to hold the information, you’d get a recordset where each customer name occurs only once. This means less data to pass around and, ultimately, better performance.

When you retrieve the value of a field that points to another recordset, you actually get that entire recordset, filtered to include only the related records. This filtered subset of the child recordset is called a chapter. In the preceding example, there are three fields in the recordset created by the SHAPE command: CustomerID, CompanyName, and Chapter1 (the default name of a chapter if no name is supplied using an AS clause). Chapter1 is the recordset of Order information, filtered to include only orders for the first customer at the time that you open the recordset.

Hierarchical recordsets can be nested. That is, a child recordset can contain a pointer to yet another child recordset, a grandchild of the original recordset.

For a quick example of using a SHAPE command to retrieve a hierarchical recordset, take a look at frmGrid in the Shape1.mdb sample database included in the accompanying Download file. This form, shown in Figure 1, simply opens a hierarchical recordset and binds it to a Hierarchical FlexGrid control. The code behind the form is simple:

Option Compare Database
Option Explicit

Private mcnn As New ADODB.Connection
Private mrst As New ADODB.Recordset

Private Sub Form_Load()

    mcnn.Open "Provider=MSDataShape;" & _
     "Data Provider=SQLOLEDB.1;" & _
     "Server=(local);User ID=sa;" & _
     "Initial Catalog=Northwind"
    
    mrst.Open "SHAPE {" & _
     "SELECT CustomerID, CompanyName " & _
     "FROM Customers} APPEND ({SELECT * " & _
     "FROM Orders} RELATE CustomerID " & _
     "TO CustomerID)", mcnn
     
    Set hfgMain.DataSource = mrst
End Sub

That’s all it takes to open a hierarchical recordset and display it in a control (at least in a control that’s designed for hierarchical data). Now that you’ve seen the simple part of data shaping, it’s time to dig in a bit further.

The Data Shaping service

Data shaping is an ADO Service Provider (see the sidebar, "Foundation Concepts: Consumers, Providers, and Service Providers"). The Data Shaping service creates a hierarchical recordset from any data supplied to it by a data provider. When using data shaping, you must specify both the Data Shaping service and another OLE DB provider that will act as the data provider. For example, a connection string to provide shaped data from a SQL Server database might look like this:

Provider=MSDataShape;
Data Provider=SQLOLEDB.1;
Server=(local);
User ID=sa;
Initial Catalog=Northwind

You can create a connection string for data shaping out of any existing connection string just by making two changes:

1. Change the Provider keyword in the existing connection string to Data Provider.

2. Add Provider=MSDataShape to the connection string.

In effect, the data is provided to ADO by the MSDataShape Service provider. MSDataShape is supplied its data by the SQLOLEDB.1 data provider (see Figure 2).

For data shaping to work properly, you must be using client-side cursors. Setting the provider to be the Data Shaping service will automatically set the cursor for any recordset using that connection to client-side.

Types of hierarchical recordsets

The SHAPE statement is able to produce three different types of hierarchical recordsets:

• A relation hierarchy–A set of parent records and associated child records.

• A parameterized hierarchy–Also parent records and associated child records, but fetches child records on demand.

• A grouping hierarchy–A parent recordset composed of aggregate functions and the related detail records as the children.

Relation hierarchies

A relation hierarchy represents a set of parent records and associated child records. This recordset is similar to the recordset you can create with a SQL JOIN statement, but it doesn’t have the redundancy of a recordset based on a JOIN.

With a relation hierarchy, all of the records involved are read into the local cache before the SHAPE statement is processed. This can result in substantial overhead if your recordset includes a large number of records. However, once the original recordset has been constructed, subsequent fetches are quick because all of the data is already cached locally. You can continue to work with records in a relation hierarchy even after closing the connection that the recordset is based on.

Parameterized hierarchies

A parameterized hierarchy also represents parent records and associated child records, but it fetches child records on demand. Just like a relation hierarchy, it contains the same information as a recordset based on a JOIN, but without the redundant rows.

When you open a parameterized hierarchy, all of the records in the parent recordset are retrieved. However, child records aren’t retrieved until you explicitly open a recordset based on a chapter field. This means that opening a parameterized hierarchy can be much quicker than opening the corresponding relation hierarchy. However, each time you open a child recordset in a parameterized hierarchy, ADO must go back to the data source for more records, so moving through the recordset might be slower than with a relation hierarchy. You must also remain connected to the data source for as long as you want to work with records in a parameterized hierarchy.

Grouping hierarchies

A grouping hierarchy is a parent recordset composed of aggregate functions with the detail for the aggregate data as the child records. This is equivalent to joining an aggregate SQL statement (one using the Group By clause) with a detail SQL statement based on the same columns. Because summary and calculated columns might consist of values accumulated from more than one record, a grouping hierarchy’s parent records are automatically non-updateable.

Like relation hierarchies, all records that a grouping hierarchy are based on are read as soon as you open a recordset on the SHAPE statement.

The SHAPE statement

As you’ve seen, hierarchical recordsets are generated by the SHAPE statement. You don’t necessarily have to write SHAPE statements by hand. If you have a copy of Visual Basic 6.0, or the Microsoft Office Developer Edition for Office 2000, you can use the Data Environment Designer to generate SHAPE commands visually. You might find that you need to make some modifications to the generated syntax to make it more readable, but, in general, using these tools can save you time.

For example, to generate a SHAPE statement relating Customers, Orders, and Order Details, follow these steps:

1. Add a Data Environment to your Visual Basic project.

2. Use the Data Link Properties dialog box to connect this Data Environment to the SQL Server Northwind database.

3. Add a command named Customers, based on the Customers table, to the default connection.

4. Add a child command named Orders, based on the Orders table, to the Customers command. Relate this command to the Customers command using the CustomerID field in both parent and child.

5. Add a child command named OrderDetails, based on the Order Details table, to the Orders command. Relate this command to the Orders command using the OrderID field in both parent and child.

6. Right-click the Customers command and choose Hierarchy Info.

If you follow these particular steps, you’ll have created this SHAPE statement (indenting added):

SHAPE 
 {SELECT * FROM "dbo"."Customers"}  
   AS Customers 
APPEND 
  ((SHAPE 
     {SELECT * FROM "dbo"."Orders"}  
      AS Orders
    APPEND 
      ({SELECT * FROM "dbo"."Order Details"}
          AS OrderDetails
        RELATE 'OrderID' TO 'OrderID') 
          AS OrderDetails)
      AS Orders 
    RELATE 'CustomerID' TO 'CustomerID') 
      AS Orders

This statement includes some qualifiers (such as the dbo to indicate table ownership and the aliasing of SELECTs to command names) that aren’t strictly necessary, but they don’t hurt.

Syntax of SHAPE

There are actually two different varieties of the SHAPE statement. SHAPE…APPEND is used to create relation and parameterized hierarchies, while SHAPE…COMPUTE is used to create group hierarchies. Either one of these statements can contain aggregate functions or calculated expressions. In this article, I’ll concentrate on the SHAPE…APPEND command.

The general syntax of the SHAPE…APPEND statement is as follows:

SHAPE 
  {parent_command} AS parent_alias
APPEND 
  ({child_command} AS child_alias
   RELATE parent_column TO child_column…) 
     AS chapter_alias …

Here, parent_command and child_command are usually two SQL statements that return the recordsets that will be built up into the hierarchical recordset. You can assign alias names to each of these recordsets, but you’re not required to do so. You can also assign an alias to the chapter created from the child recordset; this alias becomes the column name in the main recordset that points to the child recordset.

There are, however, other choices for both parent_command and child_command. They can be:

• the name of a previously shaped recordset.

• another, nested SHAPE statement.

• the TABLE keyword, followed by the name of a table from the data source.

Because you can nest SHAPE statements, you can create hierarchies of any desired depth. The curly braces and the parentheses are required parts of the statement.

You can relate the parent and child commands by specifying a pair of fields, or by specifying a group of pairs of fields. For example, this would be a valid RELATE clause:

RELATE OrderID TO OrderID, 
  OrderDate TO OrderDetailDate

A SHAPE…APPEND statement can also contain multiple APPEND clauses. Each APPEND creates another chapter column, each of which refers to a subsidiary recordset.

The syntax for a parameterized hierarchy is slightly different in that it includes information on which column to treat as a parameter. It’s easiest to show an example of this variation:

SHAPE 
 {SELECT CustomerID, CompanyName FROM Customers}
APPEND 
 ({SELECT * FROM Orders 
     WHERE CustomerID = ?}
   RELATE CustomerID to PARAMETER 0)

Here, the question mark within the definition of the child command indicates the column that will be parameterized at runtime. The clause PARAMETER 0 in the RELATE clause shows how the parameter relates to the parent command. The effect is that each time the parent command moves to a new row and you request the column requesting the chapter Recordset, the child SQL is issued. When the child SQL is issued, the question mark in the Where clause is replaced by the current value of CustomerID from the parent command.

Data shaping is a powerful technique when you have data that has hierarchical relationships. A shaped recordset lets you work with the data that has parent-child relationships without having to write code to implement the relationships. For instance, if you delete a parent record from a shaped recordset, data shaping will delete the child records, even if the underlying database doesn’t support cascading deletes. Data shaping allows you to build into the structure of the recordset the relationships that exist in your data.

(Excerpted from Mike Gunderloy’s Visual Basic Developer’s Guide to ADO by permission of SYBEX, Inc., ISBN 0-7821-2556-5. Copyright © 1999, SYBEX, Inc. , 1151 Marina Village Parkway, Alameda, CA 94501. All rights reserved. For further information please contact info@sybex.com or 1-800-227-2346. Additional information may be obtained on the Sybex Web Site [http://www.sybex.com]. No part of this article may be reproduced in any way without the prior written agreement and permission of SYBEX, Inc.)

Download SHAPE.exe

Mike Gunderloy, MCSE, MCSD, is a senior consultant with MCW Technologies, a Microsoft Solution Provider, and the author of numerous books and articles about Access, Office, Visual Basic, and VBA. He recently published Visual Basic Developer’s Guide to ADO and is now at work on Mastering SQL Server 7.5, both from Sybex, and may be reached at MikeG1@mcwtech.com when he’s not out feeding the horse, llamas, chickens, or guinea fowl on his farm.

Sidebar: Foundation Concepts: Consumers, Providers, and Service Providers

OLE DB, on which ADO rests, provides a set of objects that allow you to work with data. OLE DB divides these objects into three groups: consumers, providers, and service providers. Consumers are objects or applications that use data provided by the other two kinds of objects. A provider is an object that can extract data from a data source and return it in the standard ADO format. A service provider can’t extract data, but, like a consumer, it can accept data from a provider. A service provider, unlike a straight consumer, also manipulates the data and can pass it on to a consumer or another service provider. Since a service provider can work with any ADO provider, service providers can be very flexible. Microsoft’s search engine, Index Server, has a data provider that you can use to extract data from the search engine’s catalog (for example, "Find me all of the documents containing the words ‘Smart Access’"). Since the Index Server data provider returns data in the standard ADO format, you can use the Data Shaping server to manipulate that data.