Performing the Analysis

The cmdAnalyze button starts the process rolling. We simply empty out the txtAnalysis text box and call three routines. We have already examined the first routine, setupID3. This deletes all records in the ID3 table that might have been left over from a previous analysis and resets the country information. So now it's time to look at the code for the analysis.

Try It Out - Performing the Analysis

1.  In the Click event of the cmdAnalyze button, please add the following code. Notice that we are clearing out the txtAnalysis box and then calling three routines, one after the other. We already wrote the setupID3 routine. As you recall, that routine added the region and language to our ID3 table. So that is called each time the user wishes to analyze another product. So now we need to write the next two routines, the buildID3 and determineEntropy routines.

Private Sub cmdAnalyze_Click()

txtAnalysis = ""
Call setupID3
Call buildID3
Call determineEntropy

End Sub

2.  Please add another sub to your frmID3 form with the name buildID3. This is rather long, but it actually populates the rest of our ID3 table. Essentially, we are gong to use some monster SQL statements to retrieve information from the existing tables and make some deductions about the contents. This is the aggregation we were discussing earlier. Don't worry if this looks a bit scary. We will go over it piece by piece. But for now, please enter the code into the buildID3 sub.

Private Sub buildID3()

Dim sSql As String
Dim sCriteria As String
Dim sCurrentCountry As String
Dim sCurrentLanguage As String
Dim sCurrentRegion As String
Dim adoRequestedData As ADODB.Recordset
Dim adoID3 As ADODB.Recordset

sbStatus.Panels.Item(1).Text = "Building Query."
sbStatus.Panels.Item(2).Text = "Working..."

'-------------------------------------------------------------
'-- This SQL will retrieve the Category Name, Total, 
'-- Order Date, Country, and Product Name for the categories
'-- requested. 
'-------------------------------------------------------------
sSql = "SELECT DISTINCTROW Categories.CategoryName, "
sSql = sSql & " SUM([Order Details].Quantity) AS Total,"
sSql = sSql & " Orders.OrderDate, Customers.Country,"
sSQL = sSQL & " Products.ProductName "
sSql = sSql & " FROM (Customers INNER JOIN Orders ON "
sSql = sSql & " Customers.CustomerID = Orders.CustomerID)"
sSql = sSql & " INNER JOIN ((Categories INNER JOIN Products"
sSql = sSql & " ON Categories.CategoryID ="
sSql = sSql & " Products.CategoryID) INNER JOIN "
sSql = sSql & " [Order Details] ON Products.ProductID = "
sSql = sSql & " [Order Details].ProductID) ON "
sSql = sSql & " Orders.OrderID = [Order Details].OrderID "

'----------------------------
'-- Now check the criteria --
'----------------------------
sCriteria = ""
sCriteria = sCriteria & " Categories.CategoryID = " & _
    lstCategory.ItemData(lstCategory.ListIndex)
sCriteria = sCriteria & " AND "
sCriteria = sCriteria & " Products.ProductID = " & _
    lstProduct.ItemData(lstProduct.ListIndex)
  
sSql = sSql & " WHERE " & sCriteria

sSql = sSql & " GROUP BY "
sSql = sSql & " Categories.CategoryName, "
sSql = sSql & "[Order Details].Quantity, Products.ProductName,"
sSql = sSql & " Orders.OrderDate, Customers.Country "
sSql = sSql & " ORDER BY Categories.CategoryName "

'MsgBox sSql

'----------------------------------------------------
'-- Open a recordset with the results of the query --
'----------------------------------------------------
Set adoRequestedData = New ADODB.Recordset
adoRequestedData.CursorLocation = adUseClient
adoRequestedData.CursorType = adOpenDynamic
adoRequestedData.Open sSql, adoConnection

'--------------------------------------------
'-- If there were no records, exit the sub --
'--------------------------------------------
With adoRequestedData
   If (.RecordCount < 1) Then
     MsgBox "No records"
     Exit Sub
   Else
     ProgressBar1.Max = .RecordCount
     .MoveFirst
   End If
End With

sbStatus.Panels.Item(1).Text = "Determining Sales Information"

'------------------------------------------------------
'-- Now, create the ID3 table and prepare to refresh --
'------------------------------------------------------
Set adoID3 = New ADODB.Recordset
sSql = "SELECT * FROM ID3"
adoID3.Open sSql, adoConnection, adOpenDynamic, _
   adLockOptimistic, adCmdText
'------------------------------------------------------------
'-- Loop through all of the records in the recordset
'-- returned by the query and update the ID3 table based
'-- on the results.                         
'------------------------------------------------------------
With adoRequestedData
 While Not .EOF
  
   ProgressBar1.Value = .AbsolutePosition
   DoEvents
   sCurrentCountry = !Country
   adoID3.Filter = "Country = '" & sCurrentCountry & "'"
   If ((Not adoID3.BOF) And (Not adoID3.EOF)) Then
     adoID3!Category = !CategoryName
     adoID3!Product = !ProductName
     If (Year(!OrderDate) = "1995") Then
             adoID3!OldQuantity = adoID3!OldQuantity + !Total
     ElseIf (Year(!OrderDate) = "1996") Then
             adoID3!NewQuantity = adoID3!NewQuantity + !Total
     End If
     If (adoID3!OldQuantity < adoID3!NewQuantity) Then
             adoID3!SalesUp = True
     Else
             adoID3!SalesUp = False
     End If
     If (adoID3!NewQuantity > 0) And (adoID3!OldQuantity > 0) _
          Then
        adoID3!UpByHowMuch = _
CSng(Format((adoID3!NewQuantity/adoID3!OldQuantity), "##.###"))
     End If
     '-- There were only sales in the current year --
     If (adoID3!NewQuantity > 0) And (adoID3!OldQuantity < 1) _
          Then
        adoID3!UpByHowMuch = CSng(Format(1, "##.###"))
     End If
     '-- There were only sales in the previous year --
     If (adoID3!NewQuantity < 1) And (adoID3!OldQuantity > 0) _
          Then
        adoID3!UpByHowMuch = CSng(Format(-1, "##.###"))
     End If
     adoID3.Update
   End If
   .MoveNext
 Wend
End With

adoConnection.Execute _
("DELETE * FROM ID3 WHERE ((OldQuantity = 0)" _
 & " AND (NewQuantity = 0))")
sbStatus.Panels.Item(1).Text = ""
sbStatus.Panels.Item(2).Text = "Ready."
ProgressBar1.Value = 0

Exit Sub
myError:
MsgBox (Err.Description)

End Sub

How It Works

While this looks crazy, we are really just doing something rather simple. First, we are retrieving the details on all of the orders placed for the product requested to be analyzed. We then loop through the recordset and add all of the sales for 1993 into the old quantity field for a specific country and all of the 1994 sales into the new quantity field. We then determine if the sales have gone up or down from year to year, and by how much. So we will have all of this data for each country for the specific product in our ID3 table. See, we are aggregating the sales from several years and storing the data in our ID3 table. Let's walk through the code and discuss how we are doing this. But in a nutshell, here is what this routine is doing:

Let's look at the code and examine how each of these steps works.

The Big Kahuna of SQL Statements

First, we need to retrieve fields from the tables where the records meet the criteria the user selected to be analyzed. We want to build an SQL statement to get the raw data we need to update our ID3 table. Since the statement is a bit longer that what we have previously built, the easiest way is to build the statement in three sections.

The first part of the SQL statement explains the fields that we want to retrieve. We ask for the CategoryName, next we create a derived field that provides the total quantity of the orders for that category, then the date, the country, and finally the product name.

We must then provide SQL join operations so the relations between the tables can be exploited. We are retrieving data from the Categories, Order Details, Orders, Customers and Products tables in a single recordset.

sSql = "SELECT DISTINCTROW Categories.CategoryName, "
sSql = sSql & " SUM([Order Details].Quantity) AS Total,"
sSql = sSql & " Orders.OrderDate, Customers.Country,"
sSQL = sSQL & " Products.ProductName "
sSql = sSql & " FROM (Customers INNER JOIN Orders ON "
sSql = sSql & " Customers.CustomerID = Orders.CustomerID)"
sSql = sSql & " INNER JOIN ((Categories INNER JOIN Products"
sSql = sSql & " ON Categories.CategoryID ="
sSql = sSql & " Products.CategoryID) INNER JOIN "
sSql = sSql & " [Order Details] ON Products.ProductID = "
sSql = sSql & " OrderDetails.ProductID) ON "
sSql = sSql & " Orders.OrderID = [Order Details].OrderID "

Since the SQL statement is really a single string, we keep concatenating the string sSql that will provide the query. To give you a visual sense of how we are going to retrieve the fields, if you have Microsoft Access, you can see the relationships. As a personal opinion, if you don't have Access, this feature alone is well worth the price. As I mentioned before, you can experiment until you get your query right in Access, then cut/paste the SQL statement it generates and tweak it.

Once we have defined the fields required, we establish the criteria by using the WHERE clause. Notice that we are retrieving records where the CategoryID is equal to the ItemData value of the selected field. Remember when we stuffed this value in there when we populated the list box. Now we take that ID value that is associated with the user selection. Likewise, we also want the ProductID to be equal to the ItemData value of the lstProduct list box. Once we build that string, we append it to the first part of the SQL statement and preface it with the WHERE clause.

sCriteria = ""
sCriteria = sCriteria & " Categories.CategoryID = " & _
    lstCategory.ItemData(lstCategory.ListIndex)
sCriteria = sCriteria & " AND "
sCriteria = sCriteria & " Products.ProductID = " & _
    lstProduct.ItemData(lstProduct.ListIndex)

sSql = sSql & " WHERE " & sCriteria

Finally, we want the data returned to us in a specific order. By using the SQL GROUP BY clause, we can get the records returned grouped by CategoryName, then Quantity, then ProductName, then OrderDate, and finally Country. The SQL ORDER BY clause sorts items in CategoryName sequence.

sSql = sSql & " GROUP BY "
sSql = sSql & " Categories.CategoryName, "
sSql = sSql & "[Order Details].Quantity, Products.ProductName,"
sSql = sSql & " Orders.OrderDate, Customers.Country "
sSql = sSql & " ORDER BY Categories.CategoryName "

When you are testing out the SQL statement, you might want to add a message box right after the statement is built. You could just add:

MsgBox sSql

This line will show you the fully built SQL before you try to open a recordset with it. You can check that the values are set. Notice in the WHERE clause the Category ID = 1 and the Product ID = 1. This ensures that the correct values are being embedded in the statement.

However, your eyesight must be pretty good to read this thing! Remember when we discussed re aggregating data from several tables? You can see that is exactly what we are doing here. The individual tables are nicely normalized. But since we are trying to draw some conclusions from data that resides in each of them, we round up all of the fields needed from each of the tables and place the results in a recordset adoRequestedData. Now we are ready to execute the query and create the recordset.

We set properties on the adoRequestedData recordset and pass the Open method the sSql statement. Notice that on this recordset, we are setting properties before we open the recordset. We want the client-side cursor so we can get the RecordCount property. We are setting the CursorType property instead of passing it as a parameter to the Open method of the recordset just to show you that it can be done this way.

Set adoRequestedData = New ADODB.Recordset
adoRequestedData.CursorLocation = adUseClient
adoRequestedData.CursorType = adOpenDynamic
adoRequestedData.Open sSql, adoConnection

The adoRequestedData recordset holds all of the relevant information on the specific product the product manager wants to analyze. If there are records returned from our request in the adoRequestedData table, we are ready to go to work on our adoID3 recordset. Again, the adoID3 recordset will hold the results of our analysis on the data returned in the adoRequestedData recordset.

We set a reference to our adoID3 recordset and then open it, selecting all records to be returned. Remember that we will have as many records in our ID3 table as we had unique countries. However, there are still a few fields in each record that we now need to fill out, such as the sales numbers - are the current sales greater than previous sales, etc.

Set adoID3 = New ADODB.Recordset
sSql = "SELECT * FROM ID3"
adoID3.Open sSql, adoConnection, adOpenDynamic, _
   adLockOptimistic, adCmdText

We are now ready to build the rest of our ID3 table. We can now attempt to deduce any sales patterns by summing the sales information from the adoRequestedData recordset and placing it in the adoID3 recordset.

Aggregating Data in Our ID3 Table

The adoRequestedData recordset will be walked through one record at a time. As we loop through the While Not .EOF loop, we first update the progress bar. Users love this and they feel important as it progresses. Just watch them puff out their chests as the blue blocks progress.

Next we take the country out of the current record of the adoRequestedData recordset. Remember that we returned the records on the product being analyzed, and the sales information includes multiple sales from multiple countries. However, if we had 100 sales of our new beer to France, each of the sales figures will be distilled to a single record for France in our ID3 table.

We take the first record in the adoRequestedData recordset and assign its country, say it's France, to the variable sCurrentCountry. Remember that there might be 100 records of sales of our product to France in the adoRequestedData recordset.

We want to find the adoID3 record for that country, France. There are two ways to do this. First, we could execute an SQL statement against the adoID3 table to retrieve all records "WHERE Country = '" & sCurrentCountry & "'". But you already know how to do this, Ss we are using a new ADO 2.0 feature - the Filter property.

With adoRequestedData
 While Not .EOF
  
   ProgressBar1.Value = .AbsolutePosition
   DoEvents
   sCurrentCountry = !Country
   adoID3.Filter = "Country = '" & sCurrentCountry & "'"

The Filter property can be used to select specific records in a recordset object when using client-side cursors. The filtered recordset becomes the current cursor. By setting the property, the current record becomes the first record that satisfied the values for the filter. Remember that adoID3 was opened with all records in the table. By setting the filter, we are isolating only those records that satisfy a criterion. In this case, we only want those that equal the country in the current record of adoRequestedData. Of course, there will only be one record returned from ID3, because each ID3 record has a unique country.

It would probably be just a tad faster to create a SQL statement against the ID3 table to get the specific record. However, this is a good place to illustrate this new property. It works just fine for our purposes.

Let's assume that sCurrentRecord = "France". If a record was returned from the filtered ID3 - and we know it will - we add the CategoryName and ProductName to the record. Remember that we can only analyze a single product at a time, so this will be the same for each record in the ID3 table.

If ((Not adoID3.BOF) And (Not adoID3.EOF)) Then
         adoID3!Category = !CategoryName
         adoID3!Product = !ProductName

The year is plucked from the date of the order from the current adoRequestedData record. If it is 1995, we add it to the OldQuantity, else if the order was placed in 1996, we add it to the NewQuantity. So if there are 100 orders for this product from France, all of the orders are summed in either of the two fields. For example, if there were 50 orders placed in 1995 and 50 placed in 1996, the totals for each year will be in the respective fields in the single adoID3 record. This will permit us to compare the growth rates of the two years.

Now we can simply update our binary field, SalesUp, in the ID3 table. If the quantity sold of the product being analyzed in 1995 was less than the quantity sold in 1996, then we set the SalesUp to True. This simply tells us that there is an upward trend. Otherwise, we set SalesUp to False. We are simply deducing a new piece of data that is not in our existing tables anywhere. This is clearly a helpful piece of information for the product manager.

If (adoID3!OldQuantity < adoID3!NewQuantity) Then
    adoID3!SalesUp = True
Else
    adoID3!SalesUp = False
End If

We are now in a position to determine the relative growth of the product over the two years. If both of the quantity fields have a value, then we calculate the UpByHowMuch field. A value of 1.0 means that sales were steady. Any value greater than 1 tells us how much growth. A value less than one, say 0.80 tells us that sales were down from one year to the next. Quite simply, we divide the NewQuantity (1996 sales) by the OldQuantity (1995 sales). To ensure that we have the values stored in a standardized way in the ID3 table, we first Format the field. Finally, we convert (cast) the result to a single using CSng. This value is then placed in the UpByHowMuch field in the adoID3 recordset.

If (adoID3!NewQuantity > 0) And (adoID3!OldQuantity > 0) _
          Then
        adoID3!UpByHowMuch = _
CSng(Format((adoID3!NewQuantity/adoID3!OldQuantity), "##.###"))
End If

Thinking About Exceptions

It could be that we are analyzing a new product and there are only sales in 1996 but not 1995. Or, the product could have been discontinued in 1995, so there are no 1996 numbers. If we only had sales in one of the two years, we simply hard code in a 1 or -1 in the field. This will tell us if we went up or down. So if there were no sales in 1995 but some in 1996, we place a 1 telling us that there was some growth, but we don't have a relative measure yet because there is no history. However, if there were no sales in 1996 but some in 1995, we add a –1. Finally, we call the Update method of the recordset and write the results to the database.

'-- There were only sales in the current year --
  If (adoID3!NewQuantity > 0) And (adoID3!OldQuantity < 1) _
          Then
      adoID3!UpByHowMuch = CSng(Format(1, "##.###"))
  End If
'-- There were only sales in the previous year --
  If (adoID3!NewQuantity < 1) And (adoID3!OldQuantity > 0) _
          Then
      adoID3!UpByHowMuch = CSng(Format(-1, "##.###"))
  End If
  adoID3.Update

There are some records in the database where sales for both years are 0. Why? Remember when we initialized the ID3 table we created a record for each country. Well, if the product we are analyzing did not have any sales to a specific country, we want to delete that record from the ID3 table. Otherwise, we would not get an accurate picture of what is really happening with sales. We just 'clean' the data to eliminate those records that could skew our results. Data mining technologists call this cleaning, or scrubbing the data prior to analysis.

adoConnection.Execute _
("DELETE * FROM ID3 WHERE ((OldQuantity = 0)" _
 & " AND (NewQuantity = 0))")

So if we ran a query on, say the Carnarvon Tigers product in the Seafood category, the ID3 table might look like this when the buildID3 routine is complete. Notice the sales to France. There were a total of 98 orders placed from France in 1995 but only 36 placed for 1996. So the SalesUp check box is not checked, so we know sales were down. We calculated sales to be only 36 percent of the previous year! We also see examples of no sale in 1995 for Italy, but a total of 8 for 1996. This is a case where we place a 1 in the UpByHowMuch field. Likewise, there were 8 products sold to Norway in 1995 but none in 1996, so a –1 is placed here.

Now that we have aggregated our data in the ID3 table, let's see if we can make any deductions from the results. Everything we have done up to now has been to build this ID3 table. This table represents the re-aggregation of data we have been discussing.

© 1998 by Wrox Press. All rights reserved.