The cmdAnalyze
button starts the process rolling. We simply empty out the txtAnalysis
text box and call three routines. We have already examined the first routine, setupID3
. This deletes all records in the ID3
table that might have been left over from a previous analysis and resets the country information. So now it's time to look at the code for the analysis.
1. In the Click
event of the cmdAnalyze
button, please add the following code. Notice that we are clearing out the txtAnalysis
box and then calling three routines, one after the other. We already wrote the setupID3
routine. As you recall, that routine added the region and language to our ID3
table. So that is called each time the user wishes to analyze another product. So now we need to write the next two routines, the buildID3
and determineEntropy
routines.
Private Sub cmdAnalyze_Click()
txtAnalysis = ""
Call setupID3
Call buildID3
Call determineEntropy
End Sub
2. Please add another sub to your frmID3
form with the name buildID3
. This is rather long, but it actually populates the rest of our ID3
table. Essentially, we are gong to use some monster SQL statements to retrieve information from the existing tables and make some deductions about the contents. This is the aggregation we were discussing earlier. Don't worry if this looks a bit scary. We will go over it piece by piece. But for now, please enter the code into the buildID3
sub.
Private Sub buildID3()
Dim sSql As String
Dim sCriteria As String
Dim sCurrentCountry As String
Dim sCurrentLanguage As String
Dim sCurrentRegion As String
Dim adoRequestedData As ADODB.Recordset
Dim adoID3 As ADODB.Recordset
sbStatus.Panels.Item(1).Text = "Building Query."
sbStatus.Panels.Item(2).Text = "Working..."
'-------------------------------------------------------------
'-- This SQL will retrieve the Category Name, Total,
'-- Order Date, Country, and Product Name for the categories
'-- requested.
'-------------------------------------------------------------
sSql = "SELECT DISTINCTROW Categories.CategoryName, "
sSql = sSql & " SUM([Order Details].Quantity) AS Total,"
sSql = sSql & " Orders.OrderDate, Customers.Country,"
sSQL = sSQL & " Products.ProductName "
sSql = sSql & " FROM (Customers INNER JOIN Orders ON "
sSql = sSql & " Customers.CustomerID = Orders.CustomerID)"
sSql = sSql & " INNER JOIN ((Categories INNER JOIN Products"
sSql = sSql & " ON Categories.CategoryID ="
sSql = sSql & " Products.CategoryID) INNER JOIN "
sSql = sSql & " [Order Details] ON Products.ProductID = "
sSql = sSql & " [Order Details].ProductID) ON "
sSql = sSql & " Orders.OrderID = [Order Details].OrderID "
'----------------------------
'-- Now check the criteria --
'----------------------------
sCriteria = ""
sCriteria = sCriteria & " Categories.CategoryID = " & _
lstCategory.ItemData(lstCategory.ListIndex)
sCriteria = sCriteria & " AND "
sCriteria = sCriteria & " Products.ProductID = " & _
lstProduct.ItemData(lstProduct.ListIndex)
sSql = sSql & " WHERE " & sCriteria
sSql = sSql & " GROUP BY "
sSql = sSql & " Categories.CategoryName, "
sSql = sSql & "[Order Details].Quantity, Products.ProductName,"
sSql = sSql & " Orders.OrderDate, Customers.Country "
sSql = sSql & " ORDER BY Categories.CategoryName "
'MsgBox sSql
'----------------------------------------------------
'-- Open a recordset with the results of the query --
'----------------------------------------------------
Set adoRequestedData = New ADODB.Recordset
adoRequestedData.CursorLocation = adUseClient
adoRequestedData.CursorType = adOpenDynamic
adoRequestedData.Open sSql, adoConnection
'--------------------------------------------
'-- If there were no records, exit the sub --
'--------------------------------------------
With adoRequestedData
If (.RecordCount < 1) Then
MsgBox "No records"
Exit Sub
Else
ProgressBar1.Max = .RecordCount
.MoveFirst
End If
End With
sbStatus.Panels.Item(1).Text = "Determining Sales Information"
'------------------------------------------------------
'-- Now, create the ID3 table and prepare to refresh --
'------------------------------------------------------
Set adoID3 = New ADODB.Recordset
sSql = "SELECT * FROM ID3"
adoID3.Open sSql, adoConnection, adOpenDynamic, _
adLockOptimistic, adCmdText
'------------------------------------------------------------
'-- Loop through all of the records in the recordset
'-- returned by the query and update the ID3 table based
'-- on the results.
'------------------------------------------------------------
With adoRequestedData
While Not .EOF
ProgressBar1.Value = .AbsolutePosition
DoEvents
sCurrentCountry = !Country
adoID3.Filter = "Country = '" & sCurrentCountry & "'"
If ((Not adoID3.BOF) And (Not adoID3.EOF)) Then
adoID3!Category = !CategoryName
adoID3!Product = !ProductName
If (Year(!OrderDate) = "1995") Then
adoID3!OldQuantity = adoID3!OldQuantity + !Total
ElseIf (Year(!OrderDate) = "1996") Then
adoID3!NewQuantity = adoID3!NewQuantity + !Total
End If
If (adoID3!OldQuantity < adoID3!NewQuantity) Then
adoID3!SalesUp = True
Else
adoID3!SalesUp = False
End If
If (adoID3!NewQuantity > 0) And (adoID3!OldQuantity > 0) _
Then
adoID3!UpByHowMuch = _
CSng(Format((adoID3!NewQuantity/adoID3!OldQuantity), "##.###"))
End If
'-- There were only sales in the current year --
If (adoID3!NewQuantity > 0) And (adoID3!OldQuantity < 1) _
Then
adoID3!UpByHowMuch = CSng(Format(1, "##.###"))
End If
'-- There were only sales in the previous year --
If (adoID3!NewQuantity < 1) And (adoID3!OldQuantity > 0) _
Then
adoID3!UpByHowMuch = CSng(Format(-1, "##.###"))
End If
adoID3.Update
End If
.MoveNext
Wend
End With
adoConnection.Execute _
("DELETE * FROM ID3 WHERE ((OldQuantity = 0)" _
& " AND (NewQuantity = 0))")
sbStatus.Panels.Item(1).Text = ""
sbStatus.Panels.Item(2).Text = "Ready."
ProgressBar1.Value = 0
Exit Sub
myError:
MsgBox (Err.Description)
End Sub
While this looks crazy, we are really just doing something rather simple. First, we are retrieving the details on all of the orders placed for the product requested to be analyzed. We then loop through the recordset and add all of the sales for 1993 into the old quantity field for a specific country and all of the 1994 sales into the new quantity field. We then determine if the sales have gone up or down from year to year, and by how much. So we will have all of this data for each country for the specific product in our ID3
table. See, we are aggregating the sales from several years and storing the data in our ID3
table. Let's walk through the code and discuss how we are doing this. But in a nutshell, here is what this routine is doing:
adoRequestedData
.adoID3
recordset. Remember now there is a record for each country, but the rest of the fields are empty.adoRequestedData
records.adoID3
table. All sales for 1995 are summed in the oldQuantity
field and 1996 sales are summed in the NewQuantity
field for that country's record.adoRequestedData
for each country have been reviewed, we determine if the sales of the product in 1996 was greater than 1995. We then determine the growth between years. The fields SalesUp
and UpByHowMuch
are then calculated for that country's record in the adoID3
recordset.Let's look at the code and examine how each of these steps works.
First, we need to retrieve fields from the tables where the records meet the criteria the user selected to be analyzed. We want to build an SQL statement to get the raw data we need to update our ID3
table. Since the statement is a bit longer that what we have previously built, the easiest way is to build the statement in three sections.
The first part of the SQL statement explains the fields that we want to retrieve. We ask for the CategoryName
, next we create a derived field that provides the total quantity of the orders for that category, then the date, the country, and finally the product name.
We must then provide SQL join operations so the relations between the tables can be exploited. We are retrieving data from the Categories
, Order Details
, Orders
, Customers
and Products
tables in a single recordset.
sSql = "SELECT DISTINCTROW Categories.CategoryName, "
sSql = sSql & " SUM([Order Details].Quantity) AS Total,"
sSql = sSql & " Orders.OrderDate, Customers.Country,"
sSQL = sSQL & " Products.ProductName "
sSql = sSql & " FROM (Customers INNER JOIN Orders ON "
sSql = sSql & " Customers.CustomerID = Orders.CustomerID)"
sSql = sSql & " INNER JOIN ((Categories INNER JOIN Products"
sSql = sSql & " ON Categories.CategoryID ="
sSql = sSql & " Products.CategoryID) INNER JOIN "
sSql = sSql & " [Order Details] ON Products.ProductID = "
sSql = sSql & " OrderDetails.ProductID) ON "
sSql = sSql & " Orders.OrderID = [Order Details].OrderID "
Since the SQL statement is really a single string, we keep concatenating the string sSql
that will provide the query. To give you a visual sense of how we are going to retrieve the fields, if you have Microsoft Access, you can see the relationships. As a personal opinion, if you don't have Access, this feature alone is well worth the price. As I mentioned before, you can experiment until you get your query right in Access, then cut/paste the SQL statement it generates and tweak it.
Once we have defined the fields required, we establish the criteria by using the WHERE
clause. Notice that we are retrieving records where the CategoryID
is equal to the ItemData
value of the selected field. Remember when we stuffed this value in there when we populated the list box. Now we take that ID value that is associated with the user selection. Likewise, we also want the ProductID
to be equal to the ItemData
value of the lstProduct
list box. Once we build that string, we append it to the first part of the SQL statement and preface it with the WHERE
clause.
sCriteria = ""
sCriteria = sCriteria & " Categories.CategoryID = " & _
lstCategory.ItemData(lstCategory.ListIndex)
sCriteria = sCriteria & " AND "
sCriteria = sCriteria & " Products.ProductID = " & _
lstProduct.ItemData(lstProduct.ListIndex)
sSql = sSql & " WHERE " & sCriteria
Finally, we want the data returned to us in a specific order. By using the SQL GROUP BY
clause, we can get the records returned grouped by CategoryName
, then Quantity
, then ProductName
, then OrderDate
, and finally Country
. The SQL ORDER BY
clause sorts items in CategoryName
sequence.
sSql = sSql & " GROUP BY "
sSql = sSql & " Categories.CategoryName, "
sSql = sSql & "[Order Details].Quantity, Products.ProductName,"
sSql = sSql & " Orders.OrderDate, Customers.Country "
sSql = sSql & " ORDER BY Categories.CategoryName "
When you are testing out the SQL statement, you might want to add a message box right after the statement is built. You could just add:
MsgBox sSql
This line will show you the fully built SQL before you try to open a recordset with it. You can check that the values are set. Notice in the WHERE
clause the Category ID = 1
and the Product ID = 1
. This ensures that the correct values are being embedded in the statement.
However, your eyesight must be pretty good to read this thing! Remember when we discussed re aggregating data from several tables? You can see that is exactly what we are doing here. The individual tables are nicely normalized. But since we are trying to draw some conclusions from data that resides in each of them, we round up all of the fields needed from each of the tables and place the results in a recordset adoRequestedData
. Now we are ready to execute the query and create the recordset.
We set properties on the adoRequestedData
recordset and pass the Open
method the sSql
statement. Notice that on this recordset, we are setting properties before we open the recordset. We want the client-side cursor so we can get the RecordCount
property. We are setting the CursorType
property instead of passing it as a parameter to the Open
method of the recordset just to show you that it can be done this way.
Set adoRequestedData = New ADODB.Recordset
adoRequestedData.CursorLocation = adUseClient
adoRequestedData.CursorType = adOpenDynamic
adoRequestedData.Open sSql, adoConnection
The adoRequestedData
recordset holds all of the relevant information on the specific product the product manager wants to analyze. If there are records returned from our request in the adoRequestedData
table, we are ready to go to work on our adoID3
recordset. Again, the adoID3
recordset will hold the results of our analysis on the data returned in the adoRequestedData
recordset.
We set a reference to our adoID3
recordset and then open it, selecting all records to be returned. Remember that we will have as many records in our ID3
table as we had unique countries. However, there are still a few fields in each record that we now need to fill out, such as the sales numbers - are the current sales greater than previous sales, etc.
Set adoID3 = New ADODB.Recordset
sSql = "SELECT * FROM ID3"
adoID3.Open sSql, adoConnection, adOpenDynamic, _
adLockOptimistic, adCmdText
We are now ready to build the rest of our ID3
table. We can now attempt to deduce any sales patterns by summing the sales information from the adoRequestedData
recordset and placing it in the adoID3
recordset.
The adoRequestedData
recordset will be walked through one record at a time. As we loop through the While Not .EOF
loop, we first update the progress bar. Users love this and they feel important as it progresses. Just watch them puff out their chests as the blue blocks progress.
Next we take the country out of the current record of the adoRequestedData
recordset. Remember that we returned the records on the product being analyzed, and the sales information includes multiple sales from multiple countries. However, if we had 100 sales of our new beer to France, each of the sales figures will be distilled to a single record for France in our ID3
table.
We take the first record in the adoRequestedData
recordset and assign its country, say it's France, to the variable sCurrentCountry
. Remember that there might be 100 records of sales of our product to France in the adoRequestedData
recordset.
We want to find the adoID3
record for that country, France. There are two ways to do this. First, we could execute an SQL statement against the adoID3
table to retrieve all records "WHERE Country = '" & sCurrentCountry & "'"
. But you already know how to do this, Ss we are using a new ADO 2.0 feature - the Filter
property.
With adoRequestedData
While Not .EOF
ProgressBar1.Value = .AbsolutePosition
DoEvents
sCurrentCountry = !Country
adoID3.Filter = "Country = '" & sCurrentCountry & "'"
The Filter
property can be used to select specific records in a recordset object when using client-side cursors. The filtered recordset becomes the current cursor. By setting the property, the current record becomes the first record that satisfied the values for the filter. Remember that adoID3
was opened with all records in the table. By setting the filter, we are isolating only those records that satisfy a criterion. In this case, we only want those that equal the country in the current record of adoRequestedData
. Of course, there will only be one record returned from ID3
, because each ID3
record has a unique country.
It would probably be just a tad faster to create a SQL statement against the ID3 table to get the specific record. However, this is a good place to illustrate this new property. It works just fine for our purposes.
Let's assume that sCurrentRecord
= "France". If a record was returned from the filtered ID3
- and we know it will - we add the CategoryName
and ProductName
to the record. Remember that we can only analyze a single product at a time, so this will be the same for each record in the ID3
table.
If ((Not adoID3.BOF) And (Not adoID3.EOF)) Then
adoID3!Category = !CategoryName
adoID3!Product = !ProductName
The year is plucked from the date of the order from the current adoRequestedData
record. If it is 1995, we add it to the OldQuantity
, else if the order was placed in 1996, we add it to the NewQuantity
. So if there are 100 orders for this product from France, all of the orders are summed in either of the two fields. For example, if there were 50 orders placed in 1995 and 50 placed in 1996, the totals for each year will be in the respective fields in the single adoID3
record. This will permit us to compare the growth rates of the two years.
Now we can simply update our binary field, SalesUp
, in the ID3
table. If the quantity sold of the product being analyzed in 1995 was less than the quantity sold in 1996, then we set the SalesUp
to True
. This simply tells us that there is an upward trend. Otherwise, we set SalesUp
to False
. We are simply deducing a new piece of data that is not in our existing tables anywhere. This is clearly a helpful piece of information for the product manager.
If (adoID3!OldQuantity < adoID3!NewQuantity) Then
adoID3!SalesUp = True
Else
adoID3!SalesUp = False
End If
We are now in a position to determine the relative growth of the product over the two years. If both of the quantity fields have a value, then we calculate the UpByHowMuch
field. A value of 1.0 means that sales were steady. Any value greater than 1 tells us how much growth. A value less than one, say 0.80 tells us that sales were down from one year to the next. Quite simply, we divide the NewQuantity
(1996 sales) by the OldQuantity
(1995 sales). To ensure that we have the values stored in a standardized way in the ID3
table, we first Format
the field. Finally, we convert (cast) the result to a single using CSng
. This value is then placed in the UpByHowMuch
field in the adoID3
recordset.
If (adoID3!NewQuantity > 0) And (adoID3!OldQuantity > 0) _
Then
adoID3!UpByHowMuch = _
CSng(Format((adoID3!NewQuantity/adoID3!OldQuantity), "##.###"))
End If
It could be that we are analyzing a new product and there are only sales in 1996 but not 1995. Or, the product could have been discontinued in 1995, so there are no 1996 numbers. If we only had sales in one of the two years, we simply hard code in a 1 or -1 in the field. This will tell us if we went up or down. So if there were no sales in 1995 but some in 1996, we place a 1 telling us that there was some growth, but we don't have a relative measure yet because there is no history. However, if there were no sales in 1996 but some in 1995, we add a –1. Finally, we call the Update
method of the recordset and write the results to the database.
'-- There were only sales in the current year --
If (adoID3!NewQuantity > 0) And (adoID3!OldQuantity < 1) _
Then
adoID3!UpByHowMuch = CSng(Format(1, "##.###"))
End If
'-- There were only sales in the previous year --
If (adoID3!NewQuantity < 1) And (adoID3!OldQuantity > 0) _
Then
adoID3!UpByHowMuch = CSng(Format(-1, "##.###"))
End If
adoID3.Update
There are some records in the database where sales for both years are 0. Why? Remember when we initialized the ID3
table we created a record for each country. Well, if the product we are analyzing did not have any sales to a specific country, we want to delete that record from the ID3
table. Otherwise, we would not get an accurate picture of what is really happening with sales. We just 'clean' the data to eliminate those records that could skew our results. Data mining technologists call this cleaning, or scrubbing the data prior to analysis.
adoConnection.Execute _
("DELETE * FROM ID3 WHERE ((OldQuantity = 0)" _
& " AND (NewQuantity = 0))")
So if we ran a query on, say the Carnarvon Tigers product in the Seafood category, the ID3
table might look like this when the buildID3
routine is complete. Notice the sales to France. There were a total of 98 orders placed from France in 1995 but only 36 placed for 1996. So the SalesUp check box is not checked, so we know sales were down. We calculated sales to be only 36 percent of the previous year! We also see examples of no sale in 1995 for Italy, but a total of 8 for 1996. This is a case where we place a 1 in the UpByHowMuch field. Likewise, there were 8 products sold to Norway in 1995 but none in 1996, so a –1 is placed here.
Now that we have aggregated our data in the ID3
table, let's see if we can make any deductions from the results. Everything we have done up to now has been to build this ID3
table. This table represents the re-aggregation of data we have been discussing.