This is the table that we will use for generating the results of the ID3 algorithm. Here we are going to take liberty with our data. After doing a bit of research on our markets, we have decided that each country has its own language and each belongs to a region of the globe. We can use our ID3 algorithm to decide if the country, region, or language is important to new sales for each individual product.
While our database has all of the details of each sale, the products purchased, etc., we want to take a broader look. So instead of just looking at each order, we might get the results that price is important to the quantity of products sold. While this is helpful, this type of 'literal' information can be quickly retrieved with an SQL statement. We want to work a bit smarter, through. So we will add some meta-information (i.e. information about information) about the sale of our product. We might wish to know what language our purchasers speak. Or what region of the globe they live. This will provide our ID3 algorithm with more to work with.
We will update our table by looking at each market our customers are in and associating each with a region and language. So we are adding these additional parameters to our table. We have no way of knowing if they will be important - that is for the algorithm to tell us. As mentioned, we could have just run the ID3 algorithm on the data as-is. But this simply illustrates how you might be able to add simple meta-data to provide additional insight on the underlying data.
Take a look at the following diagram. This is how the core of our application will operate. A lot of this won't make much sense to you right now, but keep this chart in mind as you work through the rest of this chapter.
Andy - can you put the 'flowchart' that starts "Select Product from list box, User pressed cmdAnalyze..." in here. Thank you
1. Please add the following sub to the frmID3
form. This is a routine that will simply update our ID3
table with the region of the globe and the language spoken in each of the countries where we have customers. We will simply look at each country where we have a customer and add the region of the globe where the country is located and the language spoken by its inhabitants.
Public Sub setupID3()
Dim adoTempCountry As ADODB.Recordset
'holds each unique country
Dim adoTempid3 As ADODB.Recordset
'holds a recordset for ID3
Dim sCountry As String
Dim sSql As String
'-----------------------------------------------------
'-- Get the unique countries and place in temp --
'-----------------------------------------------------
Set adoTempCountry = New ADODB.Recordset
sSql = "SELECT DISTINCT Country FROM Customers"
Set adoTempCountry = adoConnection.Execute(sSql)
'-----------------------------------------------
'-----------------------------------------------------
'-- Clean out all of the data from the ID3 table --
'-----------------------------------------------------
adoConnection.Execute ("DELETE * FROM ID3")
'----------------------------
'-- Now set up the ID3 --
'---------------------------
Set adoTempid3 = New ADODB.Recordset
adoTempid3.CursorType = adOpenKeyset
adoTempid3.LockType = adLockOptimistic
adoTempid3.Open "ID3", adoConnection
'-------------------------------------------------------
'-- Iterate through the country table and add each --
'-- unique entry to the Country field of ID3. --
'-------------------------------------------------------
While Not adoTempCountry.EOF
sCountry = adoTempCountry!country
With adoTempid3
.AddNew
!Country = sCountry
Select Case UCase(sCountry)
Case "ARGENTINA"
!CountryRegion = "South America"
!CountryLanguage = "Spanish"
Case "AUSTRIA"
!CountryRegion = "Europe"
!CountryLanguage = "German"
Case "BELGIUM"
!CountryRegion = "Europe"
!CountryLanguage = "French"
Case "BRAZIL"
!CountryRegion = "South America"
!CountryLanguage = "Portuguese"
Case "CANADA"
!CountryRegion = "North America"
!CountryLanguage = "English"
Case "DENMARK"
!CountryRegion = "Scandinavia"
!CountryLanguage = "Danish"
Case "FINLAND"
!CountryRegion = "Scandinavia"
!CountryLanguage = "Finnish"
Case "FRANCE"
!CountryRegion = "Europe"
!CountryLanguage = "French"
Case "GERMANY"
!CountryRegion = "Europe"
!CountryLanguage = "German"
Case "IRELAND"
!CountryRegion = "British Isles"
!CountryLanguage = "English"
Case "ITALY"
!CountryRegion = "Europe"
!CountryLanguage = "Italian"
Case "MEXICO"
!CountryRegion = "North America"
!CountryLanguage = "Spanish"
Case "NORWAY"
!CountryRegion = "Scandinavia"
!CountryLanguage = "Norwegian"
Case "POLAND"
!CountryRegion = "Europe"
!CountryLanguage = "Polish"
Case "PORTUGAL"
!CountryRegion = "Mediterranean"
!CountryLanguage = "Portuguese"
Case "SPAIN"
!CountryRegion = "Mediterranean"
!CountryLanguage = "Spanish"
Case "SWEDEN"
!CountryRegion = "Scandinavia"
!CountryLanguage = "Swedish"
Case "SWITZERLAND"
!CountryRegion = "Europe"
!CountryLanguage = "German"
Case "UK"
!CountryRegion = "British Isles"
!CountryLanguage = "English"
Case "USA"
!CountryRegion = "North America"
!CountryLanguage = "English"
Case "VENEZUELA"
!CountryRegion = "South America"
!CountryLanguage = "Spanish"
Case Else
!CountryRegion = "Unknown"
!CountryLanguage = "Unknown"
End Select
.Update
End With
adoTempCountry.MoveNext
Wend
adoTempCountry.Close
adoTempid3.Close
Call gridID3
End Sub
We will open two locally scoped recordsets in this routine, adoTempCountry
and adoTempid3
. First, we wish to find out all of the individual countries where our clients reside. So by selecting a group of records that have a distinct country from the customers table, we know that there will only be one entry per country. Even though we might have 100 clients in Spain, by using the DISTINCT
keyword we are assured we will only get one record for Spain. In this example, we are using the Execute
method of the connection object to return our recordset and assign it to adoTempCountry
.
sSql = "SELECT DISTINCT Country FROM Customers"
Set adoTempCountry = adoConnection.Execute(sSql)
The product manager will probably be doing analysis on many products. So each time a new analysis is done, we will delete all previous entries from the ID3
table so we can start fresh.
adoConnection.Execute ("DELETE * FROM ID3")
Now we are ready to open a recordset for the currently empty ID3
table. Again, we are still using the single form-level adoConnection
object here. So we set a reference to our adoTempid3
table, set two properties, and open the table. We can then update the adoTempid3
table with information that will populate the underlying ID3
table.
Set adoTempid3 = New ADODB.Recordset
adoTempid3.CursorType = adOpenKeyset
adoTempid3.LockType = adLockOptimistic
adoTempid3.Open "ID3", adoConnection
Now we can add our data to the ID3
table. We take the first record of the adoTempCountry
table that contains a record for each individual country where we have customers. Since we know the ID3
table is empty, we add a record with the .AddNew
method and assign the new country to the Country
field. Now here is where we add our new perspective on the country. Depending on the country, we add both a CountryRegion
and a CountryLanguage
to that record. So now each country belongs to a region and language. Why is this important? Well, our ID3 algorithm might determine that all Spanish-speaking people are just craving our new beer. Or that the South American region is not buying as much of our Boysenberry, or that the Mediterranean region is starting to take off. This could provide the product manager with valuable information on how to spend those advertising dollars.
While Not adoTempCountry.EOF
sCountry = adoTempCountry!Country
With adoTempid3
.AddNew
!Country = sCountry
Select Case UCase(sCountry)
Case "ARGENTINA"
!CountryRegion = "South America"
!CountryLanguage = "Spanish"
We can see that a new record is added to the ID3
table for each record in the adoTempCountry
recordset. We know that adoTempCountry
will contain a unique record for each and every country. So we walk through the adoTempCountry
recordset. For each record in the adoTempCountry
recordset we add a new record to the adoTempid3
recordset with our additional information. Once we reach the end of the adoTempCountry
recordset, we close both of them and then call gridID3
, which will actually display the results in a flex grid.
adoTempCountry.Close
adoTempid3.Close
Call gridID3