The ID3 Algorithm

There are many schools of thought on how to extract knowledge from data. Esoteric areas of genetic algorithms and neural networks are pushing the frontiers of this field on a daily basis. However, there is a relatively simple technique that can be used to extract knowledge from data. Back in the 1960's, Dr. J. Ross Quinlan developed a technique that has evolved into the most commonly used method in expert systems that employ induction methods to generate rules - this is programming that tries to learn from generalized examples. Essentially the program looks at a ton of data and distills rules from it. The best part about this is that the program does not need to know anything about the data in advance. It will just look at the pieces and determine which ones are the most important to whatever outcome you are looking for. The ID3 algorithm can essentially look at a stack of data and determine which pieces are more important than the others.

Let's take a look at a reasonable example. In the

Nwind.mdb
file that comes with VB 6.0, there are several tables that simulate a small company's business. It has orders, products, customers, etc. Well, assume that you are the head of IS for the Northwind company. The product manager comes to you and asks you if you could set up a query form in VB so she can easily retrieve information. She is working on a project to determine how best to spend next year's marketing budget. She says that if she only had SQL capability, it would be possible to retrieve the information that she thinks might be important. She has in her mind what she intuitively thinks is important and wants to go ahead and put together a bunch of known relationships.

During this discussion, you start explaining about this great ID3 algorithm that you have been reading about. It can find important relationships in mountains of data. She looks at you as if you were the answers to her dreams! "Can you build me one of those?" she challenges. "Yes indeed," you reply.

© 1998 by Wrox Press. All rights reserved.