How Much Data are We Collecting?

The first step is to consider just how much data we are accumulating in our log files. We're concentrating on ODBC logging in this book, because we want to be able to manipulate it using SQL Server and ADO. So, for each hit on our server, our IISLog table will store anything between 100 and 300 bytes of information. Our server gets around 350,000 hits per week, so each week we are adding around 75 MB to the log file database. We're going to completely fill the server's disk within a few months unless we regularly backup and drop outdated entries.

In the Sessions table, things are not as bad. In an average week, we only get around 8,000 session starts. So, over a month, we only add around 32,000 entries of maybe 200 bytes each. However, this is still another 6 MB we are adding to the database every month.

The Number of Database Records

Even though we could manage to store the data for several months with no problem, the other consideration is what happens when we come to use it. With over a million records per month being added to the IISLog table, any query which sorts and summarizes the data is going to get progressively slower as time goes on.

More to the point, do we actually need all this data? As far as the IISLog table goes, much of it is relatively useless. We probably aren't interested in the number of hits on each navigation bar button image, for example. We also have one record per hit, so if 100 visitors load the Home page in one hour, we get around 2,500 records in IISLog (including graphics and other items). Maybe we only want to know how many hits the page itself gets per hour, per day, or even per week.

How Can We Summarize and Archive It?

So it makes sense to summarize the data on a regular basis, and archive the old data that we no longer need. How you do this depends on what information you want to keep. The big question is how do you know what to keep? It would certainly be unfortunate if the first question that your finance director asks, regarding traffic on the Web site, can't be answered because you dropped the data required that would provide the result.

After a great deal of consideration, we chose a staged summary and archive process for our site, and implemented it using a set of separate tables and stored procedures in SQL Server. We've provided the entire database, containing all these tables and stored procedures, with the samples available for this book on our Web site. You can download them and set up the complete process on your server from http://webdev.wrox.co.uk/books/1797.

Bear in mind that this is only one possible solution. You may have different requirements, and the sample code and database structure can easily be adapted to your specific situation.

© 1998 by Wrox Press. All rights reserved.