The Wrox Log File Summary Process

To make storing and querying the data from our IISLog and Sessions tables easier and more efficient, we chose a staged summary process. It processes the two 'live' tables, IISLog and Sessions, on a weekly basis, transferring the summarized data into a series of other tables that can be queried as required. This gives us a number of benefits:

This screenshot shows the complete database displayed in Visual InterDev—you can see the list of all the tables and stored procedures. Some are not concerned directly with logging, in fact you can see the NavErrors and Opinions tables that we met in earlier chapters. Visual InterDev is great for working with databases in SQL Server or other relational databases. As long as you have an ODBC driver for your data store, you can connect to it and view the structure, the tables, and even the data content. Alternatively, you can use a desktop database system like Microsoft Access in much the same way. We described these techniques when we looked at remote administration of your site back in Chapter 6.

Looking at the Tables

The following diagrams are also screenshots from Visual InterDev, showing the tables relevant to our logging process laid out as a Database Diagram. We've annotated the diagrams to indicate which tables are part of each section of the process. The first section is the file access log (hit logging) tables.

The File Access Information Summary Tables

The IISLog table is the 'live' table that IIS is updating with each hit on our server. From it, we summarize the information in four ways on a weekly or monthly basis:

The three main 'summary' tables (in the central column) are:

The DaySummary table, which contains a record for each file that was accessed during each day, indicating the total number of hits and the total volume of data delivered (in KB, rather than bytes) within that day. On current measurements, it contains around 3 % of the number of records that the IISLog table would for a similar period.

Similarly, the MonthSummary table contains the same organization of data, but summarized at monthly intervals. Each record contains the total number of hits and the total volume of data delivered for each file for each month. This table allows us to archive older records, and only contains around one thousandth of the number of records that the IISLog table would for a similar period.

The area we haven’t covered so far is the hourly traffic flow. In this case, we aren't interested in the performance of any particular file. The HourSummary table holds one record per hour (168 records per week for each site) summarizing the total number of hits and the total volume for all files for that hour.

In each of these three summary tables, we also store the IP address of the site that received the hits. Our server hosts three sites, so we can break the data down by site using this value later.

The fourth table, WeekSummary, is the 'odd one out' in that it actually contains a combination of information from the IISLog and Sessions tables. The most frequent inquiry each week is for a brief summary of total traffic patterns for the previous week, compared to other weeks. For this, we created a separate table that holds just one record for each week. This record is the result of a 'cross-tab' operation, in that it contains three totals for each site the total number of hits, total volume in KB, and total number of sessions started. The prefix characters WD, CD and WA refer, respectively, to the Web-Developer, COMDeveloper, and World of ATL sites.

The Session Summary Tables

The session-specific tables are used to hold different sets of summarized information that are extracted from the Sessions table on a weekly basis:

The four main Summary tables are similar to each other in structure, and summarize the values from Sessions in different ways. One reason for this is that some items of information, such as the HTTP-USER-AGENT and HTTP-ACCEPT-LANGUAGE (language/country code) headers, are not sent by every browser or user agent. By summarizing them separately, we can not only create more compact and focussed tables, but we can also omit entries where values are not supplied. 
The RefererSummary table contains one record per week for each individual referrer URL that links to our site. The code that creates the summary for this table automatically removes any query string from the URL (i.e. any part following a ? character).

The UserAgentSummary table contains one record per week for each individual user agent string. These often contain 'via' information if a proxy server passed on the request. For example, it may have something like 'via NetCache version NetApp Release 3.2R4D2' added to the conventional browser UA string. This considerably increases the number of records in the table, but there is no simple formula for removing it automatically.

The CountrySummary table contains one record per week for each individual standard language/country code combination that the browser sends to our server as an indication of which languages it can accept. In almost all cases, this is just one of the language codes defined by W3C. For example, en-ca for English (Canadian). 

The SessionTargetSummary table contains one record per week for each individual URL on our site that the user requests as they first enter the site. In many cases, this is our Home page, but may not be if they enter the site indirectly by requesting another page within the site. It allows us to see which pages are most featured in links on other sites.
The other two tables in this section are used for secondary purposes. The Referers table just holds details of the sites that we got a referral from in the previous week, and is used to construct the 'Links To Us' page in the Resources section of our site. You'll see the process described in detail at the end of this chapter. The other is a lookup table containing all the standard language codes exposed by the browser in the HTTP-ACCEPT-LANGUAGE header, with the matching text description of the language. It is used when we build reports that analyze the data in the CountrySummary table.

The Summary Process in Outline

The following schematic outlines the process of summarizing the data from the IISLog and Sessions tables, using three stored procedures marked A, B and C: 

Procedures A (usp_WeeklyUpdateSummaryHits) and B (usp_UpdateWeeklyHitTotals) both run weekly, while procedure C (usp_MonthlyUpdateSummaryHits) runs on the first day of each month. Together, the three procedures create and maintain a summarized record of the previous week's activity, updated every Sunday (or whatever is the first day of the week depending on your system's internationalization settings). This provides:

Later in the chapter we'll look at how these procedures can be run automatically on a weekly basis. However, we'll see how they work first, by examining each in turn.

The Summary Process in Detail

To be able to summarize the large volumes of data successfully on a regular basis, we need to create routines that can be run without user intervention. As it's just a matter of grouping the records and then adding values together, we might be tempted to use ASP and ADO. We could fetch the recordset, pre-sorted in whatever order was appropriate, and then work through it inserting values into a summary table as we go.

In principle, there's nothing wrong with this, especially if the task is highly complex. However, it's by no means the best use of server resources. A far better way is to use stored procedures inside our database—and SQL Server is ideal for this task as we can write them using Transact SQL. This is a simple programming language that provides fine control over the process, yet combines robustness, speed and efficiency. A stored procedure is compiled automatically by SQL Server, and runs entirely within the confines of the database management system.

The Logging Summary Stored Procedures

To summarize the data from our IISLog and Session tables into the other tables, we use three stored procedures: 
Procedure usp_WeeklyUpdateSummaryHits (A in the earlier diagram) pulls out the values for the previous week from the IISLog table and summarizes them into the DaySummary and HourSummary tables. It then deletes the previous week's records from the IISLog table to minimize the size of the database.
Procedure usp_MonthlyUpdateSummaryHits (C in the earlier diagram) takes the DaySummary table and further summarizes any of the contents that are more than three months old so that there is just one record per month for each target URL, for each site. It then removes any records that are more than three months old from the DaySummary and HourSummary tables. We only keep hourly and daily traffic data for three months. This procedure also sweeps through the four Summary tables from the Sessions part of the process, removing any records that are more than six months old.
Procedure usp_UpdateWeeklyHitTotals (B in the earlier diagram) has two different sets of tasks to perform, and is by far the most complex of the procedures—running to four full pages of text when listed. The first section of the procedure creates the separate weekly summary of hits, traffic volume and sessions for the WeekSummary table, using data from the DaySummary and Sessions tables. The second part uses the data from the Sessions table to update the CountrySummary, RefererSummary, UserAgentSummary and SessionTargetSummary tables. When complete, it deletes the previous week's records from the Sessions table to minimize the size of the database.

We've listed the procedures in full here, with comments added to help you understand how they work. We also provide a brief description of each one with the listing. The procedures are in the database that you can create from the SQL scripts provided with the rest of the samples for this book.

If you want to learn more about Transact SQL (TSQL), check out the Wrox Press book Professional SQL Server 6.5 Admin (ISBN 1-874416-49-4) or refer to SQL Server Books Online—an optional item when you install SQL Server or the SQL Server Client Utilities.

The Weekly Update Summary Hits Procedure

This procedure combines three SQL statements. The first two are aggregate statements, which use the GROUP BY clause. This allows them to summarize the number of hits and total the number of bytes for the period defined in the WHERE clause of the statement. For example:
...
SELECT TSumDate=MAX(logtime),                -- have to use an aggregate function
       TSiteIP=MAX(serverip),                -- on all fields when using a GROUP BY
       TTarget=MAX(target),                  -- clause 
       TKBytes=SUM(bytessent) / 1024,        -- SUM and convert to kbytes
       THitCount=COUNT(target)               -- count number of entries, one per hit
FROM IISLog 
WHERE (DATEPART(year, logtime) = @thisyear   -- up to previous week only
      AND DATEPART(week, logtime) < @thisweek) 
      OR (DATEPART(year, logtime) < @thisyear)   
GROUP BY serverip,                           -- IP address of Web site on server
         target,                             -- URL of target file
         CONVERT(char(12), logtime, 102)     -- yyyy.mm.dd format
...
The result of this statement is a recordset sorted and summarized in the correct format for our DaySummary table. We insert the values into this table by adding the INSERT clause to the beginning of the statement:
...
INSERT INTO DaySummary                       -- update DaySummary table 
SELECT TSumDate=MAX(logtime),                -- statement as above
       TSiteIP=MAX(serverip),
       TTarget=MAX(target), 
       TKBytes=SUM(bytessent) / 1024,
       THitCount=COUNT(target) 
FROM IISLog 
WHERE etc...
...
The third SQL statement is a simple DELETE statement that removes the records we've summarized from the IISLog table:
DELETE                                          -- delete last weeks records
FROM IISLog 
WHERE (DATEPART(year, logtime) = @thisyear      -- records for current year
      AND DATEPART(week, logtime) < @thisweek)  -- where week < current week
      OR (DATEPART(year, logtime) < @thisyear)  -- or records for previous year(s) 

The remainder of the code is the definition of the variables we'll use in the procedure, the transaction begin and end statements that allow us to abort all the updates if there's an error, and the SELECT and RETURN statements that create the return value and status. The return value is only visible when the procedure is run interactively, i.e. from a SQL Server Query window, or when using ADO:

 

...
SELECT 'Error - archive aborted.'            -- something went wrong
RETURN(1)
...

Here's the complete procedure listing:

/* Object: Stored Procedure dbo.usp_weeklyupdatesummaryhits    */
/* Runs weekly, before the procedure usp_updateweeklyhittotals */
CREATE PROCEDURE usp_weeklyupdatesummaryhits AS

DECLARE @thisweek int
DECLARE @thisyear int
DECLARE @dayrowsupdated int
DECLARE @hourrowsupdated int
SELECT @thisweek = DATEPART(week, GETDATE())
SELECT @thisyear = DATEPART(year, GETDATE())
SELECT @dayrowsupdated = 0
SELECT @hourrowsupdated = 0

BEGIN TRANSACTION

INSERT INTO DaySummary                       -- update DaySummary table 
SELECT TSumDate=MAX(logtime), 
       TSiteIP=MAX(serverip),
       TTarget=MAX(target), 
       TKBytes=SUM(bytessent) / 1024,        -- convert to kbytes
       THitCount=COUNT(target) 
FROM IISLog 
WHERE (DATEPART(year, logtime) = @thisyear   -- up to previous week only
      AND DATEPART(week, logtime) < @thisweek) 
      OR (DATEPART(year, logtime) < @thisyear)   
GROUP BY serverip, 
         target, 
         CONVERT(char(12), logtime, 102)     -- yyyy.mm.dd format

SELECT @dayrowsupdated = @@ROWCOUNT          -- built-in variable
IF (@dayrowsupdated = 0) GOTO on_norows      -- no records inserted
IF (@@ERROR <> 0) GOTO on_error              -- some other kind of error

INSERT INTO HourSummary                      -- update HourSummary table
SELECT TSumDate=MAX(logtime), 
       TSumHour=MAX(DATEPART(hour, logtime)),
       TSiteIP=MAX(serverip),
       TKBytes=SUM(bytessent) / 1024,        -- convert to kbytes
       THitCount=COUNT(target) 
FROM IISLog 
WHERE (DATEPART(year, logtime) = @thisyear   -- up to previous week only
      AND DATEPART(week, logtime) < @thisweek) 
      OR (DATEPART(year, logtime) < @thisyear)   
GROUP BY serverip, 
         CONVERT(char(12), logtime, 102),
         DATEPART(hour, logtime)

SELECT @hourrowsupdated = @@ROWCOUNT         -- built-in variable
IF (@hourrowsupdated = 0) GOTO on_norows     -- no records inserted
IF (@@ERROR <> 0) GOTO on_error              -- some other kind of error

DELETE                                       -- delete last weeks records
FROM IISLog 
WHERE (DATEPART(year, logtime) = @thisyear 
      AND DATEPART(week, logtime) < @thisweek) 
      OR (DATEPART(year, logtime) < @thisyear)   

IF (@@ERROR <> 0) GOTO on_error              -- some kind of error

/* create return appropriate message for display when run interactively */
SELECT 'Added ' + CONVERT(varchar(8), @dayrowsupdated) + ' DaySummary records and ' 
       + CONVERT(varchar(8), @hourrowsupdated) + ' HourSummary records.'

COMMIT TRANSACTION                           -- all OK so commit all updates
RETURN (0)

on_error:
SELECT 'Error - archive aborted.'            -- something went wrong
ROLLBACK TRANSACTION                         -- so undo all updates
RETURN(1)

on_norows:                                   -- no rows were updated in a table
SELECT 'Added ' + CONVERT(varchar(8), @dayrowsupdated) + ' DaySummary records and ' 
       + CONVERT(varchar(8), @hourrowsupdated) + ' HourSummary records.'
ROLLBACK TRANSACTION                         -- so undo all other updates
RETURN(1) 

The Monthly Update Summary Hits Procedure

This procedure first summarizes the values in the DaySummary table by month for records older than three months, and inserts them into the MonthSummary table using an INSERT statement like the ones we saw in the previous section. Then it deletes any records more than three months old from the DaySummary and HourSummary tables, using simple DELETE statements:
...
DELETE                                          -- delete old records
FROM DaySummary                                 -- from DaySummary table
WHERE DATEDIFF(month, TSumDate, GETDATE()) > 3
...
The next part of the procedure uses four very similar DELETE statements to delete records that are more than six months old from the four session summary tables. The WHERE clause is made up of two sections, separated by OR:

 

...
DELETE 
FROM SessionTargetSummary                       -- from SessionTargetSummary table
WHERE (TYearNumber = DATEPART(year, GETDATE())          -- records from this year  
  AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))   -- more than 25 weeks old
   OR (TYearNumber < DATEPART(year, GETDATE())          -- or records from last year
  AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))   -- more than 25 weeks old
...
The first section of the WHERE clause specifies any records from the current year that have a week number that is less than this week minus 25. This will only produce useful results when the current week number is greater than 26—prior to this it produces zero or negative values which will have no effect. However, it will work when the current week is in the second half of the year, producing values from 1 to 27 (or 28 in a 53 week year):
WHERE (TYearNumber = DATEPART(year, GETDATE())          -- records from this year  
  AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))   -- more than 25 weeks old

The second part of the clause will encompass any records from the first half of earlier years, i.e. those that have week numbers from 27 upwards. This part of the clause comes into effect when the current week is in the range from 1 up to 26 (or 27 in a 53 week year):

   OR (TYearNumber < DATEPART(year, GETDATE())          -- or records from last year
  AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))   -- more than 25 weeks old

So, between them, the two clauses will sweep records older than 6 months from the table. Here's the complete procedure listing, again the remainder of the code is variable definitions, error handling, and the transaction begin and end statements:

/* Object: Stored Procedure dbo.usp_monthlyupdatesummaryhits */
/* Runs monthly on first day of the month                    */
CREATE PROCEDURE usp_monthlyupdatesummaryhits AS

DECLARE @rowsupdated int

BEGIN TRANSACTION

INSERT INTO MonthSummary                        -- update the MonthSummary table
SELECT TSumDate=MAX(TSumDate), 
       TSiteIP=MAX(TSiteIP),
       TTarget=MAX(TTarget), 
       TKBytes=SUM(TKBytes), 
       THitCount=SUM(THitCount) 
FROM DaySummary 
WHERE DATEDIFF(month, TSumDate, GETDATE()) > 3  -- records more than 3 months old
GROUP BY TSiteIP, 
         TTarget,          
         DATEPART(month, TSumDate)

SELECT @rowsupdated = @@ROWCOUNT                -- built-in variable
IF (@rowsupdated = 0) GOTO on_norows            -- no rows updated
IF (@@ERROR <> 0) GOTO on_error                 -- some other error

DELETE                                          -- delete old records 
FROM HourSummary                                -- from HourSummary table  
WHERE DATEDIFF(month, TSumDate, GETDATE()) > 3

IF (@@ERROR <> 0) GOTO on_error                 -- error during delete

DELETE                                          -- delete old records
FROM DaySummary                                 -- from DaySummary table
WHERE DATEDIFF(month, TSumDate, GETDATE()) > 3

IF (@@ERROR <> 0) GOTO on_error                 -- error during delete

/* Now delete all records more than a year old from Session Summary tables */
DELETE 
FROM CountrySummary                             -- from CountrySummary table
WHERE (TYearNumber = DATEPART(year, GETDATE()) 
  AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))
   OR (TYearNumber < DATEPART(year, GETDATE()) 
  AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))

IF (@@ERROR <> 0) GOTO on_error                 -- error during delete

DELETE 
FROM RefererSummary                             -- from RefererSummary table
WHERE (TYearNumber = DATEPART(year, GETDATE()) 
  AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))
   OR (TYearNumber < DATEPART(year, GETDATE()) 
  AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))

IF (@@ERROR <> 0) GOTO on_error                 -- error during delete

DELETE 
FROM SessionTargetSummary                       -- from SessionTargetSummary table
WHERE (TYearNumber = DATEPART(year, GETDATE()) 
  AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))
   OR (TYearNumber < DATEPART(year, GETDATE()) 
  AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))

IF (@@ERROR <> 0) GOTO on_error                 -- error during delete

DELETE 
FROM UserAgentSummary                           -- from UserAgentSummary table
WHERE (TYearNumber = DATEPART(year, GETDATE()) 
  AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))
   OR (TYearNumber < DATEPART(year, GETDATE()) 
  AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))

IF (@@ERROR <> 0) GOTO on_error                 -- error during delete

SELECT 'Archived ' + CONVERT(varchar(8), @rowsupdated) + ' records.'

COMMIT TRANSACTION                              -- all OK so commit all updates
RETURN (0)

on_error:
SELECT 'Error - archive aborted.'               -- something went wrong
ROLLBACK TRANSACTION                            -- so abort all updates
RETURN (1)

on_norows:SELECT 'No records archived.'         -- no records to archive
ROLLBACK TRANSACTION                            -- so abort all updates
RETURN (1)

The Update Weekly Hit Totals Procedure

Overall, the third procedure is the most complex, but uses mainly the same kinds of SQL statements we've seen in the previous procedures. Part of this complexity is because it has to work properly even if not run every week, in case there is a server or database problem.

When we use an INSERT statement that contains a SELECT statement, as we've done in earlier examples, we can group the data using a GROUP BY clause, and know that all the records in the recordset created by the SELECT part will be inserted into the target table. This won’t work in the first part of the following procedure, because we are manually extracting values from other tables into variables, then creating the new single weekly record for the WeekSummary table from these values.

This means that we need to use a loop that starts with the week after the last existing week's record in the table, and carries on until it gets to the week previous to the current one. If we run the procedure every week, the loop will only be executed once. However, if we miss a week (or more than one week), it will catch up the next time we do run it. The code that creates the loop looks like this:

...
SELECT @ToYear = DATEPART(year, GETDATE()),    -- current week and year
       @ToWeek = DATEPART(week, GETDATE())     -- used in WHILE condition

SELECT @ThisYear = MAX(TYearNumber),           -- last week and year already in 
       @ThisWeek = MAX(TWeekNumber)            -- the WeekSummary table and 
FROM WeekSummary                               -- used to select the records

SELECT @ThisWeek = @ThisWeek + 1               -- go to week after the last
IF (@ThisWeek = 54)                            -- one already in the table
  BEGIN
    SELECT @ThisWeek = 1
    SELECT @ThisYear = @ThisYear + 1
  END

WHILE (@ThisYear = @ToYear AND @ThisWeek < @ToWeek) OR (@ThisYear < @ToYear)
BEGIN                                          -- start of actions in WHILE loop

...
... statements to collect value and update WeekSummary table here
...

SELECT @ThisWeek = @ThisWeek + 1               -- now move on to next week, we have
IF (@ThisWeek = 54)                            -- to do each week in turn because
  BEGIN                                        -- we're not creating a recordset to
    SELECT @ThisWeek = 1                       -- INSERT, just individual values 
    SELECT @ThisYear = @ThisYear + 1           -- for one week at a time
  END

END   -- end of WHILE loop, go back and start loop again until condition met
...
Because the entire procedure listing is so long, we've split it into two halves. Here's the first half, which carries out the task of updating the WeekSummary table:
/* Object: Stored Procedure dbo.usp_updateweeklyhittotals  */
/* Runs weekly, after the procedure usp_weeklyupdatesummaryhits */

CREATE PROCEDURE usp_updateweeklyhittotals AS

DECLARE @ToYear int
DECLARE @ToWeek int
DECLARE @ThisYear int
DECLARE @ThisWeek int
DECLARE @WDHits int
DECLARE @WDKBytes int
DECLARE @CDHits int
DECLARE @CDKBytes int
DECLARE @WAHits int
DECLARE @WAKBytes int
DECLARE @WDSessions int
DECLARE @CDSessions int
DECLARE @WASessions int

BEGIN TRANSACTION

SELECT @ToYear = DATEPART(year, GETDATE()),    -- current week and year
       @ToWeek = DATEPART(week, GETDATE())     -- used in WHILE condition

SELECT @ThisYear = MAX(TYearNumber),           -- last week and year already in 
       @ThisWeek = MAX(TWeekNumber)            -- the WeekSummary table and 
FROM WeekSummary                               -- used to select the records

SELECT @ThisWeek = @ThisWeek + 1               -- go to week after the last
IF (@ThisWeek = 54)                            -- one already in the table
  BEGIN
    SELECT @ThisWeek = 1
    SELECT @ThisYear = @ThisYear + 1
  END

/* Now we loop for each week up to and including the previous week */
WHILE (@ThisYear = @ToYear AND @ThisWeek < @ToWeek) OR (@ThisYear < @ToYear)
BEGIN                                          -- start of actions in WHILE loop

SELECT @WDHits = SUM(THitCount),               -- get total hits and volume for 
       @WDKBytes = SUM(TKBytes)                -- Web-Developer site
FROM DaySummary
WHERE DATEPART(year, TSumDate) = @ThisYear 
      AND DATEPART(week, TSumDate) = @ThisWeek
      AND TSiteIP = '194.73.51.228'            -- Web Developer site IP address

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

SELECT @WDSessions = COUNT(EventDateTime)      -- get total number sessions for
FROM Sessions                                  -- Web-Developer site
WHERE DATEPART(year, EventDateTime) = @ThisYear
      AND DATEPART(week, EventDateTime) = @ThisWeek
      AND EventType = 'New Session'
      AND HostIP = '194.73.51.228'             -- Web Developer site IP address

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

SELECT @CDHits = SUM(THitCount),               -- get total hits and volume for 
       @CDKBytes = SUM(TKBytes)                -- COMDeveloper site
FROM DaySummary
WHERE DATEPART(year, TSumDate) = @ThisYear
      AND DATEPART(week, TSumDate) = @ThisWeek
      AND TSiteIP = '194.73.51.229'            -- COMDeveloper site IP address

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

SELECT @CDSessions = COUNT(EventDateTime)      -- get total number sessions for
FROM Sessions                                  -- COMDeveloper site
WHERE DATEPART(year, EventDateTime) = @ThisYear
      AND DATEPART(week, EventDateTime) = @ThisWeek
      AND EventType = 'New Session'
      AND HostIP = '194.73.51.229'             -- COMDeveloper site IP address

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

SELECT @WAHits = SUM(THitCount),               -- get total hits and volume for
       @WAKBytes = SUM(TKBytes)                -- World Of ATL site
FROM DaySummary
WHERE DATEPART(year, TSumDate) = @ThisYear
      AND DATEPART(week, TSumDate) = @ThisWeek
      AND TSiteIP = '194.73.51.230'            -- World Of ATL site IP address

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

SELECT @WASessions = COUNT(EventDateTime)      -- get total number sessions for
FROM Sessions                                  -- World Of ATL site
WHERE DATEPART(year, EventDateTime) = @ThisYear
      AND DATEPART(week, EventDateTime) = @ThisWeek
      AND EventType = 'New Session'
      AND HostIP = '194.73.51.230'             -- World Of ATL site IP address

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

INSERT INTO WeekSummary                        -- now add record to WeekSummary
SELECT @ThisYear,                              -- table using values collected
       @ThisWeek,                              -- for this week
       ISNULL(@WDHits, 0),
       ISNULL(@WDKBytes, 0),                   -- if value is NULL insert zero
       ISNULL(@CDHits, 0),                     -- for numeric fields instead
       ISNULL(@CDKBytes, 0),
       ISNULL(@WAHits, 0),
       ISNULL(@WAKBytes, 0),
       ISNULL(@WDSessions, 0),
       ISNULL(@CDSessions, 0),
       ISNULL(@WASessions, 0)

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

SELECT @ThisWeek = @ThisWeek + 1               -- now move on to next week, we have
IF (@ThisWeek = 54)                            -- to do each week in turn because
  BEGIN                                        -- we're not creating a recordset to
    SELECT @ThisWeek = 1                       -- INSERT, just individual values 
    SELECT @ThisYear = @ThisYear + 1           -- for one week at a time
  END
END   -- end of WHILE loop, go back and start loop again until condition met
...
Having updated the WeekSummary table, our procedure now has to carry out the other weekly task—summarizing the information in the Sessions table for the UserAgentSummary, RefererSummary, CountrySummary and SessionTargetSummary tables. Each table is updated with an INSERT statement like the ones we've used in the previous procedures. 
However, there are a couple of wrinkles we need to handle. The referrer values collected in Sessions will include any query string that was appended to the URL when the visitor first hit our site, so we need to remove this before grouping the records; otherwise, we'll get multiple entries for the same site where the query string is different. Secondly, the language/country value sent by the browser can contain several entries in a comma-delimited list. We can’t cope with this and so we'll remove all except the first one in this case.
TSQL allows us to use a CASE statement to carry out any one of two or more actions, based on the result of a test expression. In the example of the referrer string, we just look for a ? character in the URL, then remove it and any characters after the ? if we find one. If we don’t find one, we use the original value of the field:
       ...
       /* Use a CASE statement to remove any query string from the URL */
       RefURL=MAX(
         CASE WHEN CHARINDEX('?', Referer) > 5 
         THEN SUBSTRING(Referer, 1, CHARINDEX('?', Referer) - 1) 
         ELSE Referer 
         END),
         ...
The only other difficulty is that we have to repeat this CASE statement in the GROUP BY clause to get the correct grouping. If we didn't, we'd get repeated records for any URLs that differed only in the query string. Here's the remaining half of the procedure:
...
/* Part 2, now update the for Sessions Summary tables for last week */
INSERT INTO UserAgentSummary                   -- update UserAgentSummary table
SELECT UAYear = MAX(DATEPART(year, EventDateTime)),
       UAWeek = MAX(DATEPART(week, EventDateTime)),
       UAText = MAX(UserAgent),
       UACount = COUNT(UserAgent),
       UAHostIP = MAX(HostIP) 
FROM Sessions 
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0 
      AND UserAgent IS NOT NULL                -- ignore any records with no 
      AND UserAgent <> ''                      -- value for UserAgent 
GROUP BY UserAgent,
         DATEPART(year, EventDateTime),
         DATEPART(week, EventDateTime),
         HostIP
 
IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

INSERT INTO RefererSummary                     -- update RefererSummary table
SELECT RefYear = MAX(DATEPART(year, EventDateTime)),
       RefWeek = MAX(DATEPART(week, EventDateTime)),
       /* Use a CASE statement to remove any query string from the URL */
       RefURL=MAX(
         CASE WHEN CHARINDEX('?', Referer) > 5 
         THEN SUBSTRING(Referer, 1, CHARINDEX('?', Referer) - 1) 
         ELSE Referer 
         END),
       RefCount=COUNT(Referer), 
       RefHostIP = MAX(HostIP) 
FROM Sessions 
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0 
      AND Referer IS NOT NULL                  -- ignore any records with no 
      AND Referer <> ''                        -- value for Referrer
      AND CHARINDEX('.wrox.co', Referer) = 0   -- or from all our own servers
      AND CHARINDEX('file:', Referer) = 0      -- or from a user's disk file
GROUP BY CASE WHEN CHARINDEX('?', Referer) > 5 
         THEN SUBSTRING(Referer, 1, CHARINDEX('?', Referer) - 1) 
         ELSE Referer                          -- must repeat CASE statement here
         END,                                  -- as well for correct grouping
         DATEPART(year, EventDateTime),
         DATEPART(week, EventDateTime),
         HostIP

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

INSERT INTO SessionTargetSummary               -- update SessionTargetSummary table
SELECT TargYear = MAX(DATEPART(year, EventDateTime)),
       TargWeek = MAX(DATEPART(week, EventDateTime)),
       TargText = MAX(URL),
       TargCount = COUNT(URL),
       TargHostIP = MAX(HostIP) 
FROM Sessions 
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0 
      AND URL IS NOT NULL                      -- ignore any records with no 
      AND URL <> ''                            -- value for Target URL
GROUP BY URL,
         DATEPART(year, EventDateTime),
         DATEPART(week, EventDateTime),
         HostIP

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

INSERT INTO CountrySummary                     -- update CountrySummary table
SELECT CoYear = MAX(DATEPART(year, EventDateTime)),
       CoWeek = MAX(DATEPART(week, EventDateTime)),
       /* Use a CASE statement to remove any secondary codes after a comma */
       CoText = MAX(
         CASE WHEN CHARINDEX(',', UALanguage) > 1
         THEN SUBSTRING(UALanguage, 1, CHARINDEX(',', UALanguage) - 1) 
         ELSE UALanguage 
         END),
       CoCount = COUNT(UALanguage),
       CoHostIP = MAX(HostIP) 
FROM Sessions 
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0 
      AND UALanguage IS NOT NULL               -- ignore any records with no
      AND UALanguage <> ''                     -- value for language
GROUP BY CASE WHEN CHARINDEX(',', UALanguage) > 1
         THEN SUBSTRING(UALanguage, 1, CHARINDEX(',', UALanguage) - 1) 
         ELSE UALanguage                       -- must repeat CASE statement here
         END,                                  -- as well for correct grouping
         DATEPART(year, EventDateTime),
         DATEPART(week, EventDateTime),
         HostIP

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

DELETE                                         -- now delete last weeks records
FROM Sessions                                  -- from Sessions table
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0

IF (@@ERROR <> 0) GOTO on_error                -- something went wrong

COMMIT TRANSACTION                             -- all OK, so commit updates
SELECT 'Weekly Hits and Sessions summary succeeded.'
RETURN (0)

on_error:                                      -- something went wrong
ROLLBACK TRANSACTION                           -- so abort all updates
SELECT 'Error - Weekly Hits and Sessions summary aborted.'
RETURN (1)

Automating the Summarization Process

Having created the table and procedures that summarize our logging data, we can consider how we will execute the procedures on a regular basis. We could create ASP pages that call them, and open these pages in a browser to run the procedures. This could be a regular task that we performed, and we just have to hope that our stand-in remembers to do it while we're away from the office—perhaps lying on that foreign beach we spend the rest of the year dreaming about.

However, there are other ways to run a SQL Server stored procedure. We can do it directly from a Windows Command prompt, using the DOS-based ISQL application that is included with the SQL Server Client Utilities. For example, this command would (given appropriate server and login details) execute the usp_weeklyupdatesummaryhits procedure and then exit from ISQL:

C:\>isql -Sserver -Uusername -Ppassword -Q"iislogs.dbo.usp_weeklyupdatesummaryhits"

This would allow us to automate the process by setting up Windows Schedule service to execute this command for us on a weekly (or monthly) basis. It's not the way that we do it, but we'll see how it could be used.

Using the Windows NT Schedule Service

Before you can use the Schedule service, you might need to start it using the Services icon in Control Panel. If you decide to adopt this technique, change the Startup mode to Automatic:

The Schedule service itself uses a command line interface, although there is an add-in in the Windows NT Resource Kit that provides a graphical interface and there are plenty of other third-party utilities that do the same. We'll be using the command line here.

We can create an ordinary DOS batch file containing the commands to run our two weekly procedures via ISQL, and drop it into the Mssql folder on drive C:
isql -Smyserver -Uanonymous -P -Q"iislogs.dbo.usp_weeklyupdatesummaryhits"
isql -Smyserver -Uanonymous -P -Q"iislogs.dbo.usp_updateweeklyhittotals"

Then instruct the Schedule service to run our batch file every week, at 2:00 AM on Sunday, by typing the following at the Command prompt:

C:\>at 02:00 /every:Sunday "c:\mssql\updatelogs.bat"

It is possible to create several batch files, and have them run at different times. We can also provide the command to run ISQL (or any other process) as the last parameter of the at command. But, while all this works fine, there is a better way. Instead, we'll use the SQL Server Executive.

Using the SQL Server Executive

The option we chose for automating our log summary system is the built-in scheduling capabilities of SQL Server—via the SQL Server Executive program that is installed with SQL Server. This has a graphical interface that we can use from within the SQL Server Enterprise Manager, making it easy to view the procedures as we set up scheduled tasks for them. The following screenshot shows the entry for our usp_weeklyupdatesummaryhits procedure selected, and we can edit the task properties by clicking the Edit Task toolbar button:

This opens a dialog that we can use to create a new task, or edit an existing one. You can see the task type is TSQL, the database we want to work with is iislogs, and the command is exec usp_weeklyupdatesummaryhits. By clicking the Change button, we can specify the schedule details for when it will be run:

This dialog also provides a button to display the task history:

Another alternative would be to create a script that used the new Windows NT Scripting Host to automate the process. For more information, check out http://www.microsoft.com/scripting/windowshost/default.htm.

© 1998 by Wrox Press. All rights reserved.