To make storing and querying the data from our IISLog and Sessions tables easier and more efficient, we chose a staged summary process. It processes the two 'live' tables, IISLog and Sessions, on a weekly basis, transferring the summarized data into a series of other tables that can be queried as required. This gives us a number of benefits:
This screenshot shows the complete database displayed in Visual InterDev—you can see the list of all the tables and stored procedures. Some are not concerned directly with logging, in fact you can see the NavErrors and Opinions tables that we met in earlier chapters. Visual InterDev is great for working with databases in SQL Server or other relational databases. As long as you have an ODBC driver for your data store, you can connect to it and view the structure, the tables, and even the data content. Alternatively, you can use a desktop database system like Microsoft Access in much the same way. We described these techniques when we looked at remote administration of your site back in Chapter 6.
The following diagrams are also screenshots from Visual InterDev, showing the tables relevant to our logging process laid out as a Database Diagram. We've annotated the diagrams to indicate which tables are part of each section of the process. The first section is the file access log (hit logging) tables.
The IISLog table is the 'live' table that IIS is updating with each hit on our server. From it, we summarize the information in four ways on a weekly or monthly basis:
The three main 'summary' tables (in the central column) are:
The DaySummary table, which contains a record for each file that was accessed during each day, indicating the total number of hits and the total volume of data delivered (in KB, rather than bytes) within that day. On current measurements, it contains around 3 % of the number of records that the IISLog table would for a similar period. Similarly, the MonthSummary table contains the same organization of data, but summarized at monthly intervals. Each record contains the total number of hits and the total volume of data delivered for each file for each month. This table allows us to archive older records, and only contains around one thousandth of the number of records that the IISLog table would for a similar period. The area we haven’t covered so far is the hourly traffic flow. In this case, we aren't interested in the performance of any particular file. The HourSummary table holds one record per hour (168 records per week for each site) summarizing the total number of hits and the total volume for all files for that hour.In each of these three summary tables, we also store the IP address of the site that received the hits. Our server hosts three sites, so we can break the data down by site using this value later.
The fourth table, WeekSummary, is the 'odd one out' in that it actually contains a combination of information from the IISLog and Sessions tables. The most frequent inquiry each week is for a brief summary of total traffic patterns for the previous week, compared to other weeks. For this, we created a separate table that holds just one record for each week. This record is the result of a 'cross-tab' operation, in that it contains three totals for each site the total number of hits, total volume in KB, and total number of sessions started. The prefix characters WD,
CD and
WA refer, respectively, to the
Web-Developer,
COMDeveloper, and
World
of
ATL sites.
The session-specific tables are used to hold different sets of summarized information that are extracted from the Sessions table on a weekly basis:
The four main Summary tables are similar to each other in structure, and summarize the values from Sessions in different ways. One reason for this is that some items of information, such as the HTTP-USER-AGENT and HTTP-ACCEPT-LANGUAGE (language/country code) headers, are not sent by every browser or user agent. By summarizing them separately, we can not only create more compact and focussed tables, but we can also omit entries where values are not supplied.
The RefererSummary table contains one record per week for each individual referrer URL that links to our site. The code that creates the summary for this table automatically removes any query string from the URL (i.e. any part following a ? character). The UserAgentSummary table contains one record per week for each individual user agent string. These often contain 'via' information if a proxy server passed on the request. For example, it may have something like '
viaNetCache
version
NetApp
Release
3.2R4D2'
added to the conventional browser UA string. This considerably increases the number of records in the table, but there is no simple formula for removing it automatically.
The CountrySummary table contains one record per week for each individual standard language/country code combination that the browser sends to our server as an indication of which languages it can accept. In almost all cases, this is just one of the language codes defined by W3C. For example, en-ca for English (Canadian).
The SessionTargetSummary table contains one record per week for each individual URL on our site that the user requests as they first enter the site. In many cases, this is our
Homepage, but may not be if they enter the site indirectly by requesting another page within the site. It allows us to see which pages are most featured in links on other sites.
The other two tables in this section are used for secondary purposes. The Referers table just holds details of the sites that we got a referral from in the previous week, and is used to construct the '
LinksTo
Us
' page in the
Resourcessection of our site. You'll see the process described in detail at the end of this chapter. The other is a lookup table containing all the standard language codes exposed by the browser in the HTTP-ACCEPT-LANGUAGE header, with the matching text description of the language. It is used when we build reports that analyze the data in the CountrySummary table.
The following schematic outlines the process of summarizing the data from the IISLog and Sessions tables, using three stored procedures marked
A,
Band
C:
Procedures
A(usp_WeeklyUpdateSummaryHits) and
B(usp_UpdateWeeklyHitTotals) both run weekly, while procedure
C(usp_MonthlyUpdateSummaryHits) runs on the first day of each month. Together, the three procedures create and maintain a summarized record of the previous week's activity, updated every Sunday (or whatever is the first day of the week depending on your system's internationalization settings). This provides:
Later in the chapter we'll look at how these procedures can be run automatically on a weekly basis. However, we'll see how they work first, by examining each in turn.
To be able to summarize the large volumes of data successfully on a regular basis, we need to create routines that can be run without user intervention. As it's just a matter of grouping the records and then adding values together, we might be tempted to use ASP and ADO. We could fetch the recordset, pre-sorted in whatever order was appropriate, and then work through it inserting values into a summary table as we go.
In principle, there's nothing wrong with this, especially if the task is highly complex. However, it's by no means the best use of server resources. A far better way is to use stored procedures inside our database—and SQL Server is ideal for this task as we can write them using Transact SQL. This is a simple programming language that provides fine control over the process, yet combines robustness, speed and efficiency. A stored procedure is compiled automatically by SQL Server, and runs entirely within the confines of the database management system.
To summarize the data from our IISLog and Session tables into the other tables, we use three stored procedures:
Procedure usp_WeeklyUpdateSummaryHits (
Ain the earlier diagram) pulls out the values for the previous week from the IISLog table and summarizes them into the DaySummary and HourSummary tables. It then deletes the previous week's records from the IISLog table to minimize the size of the database.
Procedure usp_MonthlyUpdateSummaryHits (
Cin the earlier diagram) takes the DaySummary table and further summarizes any of the contents that are more than three months old so that there is just one record per month for each target URL, for each site. It then removes any records that are more than three months old from the DaySummary and HourSummary tables. We only keep hourly and daily traffic data for three months. This procedure also sweeps through the four Summary tables from the Sessions part of the process, removing any records that are more than six months old.
Procedure usp_UpdateWeeklyHitTotals (
Bin the earlier diagram) has two different sets of tasks to perform, and is by far the most complex of the procedures—running to four full pages of text when listed. The first section of the procedure creates the separate weekly summary of hits, traffic volume and sessions for the WeekSummary table, using data from the DaySummary and Sessions tables. The second part uses the data from the Sessions table to update the CountrySummary, RefererSummary, UserAgentSummary and SessionTargetSummary tables. When complete, it deletes the previous week's records from the Sessions table to minimize the size of the database.
We've listed the procedures in full here, with comments added to help you understand how they work. We also provide a brief description of each one with the listing. The procedures are in the database that you can create from the SQL scripts provided with the rest of the samples for this book.
If you want to learn more about Transact SQL (TSQL), check out the Wrox Press book Professional SQL Server 6.5 Admin (ISBN 1-874416-49-4) or refer to SQL Server Books Online—an optional item when you install SQL Server or the SQL Server Client Utilities.
This procedure combines three SQL statements. The first two are aggregate statements, which use the GROUP BY clause. This allows them to summarize the number of hits and total the number of bytes for the period defined in the WHERE clause of the statement. For example:
...
SELECT TSumDate=MAX(logtime), -- have to use an aggregate function
TSiteIP=MAX(serverip), -- on all fields when using a GROUP BY
TTarget=MAX(target), -- clause
TKBytes=SUM(bytessent) / 1024, -- SUM and convert to kbytes
THitCount=COUNT(target) -- count number of entries, one per hit
FROM IISLog
WHERE (DATEPART(year, logtime) = @thisyear -- up to previous week only
AND DATEPART(week, logtime) < @thisweek)
OR (DATEPART(year, logtime) < @thisyear)
GROUP BY serverip, -- IP address of Web site on server
target, -- URL of target file
CONVERT(char(12), logtime, 102) -- yyyy.mm.dd format
...
The result of this statement is a recordset sorted and summarized in the correct format for our DaySummary table. We insert the values into this table by adding the INSERT clause to the beginning of the statement:
...
INSERT INTO DaySummary -- update DaySummary table
SELECT TSumDate=MAX(logtime), -- statement as above
TSiteIP=MAX(serverip),
TTarget=MAX(target),
TKBytes=SUM(bytessent) / 1024,
THitCount=COUNT(target)
FROM IISLog
WHERE etc...
...
The third SQL statement is a simple DELETE statement that removes the records we've summarized from the IISLog table:
DELETE -- delete last weeks records FROM IISLog WHERE (DATEPART(year, logtime) = @thisyear -- records for current year AND DATEPART(week, logtime) < @thisweek) -- where week < current week OR (DATEPART(year, logtime) < @thisyear) -- or records for previous year(s) The remainder of the code is the definition of the variables we'll use in the procedure, the transaction begin and end statements that allow us to abort all the updates if there's an error, and the SELECT and RETURN statements that create the return value and status. The return value is only visible when the procedure is run interactively, i.e. from a SQL Server
Querywindow, or when using ADO:
...
SELECT 'Error - archive aborted.' -- something went wrong
RETURN(1)
...
Here's the complete procedure listing:
/* Object: Stored Procedure dbo.usp_weeklyupdatesummaryhits */
/* Runs weekly, before the procedure usp_updateweeklyhittotals */
CREATE PROCEDURE usp_weeklyupdatesummaryhits AS
DECLARE @thisweek int
DECLARE @thisyear int
DECLARE @dayrowsupdated int
DECLARE @hourrowsupdated int
SELECT @thisweek = DATEPART(week, GETDATE())
SELECT @thisyear = DATEPART(year, GETDATE())
SELECT @dayrowsupdated = 0
SELECT @hourrowsupdated = 0
BEGIN TRANSACTION
INSERT INTO DaySummary -- update DaySummary table
SELECT TSumDate=MAX(logtime),
TSiteIP=MAX(serverip),
TTarget=MAX(target),
TKBytes=SUM(bytessent) / 1024, -- convert to kbytes
THitCount=COUNT(target)
FROM IISLog
WHERE (DATEPART(year, logtime) = @thisyear -- up to previous week only
AND DATEPART(week, logtime) < @thisweek)
OR (DATEPART(year, logtime) < @thisyear)
GROUP BY serverip,
target,
CONVERT(char(12), logtime, 102) -- yyyy.mm.dd format
SELECT @dayrowsupdated = @@ROWCOUNT -- built-in variable
IF (@dayrowsupdated = 0) GOTO on_norows -- no records inserted
IF (@@ERROR <> 0) GOTO on_error -- some other kind of error
INSERT INTO HourSummary -- update HourSummary table
SELECT TSumDate=MAX(logtime),
TSumHour=MAX(DATEPART(hour, logtime)),
TSiteIP=MAX(serverip),
TKBytes=SUM(bytessent) / 1024, -- convert to kbytes
THitCount=COUNT(target)
FROM IISLog
WHERE (DATEPART(year, logtime) = @thisyear -- up to previous week only
AND DATEPART(week, logtime) < @thisweek)
OR (DATEPART(year, logtime) < @thisyear)
GROUP BY serverip,
CONVERT(char(12), logtime, 102),
DATEPART(hour, logtime)
SELECT @hourrowsupdated = @@ROWCOUNT -- built-in variable
IF (@hourrowsupdated = 0) GOTO on_norows -- no records inserted
IF (@@ERROR <> 0) GOTO on_error -- some other kind of error
DELETE -- delete last weeks records
FROM IISLog
WHERE (DATEPART(year, logtime) = @thisyear
AND DATEPART(week, logtime) < @thisweek)
OR (DATEPART(year, logtime) < @thisyear)
IF (@@ERROR <> 0) GOTO on_error -- some kind of error
/* create return appropriate message for display when run interactively */
SELECT 'Added ' + CONVERT(varchar(8), @dayrowsupdated) + ' DaySummary records and '
+ CONVERT(varchar(8), @hourrowsupdated) + ' HourSummary records.'
COMMIT TRANSACTION -- all OK so commit all updates
RETURN (0)
on_error:
SELECT 'Error - archive aborted.' -- something went wrong
ROLLBACK TRANSACTION -- so undo all updates
RETURN(1)
on_norows: -- no rows were updated in a table
SELECT 'Added ' + CONVERT(varchar(8), @dayrowsupdated) + ' DaySummary records and '
+ CONVERT(varchar(8), @hourrowsupdated) + ' HourSummary records.'
ROLLBACK TRANSACTION -- so undo all other updates
RETURN(1)
This procedure first summarizes the values in the DaySummary table by month for records older than three months, and inserts them into the MonthSummary table using an INSERT statement like the ones we saw in the previous section. Then it deletes any records more than three months old from the DaySummary and HourSummary tables, using simple DELETE statements:
...
DELETE -- delete old records
FROM DaySummary -- from DaySummary table
WHERE DATEDIFF(month, TSumDate, GETDATE()) > 3
...
The next part of the procedure uses four very similar DELETE statements to delete records that are more than six months old from the four session summary tables. The WHERE clause is made up of two sections, separated by OR:
...
DELETE
FROM SessionTargetSummary -- from SessionTargetSummary table
WHERE (TYearNumber = DATEPART(year, GETDATE()) -- records from this year
AND TWeekNumber < (DATEPART(week, GETDATE()) - 25)) -- more than 25 weeks old
OR (TYearNumber < DATEPART(year, GETDATE()) -- or records from last year
AND TWeekNumber < (DATEPART(week, GETDATE()) + 26)) -- more than 25 weeks old
...
The first section of the WHERE clause specifies any records from the current year that have a week number that is less than this week minus 25. This will only produce useful results when the current week number is greater than 26—prior to this it produces zero or negative values which will have no effect. However, it will work when the current week is in the second half of the year, producing values from 1 to 27 (or 28 in a 53 week year):
WHERE (TYearNumber = DATEPART(year, GETDATE()) -- records from this year
AND TWeekNumber < (DATEPART(week, GETDATE()) - 25)) -- more than 25 weeks old
The second part of the clause will encompass any records from the first half of earlier years, i.e. those that have week numbers from 27 upwards. This part of the clause comes into effect when the current week is in the range from 1 up to 26 (or 27 in a 53 week year):
OR (TYearNumber < DATEPART(year, GETDATE()) -- or records from last year
AND TWeekNumber < (DATEPART(week, GETDATE()) + 26)) -- more than 25 weeks old
So, between them, the two clauses will sweep records older than 6 months from the table. Here's the complete procedure listing, again the remainder of the code is variable definitions, error handling, and the transaction begin and end statements:
/* Object: Stored Procedure dbo.usp_monthlyupdatesummaryhits */
/* Runs monthly on first day of the month */
CREATE PROCEDURE usp_monthlyupdatesummaryhits AS
DECLARE @rowsupdated int
BEGIN TRANSACTION
INSERT INTO MonthSummary -- update the MonthSummary table
SELECT TSumDate=MAX(TSumDate),
TSiteIP=MAX(TSiteIP),
TTarget=MAX(TTarget),
TKBytes=SUM(TKBytes),
THitCount=SUM(THitCount)
FROM DaySummary
WHERE DATEDIFF(month, TSumDate, GETDATE()) > 3 -- records more than 3 months old
GROUP BY TSiteIP,
TTarget,
DATEPART(month, TSumDate)
SELECT @rowsupdated = @@ROWCOUNT -- built-in variable
IF (@rowsupdated = 0) GOTO on_norows -- no rows updated
IF (@@ERROR <> 0) GOTO on_error -- some other error
DELETE -- delete old records
FROM HourSummary -- from HourSummary table
WHERE DATEDIFF(month, TSumDate, GETDATE()) > 3
IF (@@ERROR <> 0) GOTO on_error -- error during delete
DELETE -- delete old records
FROM DaySummary -- from DaySummary table
WHERE DATEDIFF(month, TSumDate, GETDATE()) > 3
IF (@@ERROR <> 0) GOTO on_error -- error during delete
/* Now delete all records more than a year old from Session Summary tables */
DELETE
FROM CountrySummary -- from CountrySummary table
WHERE (TYearNumber = DATEPART(year, GETDATE())
AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))
OR (TYearNumber < DATEPART(year, GETDATE())
AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))
IF (@@ERROR <> 0) GOTO on_error -- error during delete
DELETE
FROM RefererSummary -- from RefererSummary table
WHERE (TYearNumber = DATEPART(year, GETDATE())
AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))
OR (TYearNumber < DATEPART(year, GETDATE())
AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))
IF (@@ERROR <> 0) GOTO on_error -- error during delete
DELETE
FROM SessionTargetSummary -- from SessionTargetSummary table
WHERE (TYearNumber = DATEPART(year, GETDATE())
AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))
OR (TYearNumber < DATEPART(year, GETDATE())
AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))
IF (@@ERROR <> 0) GOTO on_error -- error during delete
DELETE
FROM UserAgentSummary -- from UserAgentSummary table
WHERE (TYearNumber = DATEPART(year, GETDATE())
AND TWeekNumber < (DATEPART(week, GETDATE()) - 25))
OR (TYearNumber < DATEPART(year, GETDATE())
AND TWeekNumber < (DATEPART(week, GETDATE()) + 26))
IF (@@ERROR <> 0) GOTO on_error -- error during delete
SELECT 'Archived ' + CONVERT(varchar(8), @rowsupdated) + ' records.'
COMMIT TRANSACTION -- all OK so commit all updates
RETURN (0)
on_error:
SELECT 'Error - archive aborted.' -- something went wrong
ROLLBACK TRANSACTION -- so abort all updates
RETURN (1)
on_norows:SELECT 'No records archived.' -- no records to archive
ROLLBACK TRANSACTION -- so abort all updates
RETURN (1)
Overall, the third procedure is the most complex, but uses mainly the same kinds of SQL statements we've seen in the previous procedures. Part of this complexity is because it has to work properly even if not run every week, in case there is a server or database problem.
When we use an INSERT statement that contains a SELECT statement, as we've done in earlier examples, we can group the data using a GROUP BY clause, and know that all the records in the recordset created by the SELECT part will be inserted into the target table. This won’t work in the first part of the following procedure, because we are manually extracting values from other tables into variables, then creating the new single weekly record for the WeekSummary table from these values.
This means that we need to use a loop that starts with the week after the last existing week's record in the table, and carries on until it gets to the week previous to the current one. If we run the procedure every week, the loop will only be executed once. However, if we miss a week (or more than one week), it will catch up the next time we do run it. The code that creates the loop looks like this:
...
SELECT @ToYear = DATEPART(year, GETDATE()), -- current week and year
@ToWeek = DATEPART(week, GETDATE()) -- used in WHILE condition
SELECT @ThisYear = MAX(TYearNumber), -- last week and year already in
@ThisWeek = MAX(TWeekNumber) -- the WeekSummary table and
FROM WeekSummary -- used to select the records
SELECT @ThisWeek = @ThisWeek + 1 -- go to week after the last
IF (@ThisWeek = 54) -- one already in the table
BEGIN
SELECT @ThisWeek = 1
SELECT @ThisYear = @ThisYear + 1
END
WHILE (@ThisYear = @ToYear AND @ThisWeek < @ToWeek) OR (@ThisYear < @ToYear)
BEGIN -- start of actions in WHILE loop
...
... statements to collect value and update WeekSummary table here
...
SELECT @ThisWeek = @ThisWeek + 1 -- now move on to next week, we have
IF (@ThisWeek = 54) -- to do each week in turn because
BEGIN -- we're not creating a recordset to
SELECT @ThisWeek = 1 -- INSERT, just individual values
SELECT @ThisYear = @ThisYear + 1 -- for one week at a time
END
END -- end of WHILE loop, go back and start loop again until condition met
...
Because the entire procedure listing is so long, we've split it into two halves. Here's the first half, which carries out the task of updating the WeekSummary table:
/* Object: Stored Procedure dbo.usp_updateweeklyhittotals */
/* Runs weekly, after the procedure usp_weeklyupdatesummaryhits */
CREATE PROCEDURE usp_updateweeklyhittotals AS
DECLARE @ToYear int
DECLARE @ToWeek int
DECLARE @ThisYear int
DECLARE @ThisWeek int
DECLARE @WDHits int
DECLARE @WDKBytes int
DECLARE @CDHits int
DECLARE @CDKBytes int
DECLARE @WAHits int
DECLARE @WAKBytes int
DECLARE @WDSessions int
DECLARE @CDSessions int
DECLARE @WASessions int
BEGIN TRANSACTION
SELECT @ToYear = DATEPART(year, GETDATE()), -- current week and year
@ToWeek = DATEPART(week, GETDATE()) -- used in WHILE condition
SELECT @ThisYear = MAX(TYearNumber), -- last week and year already in
@ThisWeek = MAX(TWeekNumber) -- the WeekSummary table and
FROM WeekSummary -- used to select the records
SELECT @ThisWeek = @ThisWeek + 1 -- go to week after the last
IF (@ThisWeek = 54) -- one already in the table
BEGIN
SELECT @ThisWeek = 1
SELECT @ThisYear = @ThisYear + 1
END
/* Now we loop for each week up to and including the previous week */
WHILE (@ThisYear = @ToYear AND @ThisWeek < @ToWeek) OR (@ThisYear < @ToYear)
BEGIN -- start of actions in WHILE loop
SELECT @WDHits = SUM(THitCount), -- get total hits and volume for
@WDKBytes = SUM(TKBytes) -- Web-Developer site
FROM DaySummary
WHERE DATEPART(year, TSumDate) = @ThisYear
AND DATEPART(week, TSumDate) = @ThisWeek
AND TSiteIP = '194.73.51.228' -- Web Developer site IP address
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
SELECT @WDSessions = COUNT(EventDateTime) -- get total number sessions for
FROM Sessions -- Web-Developer site
WHERE DATEPART(year, EventDateTime) = @ThisYear
AND DATEPART(week, EventDateTime) = @ThisWeek
AND EventType = 'New Session'
AND HostIP = '194.73.51.228' -- Web Developer site IP address
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
SELECT @CDHits = SUM(THitCount), -- get total hits and volume for
@CDKBytes = SUM(TKBytes) -- COMDeveloper site
FROM DaySummary
WHERE DATEPART(year, TSumDate) = @ThisYear
AND DATEPART(week, TSumDate) = @ThisWeek
AND TSiteIP = '194.73.51.229' -- COMDeveloper site IP address
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
SELECT @CDSessions = COUNT(EventDateTime) -- get total number sessions for
FROM Sessions -- COMDeveloper site
WHERE DATEPART(year, EventDateTime) = @ThisYear
AND DATEPART(week, EventDateTime) = @ThisWeek
AND EventType = 'New Session'
AND HostIP = '194.73.51.229' -- COMDeveloper site IP address
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
SELECT @WAHits = SUM(THitCount), -- get total hits and volume for
@WAKBytes = SUM(TKBytes) -- World Of ATL site
FROM DaySummary
WHERE DATEPART(year, TSumDate) = @ThisYear
AND DATEPART(week, TSumDate) = @ThisWeek
AND TSiteIP = '194.73.51.230' -- World Of ATL site IP address
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
SELECT @WASessions = COUNT(EventDateTime) -- get total number sessions for
FROM Sessions -- World Of ATL site
WHERE DATEPART(year, EventDateTime) = @ThisYear
AND DATEPART(week, EventDateTime) = @ThisWeek
AND EventType = 'New Session'
AND HostIP = '194.73.51.230' -- World Of ATL site IP address
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
INSERT INTO WeekSummary -- now add record to WeekSummary
SELECT @ThisYear, -- table using values collected
@ThisWeek, -- for this week
ISNULL(@WDHits, 0),
ISNULL(@WDKBytes, 0), -- if value is NULL insert zero
ISNULL(@CDHits, 0), -- for numeric fields instead
ISNULL(@CDKBytes, 0),
ISNULL(@WAHits, 0),
ISNULL(@WAKBytes, 0),
ISNULL(@WDSessions, 0),
ISNULL(@CDSessions, 0),
ISNULL(@WASessions, 0)
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
SELECT @ThisWeek = @ThisWeek + 1 -- now move on to next week, we have
IF (@ThisWeek = 54) -- to do each week in turn because
BEGIN -- we're not creating a recordset to
SELECT @ThisWeek = 1 -- INSERT, just individual values
SELECT @ThisYear = @ThisYear + 1 -- for one week at a time
END
END -- end of WHILE loop, go back and start loop again until condition met
...
Having updated the WeekSummary table, our procedure now has to carry out the other weekly task—summarizing the information in the Sessions table for the UserAgentSummary, RefererSummary, CountrySummary and SessionTargetSummary tables. Each table is updated with an INSERT statement like the ones we've used in the previous procedures.
However, there are a couple of wrinkles we need to handle. The referrer values collected in Sessions will include any query string that was appended to the URL when the visitor first hit our site, so we need to remove this before grouping the records; otherwise, we'll get multiple entries for the same site where the query string is different. Secondly, the language/country value sent by the browser can contain several entries in a comma-delimited list. We can’t cope with this and so we'll remove all except the first one in this case.
TSQL allows us to use a CASE statement to carry out any one of two or more actions, based on the result of a test expression. In the example of the referrer string, we just look for a ? character in the URL, then remove it and any characters after the ? if we find one. If we don’t find one, we use the original value of the field:
...
/* Use a CASE statement to remove any query string from the URL */
RefURL=MAX(
CASE WHEN CHARINDEX('?', Referer) > 5
THEN SUBSTRING(Referer, 1, CHARINDEX('?', Referer) - 1)
ELSE Referer
END),
...
The only other difficulty is that we have to repeat this CASE statement in the GROUP BY clause to get the correct grouping. If we didn't, we'd get repeated records for any URLs that differed only in the query string. Here's the remaining half of the procedure:
...
/* Part 2, now update the for Sessions Summary tables for last week */
INSERT INTO UserAgentSummary -- update UserAgentSummary table
SELECT UAYear = MAX(DATEPART(year, EventDateTime)),
UAWeek = MAX(DATEPART(week, EventDateTime)),
UAText = MAX(UserAgent),
UACount = COUNT(UserAgent),
UAHostIP = MAX(HostIP)
FROM Sessions
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0
AND UserAgent IS NOT NULL -- ignore any records with no
AND UserAgent <> '' -- value for UserAgent
GROUP BY UserAgent,
DATEPART(year, EventDateTime),
DATEPART(week, EventDateTime),
HostIP
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
INSERT INTO RefererSummary -- update RefererSummary table
SELECT RefYear = MAX(DATEPART(year, EventDateTime)),
RefWeek = MAX(DATEPART(week, EventDateTime)),
/* Use a CASE statement to remove any query string from the URL */
RefURL=MAX(
CASE WHEN CHARINDEX('?', Referer) > 5
THEN SUBSTRING(Referer, 1, CHARINDEX('?', Referer) - 1)
ELSE Referer
END),
RefCount=COUNT(Referer),
RefHostIP = MAX(HostIP)
FROM Sessions
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0
AND Referer IS NOT NULL -- ignore any records with no
AND Referer <> '' -- value for Referrer
AND CHARINDEX('.wrox.co', Referer) = 0 -- or from all our own servers
AND CHARINDEX('file:', Referer) = 0 -- or from a user's disk file
GROUP BY CASE WHEN CHARINDEX('?', Referer) > 5
THEN SUBSTRING(Referer, 1, CHARINDEX('?', Referer) - 1)
ELSE Referer -- must repeat CASE statement here
END, -- as well for correct grouping
DATEPART(year, EventDateTime),
DATEPART(week, EventDateTime),
HostIP
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
INSERT INTO SessionTargetSummary -- update SessionTargetSummary table
SELECT TargYear = MAX(DATEPART(year, EventDateTime)),
TargWeek = MAX(DATEPART(week, EventDateTime)),
TargText = MAX(URL),
TargCount = COUNT(URL),
TargHostIP = MAX(HostIP)
FROM Sessions
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0
AND URL IS NOT NULL -- ignore any records with no
AND URL <> '' -- value for Target URL
GROUP BY URL,
DATEPART(year, EventDateTime),
DATEPART(week, EventDateTime),
HostIP
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
INSERT INTO CountrySummary -- update CountrySummary table
SELECT CoYear = MAX(DATEPART(year, EventDateTime)),
CoWeek = MAX(DATEPART(week, EventDateTime)),
/* Use a CASE statement to remove any secondary codes after a comma */
CoText = MAX(
CASE WHEN CHARINDEX(',', UALanguage) > 1
THEN SUBSTRING(UALanguage, 1, CHARINDEX(',', UALanguage) - 1)
ELSE UALanguage
END),
CoCount = COUNT(UALanguage),
CoHostIP = MAX(HostIP)
FROM Sessions
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0
AND UALanguage IS NOT NULL -- ignore any records with no
AND UALanguage <> '' -- value for language
GROUP BY CASE WHEN CHARINDEX(',', UALanguage) > 1
THEN SUBSTRING(UALanguage, 1, CHARINDEX(',', UALanguage) - 1)
ELSE UALanguage -- must repeat CASE statement here
END, -- as well for correct grouping
DATEPART(year, EventDateTime),
DATEPART(week, EventDateTime),
HostIP
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
DELETE -- now delete last weeks records
FROM Sessions -- from Sessions table
WHERE DATEDIFF(week, EventDateTime, GETDATE()) > 0
IF (@@ERROR <> 0) GOTO on_error -- something went wrong
COMMIT TRANSACTION -- all OK, so commit updates
SELECT 'Weekly Hits and Sessions summary succeeded.'
RETURN (0)
on_error: -- something went wrong
ROLLBACK TRANSACTION -- so abort all updates
SELECT 'Error - Weekly Hits and Sessions summary aborted.'
RETURN (1)
Having created the table and procedures that summarize our logging data, we can consider how we will execute the procedures on a regular basis. We could create ASP pages that call them, and open these pages in a browser to run the procedures. This could be a regular task that we performed, and we just have to hope that our stand-in remembers to do it while we're away from the office—perhaps lying on that foreign beach we spend the rest of the year dreaming about.
However, there are other ways to run a SQL Server stored procedure. We can do it directly from a Windows Command prompt, using the DOS-based ISQL application that is included with the SQL Server Client Utilities. For example, this command would (given appropriate server and login details) execute the usp_weeklyupdatesummaryhits procedure and then exit from ISQL:
C:\>isql -Sserver -Uusername -Ppassword -Q"iislogs.dbo.usp_weeklyupdatesummaryhits"
This would allow us to automate the process by setting up Windows Schedule service to execute this command for us on a weekly (or monthly) basis. It's not the way that we do it, but we'll see how it could be used.
Before you can use the Schedule service, you might need to start it using the Services icon in Control Panel. If you decide to adopt this technique, change the Startup mode to Automatic:
The Schedule service itself uses a command line interface, although there is an add-in in the Windows NT Resource Kit that provides a graphical interface and there are plenty of other third-party utilities that do the same. We'll be using the command line here.
We can create an ordinary DOS batch file containing the commands to run our two weekly procedures via
ISQL, and drop it into the Mssql folder on drive C:
isql -Smyserver -Uanonymous -P -Q"iislogs.dbo.usp_weeklyupdatesummaryhits"
isql -Smyserver -Uanonymous -P -Q"iislogs.dbo.usp_updateweeklyhittotals"
Then instruct the Schedule service to run our batch file every week, at 2:00 AM on Sunday, by typing the following at the Command prompt:
C:\>at 02:00 /every:Sunday "c:\mssql\updatelogs.bat"
It is possible to create several batch files, and have them run at different times. We can also provide the command to run ISQL (or any other process) as the last parameter of the at command. But, while all this works fine, there is a better way. Instead, we'll use the SQL Server Executive.
The option we chose for automating our log summary system is the built-in scheduling capabilities of SQL Server—via the SQL Server Executive program that is installed with SQL Server. This has a graphical interface that we can use from within the SQL Server Enterprise Manager, making it easy to view the procedures as we set up scheduled tasks for them. The following screenshot shows the entry for our usp_weeklyupdatesummaryhits procedure selected, and we can edit the task properties by clicking the Edit Task toolbar button:
This opens a dialog that we can use to create a new task, or edit an existing one. You can see the task type is TSQL, the database we want to work with is iislogs, and the command is exec usp_weeklyupdatesummaryhits. By clicking the Change button, we can specify the schedule details for when it will be run:
This dialog also provides a button to display the task history:
Another alternative would be to create a script that used the new Windows NT Scripting Host to automate the process. For more information, check out http://www.microsoft.com/scripting/windowshost/default.htm.