This example demonstrates several of the techniques we've seen in use throughout the book so far, including manipulating data in ADO, passing information between pages, creating text files on the server, and using include files to reduce server loading.
Earlier in this chapter, we came across the Referers table. This isn’t part of the log file summary process, but is used instead to hold a list of sites that provide links to our site. We update it each week, and make the list available on our Web site for visitors to browse and find associated sites:
You can open this page from our Web site by going to http://webdev.wrox.co.uk/resources/ and selecting 'Referrers'.
Looking at the Referers table, you can see that it contains five fields:
You've seen how we can collect and store details of each site that provides a referral to us in the earlier parts of this chapter. So you can guess where the base values for our Referers table come from. We just need to pull them out of the RefererSummary table each week, after the stored procedures have worked their magic. The trick is that we want to know more about the sites in the list. Just displaying the URL (which is all we collected in the RefererSummary table) doesn’t look very attractive.
In the previous chapter, you saw how we can use an Active Server Component to fetch a Web page as a string, and then parse the string to see what the page contains. We use this same technique with the URLs in the RefererSummary table, parsing out the contents of the <TITLE> element from each page in turn and dropping it into the Referers table together with the URL and the number of referrals. This means that only sites we know actually exist, and which we can verify content for, are included in the list.
You might think that the sites must exist, because we got a referral from them. If you look in the RefererSummary table, however, you find that this is not always the case. Despite our diligence in removing from the list empty entries, entries from our own sites, and entries that are disk files rather than Web pages, we can still get URLs that we probably don't want to include in our pages. This includes sites that require a password or other verification to enter, sites that have moved or died since last week, and sites that are 'sensitive' i.e. contain information we wouldn't want to provide a link to. This last category might include sites belonging to direct competitors, sites that libeled us or our products, and sites containing content that is not suitable for our younger or less broad-minded visitors.
Yes, it does happen. We've had referrals from sites that contain explicit material. Remember, anyone can put a link to your site on one of their pages, click it a few times to create referrals, and then remove the link again. You must take precautions when building lists of links to other sites.
So, while many of the techniques you'll see in this example are ones we've introduced in other parts of the book, you will probably find the way we monitor referrals interesting—if not invaluable.
The plan is to collect the referrals from the RefererSummary table for the previous week, and build up a list of them in the separate Referers table. As we do so, we'll fetch the contents of the referring page from each site and examine the content using ASP code. We can then warn about any possibly doubtful content (and record this fact in the RefDelete field of the Referers table), while at the same time extracting the text from the page's <TITLE> tag. If we can’t load the page within a preset time, or if we get an error, we will skip the page altogether.
The initial part of the processing is done by the page createreferrerstable.asp, a version of which is included with the samples for this book. At the top of the page, we use SSI to insert two text files. The first is the access control script that we saw in Chapter 5. It parses the query string sent to this page for a username and password that entitle us to run the page. It redirects the browser back to the admin login page if it's not supplied. You can refer back to Chapter 5 to see how it works.
The second SSI file is the usual database connection information that we've used with ADO in most of our previous examples. It provides a connection string that has update permission for the IISLogs database tables. We also increase the ASP timeout for the page, because we can guess that this baby is going to run for some time. Finally, we define the timeout we'll allow the component for seeking each individual page. Defining it here makes it easier to change the value in the future if required:
<%@ LANGUAGE=VBSCRIPT %>
<!-- #include virtual="/common/login.inc" -->
<!-- #include virtual="/common/connect.inc" -->
<% Server.ScriptTimeOut = 2400 %>
<% seekTitleTimeout = 45 %>
...
Next comes some ordinary HTML that creates the introductory text in the page (omitted here), followed by the ASP code that starts the process off. We create a recordset containing all the entries in the RefererSummary table for the previous week, and an array to hold the results of our page parsing exercise. We generally only get around 70 distinct referrers that generate more that one referral per week, so we're going to limit the number of entries to the top 100. Including many more would probably make the page too long for most visitors to find useful anyway. Then we're ready to start looping through the recordset:
...
<% '--get list of referrers from RefererSummary table--
On Error Resume Next
Set oConn = Server.CreateObject("ADODB.Connection")
oConn.Open strConnect 'from include file at top of page
strSQL = "SELECT RefCount=SUM(ItemCount),RefURL=MAX(ItemText) FROM RefererSummary " _
& "WHERE (TWeekNumber = DATEPART(week,GetDate()) - 1 " _
& "AND TYearNumber = DATEPART(year,GetDate())) " _
& "OR (TYearNumber < DATEPART(year,GetDate())) " _
& "GROUP BY ItemText ORDER BY Sum(ItemCount) DESC"
Set oRs = oConn.Execute(strSQL)
If (oRs.EOF) Or (Err.Number > 0) Then
Response.Write "<FONT FACE=" & QUOT & "Arial" & QUOT & " SIZE=3>" _
& "<B>Sorry, database cannot be accessed at present.</B></FONT></BODY></HTML>"
Response.End
End If
Dim arrTitles(100, 3) 'array to hold page titles
Dim intPages 'number of pages found
intNumPages = 0
oRs.MoveFirst
Do While (Not oRs.EOF) And (intNumPages < 100)
...
For each entry in our recordset, we write some progress information into the page then examine the URL to see if we can do anything with it. If it contains the text "cgi", "wrox.co" (which will catch any wrox.co.uk and wrox.com URLs) or "file://" we ignore it because it's probably a local or server-side executable page. And if it doesn't start with "http://", we ignore it also:
...
'--process each entry--
Response.Write "Processing URL: <B>" & oRs("RefURL") _
& "</B> Hits: " & oRs("RefCount") & "<BR>"
If (InStr(LCase(oRs("RefURL")), "cgi") > 0) _
Or (InStr(LCase(oRs("RefURL")), ".wrox.co") > 0) _
Or (InStr(LCase(oRs("RefURL")), "file://") > 0) Then
Response.Write "Local or executable script page ignored.<BR>"
ElseIf (InStr(oRs("RefURL"), " ") > 0) _
Or (InStr(LCase(oRs("RefURL")), "http://") = 0) Then
Response.Write "Illegal URL format ignored.<BR>"
Else
...
If the URL seems OK, we'll go and get the page title. We do this using an ASP HTTP component—the one you saw in the previous chapter. We give the component 45 seconds to retrieve the page, then check if the value returned indicated that it contained an error or an invalid result:
...
'-- get page title --
strTitle = ""
strResult = ""
strURL = oRs("RefURL")
Set oHTTP = Server.CreateObject("ASP.HTTP") 'create the component instance
oHTTP.Url = strURL 'set the URL to fetch
oHTTP.TimeOut = seekTitleTimeout 'set the timeout
strResult = oHTTP.GetURL 'and get the page
Set oHTTP = Nothing 'then destroy component instance
If Len(strResult) = 0 Then
Response.Write "No reply from server in " _
& seekTitleTimeout & "seconds.<BR>"
Else
If Instr(LCase(strResult), "http error") > 0 _
Or Instr(LCase(strResult), "object moved") > 0 _
Or Instr(LCase(strResult), " invalid ") > 0 Then
Response.Write "Request returned an error.<BR>"
...
This is not a perfect solution, because it could reject acceptable pages that contained the words "http error" or "invalid". However, it seems to work well in general, and you can easily change the behavior by editing the code yourself.
Note that we are also destroying and re-instantiating the component each time we use it in our page. This is not the most efficient technique, but we found that it sometimes returns invalid results when it hits a server that doesn’t respond correctly. We are using an older version of the component that is already present on our system for use with other applications, and more recent versions should solve this problem. We looked in detail at the ASP HTTP component that is available from Stephen Genusa in the previous chapter. See http://www.serverobjects.com/, or the samples available for this book, for more details.
Now that we've got a page, we can parse out the title and put it into the variable strTitle, then look for 'doubtful' content. You can modify this code to suit your own requirements, of course. If we do find any doubtful pages, we set the intDelete variable to 1:
...
Else
strTitle = "" 'now we can parse out the contents of any <TITLE> tag
intStart = Instr(UCase(strResult), "<TITLE>") + 7
intFinish = Instr(UCase(strResult), "</TITLE>")
If (intStart > 0) And (intFinish > intStart) Then
strTitle = Trim(Mid(strResult, intStart, intFinish - intStart))
intApostPos = InStr(strTitle, "'")
Do While intApostPos > 0 'we have to remove any apostrophes from the
If intApostPos = 1 Then 'title otherwise our SQL statement will fail
strTitle = Mid(strTitle, 2)
ElseIf intApostPos = Len(strTitle) then
strTitle = Left(strTitle, intApostPos - 1)
Else
strTitle = Left(strTitle, intApostPos - 1) _
& Mid(strTitle, intApostPos + 1)
End If
intApostPos = InStr(strTitle, "'")
Loop
End If
If Len(strTitle) = 0 Then strTitle = "Untitled page at:"
strResult = LCase(strResult)
If InStr(strResult, " sex ") Or InStr(strResult, " adult ") Or _
InStr(strResult, " porn ") Or InStr(strResult, " xxx ") Or _
InStr(strResult, " nude ") Or InStr(strResult, " sexy ") Then
intDelete = 1 'we'll assume this page will be deleted
Response.Write "<B>Content Warning!</B> "
Else
intDelete = 0
End If
...
Now we just put a message into the page to show the title (or "
Untitled page" if there was no title), and add the values to our arrTitles array. By storing them in an array while the slow process of building up the list takes place, we avoid any chance of the database being left in a partially updated state. This might occur if user stops the processing of the page, or loses the Web connection to it:
...
Response.Write "Page title is: <B>" & strTitle & "</B><BR>"
intNumPages = intNumPages + 1
arrTitles(intNumPages, 0) = Trim(oRs("RefURL")) 'store the page details
arrTitles(intNumPages, 1) = strTitle 'in our array ready
arrTitles(intNumPages, 2) = oRs("RefCount") 'to add to the table
arrTitles(intNumPages, 3) = intDelete 'afterwards
strTitle = ""
strResult = ""
End If
End If
End If
If oRs("RefCount") < 2 Then Exit Do 'we'll stop when we get to pages that
oRs.MoveNext 'provided less than two referrals.
Loop 'then repeat for the next record
Set oRs = Nothing %>
...
So, the browser will display a list of the sites and the result of its parsing of the pages as it occurs. In the next screenshot, we're part way through the process:
Now it's just a matter of deleting the existing content from the Referers table and adding the new records:
...
<% '--delete existing table contents--
strSQL="DELETE FROM Referers"
oConn.Execute(strSQL)
If Err.Number > 0 Then
Response.Write "<FONT FACE=" & QUOT & "Arial" & QUOT & " SIZE=3>" _
& "<B>Sorry, Referers table cannot be accessed.</B></FONT></BODY></HTML>"
Response.End
End If
%>
<P><B>Deleted existing contents of Referers table.</B></P>
<% '--add new list to Referers table, looping through the array--
For intRefCount = 1 To intNumPages
strURL = arrTitles(intRefCount, 0)
strTitle = arrTitles(intRefCount, 1)
lngCount = arrTitles(intRefCount, 2)
lngDelete = arrTitles(intRefCount, 3)
strSQL = "INSERT INTO Referers (RefURL, RefTitle, RefCount, RefDelete) " _
& "VALUES ('" & strURL & "', '" & strTitle & "', " & lngCount & ", " _
& lngDelete & ")"
oConn.Execute(strSQL)
Response.Write "Added: <B>" & strTitle & "</B> at " & strURL & ".<BR>"
Next
Set oConn = Nothing %>
<P><B>Added <% = intNumPages %> pages to the Referers table.</B></P>
<FORM ACTION="updatereferrersinclude.asp">
<INPUT TYPE="SUBMIT" NAME="Submit" VALUE="Next >">
</FORM>
...
Here's a compound view of the result. The page lists the entries as it adds them to the table, then provides a button that will load the ASP page updatereferrersinclude.asp that carries out the next stage in the process:
Once we've got the list of pages into the Referers table, we then load another page that displays them with the URL as a hyperlink. We also place a checkbox next to each one, and check this automatically if we found any sensitive comment in the page during the previous stage of the process. The administrator can then look over the list, view any pages they are unsure about, and delete any that they don’t want to appear. In fact, they can go round the loop of displaying and deleting pages as many times as they like.
This is what the page updatereferrersinclude.asp looks like:
We want to be able to carry out the three activities in one page: listing the links in the Referers table, deleting unwanted links from the table, and then recreating the text file for the links that will be included in the
Links To Uspage. We use parameters added to the ASP QueryString and Form collections to control what happens as the page loads each time. To achieve all this, we create the page along the following lines. As you can see, if we open the page with no values in the Form and QueryString collections, we'll just get a listing of the referrers:
...
<%
'***********************************************
'-- open database connection
'***********************************************
On Error Resume Next
Set oConn = Server.CreateObject("ADODB.Connection")
oConn.Open strConnect 'from include file at top of page
QUOT = Chr(34)
CRLF = Chr(13) & Chr(10)
If Request.Form("DeleteLinks") = "Yes" Then
'************************************************
'-- delete ticked items from table
'************************************************
...
End If
If Request.QueryString("UpdateInclude") = "Yes" Then
'***********************************************
' update main referrers include text file
'***********************************************
...
Else
'************************************************
'-- display list of referrers to delete from
'************************************************
...
End If
%>
...
To make it easier to see how the process works, we'll look at each of the sections in the order that they are normally executed, rather than in the order they appear in the page. Just keep the previous section of code in mind, and remember that, in the real page, they appear in a different order.
This is the easiest step in the process, and we use the same old technique of opening a recordset from the table and looping through the records. The only difference this time is that we include a checkbox-type <INPUT> control for each entry, giving it a NAME attribute that is the record ID (created by SQL Server automatically when we inserted the record into the table) prefixed with the character 'X' so that we can identify it later in our code. We also look at the value of the RefDelete field for this link, and if it's not zero we set the checkbox by adding the CHECKED attribute to it as well:
'************************************************
'-- display list of referrers to delete from
'************************************************
...
'--get list of referrers from table--
strSQL = "SELECT * FROM Referers ORDER BY RefCount DESC"
Set oRs = oConn.Execute(strSQL)
If (oRs.EOF) Or (Err.Number > 0) Then
Response.Write "<FONT FACE=" & QUOT & "Arial" & QUOT & " SIZE=3>" _
& "<B>Sorry, database cannot be accessed at present.</B></FONT></BODY></HTML>"
Response.End
End If
Dim intNumDeletes 'number of content warning (deletes) found
Dim intNumPages 'number of pages found
intNumDeletes = 0
intNumPages = 0
oRs.MoveFirst 'start at first record
intWinNum = 1 'a number to use to create a new window for the <A> tags
%>
<FORM ACTION="<%=Request.ServerVariables("SCRIPT_NAME")%>" METHOD="POST">
<TABLE>
<TR>
<TH>Hits </TH><TH>Delete? </TH><TH>Page Title and URL</TH>
</TR>
<% '--now loop through the records processing each entry--
Do While (Not oRs.EOF) And (intNumPages < 100) %>
<TR>
<TD ALIGN="CENTER" VALIGN="MIDDLE" NOWRAP>
<% = CStr(oRs("RefCount")) 'show number of referrals %>
</TD>
<TD ALIGN="CENTER" VALIGN="MIDDLE" NOWRAP>
<!--checkbox with X and record id (key) as NAME attribute-->
<INPUT TYPE="CHECKBOX" NAME="X<% = CStr(oRs("RefKey")) %>"
<!--if there was a content warning add the CHECKED attribute-->
<% If CLng(oRs("RefDelete")) > 0 Then
Response.Write " CHECKED"
intNumDeletes = intNumDeletes + 1 'increment number of warnings
End If %>
> <!--end of INPUT tag-->
</TD>
<TD ALIGN="LEFT" NOWRAP>
<!--insert the page title and URL. Page will open in a new window -->
<!--named RefNewWinx so that current page is not lost from view. -->
<B><% = oRs("RefTitle") %></B><BR>
<A HREF="<% = oRs("RefURL") %>" TARGET="RefNewWin<% = intWinNum %>">
<% = oRs("RefURL") %></A>
</TD>
</TR>
<% intWinNum = intWinNum + 1
oRs.MoveNext 'and go to the next link record
Loop
Set oRs = Nothing
%>
</TABLE>
<% If intNumDeletes > 0 Then 'add warning text if any doubtful pages found %>
<P><B>There were <% = intNumDeletes %> content warnings.</B><P>
<% End If %>
<INPUT TYPE="HIDDEN" NAME="DeleteLinks" VALUE="Yes"> <!--pass on 'delete' value-->
<INPUT TYPE="SUBMIT" VALUE="Delete Ticked Links"><P> <!--submit button for form-->
</FORM>
<HR>
<A HREF="<%=Request.ServerVariables("SCRIPT_NAME")%>?UpdateInclude=Yes">
Update the 'Links To Us' page</A>
<A HREF="accessmenu.asp">Cancel</A>
'***********************************************
Once we've placed all the links and checkboxes in the page, the bottom section of the code above adds a 'content' warning if we found any doubtful entries, and then a button to start the delete process. This will submit the form containing all the links and checkboxes back to the same page again. The form also contains a HIDDEN control named DeleteLinks that has the value Yes.
Under the Delete Ticked Links button are the two ordinary text links. The first one is used to update the include file once we're happy with the list, and the other cancels the process and goes back to a main menu. Here's the bottom part of the page in the browser when we're listing the links:
When the user clicks the
DeleteTicked
Links
button, the <FORM> on the page is posted back to the same ASP page. The value from the hidden control name DeleteLinks will appear in the Form collection this time, with the value Yes. This means that the section of code that deletes any ticked links will be executed this time.
To delete the ticked links, we just need to iterate through the controls on the form looking for any that have a name starting with the character 'X'. Due to the way the Form collection works in ASP, only controls that were checked (ticked) on the page will produce an entry in the collection. For example, a checkbox named X217 will create the entry
X217=onin the Form collection when ticked, but no entry if not ticked.
We loop through the collection using a For..Each construct. Where we find a control whose name starts with 'X', we extract the record ID that is the remainder of the name, use a SQL DELETE statement to remove it from the Referers table, and write a message to the page:
'************************************************
'-- delete ticked items from table if appropriate
'************************************************
intDeleteCount = 0
For Each chkBox In Request.Form
If Left(chkBox, 1) = "X" Then
strSQL = "DELETE FROM Referers WHERE RefKey=" & Mid(chkBox,2)
oConn.Execute(strSQL)
intDeleteCount = intDeleteCount + 1
End If
Next
Response.Write "<P><B>Deleted " & intDeleteCount & " Entries.</B></P>"
'************************************************
Once the delete process is complete, execution carries on to the part of the page we looked at previously, which again lists the contents of the table complete with checkboxes—but without the record(s) we've just deleted. Here's the outline control structure of the page again, so that you can see where we've got to:
...
If Request.Form("DeleteLinks") = "Yes" Then
'************************************************
'-- delete ticked items from table
'************************************************
...
End If
If Request.QueryString("UpdateInclude") = "Yes" Then
'***********************************************
' update main referrers include text file
'***********************************************
...
Else
'************************************************
'-- display list of referrers to delete from
'************************************************
...
End If
...
Once our user is happy with the list of links, they can click on the 'Update' link at the bottom of the listing:
...
<A HREF="<%=Request.ServerVariables("SCRIPT_NAME")%>?UpdateInclude=Yes">
Update the 'Links To Us' page</A>
...
This places the value UpdateInclude=Yes in the QueryString by appending it to the page URL, so when the page loads this time it will execute the section of ASP code that updates the include text file. This code simply reads the contents of the Referers table using ADO, and from it creates an HTML-formatted text file containing the list of links. Here's how we build up a string, strInclude, containing the complete list of all the links:
'***********************************************
' update main referrers include text file
'***********************************************
'--create new list as a text file--
strSQL="SELECT * FROM Referers"
Set oRs = oConn.Execute(strSQL)
If (oRs.EOF) Or (Err.Number > 0) Then
Response.Write "<FONT FACE=" & QUOT & "Arial" & QUOT & " SIZE=3>" _
& "<B>Sorry, the Referers table cannot be accessed.</B></FONT></BODY></HTML>"
Response.End
End If
intNumPages = 0 'number of pages we've added to the table
oRs.MoveFirst 'start at first record
strInclude = "" 'the string we'll write to the text file
Do While (Not oRs.EOF) And (intNumPages < 100)
strURL = oRs("RefURL") 'the page URL
strTitle = oRs("RefTitle") 'the page title
strCount = CStr(oRs("RefCount")) 'the number of referrals
intNumPages = intNumPages + 1 'increment the number of pages in the list
strLine = "<FONT FACE=" & QUOT & "Arial,sans-serif" & QUOT & " SIZE=2>" _
& "<IMG SRC=" & QUOT & "/images/ball_red.gif" & QUOT _
& " WIDTH=25 HEIGHT=18 ALIGN=MIDDLE HSPACE=5><B>" _
& strTitle & "</B> Thanks for " & strCount & " referral"
'add 's' if there's more than one - looks nicer than using 'referrals(s)'
If oRs.Fields("RefCount") > 1 Then strLine = strLine & "s"
strLine = strLine & ".<BR>" & CRLF & " at <A HREF=" & QUOT & strURL & QUOT _
& " TARGET=" & QUOT & "RefNewWin" & intNumPages & QUOT & ">" _
& strURL & "</A></FONT><P>" & CRLF
strInclude = strInclude & strLine
Response.Write "Added: <B>" & strTitle & "</B> at " & strURL & ".<BR>"
oRs.MoveNext
Loop
Set oRs = Nothing
Set oConn = Nothing
...
Now we just need to write the string strInclude into a text file on disk. You've seen this done before in the book, and here we use a very similar technique. We've created a function named WriteNewFile that takes the file name and the string to be written into the file. We call it from our code now:
...
If WriteNewFile(strIncludeFile, strInclude) Then
%>
<P><B>Added <% = intNumPages %> pages to the Referers include file.</B></P>
Operation Complete.<P>
Go to the <A HREF="/resources/links_to_us.asp">Sites That Link To Us</A> page.
Go to the <A HREF="accessmenu.asp">Web Admin menu</A> page.
<% Else %>
<P><B>Failed - cannot create Referers include file.</B></P>
<% End If
'************************************************
The WriteNewFile function is declared elsewhere within our page, which also defines a variable strIncludeFile that contains the full physical path and name of the include file. This makes it easier to change it if we reorganize the site at a later date.
The function uses the CreateTextFile method of the VBScript Scripting.FileSystemObject object to create the text file, with the second parameter set to True so that any existing copy of the file will be overwritten. Then it can write the content into the file and exit:
<%
strIncludeFile = "G:\inetpub\wwwroot\resources\referers.inc"
Function WriteNewFile(strFileName, strContent)
On Error Resume Next
Set objFSO = CreateObject("Scripting.FileSystemObject")
If Err.Number = 0 Then Set objFile = objFSO.CreateTextFile(strFileName, True)
If Err.Number = 0 Then objFile.WriteLine strContent
If Err.Number = 0 Then objFile.Close
If Err.Number = 0 Then WriteNewFile = True
End Function
%>
And here's the result. The page contains links that allow the administrator to go and check the Links To Us page afterwards, or go back to the main menu:
Finally, within the '
LinksTo
Us
' page (the page you saw at the start of this section of the chapter), we just have to insert the text file using an ASP #include statement. Each time the text file is updated, users will see the new version and the new links when they load the '
LinksTo
Us
' page:
...
<!-- #include file="referers.inc" -->
...
Here, again, is what it looks like when complete: