George Young
Development Lead
Microsoft Corporation
March 30, 1999
The following article originally appeared in the MSDN Online Voices "Code Corner" column.
Welcome to Code Corner, a new column in MSDN Online Voices. Code corner will focus on sample code that addresses common issues, feature requests developers typically receive, or something I just find especially cool. I'll base some of the columns, like this one, on my recent experiences as lead developer on this new MSDN Online site.
Your feedback is welcome! Please feel free to drop me a line via the address in the column footers with your comments, questions, topic ideas, or links to your own variations on topics the column covers. Really cool links will likely be published in future columns. (Please, though, don't expect an individual reply or send me support questions. My dayjob boss will kill me.)
In this inaugural column, let's walk through developing and deploying a custom 404 error page for Microsoft Internet Information Services (IIS) 4.0. Custom error messages -- which allow you to designate a customized page to be served up when a given server error occurs -- are a feature new to IIS 4.0. A custom 404 handles the all-to-common File Not Found error triggered when a visitor requests a file that doesn't exist on the server. Gracefully handling a bad URL request is a Very Good Thing in all circumstances, and is especially valuable if you are moving a bunch of content around or switching servers or clusters (as we did with the recent merge of Site Builder Network (SBN) and MSDN Online).
It's very straightforward to create a plain HTML page with a friendly error message and some branding to give your visitors a smoother experience. This is certainly better than the server-issue 404 Error message. However, with a bit more work, namely resource mapping and scripting, you can actually map expired file and directory paths to valid new ones, routing your users straight to the file they seek, or at least to helpful resources, such as your table of contents, site map, or search page.
With the merge of the SBN and MSDN Online, some 7,000 files on the former's site have moved from the main microsoft.com cluster to a cluster called msdn.microsoft.com. Had all our files kept exactly the same pathname for the move, while just changing server names, redirects would be pretty straightforward -- map one server name to another, and voilà.
However, as with most site redesigns, our mapping is far from simple. Some areas, such as the Web Workshop (formerly SBN Workshop), have kept their same pathnames. Others, like Voices (an expansion of the former SBN Magazine and where you find yourself at this very moment), have kept the same filenames, but are now hosted in a different directory. A very few areas have disappeared altogether, and others have a few , but not all, files changing names.
Here is a table which outlines our five main 404 scenarios:
Scenario | Example old path | Example new path |
Single file maps to new file | /sitebuilder/whatsnew.asp | /siteguide/recent.asp |
Whole area goes away | /gallery/stylesheets/ | -- |
Path changes | /sitebuilder/siteinfo/glossary/ | /workshop/essentials/glossary/ |
Old directory handled by single file | /sbnmember/wms/ | /osig/wm/default.asp |
Direct mapping | /xml/ | /xml/ |
With such a variety of possible error conditions, the value for handling the volume of visitors requesting a file on the old server in a graceful manner is huge. One approach to this would have been to place redirect pages -- either .asp pages with Response.Redirect() calls, or .htm pages using the META Refresh header -- in every directory on the old server. While requiring less code, this approach suffers from relying on the presence of a physical file in every directory, which makes maintenance and deployment an exercise in keeping track of a gazillion separate files.
A second option was to register a custom 404 page on the server, and handle with script and mapping all redirects in that single file. While this would require more up-front work determining all the mapping variations, we chose this approach; it seemed an interesting challenge, and would allow us to keep all our redirect information in one file in one location, a real bonus on a large Web site like MSDN Online.
So, we committed to mapping a bunch of paths and filenames using script, either on the server or on the client. Because we have had good experiences mapping paths using JScript arrays on the server, such as to build our NavPath links in Web Workshop, we initially opted for that approach. Doing this on the server also allows you to use the latest scripting-engine features -- without having to worry about browser capabilities (and minor version bugs) -- like Regular Expressions, something we've become especially fond of.
Unfortunately, some legacy frameset code we weren't able to eliminate for launch kept us from putting the mapping code on the server. Our old frameset code uses a "#" to delimit the content document for a frameset (for old offline viewing reasons), and this "location.hash" information is not passed to the server. This means that any request containing our frameset could not be parsed on the server. The best we could do would be to send the user to the home page for the entire site -- hardly a valuable redirect.
So we hunkered down to build a client-side, scripted solution which would give users of our target browsers a really good experience while not causing errors in JScript-enabled browsers. In the process, we had to use "lowest common denominator" script in order to effectively handle as many browsers as possible, and even had to bypass a browser in a couple of cases.
We essentially needed to do three things with our single redirect page:
We opted for an array of pipe-delimited strings to store the old-to-new path-fragment-mapping information. A JScript associative array might have been an even cleaner approach, but we haven't had the greatest performance with these, and we wanted to keep our structure fast in case we moved the code to the server. We also could have used JScript objects, where each of the mapping elements is a property.
We divided the mappings into three types: old path, new path, and path identifier. We then ordered the array elements so that the elements containing the most specific information were processed first. A URL would "filter" down the array until it's most specific match was found (or no match at all). As you can see in the abbreviated paths array below, the specific information about the newtosite.asp page in the old /siteinfo/ directory (mapping to /siteguide/using.asp) is processed before the more general information about all (remaining) files in /siteinfo/ (mapping to /siteguide/).
var aRedir = new Array // old path | new path | path id ( // File mapping - specific filenames "/sitebuilder/whatsnew.asp|/siteguide/recent.asp|gde", "/sitebuilder/siteinfo/newtosite.asp|/siteguide/using.asp|gde", // Ex areas - areas which no longer exist "/sbnmember/promote/||mbr", "/sitebuilder/tour/||tur", //Directory mapping - the most general; entire directories map "/sitebuilder/siteinfo/glossary/|/workshop/essentials/glossary/|gls", "/sitebuilder/siteinfo/|/siteguide/|gde" );
Once the array structure is set up, we wrote a couple functions to grab the URL, find its mapping (or not) in the aRedirs array, and replace any relevant old path information with new path information. The first function, GetRedirIndex() just runs through aRedirs until it matches the old path and returns the index of the match in the array. This function could have been incorporated into the our second function, GetNewUrl(), but we isolated it to keep the code cleaner and to make modification easier in case we decided to use the index in more than one place. GetNewUrl() takes the array element that matched and parses it into the three mapping elements we need to write the appropriate link and text.
function GetRedirIndex(sUrl) { // Loop through aRedir until we match sReqUrl for (var i=0;i<aRedir.length;i++) { var sRedir = aRedir[i]; // If we match, return the array index if (-1 != sUrl.indexOf(sRedir.substring(0,sRedir.indexOf("|")))) { return i; } } // Otherwise, if we don't match, return -1 (false) return -1; } function GetNewUrl(sUrl) { // Get the matching array element's index var iIndex = GetRedirIndex(sUrl); if (-1 != iIndex) { // We've got a valid index, so let's parse out sOldPath and sNewPath var sRedir = aRedir[iIndex]; var sOldPath = sRedir.substring(0,sRedir.indexOf("|")); var sNewPath = sRedir.substring(sOldPath.length + 1); sNewPath = sNewPath.substring(0,sNewPath.indexOf("|")); // If sNewPath ends in a slash, then swap it for sOldPath if ("/" == sNewPath.substring(sNewPath.length-1)) { return (sUrl.substring(0,sUrl.indexOf(sOldPath)) + sNewPath + sUrl.substring((sUrl.indexOf(sOldPath) + sOldPath.length),sUrl.length)); } // Otherwise, if sNewPath is not empty, it must be a page // so append it to the start of sOldPath else if ("" != sNewPath) { return (sUrl.substring(0,sUrl.indexOf(sOldPath)) + sNewPath ); } // Otherwise, sNewPath is blank, so we have an "Ex-Area" else { window.sExArea = sRedir.substring((sOldPath.length + 1) + (sNewPath.length + 1)); } } // We don't have a valid index (no match), so let's bail and return false else return false; }
GetNewUrl() is called inline from the code below which does a few things. Using location.href, which we parse to get the equivalent of location.pathname and location.hash, it figures out if the request was for a frameset (really relevant only to our legacy code). Then, and this is one example of where client-side scripting differences came into play, we bypass Navigator 2, which doesn't support the Array object, and special case Internet Explorer 3, which doesn't expose location.hash to the scripting engine.
var sReqUrl = location.href; var sReqUrl = sReqUrl.substring(5); // CHOP '404;' sReqUrl = sReqUrl.substring(7); // CHOP 'http://' sReqUrl = sReqUrl.substring(sReqUrl.indexOf("/")); // CHOP servername var bIsFramed = false; if (-1 != sReqUrl.indexOf("c-frame.htm")) bIsFramed = true; // Set the base sNewUrl and initialize variables to false var sNewUrl = "http://msdn.microsoft.com/"; var sNewPath = false; var bMatched = false; // If sReqUrl is a frameset, get the correct sNewPath (special case IE3, avoid NN2) if (!bIsNN2) { if (bIsFramed) { if (bIsIE3) { sNewPath = sReqUrl.substring(0,sReqUrl.substring(1).indexOf("/") + 2); } else { sNewPath = GetNewUrl(sReqUrl.substring(sReqUrl.indexOf("#") + 1)); } } // Othewise get non frameset sNewPath else sNewPath = GetNewUrl(sReqUrl); // If we matched, set bMatched to true if (sNewPath) { bMatched = true; sNewUrl += sNewPath; } }
Finally, with the appropriate link identified for our match (or lack thereof), we write in the information in the browser window using the document.write() method. I won't include the code here, it's pretty implementation-specific, but you can take a look at it by requesting any bad Url on the old Site Builder Network site, starting Tuesday afternoon, March 30. Try, for example, the old URL of the former SBN Magazine (now MSDN Online Voices).
That's our Custom 404. We had to jump through a few hoops to get the script working on all version 3 and greater browsers, a definite reminder of the advantages of coding with ASP on the server, but it seems to be a pretty stable solution.
Let's wrap up with a look at a couple of things we would differently on the server than on the client.
The logic and mapping arrays wouldn't change much had we opted for server-side script, except that we might store them in application scope variables, using the LookupTable Object. In fact, you could take this code as is, just changing the document.write() statements to ASP's Response.Write(), and pop it on your IIS 4.0 box. But there are things we would have done a bit differently on the server.
Instead of declaring and assigning to an object variable, and then evaluating it in a separate statement (required on the client by a Navigator 4.03 script bug), we would assign and evaluate in one statement:
sNewPath = GetNewUrl(sTopicUrl); if (sNewPath) { bMatched = true; sNewUrl += sNewPath; }
would become
if (sNewPath = GetNewUrl(sTopicUrl)) { bMatched = true; sNewUrl += sNewPath; }
The replace() method is a real keystroke-saver compared to using indexOf() and substring() to build strings, and makes for more readable code. We'd use this extensively, as in the following example where a new path fragment replaces an old one, as matched in our array of paths.
return (sUrl.substring(0,sUrl.indexOf(sOldPath)) + sNewPath + sUrl.substring((sUrl.indexOf(sOldPath) + sOldPath.length),sUrl.length));
would become:
return sUrl.replace(sOldPath,sNewPath);
Using Array.split(), which we couldn't do on the client, would also have made for slightly cleaner code. For example,
var sRedir = aRedir[iIndex]; var sOldPath = sRedir.substring(0,sRedir.indexOf("|")); var sNewPath = sRedir.substring(sOldPath.length + 1); sNewPath = sNewPath.substring(0,sNewPath.indexOf("|")); window.sExArea = sRedir.substring((sOldPath.length + 1) + (sNewPath.length + 1));
would become:
var aRedirData = aRedir[iIndex].split(); var sOldPath = aRedirData[0]; var sNewPath = aRedirData[1]; window.sExArea = aRedirData[2];
We used an intermediate page on the client for two reasons. First, we wanted to give visitors accustomed to Site Builder Network some information about the site merge before redirecting them to MSDN Online. Second, we wanted to avoid the dreaded back-button loop from which many client-side redirect scripts suffer. If location.href = sNewUrl is used, visitors essentially lose use of the back button, because both the redirect and the destination page go into browser history. When the user clicks the back button, he hits the redirect page, which immediately sends him forward again. The location.replace(sNewUrl) method overcomes this by only keeping the destination page in history, but is only supported in Internet Explorer and Navigator 4 and higher.
Response.Redirect(sNewUrl) on the server would eliminate the back-button problem. We'd probably still want the intermediate "splash" page for the first few weeks of the new site, but would likely move to a transparent Response.Redirect() solution afterwards.
George Young is the development lead on Microsoft's MSDN Online site, and previously worked on the Site Builder Network site. In his spare time, he makes penalty kicks for Flamengo (in his dreams), listens to Mexican radio stations over Windows Media Player, and commutes to Redmond, Washington from New Orleans in his Caddy.