This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.


MIND


This article assumes you're familiar with ASP, Visual Basic, VBScript, Visual C++, and SQL
Download the code (42KB)

Maintaining Session State on Your Web Farm
Marco Tabini

IIS and ASP provide several methods to track a user's session on the server. But when you have several servers running concurrently, you have to modify your approach.
There's no worse time to find out that you made a mistake when planning your Web site than when you need a quick fix to a seemingly simple problem. Off you walk after cheerfully replying "no problem" to your boss's request for a rapid resolution—with just enough time to catch the next flight to a nonextradition country.
      The problem with planning mistakes is, of course, that they are difficult to predict. Careful thought in the design phase doesn't always help you, especially when the scenarios in which Web problems arise seem far-fetched and the technology that you are using appears to be foolproof. As a result, you are forced to rewrite entire portions of code for what looked like a trivial matter in the design phase, and your "no problem" turns into a nightmare from which you just can't seem to wake up.
      For the sake of discussion, let's say your Web site has become so successful that the IT department has been stuffing your only Web server with hardware to the point that it requires a separate power plant just to run the cooling system. Sadly, you can only fit so many processors in any motherboard. You are faced with adding another computer and placing a router whose only task is to dispatch clients to either machine, depending on its overall workload, in front of the pool. That's where the unthinkable happens, and you grab your phone book looking for your travel agent so you can buy that one-way ticket.

A One-way Trip to Hell
      If you use Microsoft® Internet Information Server (IIS) version 3.0 or higher, you have probably grown accustomed to the convenience and power of ASP scripting. ASP is a truly powerful and flexible platform for writing and running scripts that makes accomplishing complex tasks a breeze. Furthermore, since it's entirely based on the COM model, it can be expanded easily by either writing new components from scratch or by installing one of many third-party packages already available on the market.
      When you write an ASP script, you can take advantage of a number of built-in objects. Arguably, the most interesting of all is the Session object, which is available to a script as part of its context.
      Session solves the problem that's originated by what you could call a difference of opinion between human beings and the HTTP protocol. People tend to consider all the pages in a Web site as a whole application; as such, it makes sense for you to be able to maintain a certain set of information throughout the entire site while a specific user is visiting it.
      Unfortunately, the HTTP protocol considers the scope of the application to be a single page. Therefore, when a user requests two pages, the Web server has completely "forgotten" about the first request by the time the second one takes place. This means that, under normal circumstances, HTTP is a stateless protocol; that is, it is unable to maintain the set of data required to run the whole site as an application.
      In order to make HTTP connections state-aware, IIS implements an internal database of Session objects, one for each visit currently under way on the site, and uses a cookie to link the individual browser to its own session. All the cookie really contains is a Globally Unique Identifier (GUID), a 128-bit number that is generated with an algorithm that virtually guarantees its uniqueness. GUIDs are normally displayed in the following well-known format:


 {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
Cookies can only be used to store strings, which means that you will have to use at least 32 characters (two for each byte) to display a GUID. On a large scale, this is an awful waste of space because, while each character in the string can be used to save seven bits, only four are used. To reduce the amount of bandwidth wasted, IIS converts the GUID to base 36 (using digits from zero to nine and characters from uppercase A to uppercase Z), which results in a maximum of 25 characters being used.
      The well-known problem with IIS sessions is that because they are entirely based on cookies, if the user turns off this feature the server will not be able to maintain the session state anymore. The real trouble comes because the server doesn't know that the cookies it sends out are being refused. As a result, the Session object is still available to the ASP scripts, but the values stored in it only have page-wide scope. Whatever is stored in the Session object in one script will be lost by the time the next one will be loaded.
      A typical solution consists of two basic steps. The first one determines whether cookies are being refused, and the second either advises the user that they must change the browser's settings or provides some alternative means of maintaining the session state. Finding out whether the cookies that you send out are being refused is relatively easy. All you have to do is make the first page of your site a simple redirection:

 <%
     Session ("Cookies") = "Yes"

     Response.Redirect ("Default2.asp")
 %>
If in the Default2.asp script the value "Cookies" in the Session object is still available, then you know for sure that cookies are turned on at the other end. Be careful not to fall into the trap of storing the cookie's value into the global.asa file and then attempting to read it in default.asp (assuming that's the entry page of your site). Even though they are two different files, the server will execute both of them as part of the connection that retrieves the first page. As a result, the values set when global.asa is executed will still be available in the main page because the connection between the browser and the server has not been lost.
      Now, for a more challenging thought, let's consider a lesser-known problem with sessions. (It's not well-known because until a short while ago it was extremely rare.) Because the internal database of session states is maintained in memory by the server, sessions are not only dependent on the availability of cookies on the target machine, but their scope is also limited to the individual server machine.
      If an increasingly common capacity problem forces you to create a server farm to run your site, sessions become unsuitable for your environment and can get you in real trouble if you've heavily based your scripts on them.

Alas, the Return Ticket
      There isn't really a cure to this problem out there yet. Third-party software that provides this functionality is not widely available, and Microsoft doesn't plan to support scalable sessions in IIS 5.0. Your only options at this point are to either buy a special kind of router that supports "sticky" IPs, which are able to redirect a specific user always to the same server or to cluster the Web servers together. Microsoft Commerce Server (part of Microsoft Site Server 3.0, Commerce Edition) also offers a server-independent storage system, but it's a little slow at this point.
      There is a relatively easy way to build a session state system that has the following characteristics:

  • COM-based
  • Can be easily used from within ASP scripts
  • Works with and without cookies
  • Works across a Web farm
  • Provides data caching for optimal performance
  • Can be expanded to support data persistency (the values stored in the session will be permanently available even when the user returns in the future)
  • Easy to maintain
      The need to make this system COM-compatible is obvious if you want it to work well with ASP. As you'll see, once completed and installed, my Session object will work almost exactly as the current one does.
      When you're developing COM components, the big question is which language you should use. Whenever possible, I use Visual Basic® for prototyping because it's much easier to change your mind about the way the components work than when you're using Visual C++®. Visual C++ is my language of choice for a production environment because of its stability and performance, plus its support for a wider variety of threading models.
      Since my goal here was not to provide a product, but to pass along a solution to a problem, most of the code in this article has been written using Visual Basic 6.0. You can convert my components to Visual C++ if you decide that the improved performance impact will be worth the additional development time.
      My Web server is a Pentium II-based, 350 MHz machine running Windows NT 4.0 Service Pack 4, SQL Server™ 6.5 on 128MB of RAM. As configured, this system was able to handle well over a hundred concurrent users without showing any signs of slowing down significantly. If you need to use a server farm you will probably have (or at least plan for) a far greater number of connections, and moving to Visual C++ might be a good idea.

Your Friend, the Database
      The first big consideration when designing a Web farm environment is how to make the session data available to multiple servers. One possible solution would be to keep the data in memory and pass it along to other servers through some mechanism like a DCOM session. This approach would probably end up being slow and pretty messy, with sessions floating around your internal network and being bounced around from server to server. In addition, the performance hit would be significant because you'd be forced to move the data around even if no changes to the Session object took place. Finally, the server that holds the data at any given point becomes a single point of failure for the farm; if it goes down, all the sessions stored in it will be lost.
      If you rule out memory storage as an option, the next best solution is to store all the session data in a database. There are a number of compelling reasons for doing so, including the fact that a database server has been designed and optimized to organize data, it can be made available to all the servers in the Web farm, and it can be replicated to ensure error recovery.
      In addition, the database provides a permanent data storage medium. This is good because even though you'll want your sessions to expire after a while, being able to store the data indefinitely can be an added bonus for user profiling. My system does not support this possibility, but as you'll see later on it can be easily modified to do so.
      All the data you need is stored in a single database table that I call Sessions (see Figure 1). Each record consists of a Session ID (SID) used to uniquely identify users, a data field, and a date-time field used to determine when a session has expired and when its corresponding record should be deleted. Without this functionality, your table would rapidly become cluttered with expired sessions and the performance of the system would suffer significantly.
      I decided not to use the SQL Server built-in IDENTITY feature, which would have automatically incremented the SID field every time a new record was added to the database. I chose not to for a number of reasons, primarily because it's specific to the Microsoft DBMS. Instead, I have opted for an alternate solution that uses GUIDs.
      If you are using SQL Server 7.0, you can take advantage of the new built-in uniqueidentifier data type, which can be used to generate and store GUIDs. For all other systems (including mine), I have written a simple COM component using C++ that generates a new GUID and converts it into a text representation of its hexadecimal form. As you can see from Figure 2, the function that generates the GUID is extremely simple: it calls the CoCreateGUID function of the Win32® API and then uses the Format method of CString to convert the resulting structure to its text representation.

Storing Data
      There is an important limitation to keep in mind when moving the concept of session state from a single computer to a server farm. Representing arrays can be difficult, and the objects that are available on one machine might not be available on the others. In addition, before you can transfer an object, it must be possible to persist it; that is, to save its current state so that it can be used at a later time to recreate an exact copy of the object. This is usually done through persistence mechanisms that must be supported by the object itself.
      After a little research on my previous projects, I noticed that objects are seldom stored as part of the session data, and decided to simply ignore the problem because I could have only found a partial solution anyway. (What happens if an object is not persistable? What if it is not installed on all the servers in the farm?)
      As a result, objects cannot be stored in the system described here. The same applies to arrays, although, as you'll see later on, it's easy enough to support that. The only acceptable data types are those that can be converted to and stored as strings, including numeric values.
      When the session data has to be persisted to the database, my system simply goes through the entire list of values and stores them in a single string, separated by a special character value. Because each element is composed of a value and a name (which is used to retrieve the value), there will always be an even number of substrings in a saved session.

System Architecture
      My system is composed of four objects (see Figure 3). SessionElement is an extremely simple object. All it provides is a storage system for the values that are part of a user's session and their names.
      ElementCollection, which was created using the Visual Basic Class Wizard, represents an individual user session, and is a collection of SessionElement objects. The only significant difference is that it provides a more flexible way of retrieving and adding information through the Member method, which is also marked as the object's default member. Because of this, it's possible to add and retrieve values by simply specifying their names:


 msSession ("Test") = "This is a test"
 
 Response.Write (msSession ("Test"))    
 ' will return "This is a test"
ElementCollection also offers the LastModified property, which can be used to synchronize its data with the master data stored in the database.
      SessionObject offers the richest functionality of all the classes and performs most of the work. At its core, it maintains a collection of ElementCollection objects, which together constitute a cache of all the sessions currently available on the system. This way, if a user keeps returning to the same machine, the system will not have to reload all the data from the Sessions table every time it needs it.
      SessionObject also retrieves sessions from the database and saves them back if needed. The SaveSession method iterates through the an ElementCollection object and builds a single string that contains all the data:

 For Each eElement In cUserSession.mCol
 sSessionData = sSessionData & eElement.Name & 
	Chr(1) & eElement.Value & Chr(1)
 Next
      The LoadSession method retrieves the single string from the database, separates the individual values using the Split function provided by Visual Basic, and stores them back into an instance of ElementCollection.
      When an ASP page requests a session, SessionObject creates a new instance of Session and uses it as a wrapper around the appropriate instance of ElementCollection. Session simply provides passthrough methods to its instance of ElementCollection, adding some functionality of its own only when it is destroyed (which will usually happen at the end of the page). The Class_Terminate method, in fact, forces SessionObject to save the session back to the database (while still keeping a copy in memory for caching purposes), ensuring the appropriate data synchronization.

Synchronizing Data
      Speaking of databases, if you go through the Visual Basic code described here, you will notice two unusual things. First, I make consistent use of the BeginTrans and CommitTrans methods of the ADODB.Connection object whenever database access has to take place. I use them because they force SQL Server to purge the system log (which is used to keep track of the transactions that are being performed) whenever a database operation is performed. This allows you to keep a single connection open as long as you want, instead of having to create a new one every time you need to access the database.
      Second, there are no SQL queries in the calls to Connection.Execute. This is a result of all database operations being handled by stored procedures. Stored procedures run much faster than individual statements, as I explained in my article, "Using Stored Procedures in SQL Server" (MIND, March 1999).
      Therefore, all data synchronization and maintenance is taken care of by three procedures: CreateRetrieveSession, PurgeOldSessions, and UpdateSession (see Figure 4). A fourth procedure, GetLastModifiedDate, is used to retrieve the LASTCHANGE field of a session record, which is then compared to the data in the cache when deciding if a session should be reloaded from the database.
      CreateRetrieveSession is used to retrieve an existing user session or to create a new one. As you can see, the system automatically updates the LASTCHANGE field whenever a session is retrieved from the Sessions table. This ensures that when PurgeOldSessions is called it will only delete those sessions that are older than 20 minutes (naturally, you can change this value as you please).
      PurgeOldSessions also returns a recordset containing a list of the sessions that were deleted, which enables the PurgeOldSessions method in the SessionObject component to update its internal database of sessions accordingly.
      Finally, UpdateSession is used to update the LASTCHANGE field of a session record whenever the session is accessed by a script.

Deploying the Solution
      So how do you make the solution work? Once you've created all the necessary table structures, stored procedures, and COM components, you will be able to start using the system immediately in your ASP application.
      The first step is creating a global instance of SessionObject and giving it application-wide scope. This is done by using the <OBJECT> tag as part of the application initialization. Figure 5 shows a sample global.asa file that creates and initializes an instance of SessionObject.
      Each individual ASP script in your application will subsequently be responsible for the retrieval of each user's session data. This not only includes calling the Session property of SessionObject, but also tracking the user ID (for example, through a cookie).
      For your convenience, I have written a simple include file, iSession.asp, that takes care of the entire process (see Figure 6). Depending on the value of Application ("bUseCookies"), which can be set in the Application_OnStart method of global.asa, this script is able to store the user ID as either a cookie or as part of the HTTP query string that is used to create links. Regardless of whether or not you decide to use cookies, you should make sure to use the SessionURL function to create all of your links. This way, switching between cookies and query strings will be very easy.

Possible Improvements
      While discussing the session state system, I've mentioned a few improvements that you might want to implement on your own. Let's go over them in detail.
      As it is currently implemented, the system provides garbage collection, but not a scheduled task to execute it. You could write one using a simple Windows® Scripting Host script that opens a page in your application whose only task is to call SessionObject's PurgeOldSessions method.
      At the same time, the caching system has its own flaws. First of all, there is the risk that you will end up with several orphan sessions that have been deleted from the database, but not from the cache. This will happen if you call the PurgeOldSessions method on one machine and it deletes one or more sessions that are cached on another server. A possible solution here is to change PurgeOldSessions so that it crawls its internal cache (that is the cUserSessions collection) and deletes obsolete entries.
      Another way to increase system performance would be to separate the concepts of last modified date and last accessed data. You need last modified date to determine whether the copy of a session that you have in your cache is obsolete and should be reloaded from the database, while last accessed data is used by the garbage collection process when it has to determine whether a session has expired and should be deleted. In its current implementation, the system doesn't make any distinction between the two, resulting in sessions being reloaded from the database even if no change has taken place between two accesses.
      Similarly, you could modify the garbage collection to ignore certain database entries that have been specially marked as permanent. This way, you'll be able to maintain information about a specific user indefinitely, as long as you are using cookies to track them. You will also have to change the iSession.asp include file shown in Figure 6 so that the expiration date of the cookies is set further in the future than one day.
      Finally, the entire system can be modified to store arrays. In this case, your biggest challenge will be to turn an array into a continuous string; you can do so by first determining whether the variable you're trying to convert is, in fact, an array by means of the Visual Basic isArray function. Then you can determine how many elements the array has using UBound and LBound and build a single string by crawling the entire array with a for…next loop. Be careful in choosing a separate character as a delimiter for your entries, or the LoadSession function will not work properly.

MSDN
http://msdn.microsoft.com/workshop/server/nextgen/sessiondata.asp
and
http://msdn.microsoft.com/library/devprods/vs6/vbasic/vbcon98/vbconstoringstateinobjects.htm


From the October 1999 issue of Microsoft Internet Developer.