This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
|
|
Ted Pattison |
Working on a Web Farm |
First, let's review a couple of key points about scalability and IIS. The IIS/ASP framework conducts a thread-pooling scheme and maintains a request queue to accommodate peak traffic times. After the default installation, IIS is configured to allocate 10 threads per processor to service incoming ASP requests. If you need to maintain state on a per-user basis across requests, you can accomplish this in an IIS application by using an ASP Session variable. Visual Basic® objects are apartment threaded. If you create instances of Visual Basic objects from ASP scripts, a basic rule of thumb is to make sure you release these objects at the end of every request. If you assign a Visual Basic object to an ASP Session variable, you will pin each user to a specific thread. This practice significantly decreases the scalability of an IIS application and should be avoided. While holding onto Visual Basic objects across requests is a no-no, it is far more acceptable to store non-object-based state information inside ASP Session variables. For instance, you can use ASP Session variables to stash variants, arrays, or the contents of a property bag when you're creating an application that needs to maintain state across requests. This approach allows you to build state-based applications such as those that use shopping carts. The one problem with using ASP Session variables is that you must make the assumption that you're processing each incoming HTTP request for a particular user with the same IIS computer, an assumption you can't make when running a Web farm. In this month's column, I'll look at the issues that come into play when you can't make that assumption. Increasing Scalability through Load Balancing
A Web site is like an aspiring Hollywood actor. In the beginning of its career, before a Web site has been discovered by the masses, it lives in obscurity. Its biggest problem is the paranoia that it will live in obscurity for its entire existence. However, once a Web site becomes famous, it has a new set of challenges. The volume of incoming requests increases dramatically. Moreover, the fans of a famous Web site have lofty expectations for the site's performance, and are highly critical when these expectations are not met. Unfortunately, these new challenges are sometimes more than a Web site can handle. Some Web sites respond to fame by going up in smokeand they never recover. Other sites are able to handle the pressures of fame more gracefully. They go on to become the places that thousands of users return to again and again. |
Figure 1: Windows 2000 Load Balancing |
If you can't wait for Windows 2000 to ship, you can produce similar results by creating a custom broker object on the IIS server. For instance, an ASP script could create and call a local broker object that supports a method like CreateUserManagerObject. The implementation of this method could call the Visual Basic 6.0 function CreateObject and pass the name of the MTS server on which to activate the UserManager object. Your algorithm could rotate activation requests across a set of servers. This approach makes it possible to distribute the load across a set of MTS-enabled computers. It turns out that this style of load balancing works pretty well inside a LAN environment with remote COM clients, but isn't that good for a Web-based application. The problem with this approach is that all incoming HTTP requests and ASP processing is handled by a single computer. This IIS computer becomes both a performance bottleneck and a single point of failure for your site. For a Web-based application, it's better to address load balancing at the point at which the HTTP request arrives at the site. This means that you need a technique to distribute incoming HTTP requests across a set of IIS computers. A site that uses this approach is known as a Web farm. Web Farming Web designers have devised quite a few techniques to distribute HTTP requests across a set of servers. One simple approach is to design a Web site with a dedicated routing server, as shown in Figure 2. |
Figure 2: Using a Dedicated Routing Server |
The routing server usually has a well-known Domain Name Service (DNS) name (such as MyServer.com) and a dedicated IP address. The other servers in the farm have their own dedicated IP addresses and can optionally have a DNS names as well. When a user's initial request reaches the routing server, the request is redirected to one of the other servers in the farm. You can redirect your users from the routing server using the Session_OnStart event in the global.asa file. Say you want to redirect each new user to one of three different servers in a farm: |
|
Once a user is redirected to a particular server, a session is created and the user sends all future requests to the same server. For this reason, you can think of this as a session-based load balancing technique. Note that if you use this technique, you must make sure that a user's initial request is for a page with an .asp extension. The Session_OnState event will not be fired when the request is for a file with an .htm or .html extension, since IIS doesn't expect them to contain ASP code. The code shown previously uses a random algorithm for redirection, but you could design a more elaborate load balancing mechanism. For instance, each server in the farm could send performance data back to the routing server. If each server periodically transmits a count of active sessions to the router, the load balancing code could redirect each new user to the server with the fewest number of current users. Round-robin DNS is another common session-based load balancing technique. With a round-robin DNS, each logical DNS name (such as MyServer.com) maps to several IP addresses. When a browser attempts to resolve the DNS name, the DNS server sends back one of the addresses from the list. The DNS server rotates the addresses in order to distribute a set of users across a set of servers. Once a browser resolves a DNS name into an IP address, it caches the IP address for the duration of the user's session. Round-robin DNS is slightly faster than the redirection technique shown previously, yet it produces the same results. Different users are given different IP addresses to balance the load. If you are going to set up a Web farm with one of these session-based load balancing techniques, you should write your pages and code in terms of relative URLs. For instance, you should use URLs such as \MyOtherPage.asp instead of absolute URLs like http://myserver/MyOtherPage.asp, which contain a server's DNS name or IP address. This will ensure that each user continues to send requests to the same server once a session has been started. While both of these forms of session-based load balancing are easy to set up, they have a few notable limitations. First, load balancing is only performed once for each client at the beginning of a session. Second, it's possible for the load balancing scheme to get a little skewed. For instance, all the users that have been sent to MyServer1 may go to lunch while all the users who have been sent to MyServer2 continue to send requests. In this case, one server could become overloaded while another server is sitting by idly. A more significant problem with session-based load balancing is that it exposes the IP addresses of the servers in the farm to the client-side browser. What happens when a server crashes or is taken offline? Your balancing algorithm needs to account for this as soon as possible, but doing so can be problematic. If you're passing out bad IP addresses, your users will start to receive "server not available" errors. In a round-robin DNS system, it still can take as long as 48 hours to fix the problem once you've discovered that one of your servers has crashed. This is due to the fact that the changes to your IP address mappings need to be propagated to DNS servers throughout the Internet. To make things worse, users often add pages to their Favorites list. A user might attempt to reach a page that they put in their Favorites list last week on a server that crashed this morning. If you're using the redirection technique, an attempt to locate a Favorites page can result in a "server not available" error. The load balancing techniques I've shown so far can compromise the fault tolerance of your site. There are more sophisticated approaches to load balancing that can significantly improve your system's availability. The solution lies in exposing a single IP address to every user. Designing a Better Web Farm
As you have seen, exposing multiple IP addresses for a single Web site can compromise both availability and load distribution. It's better to expose a single IP address that maps to several physical servers. As it turns out, this is a very difficult problem to solve because it requires low-level networking code to reroute incoming IP packets across a set of servers. Most companies decide to buy a solution rather than roll their own. |
Figure 3: LocalDirector Routing |
The WLBS provides a software-based solution for request-based load balancing. Don't confuse the WLBS with either the COM+ load balancing service or the Microsoft Cluster Server (known in its beta incarnation by its codename, Wolfpack). The WLBS is based on Convoy Cluster software purchased by Microsoft from Valence Research Inc. It can be used by many types of applications that rely on IP traffic, but in this column I'll focus on using the WLBS to create a Web farm. You can download the WLBS installation files from the Microsoft Web site (http://www.microsoft.com/ntserver/ ntserverenterprise/exec/feature/wlbs/default.asp) and install it on any computer running Windows NT Server 4.0 with an Enterprise Edition license. |
Figure 4: WLBS Routing |
Unlike LocalDirector, the WLBS doesn't require a proprietary piece of hardware. Instead, the WLBS is installed as a Windows device driver on each machine in the farm, as shown in Figure 4. The WLBS can accommodate a Web farm of up to 32 servers. The WLBS has an advantage over LocalDrector in that there isn't one piece of hardware that represents a singe point of failure. If you're using LocalDirector you can buy two hardware units instead of one to improve fault tolerance, but that starts to get expensive. The argument in favor of a hardware-based solution is that it's a one-time cost. With a software-based solution, you usually have to pay additional licensing fees every time you add another server to your Web farm. When you set up a Web farm using the WLBS, each server is in constant communication with all the others. They exchange performance statistics and divide the responsibilities of handling incoming requests. You should note that every incoming request is seen by every server in the farm, and the WLBS has a proprietary algorithm to determine which server will handle each request. A full discussion of how the WBLS distributes the load between the servers is beyond the scope of this column. The low-level plumbing details of the WLBS and LocalDirector are very different, but from a high-level perspective they produce the same results. Each user makes a request using a virtual IP address, and the request is routed to one of many servers in a Web farm. Both the WLBS and LocalDirector provide effective request-based load balancing that has many advantages over session-based solutions. Request-based load balancing is more granular because the load balancing algorithm is run far more frequently. This results in a more even distribution of requests across servers. Request-based load balancing also provides higher levels of fault tolerance. With either LocalDirector or the WLBS, an administrator can issue a command to take a server offline. This allows an administrator to perform maintenance or to upgrade each server in the farm without an interruption in service. Both LocalDirector and the WLBS can also detect when a server has crashed and avoid directing future requests to an unavailable IP address. Now that I've examined the advantages of request-based load balancing, it's time to discuss one of the main disadvantages. Managing state in a Web farm with request-based load balancing becomes more complicated because you cannot assume that a user's requests will all be serviced by the same IIS computer. The main side effect of this requirement is that you should not attempt to maintain state using ASP Session or Application variables. Instead, you must design a more sophisticated approach to formalize the same concept of a user session. Maintaining State in a Web Farm
If you're designing a Web application where you don't need to maintain state across requests, you don't have as many concerns. This might be the case if your users are simply browsing for information. However, if you're creating a Web application that involves something like a shopping cart, you must carefully consider how you want to build up and store state for each user across requests. Passing State
Whether you're maintaining lots of user-specific state data on the client or just a primary key, you need to know how to pass data back and forth between the browser and IIS. Let's look at three common techniques to accomplish this. |
|
These cookie values will continue to flow back and forth with each request/response pair. Cookies are just as easy to read using the ASP Request object. The previous example creates cookie values that will only live for the duration of the browser's session. You can also persist your cookie values to a user's hard drive. If you do this, these cookie values will live across sessions of the browser. This means that your site can remember all sorts of information and maintain user-specific preferences across visits. Users seem to appreciate sites that do this. To persist a cookie value to your user's hard drive, simply add an expiration date as follows: |
|
If you have a shopping cart application, you can build up a complex data structure across successive hits. Note that you can also write multiple values to a cookie if you want to store line items. The upward limit for what you can store in a cookie changes from browser to browser, but most browsers supports cookies up to 4,096 bytes. Cookies work great until you encounter users who have disabled cookie passing in their browsers. If you have a requirement to accommodate users such as these, you will not be able to store data on the client that will live across sessions of the browser. You must also come up with another technique to pass state data back and forth. One technique you can use is to append named values to your URLs. For instance, instead of setting your HTML form's action to \MyPage.asp, you could change it to \MyPage.asp?UserID=TedP. The extra data is known as a query string. Appending named values into a query string requires additional effort. You can dynamically embed these named values onto the tail of every URL in your pages. However, Visual Basic 6.0 and the WebClass framework can transparently perform this task for you with far less effort. Each WebClass has an URLData property. If you assign a value to this property and then use either the URLFor method or the WriteTemplate method, the WebClass framework will append a named value to the end of your URLs (such as \MyPage.asp?WCU=TedP). You can easily query the URL Data property while processing a request to determine whether it was assigned a value during an earlier request. The use of query strings and the URLData property have two limitations. First, you are slightly more limited in size (2KB) than you are with cookies. Second, you must use the POST method in your forms. Your app will not work correctly if your HTML forms are using the GET method. There is one last client-side state management technique that you should have up your sleeve. This technique involves the use of a hidden field in an HTML form. If a user will not accept cookies and you want to store more information than can fit in the URLData property, hidden fields may be the solution you're looking for. Note that this technique requires the use of an HTML form and a Submit button. Here's an example of what your form might look like: |
|
A Few More Words on Optimization
Both the WLBS and LocalDirector allow you to turn off request-based load balancing and revert to session-based load balancing. The WLBS has an affinity setting, which lets you instruct the service to route users over the same physical server once the first request has gone through. Likewise, LocalDirector has a sticky command that provides the same effect. The good news is that these features allow you to use ASP Session variables to store state in the middle tier. |
|
I hope this month's column makes you think long and hard about scalability in your application's initial design phase. There are many designers and developers who deeply regret the dependencies their applications have on ASP Session variables. They were not prepared for fame and, consequently, their applications will probably never make it in Tinseltown. Will they ever re-architect their approach to state management and rewrite their applications for a Web farm environment? Maybe, maybe not. However, those who can make this leap will find it painful at best. The Internet, like Hollywood, is continually littered by thousands who can't handle the
pressure. Are you prepared for fame? After all, it could always happen to you. |
From the June 1999 issue of Microsoft Internet Developer.
|