Building High-Scalability Server Farms

September 1999

Microsoft Corporation

Introduction

A well-designed server farm can be expanded (scaled) to accommodate increased site traffic and site performance in a cost-effective manner. You can use Site Server 3.0 Commerce Edition (SSCE 3.0) to design and build an affordable, highly scalable server farm. This document describes how you can scale a server farm to support 100,000+ concurrent users, yet still require as few servers as possible.

A high-performance, highly scalable server farm requires fewer, consolidated servers.  Consolidated Web servers are easier to manage than an unconsolidated server farm. In addition, you can use modern management software, such as HP OpenView or CA Unicenter TNG, to significantly reduce the complexity and cost of managing server farms.

This document discusses several techniques for achieving a highly scalable site:

Scaling hardware vertically Increase capacity by upgrading hardware specifications, while maintaining the physical footprint and number of servers in the server farm. Scaling hardware vertically simplifies site management, but at a higher hardware cost than scaling horizontally or improving software architecture. In addition, once you reach maximum capacity on existing hardware, you must begin to scale horizontally.
Scaling hardware horizontally Increase capacity by adding servers. Scaling hardware horizontally enables you to increase hardware capacity at a lower cost, but once site management becomes sufficiently complex, you must begin scaling vertically.
Improving architecture Improve server efficiency by identifying operations with similar workload factors and dedicating servers to each type of operation. A significant capacity improvement is possible when you dedicate servers to operations with similar workload factors rather than to a mixed-operation workload and optimize performance. Architectural improvements should be designed early in the project life cycle. An efficient architecture enables you to build and operate a site at a lower cost.

The result of using these scaling techniques is a highly scalable server farm that you can grow well beyond its original design limitations.

Scaling Hardware Vertically

It is common to find servers with large amounts of memory that can nearly cache an entire site's content. NTFS file cache, along with increased disk I/O throughput, results in much-improved I/O throughput. In addition, n-way Symmetric Multiprocessing (SMP) hardware is also readily available in the marketplace. Thus, abundant resources are available to scale SSCE 3.0 vertically.

Based on performance and capacity planning benchmarks performed by Microsoft, SSCE 3.0 has been found to be characteristically CPU-bound. In other words, its primary CPU processing capacity bottleneck is caused by ASP page processing, which is highly CPU-intensive. Benchmarks performed on the Volcano Coffee sample site (included with SSCE 3.0) show an ability to serve 400 concurrent users on a single 200 MHz Pentium Pro CPU, configured as specified in the Microsoft Site Server 3.0 Commerce Server Capacity and Performance Analysis white paper.

One method of vertically scaling SSCE 3.0 is to increase processing power. You can do this by using a higher-class processor (such as the Pentium II, Pentium III, and their Xeon derivatives with large level II caches). In addition, you can run SSCE 3.0 on a Compaq/Digital Alpha class hardware platform or 64-CPU servers from Data General or Unisys, which are comparable to any Unix hardware platform. The net effect is that a single server can accommodate more user site traffic.

Another way you can vertically scale SSCE 3.0 is to run Windows NT 4.0 Server on 4-way SMP servers.  However, be aware that running SSCE 3.0 on hardware with more than two processors can cause diminishing performance. The aggregate throughput is higher, but it comes at the cost of diminishing returns on investment.  (In other words, per-processor throughput is less on 4-way SMP hardware than on 2-way SMP hardware.  You achieve higher aggregate throughput at a disproportionate increase in cost.)

We recommend the following:

The following diagram illustrates scaling from a single CPU server Pentium II class to a dual CPU server with a Pentium III Xeon class processor, and then to a 14-CPU server with an even higher class Alpha 21264 processor.

Figure 1. Scaling hardware vertically

Scaling Hardware Horizontally

Scaling hardware horizontally increases capacity by adding servers (running SSCE 3.0) to the server farm. You can run or distribute SSCE 3.0 components across multiple servers, thereby increasing capacity.

When you begin horizontally scaling your server farm, you add the complexity of having to distribute the load evenly across multiple servers. You must address this by using load-balancing techniques, such as Windows Load Balancing Services (WLBS, OS software), DNS round robin (network software), and/or Cisco LocalDirector (network/router hardware). The benefits of load balancing include providing redundancy of services and presenting higher aggregated capacity by directing the load to multiple servers.

By default, the SSCE 3.0 Site Builder wizard generates load-balancing-friendly sites by employing URL or cookie-based shopper IDs. The wizard also generates sites with session/state management located centrally on the commerce store database, separate from the ASP servers. This enables the load balancer to direct user requests to any available server at any given time without losing the user’s session or state. The advantage is that without session variables, no resources are used to manage user sessions.

Note   To effectively scale hardware horizontally, you must not use IIS session variables and you must disable IIS session management. The alternative is to use the Dynamic Data features of the SSCE 3.0 LDAP service.

In cases where an application is coded with IIS session management (makes use of session variables), you can use Cisco LocalDirector hardware to achieve load balancing. You can configure Cisco LocalDirector to direct traffic based on the client/source IP address. This sends a client back to the same destination server for each request, thus providing the desired load-balancing effect. Session variables are local to each server and the client will then see the correct set of variables and state.

CAUTION   There are serious issues with this technique for some proxy server farm architectures. Refer to the section on Disabling IIS Session Management and Removing Session Variables for more information.

You can horizontally scale the following components of an SSCE 3.0 server farm:

HTML/ASP servers Add more servers to function as ASP servers. Externally, you expose the servers using a common domain name with a single virtual IP address mapping to a load-balancing system. The load-balancing system directs the traffic to multiple servers. Typically, load balancing directs a TCP connection (such as an HTTP request) to a specific server and keeps it directed to the same server until the TCP connection session ends.
LDAP servers Add more servers to function as LDAP servers. Externally, you expose the servers using a common domain name with a single virtual IP address mapping to a load-balancing system. The ASP server Personalization & Membership instances point to the common domain name. The load-balancing system directs traffic to multiple servers. Typically, load balancing directs a TCP connection (such as an HTTP request) to a specific server and keeps it directed to the same server until the TCP connection session ends.
Membership Directory Partition the Membership database and host each database partition on dedicated SQL servers. You must partition the Membership database before populating it with data (such as member objects). Initially, you can run the partitioned databases on a single machine and place them on dedicated SQL servers at a later time. If you populate the Membership database prior to partitioning it, you must write a custom migration tool to repopulate the partitioned Membership database with the current Membership database content.
Commerce Store The DBStorage database stores the shopping basket, orders, and receipts. You cannot partition the DBStorage database out of the box. However, with minimal code consideration (outlined in the Improving Architecture section) you can partition and horizontally scale this database.

The following diagram illustrates scaling hardware horizontally, using multiple IIS/ASP servers, multiple LDAP servers, partitioned Membership Database SQL servers, and a Commerce Store Database SQL server.

Figure 2. Scaling hardware horizontally

Scaling hardware horizontally helps the server farm expand to higher capacity. Further scaling requires architectural improvements.

Improving Architecture

To improve the system’s architecture, you must decide how to build and deploy the application to improve the efficiency of the server farm. The basic goal is to separate workload based on load factors, to dedicate servers to each type of workload, and to optimize the performance of each. This enables the servers to execute lightweight operations with higher capacity, thus serving a higher number of concurrent users per server. You can then direct operations with heavier load factors but smaller capacity requirements to a smaller number of servers.

Dedicated servers serve up heavyweight content, such as ASP content, MTS, and Commerce Pipeline component execution, so that the entire bandwidth of the server is efficiently used without interfering with the servicing of static HTML/GIF content requests.

IIS serves up static HTML/GIF content requests many times faster than processing an ASP page request. For example, an IIS server dedicated to serving up HTML/GIF content may be able to handle 10,000 concurrent user requests while an IIS server dedicated to serving up ASP/Commerce Pipeline may only be able to handle up to 1,000 concurrent user requests.

A recent Intel study shows that all Web-based electronic commerce sites service customer requests that fall into one of the following five categories:

Browse 80%
Search  9%
User Registration  2%
Add item  5%
Buy  4%

This study shows that users perform browsing, registration, and searching operations nine times more often than adding items to their shopping basket and checking out. Based on this information, in a population of 100,000 users, there are approximately 10,000 users adding items to their shopping basket or buying, while 90,000 users browse, search, or register.

Thus 80% of the traffic can be serviced by servers handling static content (browsing, user registration, and search operations), while the remaining 20% of the traffic can be serviced by servers handling heavy-duty operations (add items and buy). However, since heavy-duty-operations traffic also accounts for a fewer number of concurrent users, the number of these servers decreases.

There are many situations to which you can apply this scheme, such as static content (HTML/GIF), dynamic content (ASP/Commerce Pipeline), business rules (MTS components), disk I/O (cache most active files), and so forth. You can apply additional architectural improvements, such as the following, to achieve even higher performance, with better scalability:

Disabling IIS Session Management and Removing Session Variables

You must ensure that the application code disables IIS session management and that it does not use IIS session variables. IIS session management consumes a specific amount of memory for each user, consuming more memory as the application stores more values in the session variable (due to an increase in the number of concurrent users). If there are few session variable values, this might not impact performance significantly. On the other hand, if there are a large number of session variable values, such as an object model, this can impact performance significantly.

For example, if the session variable for each user consumes 1MB of memory, 1,000 concurrent users consume approximately 1GB of memory. Based on this example, using session variables severely limits scalability in a case where the computer has 1.5GB of available memory. Without this resource consumption, it’s possible to service a larger number of concurrent users, up to the limits of the CPU.

Another disadvantage of using session variables is that they reside only on the local server. In other words, the application requires an affinity between the client and the server on which the session variable started, because the session variable only resides on that one server. To accomplish this, you must ensure session stickiness. This eliminates on-the-fly redundancy (destroying user sessions if a server goes down or needs to be taken down) and dynamic load-balancing (causing inconsistent user experience, in which some users might experience a very slow site while others experience no performance problems).

You can mitigate this affinity between the client and server by using Cisco LocalDirector, a load-balancing router product. This router uses the source IP address to “stick” the request to a destination server. However, the Cisco LocalDirector router cannot maintain session stickiness if the client uses a load-balanced proxy server to access the Internet (which presents a different IP address externally from each proxy server). This issue is particularly problematic for America Online (AOL) users. Since AOL represents approximately 25% of the traffic on Internet sites, site administrators usually give this issue special consideration. A common workaround is to dedicate a server specifically for AOL traffic. Although this solution can work in the short term, a single dedicated server might be overwhelmed in the future, if AOL continues to grow.

If you need to to maintain state for a user session, a scalable alternative is the dynamic data feature of SSCE 3.0’s LDAP service. (See Using the Membership Directory and Active User Object (AUO) for Session State Data for more information.)

Separating Static Content from Other Types of Content

Using the information from the Intel study, the following tables compare two methods of serving 100,000 concurrent users.

Method 1: Non-consolidated server farm

Operations Type of Content Percentage of Users # of Web Servers # of Concurrent Users per Server Total Concurrent Users
Browse, search, user registration, add item, buy All (static, dynamic, ASP, etc.) 100% 100 1,000 100,000
Totals: All 100% 100 n/a 100,000

Method 2: Consolidated server farm

Operations Type of Content Percentage of Users # of Web Servers # of Concurrent Users per Server Total Concurrent Users
Browse Static 90% 9 10,000 90,000
Add item, buy, search, user registration Other (dynamic, ASP, etc.) 10% 10 1,000 10,000
Totals: All 100% 19 n/a 100,000

Based on the information in the tables, the total number of servers drops from 100 front-end Web servers to 19 front-end Web servers, if you separate the static content from other types of content.

Internet Information Server (IIS) processes static HTML/GIF content very efficiently, but processing ASP pages requires a significant amount of CPU time, resulting in reduced performance. To most efficiently use the servers, combine operations that have similar load-factor characteristics or capacity requirements and separate those that differ. Based on current benchmarks, a single IIS server is capable of concurrently serving different types of content to the following number of users:

Static HTML/GIF content requests 10,000 users
Search ASP page requests  1,500 users
User registration ASP requests  1,500 users
Add item to basket ASP requests  1,200 users
Checkout purchase ASP requests  1,200 users

These numbers suggest using three different servers: one for static HTML/GIF content requests, one for search ASP page and user registration requests, and one for add item to basket and checkout purchase ASP requests. For example, if you use the benchmark numbers above, the percentages from the Intel study, and assume 100,000 users, the server topology might look like the following:

Static HTML/GIF content servers 100,000 * 80%= 80,000 users / 10,000 users= 8 servers
Search + user reg. ASP servers 100,000 * 11%= 11,000 users /  1,500 users= 8 servers
Add item + checkout ASP servers 100,000 * 9%=  9,000 users /  1,200 users= 8 servers
Total front-end servers required: 24 servers

(Note that the numbers in this table assume the separation of ASP product browse pages, then rendering them as static HTML, thus optimizing their performance. However, in general, sites tend not to be as simple as that illustrated in the table.)

Caching Static Content

ASP pages render many types of data to HTML that is not highly dynamic, but not truly static. Examples of such data include product attributes (information, description, price, and so forth.), site announcements, and sale announcements.

You can use a process to render these types of information to static HTML pages and serve them up as static HTML/GIF content. This provides for a much higher throughput, and reduces overhead by avoiding ASP processing and SQL server data fetch.

If your information is relatively static but some content (such as product price) is driven by a database look-up (such as pricing by zip code) you can use this technique in combination with framing product information in a separate HTML frame from the product price. Another solution involves using an ISAPI filter that reads HTML and performs a look-up to an in-memory database similar to the way early database integration was accomplished using .idc, and .htx files. Using this method avoids full ASP processing and retains high-speed HTML servicing.

Caching Static Look-up Data

If your data requires dynamic look-ups (such as product price based on zip code or customer ID), or requires a database look-up (such as pricing by zip code), you can use an in-memory database to cache the look-up table. This helps reduce overhead associated with performing a data fetch across a network. You can refresh the in-memory database with a nightly process (or as necessary) to ensure that the dynamic data is up-to-date.

In many cases, you can replace real-time inventory status, such as the number of items available, with an in-stock “flag.” You can also cache look-ups in memory, updating them on a nightly basis (or as necessary). This helps reduce overhead associated with performing a data fetch on the SQL server database.  The result of caching data in memory is a faster look-up time with reduced latency (previously caused by a data fetch across the network) and increased performance.

When using this technique, smaller look-up tables are ideal. However, you can increase hardware memory capacity to help accommodate larger tables, if needed. By analyzing the IIS and SQL server logs, you can determine which look-up tables are accessed most frequently and would benefit most from caching.

In many sites, a page contains an HTML list box/combo box (such as product categories or product compartments) rendered from a look-up table.  It is much more efficient to render these records once and cache the HTML fragment globally in the ASP application object than to fetch them from the look-up table each time they are needed. You can cache data using the Commerce Dictionary (provided with SSCE) or MSDN™ Dictionary object (provided by the MSDN developer resource).

Consolidating Business Logic on Dedicated Servers

Since Site Server is CPU-bound, reducing CPU utilization improves ASP/Commerce Pipeline processing performance. You can reduce CPU utilization by identifying and separating complex, processor-intensive business rules (such as MTS components) to dedicated servers.

There is a trade-off in performance between in-process execution of components and out-of-process execution of components marshaled via DCOM. To determine the exact trade-off, you must benchmark both methods and determine which method works best for your site.

If the business rule is processor-intensive and the performance cost is greater than the cost of marshalling via DCOM, you could develop the component as an MTS component. Dedicating a separate server to MTS components increases computer capacity on the ASP/Commerce Pipeline, thereby increasing performance of ASP/Commerce Pipeline processing.

If the business rule consists of only a few lines of non-processor-intensive code, it probably is not worth having a dedicated server to execute it. In this case, either leave it as an ASP function snippet (saving object activation/invocation costs) or, if the code is complicated ASP code, code it as an ASP COM component using Microsoft® Visual C++®/ATL and activate it locally (using the ATL wizard’s “Both” threading model).

Using MSMQ or E-mail to Update Systems

You can use Microsoft Message Queue Server (MSMQ) or e-mail to update fulfillment, data warehouse, reporting, and other systems, rather than using a database transaction. By using MSMQ or e-mail, you leverage asynchronous communications to achieve a high rate of “fire and forget” operations/transactions to avoid latency imposed by database operations/transactions, such as a data fetch or extended computation.

For example, if a business unit (or an entirely different company) performs the actual order fulfillment at a different geographical location than the business unit that receives the order (drop ships), the two locations must frequently communicate new orders and shipping status. Instead of using a database operation/transaction (such as a periodic batch database extract) and sending the results to the remote site, the business can use MSMQ services or e-mail to send notifications (such as new orders) to and accept status information from the remote site.

The front-end servers accept the request and quickly hand off the information to MSMQ or an e-mail server, which then sends the information to the remote location. This results in a higher rate of processing and faster front-end server response time, updating the remote sites more quickly than using periodic batch database extracts.

In general, most sites require reporting (on orders, transactions, usage, and so forth) at some level. Site administrators usually generate these reports from online logs or database records copied from the production database. You can use MSMQ to provide near-real-time copies of this data, which you can then send to a server dedicated to receiving the reporting records. Performing reporting operations in this manner eliminates the need to perform time-consuming record-copy database operations. You can process/aggregate the shipped records immediately at the remote location to provide near-real-time report updates. In addition, you can reduce storage costs, if you discard the shipped record immediately after processing/aggregating it and executing the necessary reports.

You can also asynchronously record Commerce Server orders and receipts, rather than use an in-line database transaction.  Doing this enables the ASP page to avoid transaction latency and continue processing. The disadvantage of doing this is that the customer does not see an immediate order confirmation number and must wait for a confirmation e-mail or wait until you process and record the orders and receipts in the database. Asynchronously recording Commerce Server orders and receipts works best at sites with periodic load peaks.

Processing Requests in Batches

You can defer credit card, tax, or other processing, or you can perform the processing in batch mode on a dedicated server.  Deferring processing enables the front-end servers to process requests at a high rate of speed and respond to requests quickly. You can send failure and exception reports to the shopper via e-mail. In many cases, legacy systems that perform batch processing operations already exist.

Optimizing ASP Code

Most people develop Internet sites in Rapid Application Development mode. In other words, their site development life cycle is very short, but frequent. Development done in this way often results in poorly designed code that isn’t efficient, clear, or reusable. Any of these factors can cause less-than-optimal ASP code processing.

To help optimize your ASP code, you can create a simple utility that inserts instrumentation code into your ASP source files to record the execution time for each line of code. Then, when you execute the ASP code, it generates a source-execution profile that you can use to identify code that needs to be optimized further.

Many groups prototype business rules as Commerce Pipeline Scriptor components and never convert them to compiled code. With each scriptor component, Commerce Server invokes a scripting engine, resulting in activation and invocation overhead. You can significantly increase performance by converting business rules to a Visual C++/ATL compiled Commerce Pipeline component.

Optimizing Commerce Pipeline

You can enhance Commerce Pipeline performance by removing unnecessary Commerce Pipeline stages and dividing the Commerce Pipeline for separate execution where possible or necessary. This reduces the activation/invocation overhead and results in a higher rate of processing.

You can also bypass the Commerce Pipeline entirely and use custom MTS components throughout. However, in general, we don’t recommend doing this. The disadvantage of bypassing the Commerce Pipeline is that you lose access to third-party ISV pipeline components (such as components that calculate tax, shipping, or perform credit card authorization), thus possibly raising development costs significantly.

Optimizing the Database Schema

You can optimize the database schema to bypass the Commerce Store completely and write baskets, orders, and receipts directly to a custom SQL server database. This technique involves developing custom code to replace the DBStorage functionality. When using this technique, you essentially bypass marshalling inefficiencies associated with DBStorage. However, for each additional field you require, you must modify the database schema and the read/write code to accommodate the new field. Although it appears simple to implement, you must carefully design and benchmark the database schema to be sure that you achieve the maximum performance increases.

The disadvantage of using this technique is the loss of flexibility provided by the Commerce Dictionary Object and the need to modify the database schema each time you add fields to the shopping cart (orderform object).

Optimizing Catalog Build/Search Services

You can optimize Catalog Build/Search services by separating static content requests from dynamic content requests and servicing each type of request on dedicated servers. SSCE 3.0 provides the following services:

Catalog Builder Build index catalogs that you can run on a dedicated server.
Search Server Search built catalogs that you can run on a dedicated server.

Each instance of the Catalog Builder service can propagate up to 32 Search servers, thus providing an efficient way to create and search catalogs. You can achieve further increases in search capacity by dedicating additional Search servers to the site.

Optimizing SQL Server Databases

Microsoft® SQL Server™ 7.0 provides many improvements over previous versions, including self-tuning,  high-performance, and online/live backups.  You can scale individual Commerce Store, Ad Server, and Product databases used by SSCE 3.0 by placing them on vertically scaled, dedicated, high-end servers.

If your site experiences extremely high traffic, you should consider doing the following:

You can also horizontally scale SSCE 3.0 Commerce Store databases by partitioning the databases by shopper ID hash. This technique uses the hash of a shopper ID to direct Commerce Store database operations (such as shopping basket, orders, receipts, and so forth) to a dedicated Commerce Store database on a dedicated high-end server. You can use the last four digits of the shopper ID to map to an individual SQL server from a set of SQL servers with a Commerce Store database on each server.

The following diagram illustrates architectural improvements, including dedicated static HTML/IIS servers, dedicated Search/User Registration ASP servers, dedicated Basket/Checkout servers, multiple LDAP servers, multiple Membership SQL servers, and partitioned Commerce Store database on multiple SQL servers.

Figure 3. Architectural Improvements

Arguments Against Heavily Consolidating Web Servers

Although you can consolidate your servers to achieve a low server count, there are disadvantages to doing this.  An electronic commerce Web site typically must operate at high capacity, high traffic, and high availability. Employing fewer front-end Web servers means that each server is responsible for a higher percentage of capacity, traffic, and availability. The disadvantages of consolidating a Web server farm include the following:

You can mitigate the disadvantages by using redundant servers, disks, power supplies, and so forth. However, as you add hardware, the site quickly begins to look like an unconsolidated Web farm.

Conclusion

You can build a highly scalable site with as few servers as possible by scaling vertically, horizontally, and by implementing architectural improvements. When setting up servers for the Internet, we recommend that you scale them horizontally. Scaling horizontally provides these servers with redundancy, predictable incremental capacity increases, and degradation.

Consolidating operations with similar workload factors to dedicated servers enables you to efficiently scale your server farm with the least number of servers.

Using the techniques described in this document, you can theoretically scale SSCE 3.0 to serve 100,000 concurrent users with as few as 88 front-end servers (based on early results obtained during a customer-site performance audit). You can also scale back-end servers as needed, adding to the total server count. However, most sites do not require that back-end servers be horizontally scaled; you can usually achieve the same results by vertically scaling the server.

In the future, you can use Windows 2000, IIS 5 with Web clusters, and other technologies (such as 1 GHz Alpha, Merced chips, Windows 2000 64-bit) to scale and increase the performance of your servers to achieve even higher capacity.

Information in this document, including URL and other Internet web site references, is subject to change without notice.  The entire risk of the use or the results of the use of this resource kit remains with the user.  This resource kit is not supported and is provided as is without warranty of any kind, either express or implied.  The example companies, organizations, products, people and events depicted herein are fictitious.  No association with any real company, organization, product, person or event is intended or should be inferred.  Complying with all applicable copyright laws is the responsibility of the user.  Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document.  Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 1999-2000 Microsoft Corporation.  All rights reserved.

Microsoft, Windows, Windows NT, MSDN, and Visual C++ are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries/regions.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.