A Site Server 3.0 Commerce Edition Scalability Case Study

August 1999

Microsoft Corporation

Executive Summary

We conducted this study to test concepts for building a highly scalable Electronic Commerce site with as few servers as possible.

A standard Site Server 3.0 Commerce Edition (SSCE) installation was used, with an application code base from the customer’s production environment. The study consisted of the following phases:

Phase 1: Baseline and Optimization Measured the original baseline code, then measured an optimized baseline code.
Phase 2: Platform Upgrades Measured platform upgrades to obtain insight into performance deltas and benefits of upgrading the platform by each type of upgrade (Windows NT 4.0 Service Pack 4, SSCE Service Pack 2, MDAC 2.1 SP1, ADSI 2.5, Visual Basic Scripting Edition 5.0).
Phase 3: Architectural Optimization Implemented architectural improvements and measured the throughput of partitioned operations.

The study shows that we can improve performance and increase total site capacity by:

There was an overall increase of 100% in capacity, from 300 shoppers to 600 shoppers, resulting from conversion of ASP pages to HTML pages and site platform software upgrades. Upgrading the platform to a quad-CPU server resulted in a 167% increase in capacity, from 300 shoppers to 800 shoppers.

The study also shows that even more substantial performance gains can be realized by modifying the overall site architecture to render frequently visited ASP pages as static HTML requests. We used this technique in combination with ISAPI.DLL to render the dynamic portions of the now-static HTML pages, then partitioned HTML/ASP requests with different workload characteristics (cost) to different, dedicated servers. This reduced server count by applying vertical scaling to the partitioned request, thus in effect increasing available CPU resources to process a greater number of requests.

In short, the study shows how 100,000 concurrent users theoretically can be supported by as few as 88 Web servers when optimization and vertical scaling techniques are applied to operations that are high consumers of CPU capacity. The improvements in code and architecture will enable the site to improve capacity by 383.1%, from 300 shoppers per server to 1149 shoppers per server, at a savings of approximately $3,240,000.

These results contrast dramatically with traditional practices of scaling applications vertically (using larger, faster single machines) or horizontally (using hundreds of servers executing the entire base set of application code), which would have required hundreds of machines (333 servers) to support the same workload.

In terms of cost per shopper, the cost of 87 servers is approximately $1,770,000 (Compaq Proliant 5500 dual-CPU Xeon Class $15,000 x 43 = $645,000 + Compaq AlphaServer 1200 dual-CPU Alpha Class $25,000 x 45 = $1,125,000). This is a cost of $17.70 per shopper, for 100,000 shoppers—a stark contrast to the cost of a traditional, horizontally scaled server farm of 333 servers, which would have cost $15,000 x 333 = $4,995,000 (or $49.95 per shopper for 100,000 shoppers).

Note that this capacity is at the very extreme end. With assumptions of a 40:1 browse-to-buy ratio, 11-minute average session length, $250 average per checkout, and with a transaction rate of 0.1 checkouts/user/11 minutes, this configuration supports 57 million shoppers/day, 1.7 million checkouts/day, and can generate revenue of $158 billion annually.

In terms of cost per transaction, the checkout transaction server (Compaq Proliant 5500 dual-CPU Xeon Class) has an observed capacity of 216,000 checkouts/day/server, or a checkout transaction cost of $0.06/transaction ($15,000 / 216,000). Amortized over a year, this amounts to $0.0005/transaction ($15,000/(216,000 * 365)).

With Compaq AlphaServer 1200 dual-CPU hardware, the theoretical capacity is 345,600 checkouts/day/server, or a transaction cost of $0.07/transaction ($25,000 / 345,000). Amortized over a year, this amounts to $0.0001/transaction ($25,000/(345,000 * 365)).

Note that transaction capacity in this architecture can be finely tuned by increasing the number of checkout-transaction servers. You can also add capacity for other operations by adding dedicated servers. In this way, capacity can be increased with fewer servers than when using traditional scaling techniques. The benefit of this architecture, then, is high capacity with minimal management complexity and cost.

Customer Capacity Goals

The study customer asked for 10x capacity by Christmas 1999, followed by 100x capacity for Christmas 2000, but had no figures for actual site traffic. The study showed that there

were approximately 30 concurrent visitors to the customer’s site, and capacity could be estimated as shown in the following table:

Time Concurrent users Checkouts per day Server capacity required1 Projected annual revenue
Current (1x) 30 521 0.10 $47 million
Christmas 1999 (10x) 300 5,210 1 $475 million
Christmas 2000 (100x) 3,000 52,100 10 $4.7 billion

1 The customer currently has four servers. Although capacity increases for Christmas 2000 initially indicated a requirement for 6 additional servers, performance optimization resulting from this study (platform upgrades, early rendering product browses, additional CPU upgrade) reduced the server requirement to 3000/800 = 3.75. No additional servers are required to meet this goal.

We also discovered that the customer’s actual goal was only 4,000 transactions per day for Christmas 2000 (equivalent to 240 concurrent users). This translates to the following requirements for the next three years, using the same 10x, 100x growth rates:

Time Concurrent users Checkouts per day Server capacity required2 Projected annual revenue
Christmas 2000 240 4,168 0.80 $380 million
Christmas 2001 2,400 41,684 8 $3.8 billion
Christmas 2002 24,000 416,840 65 $38 billion

2 The customer already has four servers. On this growth schedule, the customer would need four additional servers to accommodate growth planned for Christmas 2001 (for a total of eight servers) and an additional 57 servers to accommodate growth planned for Christmas 2002 (for a total of 65 servers). However, using the performance optimizations recommended in this study, the server requirement can be reduced to 2,400/800 = 3 for Christmas 2001 (no additional servers required) and approximately 30 servers (26 additional servers) to handle projected capacity for Christmas 2002.

During this study, a Compaq Proliant 7000 quad-CPU Pentium II 400 MHz Xeon Class server provided sufficient capacity for 800 concurrent users. Our measurement stopped short of the actual maximum, due to insufficient resources available for testing. However, using this number, the required server count for Christmas 2002 becomes 24,000/800 = 30 servers. Note also that server counts are based on current Transaction Cost Analysis (TCA) costs on Pentium II-class servers. Using Alpha Class servers would further reduce the required number of servers.

If additional capacity is required in the immediate future, the partitioned operations architecture used in this study enables the customer to scale the site horizontally by operation, scale it vertically by installing Compaq Alpha Class servers, or, if necessary, further optimize operations that consume a high amount of CPU capacity. At this revenue level, high-performance servers from Compaq (8-CPU systems), Data General (64-CPU systems), or Unisys (64-CPU systems) can be brought online to further increase capacity and reduce server count.

Standardized Capacity Projections

A much more realistic look at capacity projections can be found in the Forrester Report entitled Retail’s Growth Spiral (11/98), which reports that the growth of online commerce sites averages 70% (1.7x) per year.

This figure can be compared to the customer’s projection of a 10x growth rate, which translates to an increase of 1000% in revenue each year.  This rate of growth, and perhaps more, might be achievable in the first year after launching the site, when there is typically high growth as customers discover the site, with the help of advertising campaigns. However, the site’s rate of growth will probably track industry averages more closely in following years.

This study projects that with 40 servers, the site could theoretically generate $19 bBillion in annual revenue by Christmas 1999 (404x current revenue levels). At 400 servers, the site could theoretically generate $190 billion in annual revenue by Christmas 2000 (4,042x current revenue levels).

With optimization, these projected revenue levels begin to look astronomical.  With 40 servers, the site could theoretically eventually generate $38 billion in annual revenue (808x current revenue levels).  With 400 servers, the site could theoretically eventually generate $507.1 billion in annual revenue (10,789x current revenue levels).

A much more realistic capacity growth projection might look as follows:

Time Concurrent users Checkouts per day Server capacity required Projected annual revenue
Current (1x) 30 521 0.10 $47 million
Christmas 1999 (10x) 3003 5,210 1 $475 million
Christmas 2000 (17x) 510 8,857 1.7 $808 million

3 This accounts for initial growth following site launch.

Reason for the Study

We have recently received requests for advice on how to scale customer sites for peak shopping periods, such as Christmas and Back-To-School. Some customers are also being pressured by our competitors, who suggest that they might face as much as ten times normal traffic during these peak periods, followed by ten times more traffic during next year’s peak periods.

For example, if a customer has four web servers handling the current user load, they would need 40 web servers to handle peak shopping periods this year, and 400 web servers to handle peak shopping periods next year.

However, these figures are probably unrealistic, since only a handful of very successful sites actually experience this type of an increase in traffic volume. Nevertheless, all customers want to become successful electronic commerce sites, and want to be prepared for the resulting increase in volume.

Traditionally, scaling a Web site or electronic commerce site is done by scaling the site either vertically or horizontally. Scaling vertically is done by adding processors, memory, and disk space to an existing server or by purchasing new machines with larger internal capacity. Scaling horizontally is done by adding more servers. See the white paper on Building High-Scalability Server Farms for more information on the various methods of scaling sites.

At first glance, these methods appear to provide easy solutions to the problem of increasing a site’s capacity. However, some customers object to these methods, either because they have already maximized their current hardware (vertically scaled their sites), or are not enthusiastic about expanding their server farms (scaling horizontally), due to the increased costs of managing large Web site server farms. In effect, these customers do not want to scale vertically or horizontally beyond a certain point because of the cost of the related increase in management complexity.

In order to address customers’ requirements for using a smaller group of machines to support their sites, we decided to attempt to improve the architecture of their existing hardware. This approach attempts to utilize the machines in a more efficient manner by dividing up requests with similar workload characteristics (similar CPU cost), then putting similar operations on dedicated servers, rather than having all of the servers handling all types of requests (requests with different CPU costs). We found that this approach can improve site capacity more efficiently and more cost-effectively than traditional horizontal and vertical scaling methods. See the white paper on Building High-Scalability Server Farms for more details on how to improve a site’s architecture.

Before we could redesign a site’s architecture, however, we first needed to understand the current performance of the site and where the bottlenecks were (if any). We then needed to understand how improving the architecture could improve the site’s scalability, while reducing complexity and total operating cost more than either of the other two methods.

We chose a live, representative customer site for our study. We audited the site using a measurement technique called Transaction Cost Analysis (TCA). TCA measures the cost of each shopper operation in terms of CPU cycles. (See the white paper Using Transaction Cost Analysis for Site Capacity Planning for more information on TCA.)

TCA is a more accurate measure than ASP throughput performance, which is not directly related to the shopper capacity of a site. For example, increasing ASP throughput by 50% does not necessarily increase shopper capacity by 50%, due to the mixed types of requests that can be overshadowed by a bottleneck existing in other ASP pages.

Measuring the CPU cost of a shopper operation, however, provides an accurate measure of the cost of that operation. CPU cost of a shopper operation can then be translated to shopper capacity, simply by dividing CPU capacity by the CPU cost of the shopper operation.

The Study

This section describes the platform on which we conducted the study, and describes what we did in each phase. The Conclusion section provides a detailed description of our results.

Hardware and Software

This section describes the hardware and software used in this study. The hardware and software match the configuration of the customer site’s data center as closely as possible.

Web Servers

The Web servers used in the study were configured as follows:

Hardware
  • 4 Compaq Proliant 5500s, Pentium II Xeon 400MHz Dual Proc, 1 GB Memory

  • 1 9GB Hard drive for base software

  • 3 6GB Hard Drives for Web Server Data (Web pages) Raid 5

  • Cisco LocalDirector for load balancing
Software
  • Microsoft® Windows NT® 4 Server with SP3

  • Microsoft® Windows NT® Option Pack (IIS)

  • Microsoft® Site Server Commerce Edition 3.0 with SP1

  • LDAP hotfix for membership

  • Microsoft® Internet Explorer 4.01 with SP1

SQL Servers

The SQL servers used in the study were configured as follows:

Hardware
  • 2 Compaq Proliant 5500s, Pentium II Xeon 400 MHz Quad Proc, 2GB Memory

  • 2 9GB Hard drives for base software

  • 4 18GB Hard drives for Product Catalog and Membership (Raid 5)
Software
  • Microsoft® Windows NT® 4 Server with SP3

  • Microsoft® SQL Server™ 6.5 with SP4

  • Microsoft® Windows NT® Option Pack with MTS

  • Microsoft® Internet Explorer 4.01 with SP1

  • Ad Server Database
Network bandwidth 100 mbps

Clients

Clients used in the study were configured as follows:

Hardware 5 Pentium II 233mhz with 128MB Memory
Software
  • Microsoft® Windows NT® 4.0 Server with SP4

  • Microsoft® Internet Explorer 4.01 with SP1

  • InetMonitor
Client Usage
  • 4 clients to drive load

  • 1 client for response time measurements

Router (Cisco LocalDirector 5200)

The router was configured with four real Web servers mapped to one virtual Web server. Sticky session was enabled, since the customer site makes use of ASP Session variables.

Cisco LocalDirector requires that client machines be configured on a different network card interface from real Web servers, to enable virtual IP addressing of the real Web servers.

Phase 1: Baseline & Optimization

Baseline

Baseline code consisted of the customer’s existing production site application code, installed and configured as it is running in the customer data center. The test site duplicated customer IIS/ASP registry entries, Microsoft® SQL Server™ 6.5 configuration parameters (including database device/sizing and database location), and Windows NT page file configuration.

Once the system was set up and configured, we performed a TCA verification, which resulted in the data shown in the following table:

# of
Shoppers
CPU
utilization
Context
switches per
second
Average
operation
latency
ASP
Req. / sec.
ASP
Req. Queued
# of
CPUs
CPU Capacity
(Mcycles)
Cost
(Mcycles)
0 0.000% 0.000 0.000 0.000 0.000 2 400 0.000
100 14.706% 1050.279 421.024 2.659 0.000 2 400 117.648
200 31.960% 3177.189 439.151 5.391 0.000 2 400 255.680
300 57.939% 14637.905 647.937 8.105 0.141 2 400 463.512
400 90.464% 47551.680 15770.377 8.299 78.074 2 400 723.712
500 94.616% 49054.379 25561.366 9.583 158.789 2 400 756.928

Note   The 300 concurrent shoppers for this study site translate to 300 * 4 = 1,200 concurrent shoppers for a live production site, since the transaction size of the study site is four times the size of the live production site.

The following chart plots Load vs. Latency.

Optimization

The optimized site consisted of the baseline code plus changes that we made to optimize performance.

We had two strategies for optimizing the site:

The second optimization effort turned out to be a much easier way to optimize performance of an existing site than reviewing and modifying code page by page. The following table lists the changes we made first.

Change Result
Use Set objSession = Session at the top of the ASP page. Avoids multiple Microsoft® Visual Basic® Scripting Edition (VBScript) name look-ups of the Session variable. You can also apply the same change to the Application, Request, and Response objects.
Use curSubTotal = Session("subtotal") rather than using Session("subtotal") directly. Reduces VBScript name look-ups.
Copy session variables to local variables if used multiple times (such as within a loop). Reduces VBScript name look-ups.
Use CreateObject instead of Server.CreateObject unless MTS transaction flow to the ASP page is necessary. Reduces activation time, since no MTS context wrapper is created for the object when the CreateObject method is used directly.
Use cached data for list/combo boxes, rather than rendering the data every time within the ASP page. You can cache rendered data in the application variable and reuse it within the ASP page. Reduces database fetches.
Use typelib declaration for constants. For example, the following declares constants taken from the ADO 2.1 type library:

<!—METADATA TYPE="TypeLib"
UUID="000020-1239-123-12309"
NAME="ADO21"-->

or

<!-- METADATA TYPE="TypeLib"
FILE="C:\Program Files\Common
Files\system\ado\msado15.dll” -->

Reduces name look-ups for constants.

ADO typelib uuid: Library
{00000205-0000-0010-8000-00AA006D2EA4}

Do not create the Connection object just for use by the Command object. Use the Connection object directly, or provide the connection string to the Command object directly. Reduces object-creation time.
Avoid interspersing HTML content littered with ASP fragments. For example, the following code snippet can be optimized greatly from:
<% While Not qtyRS.EOF %>
<% if qtyRS("qty_high") >= 9999
Then %>
<% = qtyRS("qty_low") & "+" %>  @
<% = FormatCurrency(qtyRS("price"))
%><br>
<% Else %>
<% = qtyRS("qty_low") %> - <% =
qtyRS("qty_high") %> @ <% =
FormatCurrency(qtyRS("price"))
%><br>
<% End If %>
<% qtyRS.MoveNext %>
<% Wend %>

To the following:

<%
  Set fldQtyHigh =
qtyRS("qty_high")
  Set fldQtyLow  = qtyRS("qty_low")
  Set fldPrice   = qtyRS("price")

  Do While Not qtyRS.EOF
    If fldQtyHigh >= 9999 Then
      Response.Write fldQtyLow & "+
@ " & FormatCurrency(fldPrice) &
"<br>"
    Else
      Response.Write fldQtyLow & " - " & fldQtyHigh " @ " & FormatCurrency(fldPrice) & "<br>"
    End If
    qtyRS.MoveNext
  Loop
%>
Optimizes program execution.
Use stored procedures in place of SQL statements. Stored procedures are stored in native compiled form within SQL Server, but SQL statements must be parsed and processed by the Query processor prior to execution. Reduces SQL Server database execution and CPU utilization.
Open a database connection and submit execution as late as possible, then close the recordset/connection as early as possible. Increases efficiency and results in higher scalability, since more resources are made available for use at any given time.

The following table lists the changes we made to reduce ASP CPU utilization overhead by sidestepping the execution of ASP pages.

Change Result
Render ASP pages as HTML pages and use static HTML pages instead of ASP pages. IIS serves up HTML pages extremely efficiently.

To do this, we used XBuilder, a page-rendering tool that converts ASP pages to static HTML pages. To do this, it crawls through HTTP URL links and renders static HTML of the pages. XBuilder can be provided a top level URL (such as http://www.mysite.com), a directory URL (such as http://www.mysite.com/infodir), or a page URL (such as http://www.mysite.com/infodir/about.asp).

Serves a much higher concurrent number of users browsing static HTML pages.

The following is the data we obtained from running transaction cost analysis verification on the optimized site.

Shoppers CPU Util. Context Switches per second Avg. Operation Latency ASP Req. per sec. ASP Req. Queued # of CPUs CPU Capacity (Mcycles) Cost (Mcycles)
0 0.000% 0.000 0.000 0.000 0.000 2 400 0.000
1004 28.060% 19260.459 509.217 3.751 0.000 2 400 224.480
200 20.130% 2174.367 264.313 4.225 0.000 2 400 161.040
300 34.016% 5823.119 194.653 6.363 0.000 2 400 272.128
400 72.053% 35961.031 4213.775 8.289 11.787 2 400 576.424
500 93.375% 56404.863 10550.026 9.067 50.538 2 400 747.000
600 93.105% 57115.512 28825.262 8.326 160.429 2 400 744.840

4 Data for this row appears to be anomalous.

The following chart plots Load vs. Latency.

The data shows that, following optimization, site capacity improved by 100 shoppers (400 vs. 300 shoppers with the baseline site), which is a 33% improvement.

Although the data shows that CPU utilization hasn’t peaked at a load level of 500 shoppers, the ASP request throughput dropped. That, coupled with much higher latency and a high queue, would realistically make such a load level unacceptable to users.

Phase 2: Platform Upgrades

After optimizing the site, we tested it on a series of upgraded platforms. This section describes the results of those upgrades on site performance.

Windows NT Service Pack 4

The first platform upgrade was Windows NT Service Pack 4. The following table shows the results of upgrading to Windows NT Service Pack 4.

Shoppers CPU Util. Context Switches per second Avg. Operation Latency ASP Req. per sec ASP Req. Queued # of CPUs CPU Cap. (Mcycles) Cost (Mcycles)
100 9.086% 857.596 340.383 2.123 0.000 2 400 72.688
200 20.352% 2207.050 307.151 4.346 0.000 2 400 162.816
300 32.203% 4857.834 206.434 6.477 0.000 2 400 257.624
400 50.048% 13150.839 625.089 8.506 0.000 2 400 400.384
500 92.995% 58136.125 12444.548 9.094 64.595 2 400 743.960
600 92.876% 57653.461 22288.374 9.293 140.155 2 400 743.008

Capacity did not significantly increase with this upgrade.

Site Server 3.0 Service Pack 2

Next, we added Site Server 3.0 Service Pack 2 to Windows NT Service Pack 4 and again measured the results.

Shoppers CPU Util. Context Switches per second Avg. Operation Latency ASP Req. per second ASP Req. Queued # of CPUs CPU Cap. (Mcycles) Cost (Mcycles)
400 42.721% 9421.509 551.356 8.526 0.175 2 400 341.768
500 64.840% 26913.301 2626.537 10.144 6.479 2 400 518.720
550 87.379% 52156.246 6228.486 10.543 29.900 2 400 699.032
600 93.406% 58294.496 17243.443 10.363 112.800 2 400 747.248

This time, capacity increased by 100 shoppers, to a maximum of 500 shoppers (the upper limit for load level before the average operational latency increased). This is a 25% improvement over the previous platform, and a 66.7% improvement over the baseline site.

MDAC 2.1 Service Pack 1, and ADSI 2.5

Next, we added MDAC 2.1 Service Pack 1 and ADSI 2.5 (required with MDAC 2.x) to the optimized site code platform and previous upgrades, and again measured the results.

Shoppers CPU Util. Context Switches per second Avg. Operation Latency ASP Req. per sec. ASP Req. Queued # of CPUs CPU Cap (Mcycles) Cost (Mcycles)
400 40.612% 2722.688 294.543 8.697 0.000 2 400 324.896
500 53.534% 4496.735 315.334 10.847 0.045 2 400 428.272
550 63.882% 7257.642 543.354 12.005 0.386 2 400 511.056
600 69.591% 9775.615 1007.789 12.721 1.776 2 400 556.728
700 98.259% 30045.004 10279.983 13.156 76.436 2 400 786.072

The following chart plots Load vs. Latency.

This time, capacity increased to 600 shoppers, an increase of 100 shoppers over the previous platform. This represents a 20% improvement over the previous platform and a 100% improvement over the baseline site.

The data also shows that at a load level of 700 shoppers, the CPU has been fully utilized. At this load level, ASP Requests Queued jumped sharply from 1.776 at the 700-shopper load level to 76.436, with operation latency at an unacceptable level of 10.3 seconds.

An interesting number to note is the number of Context Switches per second. Prior to reaching maximum shopper capacity, it dropped significantly from the previous platform.

Visual Basic Scripting Edition 5

Finally, we added Visual Basic Scripting Edition 5.0 (VBScript) to the previous platform and measured once again.

Shoppers CPU Util. Context Switches per second Avg. Operation Latency ASP Req. per sec. ASP Req. Queued # of CPUs CPU Cap. (Mcycles) Cost (Mcycles)
600 84.146% 22608.238 2153.399 12.461 7.644 2 400 673.168
700 98.187% 29073.221 13558.450 12.288 102.641 2 400 785.496
800 98.455% 26812.770 23504.685 12.108 186.071 2 400 787.640

VBScript did not increase capacity. At a load level of 700 shoppers, CPU utilization increased, but there was lower ASP Request throughput and significantly higher latency, compared to a load level of 600 shoppers. Thus, 600 is the maximum capacity for the platform configured in this way.

It is also interesting to note that ASP Request throughput at the same load level dropped to 12.461 ASP Requests/Second, compared with the previous platform throughput of 12.721 Requests/Second. Note also that Context Switching per second increased by approximately 131.28% over the previous configuration, suggesting a performance issue with VBScript 5 in this configuration. (This issue has since been addressed with Windows NT4 Service Pack 5.)

Quad-CPU Configuration

We previously measured performance and capacity on dual-CPU hardware. Next, we measured performance on quad-CPU hardware, with optimizations and all platform upgrades applied. The results are as follows.

Shoppers CPU Util. Context Switches per second Avg. Operation Latency ASP Req. per sec. ASP Req. Queued # of CPUs CPU Cap. (Mcycles) Cost (Mcycles)
600 34.570% 7872.792 286.806 12.922 0.000 4 400 553.120
700 49.040% 15491.617 200.265 14.176 0.000 4 400 784.640
800 57.174% 24420.549 2296.634 15.916 12.515 4 400 914.784
900 89.914% 53591.469 17019.767 16.133 158.458 4 400 1438.624

The following chart plots Load vs. Latency.

The quad-CPU platform increased capacity by 200 shoppers over a dual-CPU configuration at the maximum shopper load level of 800 shoppers. At this load level, context switching was excessive and latency significantly higher, although CPU utilization had not reached 100%. If threading or VBScript were causing bottlenecks, this might not be the true maximum capacity. It still might be possible to reduce the IIS thread count to reduce context switching and achieve more ASP Request throughput to take capacity beyond 900 shoppers. However, we didn’t test those possibilities during this test.

Phase 3: Architectural Optimization

This section describes how we optimized the test site’s architecture.

Partitioning HTML vs. ASP

Optimizing a site’s architecture is primarily a process of separating (partitioning) operations with significantly different workload (cost) characteristics. For the baseline site, we separated static HTML product browse requests from checkout operations. To completely optimize a site’s architecture, every operation identified in a user profile analysis needs to be partitioned. This type of partitioning is a more efficient use of server capacity than a mixed-request operations scenario.

For this study, we dedicated one server to static HTML product browse pages and another server to processing checkout operations. We did this on the assumptions that HTML requests cost very little, and that a dedicated checkout server can serve as many as 1,000 concurrent users. We then measured throughput performance and validated it to see whether our original assumptions were correct.

HTML Partitioning

Since product browse operations were already rendered as static HTML pages, this partition was relatively simple. There are three types of product browse pages in the customer site: department, skuset, and product information.

Product information pages provide dynamic product pricing based on a customer’s zip code. For simplicity of the study and in the interest of development time, we generated product information pages with an assumed zip code. One way to implement dynamic product pricing on product information pages with static HTML pages is to use an ISAPI filter to interpret meta tags and retrieve product pricing, based on a zip code stored as a cookie or provided as a parameter to the product information page. In this way, product browse pages can be rendered as static HTML pages, but still provide dynamic product pricing based on zip code.

The following table shows HTML Partitioning Data.

Shoppers CPU Utilization Number of CPUs CPU Capacity (Mcycles) Cost (Mcycles)
5000 2.464% 2 400 19.712
10000 4.260% 2 400 34.080

Since we were measuring HTML throughput, we didn’t collect any data for ASP performance counters (they were all 0); only CPU utilization is relevant. The evidence of load is the number of current connections on the IIS server, which showed 5,000 and 10,000 connections, respectively, for each of the verifications.

The data shows that IIS can process a high number of static HTML requests. The study ran the simulation of, at most, 10,000 shoppers, thereby validating the assumptions of HTML partitioning. It’s clear that a dual-CPU server can absorb the load of 10,000 concurrent shoppers requesting static HTML product browse pages.

The data also suggests that many more than 10,000 shoppers can be accommodated (perhaps as many as 50,000 shoppers with CPU utilization approaching 50%). The study stops at 10,000 shoppers, however, since not enough machines were available to generate a larger load (we tested five client machines, each generating 2,000 shopper sessions).

Current benchmarks (Microsoft’s 100 million hits per day scalability demo site) result in 1,200 hits per second using a 1-CPU Pentium Pro server without reaching capacity. Current Pentium II Xeon Class multiprocessor servers will likely attain 4,000 hits per second quite easily.

Checkout Partitioning

Our objective was to measure the maximum number of checkout transactions. We did this with a script used during the TCA measurement process, which we modified by inserting a sleep time so that it performed a checkout once every minute. Increasing the number of users increases the load level. For example, ten users generate ten checkout transactions every minute.

The following table shows Checkout Partitioning Data.

Shoppers CPU Util. Context Switches per second Avg. Operation Latency ASP Req. per sec. ASP Req. Queued # of CPUs CPU Cap. (Mcycles) Cost (Mcycles)
100 29.759% 4176.904 414.139 3.294 0.000 2 400 238.072
120 44.047% 7060.038 451.977 3.967 0.000 2 400 352.376
150 54.970% 15668.520 945.319 4.914 0.156 2 400 439.760
200 95.732% 44520.559 25882.737 4.979 84.111 2 400 765.856

The data shows that the maximum capacity is 150 shoppers. The latency and ASP Requests Queued at a load level of 200 shoppers are not realistically acceptable. This translates to a measured checkout capacity of 150 checkouts per minute (the verification script executes a checkout transaction per minute per user).

Taking the customer site user profile analysis of 0.01 checkout transactions per user per minute, the data shows that the site can absorb 150 / 0.01 = 15,000 concurrent shoppers for the user profile every 10 minutes, per server.

Conclusion

The customer’s requirement for $1.7 billion in annual revenue is supported very easily with existing site capacity. Assuming an average checkout of $250, this translates to approximately 18,669 orders/day.

The existing site, as measured by the TCA process, has a capacity of 300 concurrent shoppers/server or live capacity of 1,200 concurrent shoppers for the site every 11 minutes, with a 40:1 browse-to-buy ratio, transaction rate of 0.1 checkouts/shopper/11 minutes, and an average checkout amount of $250. This means that the existing site should be able to support annual revenues of $1.9 billion.

The existing site supports revenues of $1.7 billion annually with 1,075 concurrent shoppers, well below the site’s maximum capacity of 1,200 concurrent shoppers. Note that the customer specified a requirement that is 37 times the existing volume of transactions (well beyond ten times the current volume).

The study shows that the site definitively performs and scales extremely well above and beyond customer requirements.

A site supporting 100,000 concurrent shoppers is theoretically possible with as few as 88 front-end Web servers.

This study validates the concept of scaling a site architecturally.

Shopper operations with similar workload characteristics (cost) can be grouped together. Their respective ASP pages are then processed from their allocated group servers.

Calculations based on TCA measurements show that it is theoretically possible to support 100,000 concurrent shoppers every 11 minutes, a 40:1 browse-to-buy ratio, 0.1 checkouts/user/11 minutes, with targeted optimizations, with at least 88 servers, as follows:

In addition to the front-end Web servers, there may be a requirement for three or more additional back-end SQL Servers (membership database servers, product database servers, and ad-server database servers).

Although this study shows that a single dedicated checkout server can support up to 15,000 shoppers, we don’t recommend using a single server for checkout transactions. A single server can become a single point of failure. Instead, we recommend that you use at least two checkout transaction servers, to accommodate periodical peak spikes and to ensure high availability.

An understanding of scalability issues facing the test site is clearly provided by the TCA results. Transactions such as checking out or adding items are expensive in terms of workload. Database fetches such as product browses aren’t as expensive; but if they are executed frequently enough, they consume a significant amount of CPU time.

Problems with the customer site as it is currently designed are as follows:

Available solutions include the following:

Tools

This section describes the tools used in this study:

Automation Scripts

The scripts provided in this section are the batch files we used to automate the testing process.

TCA.CMD

TCA.CMD is the root batch file. It takes 3 parameters:

For example: tca www.myhost.com :80 1800 \\mywebserver.

TCA launches Performance Monitor and InetMonitor. At the end of the specified test period, TCA kills Performance Monitor and InetMonitor to go on with the next iteration. The iterations are specified by a collection of InetMonitor parameter files named user*.txt.

@echo off
if "%1" == "" goto error
if "%2" == "" goto error
if "%3" == "" goto error

rem sleep 6 hours
sleep 21600

net use * %3\c/u:administrator mypassword

for %%i in (users*.txt) do (dostart.cmd %%~ni && start iload.exe /s %1 /f %%i && sleep %2 && dorestart.cmd %3)

attrib +r *.* /s

goto end

:error
echo don't forget to use server name:port on command line.
echo also the second parameter should be run time per script in seconds.
echo also the third parameter should be the remote web server hostname (e.g. \\csamsiss30).
:end

DOStart.CMD

@echo off
start perfmon %1.pmw
sleep 30

DORestart.CMD

@echo off
kill iload.exe
kill perfmon.exe
rcmd %1 "net stop w3svc /y"
rcmd %1 "net start w3svc"

Sleep.Exe

This is a utility from the Windows NT Resource Kit that is used to pause/sleep within a batch file.

Kill.Exe

This is also a utility from the Windows NT Resource Kit. It is used to terminate an InetMonitor load simulation run and the associated Performance Monitor log.

Remote Command Service

Remote Command Service also comes from the Windows NT Resource Kit. This tool is used to execute character/console commands on a remote server. It is used to recycle Web servers at the end of a test run.

InetMonitor

InetMonitor is a load simulation tool found in the Microsoft® BackOffice® Resource Kit. It is used to simulate load from client machines on the target Web server.

A component of InetMonitor called InetLoad (iload.exe) runs the client process that generates the load. InetLoad’s parameter file (load.inp) can be found in the directory in which InetMonitor runs. This file is a text file that can be customized to run iload.exe from the command line (as shown in the TCA.CMD batch file).

The command syntax to run iload.exe is as follows:

 iload.exe /s www.myhost.com:80 /f parameterfile.txt

InetMonitor supports script commands to execute HTTP requests, control requests, and load distribution commands. You can distribute load by specifying %NN SKIP X (SKIP the next X commands NN% of the time). For example, you can use this to execute 2.1 operations by executing 2 operations, and adding 1 additional operation skipped 90% of the time.

In order to distribute the load and obtain good data results, you need to distribute users across the average session length. For example, if the session length is 10 minutes, you can distribute 600 users by specifying an InetMonitor client ramp-up delay of 10 min* 60 sec/ 600 users= 1 second. InetMonitor will then pace client ramp-up time by separating users by 1 second, each. A full load will occur after 10 minutes. It is a good idea to run the test for at least several times the length of the average session length. For example, if it takes 10 minutes to ramp up, allow another 10 minutes to create an average load for measurement, then an additional 10 minutes to ramp down.

Performance Monitor

Performance Monitor can automatically start a log if you specify the log name and save the workspace as a file (.pwc extension). You can provide this file to Performance Monitor for start-up settings, which will autostart logging.

To see the ASP session load, it’s best to terminate a session at the end of a script so that the Performance Monitor ASP Session counter does not continue to climb. You can do this with the Session.Abandon ASP statement. The TCA scripts use an ASP page named quitsession.asp to do this.

XBuilder

One of the best ways to optimize Site Server performance is to avoid the CPU overhead required to execute ASP pages and database operations, if data is relatively static. You can do this by changing the ASP pages to HTML pages. IIS processes HTML pages very efficiently, thus allowing the site to serve a much higher concurrent number of users.

XBuilder is a tool that you can use to render static HTML pages from dynamic ASP pages. It crawls HTTP URLs and renders the pages it crawls through the HTTP URL links as static HTML.

XBuilder can work with any of the following types of URLs:

From this information, XBuilder renders the Web site tree and automatically transforms dynamic links to static links.

One very useful feature of XBuilder is that you can embed a header tag within the ASP page and XBuilder will name the rendered page with the text provided in that header. For example, the following code generates a page named product1234.htm (if the SKUSetID has a value of 1234):

<% Response.AddHeader "XBuilder-FileName","product" & Request("SKUSetID") & ".htm"%> 

You can scope XBuilder to include a narrow set of pages, or allow it to crawl and render all pages with excluded directories or pages. When you scope XBuilder this way, it does not transform links to pages that are not part of its crawl path into static links. In this way, links to dynamic ASP pages (such as checkout, view cart) can remain active.

We used the file naming and scoping features of XBuilder to render only product pages, thus enabling the static rendering of product pages, while maintaining all other links to dynamic pages (view cart, checkout, search, and so on).

Generating product pages requires an intermediate root to serve as the starting point for XBuilder. The root page simply generates URL links pointing to a child page that renders the dynamic content using the passed product ID (SKU) as the parameter. The root page shows 1000 links at a time, with the last link pointing back to itself with a parameter indicating to where to start next. XBuilder then follows the links on the root page, generates the product pages, and follows the last link back to the root page, from which it continues with the next set of 1000 until all of the product links have been exhausted.

An easy way to generate the root and child pages for XBuilder is to take the dir.asp and row.asp pages generated by Site Server Search Database Catalog Wizard. These pages have URL links ready to crawl. We modifed the dir.asp page to point to the real product browse ASP pages while passing the product ID (SKU) as the parameter. We then copied the dir.asp and row.asp (modified product.asp page) to the site directory and pointed XBuilder to crawl, starting at http://www.mysite.com/st/dir.asp, with scoping set to include only http://www.mysite.com/st/dir.asp and http://www.mysite.com/st/product.asp pages.

TCA Measurement Spreadsheet

The following spreadsheet shows test results for each shopper operation.

 

Sample Test Scripts

There were two types of InetMonitor scripts used in this study:

Transaction Cost Analysis (TCA) Scripts

We used TCA scripts to exercise load for measuring costs. See Using Transaction Cost Analysis for Site Capacity Planning for an explanation of TCA and a complete set of sample TCA scripts.

Verification Scripts

We used verification scripts to exercise load for TCA verification. See Using Transaction Cost Analysis for Site Capacity Planning for an explanation of TCA and a complete set of sample verification scripts.

Team

The Microsoft team conducting the case study consisted of the following people:

Ron Bokleman (Microsoft Consulting Services)

Philip Carmichael (IIS Product Group)

Michael Glass (Commerce Product Group)

David Guimbellot (Commerce Product Group)

Ken Mallit (Commerce Product Group)

Doug Martin (Commerce Product Group)

Michael Nappi (Commerce Product Group)

Scott Pierce (Commerce Product Group)

Caesar M. Samsi (Microsoft Consulting Services)

© 1999-2000 Microsoft Corporation. All rights reserved.

Information in this document, including URL and other Internet web site references, is subject to change without notice. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Microsoft, BackOffice, MS-DOS, Outlook, PivotTable, PowerPoint, Microsoft Press, Visual Basic, Windows, Windows NT, and the Office logo are either registered trademarks or trademarks of Microsoft in the United States and/or other countries/regions.

Macintosh is a registered trademark of Apple Computer, Inc.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.