Capacity Planning

Finding Potential Bottlenecks

Find out what is likely to break first. Unless your site is extremely small, you’ll need a test lab to determine that. (There are suggestions for building and using such a lab in the following list.)

To determine potential bottlenecks

Draw a block diagram showing all paths into the site. Include, for example, links to FTP download sites as well as other URLs.
Determine what machine hosts each functional component (database, mail, FTP, and so on).
Draw a network model of the site and the connections to its environment. Define the topography throughout. Identify slow links.
For each page, create a user profile that answers the following questions:
- How long does the user stay on the page?
- What data gets passed to (or by) the page?
- How much database activity (or other activity) does the page generate?
- What objects live on each page? How system-friendly are they? (That is, how heavily do they load the system’s resources? If they fail, do they do so without crashing other objects or applications?)
- What is the threading model of each object? (The “agile” model, in which objects are specified as both-threaded, is typically preferable, and is the only appropriate choice for application scope.)
Define which objects are server-side and which are client-side.
Build a lab. You’ll need at least two computers, because if you run all the pieces of WCAT on one computer, your results will be skewed by WCAT’s own use of system resources. Monitor the performance counters at 1-second intervals. When ASP service fails it does so abruptly, and an interval of 10 or 15 seconds is likely to be too long—you’ll miss the crucial action. Relevant counters include CPU utilization, Pool nonpaged bytes, connections/sec, and so on. (For more information about counters, see Monitoring and Tuning Your Server in this book.)
Throw traffic at each object, or at a simple page that calls the object, until the object or the server fails. Look for:
- Memory leaks (steady decrease in pool nonpaged bytes and pool paged bytes).
- Stop errors and Dr. Watsons.
- Inetinfo failures and failures recorded in the Windows® Event Log.
Increase the loading until you observe a failure; document both the failure itself and the maximum number of connections per second you achieve before the system tips over and fails.
Go back to the logical block diagram, and under each block fill in the amount of time and resources each object uses. This tells you which object is most likely to limit your site, presuming you know how heavily each one will actually be used by clients. Change the limiting object to make it more efficient if you can, unless it is seldom used and well off the main path.
Next, traceroute among all points on the network. Clearly, you can’t traceroute the entire Internet; but you can certainly examine a reasonable number of paths between your clients and your server(s). If you are operating only on an intranet, traceroute from your desk to the server. This gives you a ballpark estimate of the routing latencies, which add to the resolution time of each page. Based on this information, you can set your ASP queue and Open Database Connectivity (ODBC) timeouts. (For more information about ASP queuing, see Monitoring and Tuning Your Server in this book.)

Note If the first seven steps appear to bear some resemblance to an inverted version of the Open Systems Interconnect (ISO) “layer cake” model, there’s a reason. The ISO model is a highly useful lens through which to examine server behavior.