FAQ: Developing a Scalable Two-tier Site with IIS; Microsoft Internet Developer October 1999

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

FAQ

Paul Enfield
Download the code (2KB)

Developing a Scalable Two-tier Site with IIS

emanding release schedules and restrictive budgets often require concessions in Web site development. Such concessions often involve the development of a two-tier solution instead of the n-tier ideal. With scalability as a common business requirement, it is important to understand how to design a scalable two-tier solution. Let's examine how a scalable two-tier site, http://shop.microsoft.com (also known as Shop), was designed and how it was successfully released.
      Let me first define exactly what I mean by two-tier. The two tiers in shop.microsoft.com consist of the Microsoft®Internet Information Server (IIS) and ASP front-end code and the Microsoft SQL Server™back-end database. Shop utilizes a farm of front-end servers running IIS and a single high-performance SQL Server-based server.

Using Technology as Designed
      Without a business logic layer in the site design, you must decide where to place the logic necessary to build the pages. It is easy to fall into the trap of performing most operations in the IIS front-end code. When determining where code should be run, follow this simple guideline: use technologies for their designed purposes.
      SQL Server and database back ends in general were designed for information retrieval, sorting, and manipulation. This means data sets provided by your back end should be in a useful format requiring minimal manipulation on the front end. Also be sure to fully exploit your tools. With a database, this means you should build your data retrieval and manipulation logic into stored procedures when possible.
      IIS and ASP are the glue that holds together all the components necessary to build the site. In a two-tier system this includes the ActiveX®Data Objects (ADO), the ASP intrinsic objects, and whatever other COM objects you have at your disposal. ASP takes these components and produces the HTML code to display the page. Operations outside the scope of display logic should be looked at more carefully. This might involve decisions on what information to display, or when to display it. Analyze this logic and determine if it is better handled by the database back end or by the ASP code.

Don't Monopolize Resources
      The key to a scalable two-tier site is understanding that resources will be in demand by many browser clients at the same time. This means that finite resources such as database connections must be used in a concise manner to promote their availability. This lets your site take advantage of the performance enhancements granted by ODBC connection pooling or OLE DB session pooling. The specific methods your site utilizes depends on which data layers you use, ODBC or OLE DB. For more details on pooling see this month's Inside Knowledge column and http://msdn.microsoft.com/library/techart/pooling2.htm.
      The easiest way to do this is to obtain the data you need through ADO in a concise manner. Shop 1.0 uses arrays as a means to transport the data from the stored procedure wrapper function to the display code with the Recordset GetRows method. Alternatively, you can use the ADO disconnected recordsets. With either method you can immediately close the connection after retrieval. Using connections in this concise manner allows Microsoft Transaction Services (MTS) to manage your resources more efficiently. Figure 1 is an example of how Shop 1.0 uses arrays (minus error handling).
      Referencing the data in the array can get confusing when you use only indexes into the array. By defining friendly-named constants you can ease your debugging.

<% CONST INFO_ID = 0 CONST INFO_NAME = 1 CONST INFO_DATA = 2 Dim aInfo, intLoop aInfo = GetInfo(1) For intLoop = 0 To UBound(aInfo, 2) Response.Write aInfo(INFO_NAME, intLoop) & ": " & aInfo(INFO_DATA, intLoop) Next %>

      The Recordset.GetRows method has a limitation in its handling of Text fields in the recordset. If there is more than one text field in your recordset, a different approach must be used. In this case, you must iterate through all the records in the recordset, moving them into the array. In Figure 2, an initial size is set for the array to hold the recordset, which is grown as necessary. This method was chosen because of the special handling necessary for TEXT columns, and for consistency and maintainability in the 1.0 version. Since that time, further performance analysis has shown that the redimensioning and looping used in this function are not optimal. In hindsight, a preferable method might have been to utilize the disconnected recordset. Better yet would be to take advantage of the SQL Server 7.0 long character fields and avoid TEXT columns wherever possible. Assuming the requirement for TEXT columns, you might rewrite this function utilizing disconnected recordsets (see Figure 3).
      There are a few advantages to using disconnected recordsets. As with passing arrays as the return value from the function, disconnected recordsets accomplish the goal of concise use of connections. In testing we found that disconnected recordsets performed at a virtually identical level as arrays. With the added benefit of being able to reference column data explicitly by column name, disconnected recordsets lend themselves to a more manageable piece of code. There's no need to define constants representing column indexes with disconnected recordsets.
      These functions demonstrate the methods for disposing of connections in a timely basis. The added benefit of this approach is that it helps work around potential problems with connection reuse. Opening a connection at the beginning of a page and utilizing it for the entire page is not only an inefficient way to use resources, but it can also lead you to these connection reuse problems. Also notice that parameters are explicitly defined by using the CreateParameter method of the Command object. This and other performance tips can be found at http://www.microsoft.com/data/impperf.htm.

Use Available Data Structures
      The data structures available to you have their own intrinsic benefits that should be used appropriately. Two good examples are arrays and the dictionary object. Arrays are ideally used when information needs to be stored in a group and accessed sequentially or directly. They are not ideal for storing information that must be searched. Dictionaries, however, can be good for this app. If you store information in the dictionary properly so that information you want to search for has an appropriate key value, you can reference that information in a random fashion by key name only.
      If you are using data obtained from the database back end to build lists or display tables, arrays are ideal. Arrays are also good in any situation where you can calculate the exact index you need. For example, if you hold four tip-of-the-day quotes for each day of the week, you can easily calculate the array index by multiplying the day of the week by four (subtracting one for zero-based arrays). For the Shop homepage, exactly four products for each category are stored in arrays for easy access to the data and to reduce the number of hits to the database (see Figure 4).

Figure 4: The shop.microsoft.com Homepage

      Figure 4: The shop.microsoft.com Homepage

      If you need to locate a value in an array, consider using the dictionary object. Looping through an array simply to locate an item can be a time-consuming process and can hinder scalability greatly, especially if done in a repetitive fashion. In this case, consider storing the dictionary at application scope for later use, thus avoiding the cost of repopulating the dictionary.
      The VBScript scripting object dictionary is not truly a both-threaded object capable of being stored at application scope. If used this way, the object could lead to thread blocking problems. To avoid these issues, shop.microsoft.com uses the dictionary objects provided as a part of Microsoft Commerce Server 3.0. The IIS development team has also released a LookupTable object that can be used as a read-only dictionary. For details, see http://msdn.microsoft.com/workshop/management/planning/MSDNchronicles2.asp.

Avoid Costly Resources
      Accessing data from the database back end is frequently one of the most costly operations on a Web site. This expensive operation can sometimes be avoided by utilizing other data structures local to the machine running IIS. Shop.microsoft.com was successfully designed around application-scope dictionaries and arrays so that no database connections are incurred until a user has navigated several pages into the site.
      A site driven independently of the database back end is a nice idea, but it's frequently not realistic. Careful analysis of a site will help you determine areas that require database hits that might be circumvented by the use of a data structure such as a dictionary or application-level array. For example, you might have a site that allows users to look up parts using a particular SKU number. This information is relatively lightweight and can be effectively stored at application scope in a dictionary object for use by all clients. Because the part names are not likely to change often, introducing this independence from the database is not very risky as long as the dictionaries are updated incrementally. Another alternative is to use include files instead of live data from the back end. You can build an HTML table on a scheduled basis and save this to a file that is included in your ASP files.

Avoid Costly Page Construction Code
      Some coding conventions and structures are susceptible to eating a lot of processor time. A common culprit is large looping structures. Such a loop might construct a very large table or HTML <SELECT> box. These structures can be even more costly when nested. Pay close attention to looping structures with more than 25 iterations and evaluate possible alternative approaches.
      In cases where the results of these large loops might be reused on subsequent pages, consider storing the resulting HTML in an application variable. This is a workable solution in situations where the data used to build the listboxes is not volatile. If you are building extremely large listboxes or tables, consider reviewing your design to see if an alternate approach might lend itself to smaller display structures.

Analyze Performance Incrementally
      Probably one of the biggest mistakes you can make in building a scalable site is to delay performance testing until the end of the development cycle. Performance testing should be carried out during the development cycle, and especially when considering an approach or technology that does not have a proven history. Make time to build test pages containing minimal amounts of code to compare algorithms.
      Two viewpoints should be considered when performance testing. Tools such as the Microsoft webcat (http://www.asptoday.com/articles/19990630.htm) test page throughput, while RadView's WebLoad measures performance from the client's viewpoint, such as how long it takes before a complete page loads.
      Page load times are largely dependent on the bandwidth available to the client and the total size of the content being delivered. Graphics and page sizes can contribute to the content size and can be controlled by the Web developer. Usually, however, page content is controlled by those doing the page layout and design, and is largely out of the control of an ASP developer. Therefore, client load-times are not as important to an ASP developer as server-side page throughput.
      Server-side page throughput is a measure of how quickly IIS can process a page. The larger the throughput, the larger the number of clients a single box can service. This performance is greatly influenced by the ASP page design. Your choices of code, how you use it, and when you use it determines the performance of the page. Use a load tool such as webcat or homer (http://homer.rte.microsoft.com) to find potential bottlenecks on a page-by-page basis. Establish a baseline for performance and tune your pages to meet this baseline if possible.

Conclusion
      Two-tier sites are a reality of the current Web development world. With proper foresight and design, however, they can be built to perform almost as well as a full n-tier design. The design principles I've covered here led to the successful release of a scalable two-tier site, shop.microsoft.com.

Do you have questions for FAQ? Send mail to faq@microsoft.com.

From the October 1999 issue of Microsoft Internet Developer.