Nancy Winnick Cluts
Developer Technology Engineer
Microsoft Corporation
April 30, 1998
Contents
Introduction
What Server Caching Is and Why to Use It
What to Cache and How To Cache
Output Caching
Input Caching
Summary
Editor's Note: This article is one of three on technologies used by the Internet Start site. You'll also want to read Getting Content from the Web to the Client and The Internet Start Site: Now It's Personal.
This article delves into the subject of server caching, one of the technologies used by the Internet Start site . This is the third in a series of articles about things that you can do to add personalization to your Web site and to enhance performance. In this article, you will learn what server caching is, why you should use it, and when and where to use it. You will also have the ability to see the benefits of server caching in action. By the time you finish this article, you will see how easy it is to cache data on the server and, given the improved performance, that you cannot afford to overlook the advantages of server caching.
When we use the word "cache," we are talking about data that is saved in memory. When we talk about server caching, we are talking about saving specific data in memory on the server. This is different from data that is stored in a database on your server's hard disk, because the machine must still use its processing power to retrieve or sort information from that database. Once you have retrieved the data and saved it, you can access it again much faster than if you had to rerun your query.
Now that you know why to cache data on the server, it's time to look at what data should be cached. Just because you can cache data doesn't mean you should. If you have data that is constantly changing, caching the data isn't going to buy you anything. In fact, you would just be wasting memory. However, if you have data that you will access over and over and that does not change often (in this case, often means every few seconds or minutes), this data is a good candidate for caching.
There are two basic methods of data caching: output caching and input caching. When deciding what data to cache, ask yourself, "Is the data dynamic or static?" Static data can be hard-coded in your script while dynamic data can be retrieved via ASP and cached. But you might also have what can be referred to as semi-dynamic data. Semi-dynamic data is data that you don't want to hard-code into your script (that is, it may be a large amount of data) but you want to leverage the data you already have. For example, you may store all of your data in a database. The data may be fairly static (that is, it doesn't change much); however, you don't want to hard-code it into every HTML file. Remember that you have to maintain all of those files, so if you make a change in one, you must propagate those changes to every file. Instead, you may want to cache the data that you will ultimately present to the user.
Output caching is the term for saving data that is in its final presentation state (that is, saving the HTML tags along with the data). This type of caching is otherwise known as caching transformed output. With transformed output, you take the input and transform it into what you are going to present, then cache the transformed data -- you save the data as you would display it. This type of caching gives you the most bang for the performance buck because you only retrieve the data and format it once. It saves work (translated into time) for every request you don't have to reprocess. Output caching works especially well for a Web page that will be shown to many in the same manner.
Download sports.mdb to run the sample code (zipped, 6.38K).
The following script (in SPORTS.ASP, which uses the database in SPORTS.MDB) is an example of output caching. To run this sample, follow the steps below:
1. Save SPORTS.ASP to a directory that has script access (you can run .asp files from it).
2. Save SPORTS.MDB to a directory that the IUSR_machinename account can access. Note: This should not be a directory that is published on the Web unless you don't mind people downloading the file.
3. Create a System data source name (DSN) to provide access to SPORTS.MDB.
4. Request SPORTS.ASP using your browser and the HTTP protocol (not FILE) so the ASP file will run.
Note No Global.asa is needed for this first example.
SPORTS.ASP <%@ LANGUAGE=JavaScript %> <html> <body> <form method=post> What is your favorite sport? <%= getSportsListBox() %> <p> <input type=submit> </form> </body> </html> <% function getSportsListBox() { SportsListBox = Application("SportsListBox") if (SportsListBox != null) return SportsListBox; crlf = String.fromCharCode(13, 10) SportsListBox = "<select name=Sports>" + crlf; SQL = "SELECT SportName FROM Sports ORDER BY SportName"; cnnSports = Server.CreateObject("ADODB.Connection"); cnnSports.Open("Sports", "WebUser", "WebPassword"); rstSports = cnnSports.Execute(SQL); fldSportName = rstSports("SportName"); while (!rstSports.EOF){ SportsListBox = SportsListBox + " <option>" + fldSportName + "</option>" + crlf; rstSports.MoveNext(); } SportsListBox = SportsListBox + "</select>" Application("SportsListBox") = SportsListBox return SportsListBox; } %>
In the script above, notice the second line in the function getSportsListBox. This line checks to see if the application-level variable, SportsListBox, is NULL. If the variable is NULL, this is the first time that this function has run and the function continues processing the data into its final form. If this variable is non-NULL, it has already been processed and is returned. This data is saved in its transformed state for display on the Web page.
For a concrete example of the improvement you will see by using output caching, try commenting out the script that saves the data (the second to last line in the getSportsListBox function) as follows:
// Application("SportsListBox") = SportsListBox
Next, run the Web Capacity Analysis Tool (WCAT) on the .ASP file. On my laptop, I get about 10 requests processed per second. That's not too bad. Now, take the commenting syntax out (remove the "//" characters) and run WCat again. On my laptop, I get 60 requests processed per second. That's a six-fold performance gain by using just one line of script! Ask yourself, "Can I afford to not do this when I can so easily improve the performance on my site?"
But what if you have data that is used by many but presented in different ways to provide personalization? You cannot use output caching in this situation; however, you can use input caching. With input caching, you save the data but not the presentation. This works well for a site that is personalized, where you cannot cache the transformed output because everyone would get the same view. Instead, cache the data using a Visual Basic® Scripting Edition (VBScript)-provided Dictionary object) or an ActiveX® Data Objects (ADO) Recordset object.
The example below uses an ADO Recordset to cache data. Using a Recordset allows you to have multiple properties for each entry (row). The VBScript Dictionary object provides for the storage of one value per entry (key). To see just how easy it is to cache input, try the example below.
GLOBAL.ASA <!--METADATA TYPE="TypeLib" FILE="C:\Program Files\Common Files\system\ado\msado15.dll"--> <SCRIPT LANGUAGE=VBScript RUNAT="Server"> Sub Application_OnStart SQL = "SELECT CompanyName, City FROM Customers" cnnAdvWorks = "DSN=AdvWorks" Set rsCustomers = Server.CreateObject("ADODB.Recordset") ' This is usable disconnected rsCustomers.CursorLocation = adUseClient rsCustomers.Open SQL, cnnAdvWorks, adOpenStatic, AdLockReadOnly ' Disconnect the Recordset rsCustomers.ActiveConnection = Nothing Set Application("rsCustomers") = rsCustomers End Sub </SCRIPT> Customers.ASP <% Set myCustomers = Application("rsCustomers").Clone Set CompanyName = myCustomers("CompanyName") Set City = myCustomers("City") Do Until myCustomers.EOF %> <b><%= CompanyName %></b> is located in <b><%= City %></b>.<p> <% myCustomers.MoveNext Loop %>
In this example, when the Application is started, the rsCustomers Recordset is saved into Application state. Then, anytime the Customers.asp page is called, the same Recordset is used to provide the data. The Recordset Clone method is used to get a private cursor to the shared rsCustomers Recordset. Using the Clone method, the ASP page will share the data while being able to navigate the data separately from other concurrent page requests.
As shown in the example above, you can save information in the GLOBAL.ASA file. Then you can get the results of a query and step through the recordset via the MoveNext, MovePrevious, MoveFirst, and MoveLast commands. This works great in that the information is saved out and the database isn't queried over and over again. But what happens when the database is shared among users? UserA may have gone in and stepped through a few records. Then UserB comes in and does a MoveFirst, effectively resetting the cursor to the beginning of the recordset. This messes up UserA and UserB big-time. The problem is summed up as follows: Multiple users cannot navigate the same recordset at the same time.
Now that I've shown you the problem, here's the good news: ADO version 1.5 has its own cursor engine. What this means is that you can use a client-based cursor when stepping through your recordset. A client-based cursor is just what the name implies: a cursor that is for use on the client machine only. Now, look again at the use of the Clone method in the example above. The private cursor available to the Web page is a client-based cursor.
To try this: Initialize, make the connection, and save out the cursor in your GLOBAL.ASA file. Then release your connection so that others can access the file for writing (if necessary). Then, in your script, use the Clone method (as demonstrated in the above example). This clones the cursor -- NOT the data. This is important, because if you have a very large database, you don't want to cache the entire database for each view when you only need a few records for each view. The data is shared -- the cursor is cloned.
For Recordsets in Application state to scale well, ADO must be marked "Both" threaded. If the threading model is "Both," ADO is assumed to be thread-safe and is instantiated on the calling thread. Any method call on this object will translate directly to a method call on the underlying COM object on the calling thread. A detailed discussion about ADO and threads and how to specify the threading model can be found in the article "Improving the Performance of Data Access Components with IIS 4.0" in the section entitled "Threading Support" (http://www.microsoft.com/workshop/server/components/daciisperf.asp#topic2).
Another innovation in ADO 1.5 is the ability to apply a filter to data. This means that you can grab just a few records from a Recordset based on some specified criteria. Once again, if you have a large database and only a few records to use, the filter would hypothetically be going through all of your records (perhaps thousands) just to create a view. Multiply that by several users and it can severely impact the performance of your Web page.
In the next version of ADO (version 2.0) slated to ship with Visual Studio 98, you will be able to search on your client-based cursor and index the data you get. This should really speed things up. But this is a topic in and of itself, so I will cover how you do this in a future article.
Given the information in this article, you can look at your own script and see if the opportunity exists to cache data on the server. Mix and match so that you use the appropriate type of caching for the data you are trying to save. You will get the greatest benefit by trying to incorporate output caching. But if there isn't a good place for you to do that, improve performance by finding areas where you can cache input. More information about ASP and IIS can be found in our Server area at http://www.microsoft.com/workshop/server/default.asp.