Tom Moran, Microsoft Developer Support
Dustin Hubbard, Premier Internet Developer Support
Microsoft Corporation
June 12, 1998
Download the sample code files (zipped, 2.8K).
Contents
Introduction
Technologies Used: The Big Picture
Site Architecture
Designing Scalable Web Applications
Performance: The Need for Speed
Browser Compatibility
Building a Subscription-Based Site
An In-Depth Look at Portfolio Manager
Testing: Making Sure It All Works
Publishing
Thank You and Goodnight
Additional Resources
Note Highlights of this article were recently published in MSDN Online Voices. Through June 30, readers of this article can receive a Microsoft Investor subscription free for one month, plus a continuing 30 percent discount (US$6.95 monthly, instead of US$9.95) thereafter. Remember, the first month is free. To obtain the discount, you must use this link to Microsoft Investor (new subscriptions only, offer valid through June 1998 only).
Barron's magazine called it the "best all-around site for investors on the Web today." Individual Investor wrote, "One of the most elegant, drop-dead gorgeous interfaces on the Net."
I'm talking about Microsoft Investor (http://investor.msn.com ). Every day, 300,000 customers call up over 3.5 million pages with a response time that borders on the immediate. That kind of performance makes Microsoft Investor one of the hottest sites on the Web.
So I figured I would ask if I could find out a little bit more about the Microsoft Investor (MSI) site and how its creators do things. Being the skilled negotiator I am (I went to a seminar last year), I decided initially to ask for everything. I was a little shocked when Eric Zinda, the group Program Manager, agreed to let me publish all the group's secrets. There were so many of them -- and it was such good information for you -- that I asked Dustin Hubbard, one of the star engineers from Microsoft's Premier Developer Support ASP Team, to assist me.
The main purpose of this article is to address some of the most significant decisions that come up when trying to build a robust, scalable Web site and architecture. We will examine the most crucial aspects of designing a large-scale Web site, tell you exactly how MSI designed, built, and tested their commercial Web site, and answer such questions as: How do I manage a subscription? How do I get great performance? What technologies should I use to achieve scalability?
In this article, Dustin and I will take you through several major areas of the site -- architecture, performance, publishing, testing, building the site so it's scalable, subscriptions, downleveling (browser compatibility), and so on. As we walk you through the site and these features, we will also point out various decision points the team came across as they built the site.
This article assumes intermediate familiarity with server software and hardware, including Microsoft Internet Information Server (IIS).
Some background: MSI is an online resource for investors who want to track their stocks, maintain their portfolios, keep up-to-date on financial news, and get investing advice from experts. MSI has tools for seasoned investors as well as ways to help beginning investors understand the market through features like the strategy lab. There are daily articles, quote and news feeds, market summaries updated several times per day, analyst recommendations, and so forth. MSI is much like other investing sites on the Internet -- however, there are a few significant exceptions.
First, MSI is broken up into only two sections, subscriber and non-subscriber. Second, MSI provides a way for individuals to easily track all of their investment accounts. Even non-subscribers are provided this powerful functionality. Third, MSI has some of the most powerful investment-research tools available. Finally, MSI has the ability to dynamically chart multiple stocks at once, which provides a powerful way to view the performance trends of your favorite holdings.
Here's a table that summarizes the various technologies used on the site:
Technology | Use/Notes |
Internet Information Server (IIS) 3.0 | As this article was being written, the MSI team was working to upgrade their servers to IIS 4.0 |
Windows NT® Server 4.0, SP3 | Used as the platform for all servers |
ISAPI Filters | Used to add information to the IIS logs |
Active Server Pages (ASP) | Used throughout the site to generate all Web pages |
Cascading Style Sheets (CSS) | Used to easily maintain a consistent look and feel, and to help performance, because CSS is smaller than adding font tags to every cell |
Microsoft Word | Used to write all articles |
Microsoft Access | Keeps track of articles and generates article archives |
Visual Basic® for Applications | A macro that goes through a Word article, generates HTML syntax for each heading and style, and does lookups in the Microsoft Access database for company information |
SQL Server 6.5 | Used to store the user database |
SSL 2.0 | Used for secure communication when transmitting credit-card information |
Content Replication System (CRS) | Used to deploy new or updated content onto the production servers ( a feature of Site Server 3.0) |
ActiveX® Data Objects (ADO) | Used for all interaction with SQL Server, all ADO is directly in ASP pages, and there is no transaction processing |
Active Template Library (ATL) | Used to write ISAPI filters, server-side objects, and specific ActiveX controls that are downloaded to client browser |
ActiveX Controls | Portfolio Manager, Investment Finder, Charting, Ticker |
VBScript | Used on server |
JavaScript | Used on client, primarily for cross-browser compatibility |
Web Capacity Analysis Tool | Used for testing load of ASP pages on the Web servers |
Cookies | Holds your username, and, if you select the option, your password; used to determine whether your session has timed out; and stores miscellaneous site settings |
Internet Mail Server (IMS) | Used to send alerts |
Sendmail object | Used to send your forgotten password |
There are no frames, although there once were. In fact, getting rid of frames was one way the MSI team got major performance improvements between versions 4 and 5 of their site, which just debuted. There is no Java and no per-user state. You won't even find HTML pages. MSI's primary goal has been performance and a great customer experience, and it definitely shows.
MSI has successfully met all the criteria of a great Web site -- performance, stability, broad reach, and scalability. There are many interesting areas of the site; now we'll focus on each in turn.
Because the architecture serves as the foundation for everything else, we'll start there.
One of the most interesting things about the architecture is that there are two different stock quote feeds: a satellite dish and a leased line (similar to ISDN). Although the recent Galaxy 4 satellite outage took out the stock quote feed for some of MSI's competitors, MSI stayed up without a burp because of the land-based line. It might sound a little strange, but thinking through issues like this is a real part of creating a site that is both redundant and fail-safe. (In a redundant system, if you lose part of the system, you can continue to operate. For example, if you have two power supplies and one takes over if the other one dies, then that is a form of redundancy.)
As we mentioned in the introduction, the entire site is built on Windows NT 4.0 and IIS 3.0. At the time this article was written, the MSI team was in the middle of upgrading to IIS 4.0, primarily to take advantage of the increased performance and stability improvements. When asked if they were currently having problems with stability, they mentioned some of their Windows NT boxes running their quote servers had been running over six months and never gone down.
Whether you maintain your own connection to the Internet or rely on an ISP to provide it for you, infrastructure is a major component of Web site design. Don't assume your ISP has the bandwidth your application requires to give a good end-user experience.
After the quotes from S&P are picked up, they are broadcast via the small private LAN to a bank of three quote servers. These servers hold large memory-mapped databases of the approximately 250,000 symbols. If you are asking yourself why SQL Server isn't used, the reason is that when this site was created, SQL version 4.3 was current and supported only page locking. As you can imagine, it might cause a bit of a performance problem if you had to lock the page every time a new quote price came in from the stock exchanges. With the current version of SQL Server, this is not a problem. The servers also manage information that is sent via CRS (from Microsoft Commercial Internet System [MCIS]) from a server that collects data from several data providers, including Hoovers, Media General, MSNBC, Zacks, MorningStar. Most of these other data feeds are actually sent by FTP from the data provider's servers to MSI for processing.
As was pointed out previously, there are several server-side objects (SSOs) used on the Web servers. One SSO communicates with the News servers, and a separate SSO communicates with the Quote Servers -- we'll talk about the specifics a little further down.
The News Servers also hold large memory-mapped files of the last 30 days of news, provided by MSNBC. This news is processed into HTML format and replicated to the utility servers. Because it gets to the utility servers in HTML, it is very quickly served to a customer.
Note that there is a mail server that uses IMS (from MCIS). This is used to generate the alert e-mail that subscribers sign up for. The team also uses SQL Server and SMTP to generate the mail you get when you request a lost password. This was the simplest and quickest way to get e-mail for someone trying to get in who has lost his or her password. The alert e-mail service runs on its own server to minimize interdependency. Customers want an alert as soon as possible when an event happens, and isolating the server that handles that mail helps accomplish this.
Chat servers and bulletin-board system (BBS) servers are shared with MSN. Actually, this is an important part of the architecture. If you can find someone else who is doing something successfully, and it meets your needs, there is nothing wrong with sharing the responsibility. It can be a lot cheaper and easier to use someone else's service than to develop your own.
One of the most difficult challenges is making sure that all your servers are set up exactly the way you need them to be. This means that all of the correct software is there, with all of the patches. It also means that all of the data source names (DSNs) are set up, the SSOs and ISAPI filters are installed, and so forth. MSI has an internal document that details every step. If you were to follow this exactly, everything would be set up exactly as it needs to be. It is important to leave no room for error, so when a version of software changes, the team goes through the setup instructions again to make sure that the dialog boxes are the same, and that specific instructions like "Click Add" are still relevant.
So what happens if something goes wrong? Well, MSI's goal is to detect and resolve the problem before someone calls in and says they're having a problem. A complete system monitors the Windows NT event log and Performance Monitor (a.k.a. "perfmon," a feature of Windows NT). The status is available round-the-clock via Web pages, so anybody on the team can keep an eye on what is happening. Several members of the team also carry pagers, and are beeped automatically if there is a problem. The Investor team uses standard IIS perfmon counters to monitor application health, as well as custom counters that tell them things like how many price tables were delivered, how many charts, the number of file downloads, and so forth. All of this is accomplished via ASP pages that talk to various internal controls monitoring the health of the entire system.
Decision point: How are you planning to monitor your servers? Check out Tom's article on INetMonitor for ideas.
The Investor team wrote their own round-robin, load-balancing software in their SSOs instead of using standard DNS. This ensures that quotes and newsfeeds are delivered from the server that can best handle them. In effect, it does custom load balancing to meet the needs of the Investor architecture. Doing something like this is not for the faint of heart, and not generally recommended, but in MSI's case was necessary.
A major challenge the MSI team faced was to build a site that would be available to investors all over the world, all day, every day. This is critical because of the global need for up-to-date, accurate investment information. So what happens when the MSI team needs to reboot their server to upgrade a piece of software or when the infamous Windows NT blue screen appears? Do thousands of individuals suddenly lose their connection to MSI?
This is what fault tolerance is all about: basically, expecting the unexpected and preparing for it. Any serious Web architecture must be capable of losing a server without affecting the customers who are accessing that site. This is accomplished by having multiple servers running the same software, all the time. Say, for instance, you have two IIS 4.0 Web servers running through a router. If you need to restart one of the servers to upgrade some software, you can do so without taking your site off the Internet.
One example of fault tolerance is MSI's quote architecture. They have two different feeds that receive data from S&P Com Stock, which go to three different quote servers. All this is done to increase the certainty that quote feeds will always be available to MSI customers.
Decision Point: Determine the acceptable amount of availability for your Web site. Is it 70, 80, 90, or maybe 99.9 percent? Your architecture should be designed to meet this fault-tolerance percentage. If you use an ISP, ask them what level of fault tolerance they are promising.
We wrote this article to help you design and build a Web site that will meet your organization's needs today and, more importantly, will meet them tomorrow. This is where we find the majority of mistakes get made. In a push to get something out fast, many questions go unanswered -- or even more often -- unconsidered. Thus, six months after the site goes live, a major redesign is necessary because the site isn't fast enough or traffic to the site is significantly higher than anticipated.
Scalability is simply ensuring that as traffic to your site grows, so does the Web server's ability to effectively handle it. For instance, if freeways were built to be scalable, the lanes of traffic would increase as the number of commuters on a particular freeway increased. When more users begin going to a site that is not scalable, the result is the same as what happens to popular freeways. Traffic comes to a crawl and in some cases you get road rage (or in the case of the Internet, it might be called "Web rage").
MSI is an excellent example of a site architecture that scales as its popularity increases. It's a good thing too, because the site's traffic has increased over 60% in the past six months. Making a site scalable doesn't have to be particularly complicated; you just have to plan for it from the beginning.
To start with, estimate the amount of traffic to your site per month and triple it! Just about everyone underestimates the amount of traffic that their site will generate. If your architecture handles only your initial estimate, you will find that your site quickly will not be able to handle the number of requests it receives. The technique of forecasting Web traffic to your site is commonly known as capacity planning. It involves determining the server's requirements, designing an application to meet those requirements, and monitoring your site to ensure those requirements are being met. The Internet Information Server Resource Kit by Microsoft Press has an entire chapter dedicated to this subject.
One way MSI ensured a scalable architecture was by building a stateless Web site. The term stateless means that information does not persist after a request is served. A simpler way to think of it is that the Web site "remembers nothing." When designing your site, don't keep information in the Web server's memory. If information needs to be retained, write it to a database or a text file. This frees up the amount of available memory the system has running, and it is able to handle a greater number of requests.
Many companies that have built their Web sites using the Microsoft ASP architecture have made heavy use of the Session object. The ASP Session object retains information for a user for the amount of time the user is at the site plus 20 minutes (by default). Think about the ramifications of doing this. Let's say you ask every visitor to your page to enter his or her name and you store it in a Session object. Assume you have 100,000 visitors a day. That means that you would likely have over 4,000 names in memory at a time. That is vital RAM being used to fulfill a pretty insignificant purpose. Even worse (because RAM is really pretty cheap) is the time it would take to look all of that information up. Also, it is not a simple or desirable task to try to get Session objects working in any sort of cluster or Web farm. If you need that type of functionality on a large scale, you really should look at Site Server's membership services. There are times, however, when using the Session object may make sense, such as on an intranet site where Web traffic is fairly light.
Decision Point: List all the information that you plan to maintain on a per-visitor basis. If you are not planning to store it in a database, decide whether it is critical enough to justify the scalability hit. You basically have four choices: cookies, session objects, a custom database, or something like Site Server's membership services . If your site already uses ASP, you might want to check your pages for Session objects and try to modify your ASP to not use them.
Have you ever been in a car accident and had to have the hood or door replaced because it was damaged in the accident? All the repair shop has to do is take that component (let's say the car's hood) off and put on a new one. Can you imagine the cost if you had a dent in your hood but the only way to replace it was to replace the entire exterior of the car? Not only would it cost significantly more money, but it would take longer to do, and if the following month you got a dent in your door, you would have to go through the entire process all over again. As absurd as that idea seems in the realm of cars, every day people build Web sites that are designed exactly that way.
To componentize your site and avoid this, think through your Web site's design before you build it. This will enable you to break your site up into reusable components, which usually means into server-side objects (SSOs). An SSO is essentially a dynamic-link library (DLL) that sits on the server and contains some type of business logic. That is exactly what MSI did to ensure that their site was scalable -- not to mention easier to maintain.
MSI has various SSOs that they wrote using C++ and ATL. Each of these objects fills a very specific purpose (and yours should also):
Two of the most common ways to build an Internet SSO are to use Visual Basic® 5.0 or Visual C++® and ATL. Note that a component written in ATL, in most cases, will be much faster and scale better than one written in Visual Basic. However, a component written in Visual Basic may be much quicker and easier to develop, so there is a tradeoff. You need to decide whether the added scalability provided by a component written in C++ is worth the possible delay in implementation.
Componentizing a Web site provides scalability that otherwise would be missing because a component can be used by thousands of users simultaneously and can be spread across multiple Web servers. This is part of load balancing, which we touched on earlier.
Decision Point: Determine your business needs and pick a technology to build your components with that matches those needs.
Performance is essentially speed. Simply put, it is how fast a Web site runs and what causes it to slow down. Have you ever tried hitting a popular site during lunch time? If you have, you have probably noticed that the site is slower than only a few hours earlier. This is usually because more people are accessing that same site and the Web server has a harder time responding to all the increased requests. Various studies have suggested that Web surfers give a site about 10 to 15 seconds to display before they hit the Stop button and go to a different site.
It is easy to build a site that is fast when only a handful of users are accessing it. However, if your site grinds to a crawl when a few hundred people access it, you may be in big trouble.
Decision Point: What level of end-user experience can you afford to provide? Is it important that all users have a premium experience, even under load? Will they go to a competitor if response time is not immediate? The Investor team wanted everyone to have a premium experience, and knew that for customers in a financial environment, speed was paramount. Keep in mind that speed is more than just how quickly a completed page is brought down from the server -- it is also how quickly someone can find the information they need.
Unlike mathematics and physics, performance is more an art than a science. In some respects, that is unfortunate, because there is no simple equation to determine how fast your site will be. In most cases it takes testing, fine-tuning, and time to get your site optimized. However, if you have a Web site that was not designed for speed, there is very little you can do to improve the performance of the site without redesigning it. Thus it is critical that performance is a major consideration in the design stage. Some of the major contributors to Web site performance include:
MSI is an excellent example of a site built for speed; the team's goal was to ensure that their site is one of the fastest investor sites in the world. Listed below are some of the performance-enhancing things that they have done to make their site fly.
1. Web servers have 510 MB of RAM each. As with your own PC, the more memory you have the better your performance.
2. Spare use of images. The more images you have on your page, the longer it will take your page to get transmitted to the client.
3. Use of Server Side Objects (SSO). ATL objects not only scale well but perform great under high load.
4. Active Data Objects (ADO) and connection pooling. Connection pooling significantly reduces the amount of time that the Web server takes to establish a connection to a back-end database. A slight change for IIS 4.0 is that connection pooling is now on by default.
5. No use of frames. Frames actually increase the amount of time it takes to download to a client because more files get transmitted per client request. MSI cut their download traffic in half simply by removing frames from their site. What may look like frames on the pages are actually tables.
6. Keep all pages under 10K. Setting a limit on page size (including all images, and so forth) will help you keep your pages lean.
7. Careful use of objects. The MSI team is careful only to instantiate objects right before they use them. This ensures that objects don't end up hanging around in memory longer than needed.
The one thing that would increase the speed of the site even more would be, of course, to use HTML instead of ASP. Because ASP has to be interpreted and generated by the server, it is much slower than straight HTML. However, nearly every page on the MSI site has database queries. In this case, ASP is usually the best choice, because it enables customization on-the-fly. It comes down to flexibility versus performance.
Earlier in this article, we talked extensively about what scalability is and how to ensure you design a scalable Web site. If your site does not scale well, all other performance enhancements you have made will likely be useless under high load or stress.
If your code is efficient and well-written, it will perform much better than poorly written code. A good example of this is the problem that the MSI team found when testing one of their ASP pages. They were retrieving arrays from one of their SSOs and iterating through each record in the array, of which there were 1,000. The problem was that in their if statement they kept using the syntax someObject.arrayMethod(x). As it turns out, VBScript re-copies the returned value of a method every time it is called. Therefore, rather than creating 1,000 rows in memory they were creating 1,000 rows 1,000 times, which of course equals one million! The way to resolve this is simple. Just assign the array to a local variable and then iterate through the local variable; that way the 1,000 records will get created only once.
Componentizing your application serves two purposes: it helps your Web site scale better and it makes your application fast. Breaking your code out of ASP and putting it into SSOs will generally improve performance. The reason is that ASP is written in an interpreted language such as VBScript or JScript, which means that each line has to be read and interpreted every time a client requests it. An SSO written in ATL is machine-compiled, which means it does not need to get interpreted every time a user requests it. If you write an SSO in Visual Basic you'll find that it is slightly faster than VBScript, but slower than ATL.
As I mentioned earlier, it is easy to unintentionally initialize objects that you end up not using in certain pages. One common place this can happen is within included files. You can avoid this by using the <OBJECT> syntax rather than the server.createobject() syntax. If you use the <OBJECT> tag, the object doesn't actually get created until you reference it in your code by calling a method or property on that object. In contrast, the server.createobject() syntax immediately creates the object.
In an effort to quickly build your Web application it is easy to overlook costly coding errors. Remember that just because your code runs doesn't mean it runs well. The MSI example I gave about creating one million records instead of 1,000 is an easy thing to overlook but can cause substantial performance problems and be difficult to track down. Be sure to spend time going back over your code after it is written to look for obvious problems, such as creating variables and objects that don't get used. Stepping through your code is a great way to really understand all the work it is doing. Each step takes up server resources, and with hundreds or thousands of simultaneous users, it can be very costly on your live site.
Just about anyone who has developed a Web site has faced the decision of what client browsers they will support. This usually boils down to whether you want your site to support both Microsoft Internet Explorer and Netscape Navigator, or just one browser. For the MSI team, the decision was easy. They would need to support both browsers to develop a Web site that reached the broadest consumer base.
Decision Point: Of course, when developing a Web site for both browsers you have some difficult decisions to make. Should you build two Web sites, one for Microsoft Internet Explorer and one for Netscape Navigator? Should you use only standard HTML in your page to guarantee browser compliance but forfeit the creativity that other Web sites will offer? What technologies can you safely implement across multiple browsers?
For the MSI team to build a site that supported many different browsers but also incorporated some of the latest technologies, they had to make some important decisions. First of all, each ASP page sniffs the browser and generates two possible versions of HTML: A Cascading Style Sheets (CSS) version for Internet Explorer versions 3, 4, 5 and Netscape Navigator 4; or a basic HTML version for every other browser from Navigator 3 to WebTV.
MSI decided to make heavy use of CSS for the most current browsers because it is a standard that is supported by both Microsoft's and Netscape's browsers, but offers much more design flexibility than standard HTML.
The team also makes use of ActiveX components for the more interactive portion of their site such as their Historical Chart and Portfolio Manager sections. However, because ActiveX is not directly supported by Netscape Navigator, they also wrote a custom plug-in that would expand the functionality of Navigator and enable it to use ActiveX. In the cases where a customer is using a browser that is not Internet Explorer or Navigator, the Web page will detect this, generate a GIF image of the requested chart on the server, and send the browser the image.
For MSI, browser compatibility was a major goal. Spending the extra development time needed to build a Web site that captures approximately 99% of the world's browser market was well worth it for them.
With so many technologies available today for Internet development, it is important to know your audience. You must choose the technology that best meets your consumer base. If, for instance, you are building an intranet site and you know the only browser used on it is Internet Explorer 4.0, you can take advantage of things like the Internet Explorer 4.0 object model without worrying about browser compatibility. On the other hand, if you are building an Internet site where your audience will be wide and varied, as is the case for the MSI site, you'll need to pick your technologies more carefully. Research technologies before designing your site, so you know which ones are supported by different browsers. For instance, DHTML is supported by both Internet Explorer and Navigator, but their implementation of DHTML is different. Don't make assumptions about a browser's support of a technology. Sometimes two different versions of the same browser vary widely in what technologies they support.
A key part of most large sites is a subscription or membership service. Because the two are so tightly integrated, we'll talk about both subscription and access. MSI uses a fairly straightforward subscription model. It's a forms-based model. Your information, once entered into the MSI form, is picked up by an ISAPI filter, which uses ADO to check a SQL Server database. What is it checking for? Some pretty simple stuff, really. Is your username a duplicate? If you are signing up as part of a Microsoft Money six-month trial, do you have a valid product ID that hasn't been used before?
I've included a small flowchart so you can quickly get an idea of what is happening.
Figure 2. Subscription model flowchart
Let's talk about what happens when you go to enter your information. You are sent to the following page (https://investor.msn.com/secure/signup/account8.asp) to enter your personal information. All information is encrypted, as you can tell by the https: qualifier. The credit card is actually used to check a SQL Server database, where everything is encrypted using the Win32® CryptoAPI, to verify whether you have used that card before to get a free trial. Because the credit card numbers are all encrypted, not even the Investor team has access. One thing many people appreciate is that their credit card is not even charged during the trial subscription. They actually ask you to sign up again when your 30 days are up, rather than automatically charging your card.
To simplify things (primarily for the customer, but it also simplifies coding), the MSI team chose an "all-you-can-eat" model. With this model, you are either a subscriber or you're not. If you try to access a subscription page, your cookie is checked to verify that you recently and successfully logged on. If you haven't logged on, or it's been too long since your last activity, MSI presents you with a logon page and lets you know this is a subscription feature. Two files are used to do this. One is an include file, which is included in every subscription page and determines the validity of the user's logon. The second is the login.asp file, which collects and validates your information.
Login.inc -- This file is included on every subscription page in MSI and bounces the user to a logon page if he or she does not have a valid user ID. An invalid user ID could occur because the user's cookie timed out, the user hasn't logged on before, or the credit card was denied. MSI uses the ASP object request servervariables to track where the user wants to go so that the user is automatically sent to the desired page after successfully logging on.
Login.asp -- Unless the user has a valid ID, this logon page appears, which will ask for the user ID and password. It then validates the ID against the SQL Server database and, if successful, sends the user on his merry way, updating his cookie so he doesn't have to log on anymore.
It is a fairly simple model. But simple is good. MSI could have chosen to do something much more complicated, something that kept settings per user, charged by the quote or service, or even used Windows NT security to validate every user's access permissions. But this would have made it difficult for the user and more difficult to implement. The approach used also scales well and will work in the future when MSI goes from 50,000 paying subscribers to 500,000.
Decision Point: How many users do you expect, what kind of data do you need to keep about them, and in your wildest dreams, what kind of growth do you expect?
A main goal of this article is to closely examine a unique function of a major Web site that our readers can go to and clearly understand how it works. We have chosen to discuss MSI's Portfolio Manager because it is a popular function of MSI and it is available to all people who visit the site.
Portfolio Manager's purpose is, as its name implies, to track and manage an individual's personal investment portfolios. Some of its powerful features include the ability to track all your accounts in one location, the option of encrypting your data for confidentiality, and interactive views that let you decide how you want the data presented.
Portfolio Manager is available from the MSI site at http://investor.msn.com . Click on the Portfolio Manager link and follow along as we walk you through the user experience and its design.
For Internet Explorer and Netscape Navigator, Portfolio Manager relies solely on ActiveX technology. The first time you use Portfolio Manager, it will download and extract a .cab file to your system. Once it has finished, you will have an .ocx and two .dlls on your system that are used by Portfolio Manager. The .cab file is required to use a client-side ActiveX component on your local machine. Once the component is installed, Portfolio Manager is available. To add ActiveX functionality to Netscape Navigator, the MSI team shared a plug-in that CarPoint wrote so that Netscape users can also benefit from the added functionality provided by ActiveX.
The first thing you'll want to do in Portfolio Manager is add your account information and investment transactions. When you do this, all that actually happens is the data you enter gets written to a text file on your local system. This information then gets read out of the text file and displayed in Portfolio Manager. The benefits of keeping the data locally on the user's computer rather than on the Web server include:
One disadvantage of keeping the data stored locally is that if you go to another computer and run Portfolio Manager, your portfolio doesn't come with you. The MSI team provides an export function that will copy the text file onto a floppy disk, which you can import to a different computer.
A few seconds after adding a transaction, more information about your portfolio will appear, such as your stocks' current prices, daily change, and your investments' market value. This is accomplished by an ISAPI filter that that the MSI team wrote, which communicates with the ActiveX control on your computer. The quote data for your stock gets marshaled across the Internet and copied onto your computer where the portfolio calculations take place.
Even though the data in Portfolio Manager is all stored locally, certain users may still want the text file encrypted. The MSI team has provided a way to do this from within Portfolio Manager. By using the CryptoAPI and building their own encryption scheme, they have ensured that your data is safe from any unauthorized viewers. You must provide a password to unencrypt the data before you can view it.
To make sure no Web surfers feel left out, even those users who are not running Internet Explorer or Netscape Navigator can use Portfolio Manager. When a browser requests the page, ASP is used to check what type of browser it is. If it is not one of the supported ActiveX browsers, the customer's information will get stored on a SQL Server database that is maintained by Microsoft. The interface is not as dynamic as using the ActiveX version but still offers much of the functionality.
Walter Kennamer is the Software Design Lead for all of the ActiveX controls on the MSI site. I spent some time with Walter going over the Portfolio Manager technologies and major design decisions that were made.
Walter told me that the first big design decision was in which technology to implement Portfolio Manager. He said they considered three choices:
According to Walter, the biggest reason that they decided not to use static HTML was because of its inherent lack of flexibility. For instance, you cannot enable the user to dynamically change the view of their data. In this scenario, the application becomes less interactive and lacks the added functionality that makes Portfolio Manager so powerful. In addition, every request would have to go back to the server, which is slow and time-consuming.
The second technology the MSI team considered was Java applets. The nice thing about using a Java applet is that most browsers today ship with a Java Virtual Machine (JVM) so applets are widely supported. However, there were some major obstacles to using Java. If you recall, earlier I discussed that the data used by Portfolio Manager was stored locally on a user's hard drive. The issue with Java applets is that the security is so rigid you cannot perform any I/O (input/output) on a user's computer. This makes it virtually impossible to read and write data from a text file on the user's computer without going through complicated authentication routines.
The final choice, which was used, was ActiveX. ActiveX offered many advantages, including the data could be locally accessed, the interface could be flexible and interactive, and the code could be written very efficiently. The one major drawback was that unlike Java, the only browser that natively supports ActiveX is Internet Explorer. Given all the other positives about ActiveX, the team decided to work around this issue by writing a custom plug-in for Netscape Navigator so that those end users could benefit as well.
Once the technology was chosen, the next major design decision was determining the application requirements. The MSI site had one requirement above and beyond everything else: speed. They wanted the application to be small so that it would download fast. Their other requirement was that it could run on all Windows® platforms with no external dependencies. With traditional applications, developers can rely heavily on the use of class libraries, such as Microsoft Foundation Classes (MFC), which helps them write code faster but requires many files to be installed on a user's computer. The MSI team did not want to make the ActiveX file depend on the libraries being on the user's machine and thus increase the amount of download time for the end user. Given this, their only option was to not rely on any external libraries, which increased development time but ultimately provided a better solution for the customers.
The result of all this planning, decision-making, and development time was a lean .cab file that contained only one .ocx, two .dlls, and was only 432K. This includes not just the Portfolio Manager, but charting, a stock and fund finder, and a Find Symbol control. Now that's design and development at its finest!
As you probably know, one of the most critical tasks in producing a site like this is testing. The site changes several times daily and the links need to be tested, the functionality needs to be verified, it needs to be fail-safe and redundant, and the user experience needs to be good even when there is high load or spikes, and so forth. We had the opportunity to meet with Bob Archer, the testing manager, to talk about the MSI team's testing process.
There are seven people testing the site, or about 60% of the number of developers (and they are hiring more). Four-and-a-half people are focused on testing the client, one-and-a-half people are focused on the server side, and there is one lead. Because an MSI site goal is to have as broad a reach as possible, it makes sense to have that client/server balance. They need to test all clients and all operating systems, which is a big job. An interesting fact that came up when we discussed this balance with Bob was that the quote servers, running Windows NT, have never gone down.
One of the key things they look at is performance. What is the raw number of bytes transferred and what is the time to process? To do this, they used perfmon extensively, creating their own custom perfmon counters. They also created several of their own custom components that generate debug and performance information. They use a single daily drop machine to propagate code for the next version, subscribing to the tenet, "Get to a known state and stay there."
To test links, they have a link checker that was written using Visual Basic. You can imagine that this gets run a lot with a site that is updated three or more times a day. Even so, the basic link checking and so on is still about 50 percent automated and 50 percent manual effort.
Webmon is another tool they use. Webmon is written in ASP and consists of a screen divided into multiple sections displaying the status of all the servers and major processes running. They can tell at a glance what the status of each server is, and the application itself will send out pages that buzz your pager if bad things start to happen. The team designed the tool to have hooks in several of the ISAPI filters, SSOs, and so forth, and have written custom components that talk to those parts of the production architecture which in turn make that data available to ASP. Starting out, if a certain component had a problem, the team designed these hooks to monitor it to find out what happened after the fact. In the last couple of site versions, however, they planned these debugging and information hooks into each new component.
Decision Point: What would something going wrong look like? Can you anticipate that and make sure that your component can tell you when there is a problem, rather than waiting until a user calls up when the problem has gotten out of hand?
There are some things the team wants to do differently when they next redesign the site. One is to try better to test under actual load conditions. It is difficult to come up with scenarios that really mean anything and real-world server testing is tough. While you can use tools like WCAT or InetMonitor (available through the BackOffice® Resource Kit ) to load up the servers, you need to make sure that you test all the way to failure. You also need to ensure you are mimicking typical user behavior. If not, certain code sections may behave differently. For example, if the code that made quote lookups was more server-intensive than one might think, on a particularly heavy trading day the site could behave in strange ways and the user experience could suffer drastically. This could happen even if the servers were tested all the way to failure.
Downlevel testing also came up repeatedly as a difficult issue to master. There are many browsers out there, and a lot of operating systems, and they needed to test each of them. To do this, they have to be familiar with a lot of different behaviors. Which versions support tables, frames, scripting, and so forth?
Like most groups doing something like this, the MSI team didn't start out with people who had backgrounds in testing large-scale sites. Most of the people have applications backgrounds, with some systems and coding. This means that there is much less automation than one might expect, and that they adopt a milestone approach, much like producing any other product. It also means that when they have to get into the Windows NT debugger to find a potential problem, they can.
This process is actually a little different than we expected. Because the timing on a site like this is so critical, the publishing team went with a very straightforward model geared for speed, without a lot of bells and whistles. There are three separate content areas -- Insight, Market Summaries, and the Strategy Lab.
Insight. The content is written up in Microsoft Word, undergoes a quick content editing pass, and is sent to copyediting, where one of three producers manually copies it into a directory to be picked up for publishing. To deploy the content, the Microsoft Content Replication System is used. This handles deploying content simultaneously to all servers with no need for any down time. The MSI team uses a two-stage model of a staging server plus a deployment server, which basically means they always have a backup.
Market Summaries. Definitely geared for speed -- nobody wants to wait on these. These are rewritten and enhanced immediately from information wires, sent directly to copyediting, and posted on the site.
Strategy Lab. Basically follows the same process as Insight, but there is no content editing -- this content is completely in the voice of the strategist.
An interesting thing the MSI team has done to streamline things is to use a combination of Microsoft Word, Visual Basic for Applications (VBA), and Microsoft Access to massage much of the content into ASP pages. Either a member of the editorial staff or a freelance writer writes each article in Word. When the article is complete, a set of macros is run that basically dumps out ASP code. Microsoft Access is used primarily to hold a small repository of company information, such as the stock symbol and Internet link, so that all a writer has to do is refer to the company, and all of the links and current information are already taken care of. The macros are pretty fancy in that they take care of all of the downlevel compatibility issues as well. If there is a heading, for example, it will recognize that and produce something like the following code:
<%=Heading1%>This is a title<%=endHeadling1%>
The heading variables are defined in an included file that sets them to be CSS tags or standard HTML tags, based on the browser type.
As you can see, even with several custom tools, there is still more to do manually than one might expect. But that is how it is with all large-scale Web sites I've seen.
After reading this article, you probably have more questions than answers. If so, this article did exactly what we intended. We wanted to address the decisions you will need to make when designing a large-scale Web application, while at the same time giving you an inside look at the MSI site for reference.
Building a fast, secure, and scalable Web site takes planning, time, and work. However, if you spend the extra time in the beginning to address the critical questions that we have outlined throughout this article, your site will run markedly faster, smoother, and better than others will.
Who knows -- maybe your site will be the next one that Barron's magazine calls "the best all-around site . . . on the Web today."
Server Performance Optimization on Microsoft's Web Site
Tuning Internet Information Server Performance