Click to return to the Server Technologies home page    
Web Workshop  |  Server Technologies

Getting Content from the Web to the Client


Nancy Winnick Cluts
Developer Technology Engineer
Microsoft Corporation

April 30, 1998

Contents
Introduction
Content Acquisition
Summary

Editor's Note: This article is one of three on technologies used by the Internet Start site. You'll also want to read The Internet Start Site: Now It's Personal and Got Any Cache?.

Introduction

Providing specialized content to the user is key to personalizing your Web site. The revamped Internet Start site (http://home.microsoft.com Non-MSDN Online link) offers personalization and the retrieval of content from various Web sites based on user preferences. The site enables you to view news headlines from sites such as MSNBC, the New York Times, Fox News, Time, and Slate. You can also specify content that includes the default search engine, video clips, stock quotes, local information, sports information, local weather, and family resources. Once you have specified the information that you would like to see, the site developers need to get that content to you the next time you visit the Web site. This article describes how the Internet Start site implemented a method to retrieve content from the Web to provide personalization on your Start page (the Internet Start site's home page). Before reading this article, I recommend that you first read up on the mechanics behind the personalization of the Internet Start site (The Internet Start Site: Now It's Personal); otherwise, you may well end up scratching your head when I talk about the Dictionary objects.

Content Acquisition

To gather content from their content partners, the Internet Start team uses a Visual Basic® program they created called Newsfeed. Two copies of this program are launched:

The Newsfeed program uses a Microsoft Access database to store information about each content provider. A timer is used to initiate content retrieval based on values stored in the database. Content on the site is refreshed at regular intervals (15 or 60 minutes) depending on the content type. In general, the following tasks are performed by this program each cycle:

  1. Channel Definition Format (CDF) files are retrieved from the content partners' Web sites.
  2. Using information supplied by the CDF file, the actual content is accessed and copied to an internal server.
  3. The content is assigned an identifier and packaged into a single, comma-delimited text file.
  4. The text file is copied to a staging server from where it is propagated to the live Microsoft Web servers.
  5. An Active Server Pages (ASP) file is then hit on the live servers, causing the new content to be loaded into global Dictionary objects, which are retained in server memory.

News Content

To gather news content for the Internet Start site, each provider's CDF file is accessed across the Web via HTTP. A custom server component does the accessing. Depending on the CDF file, content is extracted directly from the CDF file or from the location pointed to in the CDF file. This information is then written to a text file and put on a network share. Once all of the content has been gathered, the Newsfeed program packages up all of the content into a single textfile. This file is then copied to a propagation server.

The Newsfeed program then waits for the file to be copied to all of the live Web servers. After this has happened, the Newsfeed program accesses an .ASP file on each server, causing the text file to be read into server memory where the information is stored in Dictionary objects. During this process, the news from the previous hour is cleared from the Dictionary object. The news content is now available globally on the site to any ASP page.

Weather Content

Each hour, the Weather Channel places four weather data files on its server. The Newsfeed program retrieves these files and propagates them to the live Web servers. Cities in the United States are provided a five-day forecast while international cities are provided a three-day forecast. Once the files are propagated, the Newsfeed program refreshes the server memory by hitting another ASP file, which causes the new weather data to be read into a Dictionary object, replacing the data from the previous hour.

Event Content

The Event content is gathered in exactly the same way as the weather content via another instance of the Newsfeed program. The only differences are the refresh interval (every 15 minutes for event content) and the file into which the information is stored.

Stock Refresh

Stock prices are updated by means of a server-side object (the Microsoft Investor object). As a result, there is no need to continually propagate content to the live servers. In order to take some of the load off of the Microsoft Investor quote servers, quotes for the top 1000 stocks are preloaded into a Dictionary object every 15 minutes. Once every 15 minutes, the events Newsfeed program hits an ASP file to load the current quote information into the Stock Dictionary object.

TopBack to top

Summary

Now you know how the Start team gets all of that content for each visitor to the site. This is not the only way to accomplish this work, just one way that worked well for the Start team in their personalization efforts. The next article in this series will concentrate on caching data on the server to maximize the performance of your Web site.



Back to topBack to top

Did you find this material useful? Gripes? Compliments? Suggestions for other articles? Write us!

© 1999 Microsoft Corporation. All rights reserved. Terms of use.