Microsoft Corporation
June 1997
This document serves as a guide for evaluating the Webcasting, which will be introduced to users, Web authors, and corporations with the release of Microsoft® Internet Explorer 4.0.
Webcasting is the automated delivery of personalized and up-to-date information.
Why introduce a new model of content delivery on the Internet? Because traditional Web-browsing poses challenging problems for users:
To address these problems, Webcasting provides to each user automatic delivery and offline access to the information and Web sites that he or she uses most often. This new "push" model of content delivery has begun a wave of new technology that will revolutionize how users receive information on the Internet. Leading this wave, Microsoft Internet Explorer 4.0 provides superior technology that makes it easy to author "push" content, delivers solutions for home and corporate users, and introduces an open standard that is already bringing order and interoperability to a currently fragmented market. This white paper explains the benefits of Internet Explorer 4.0 Webcasting for users, authors, and third-party vendors.
Webcasting is designed to meet the needs of two different types of customers: dial-up/laptop users who are often offline (for example, home and mobile users), and LAN-based corporate users who are usually online. Webcasting improves the dial-up experience through hands-free delivery of content for faster offline use, so information can be viewed without requiring that the computer be connected to the Internet and without the bandwidth limits of a modem. For LAN-based corporate users, Webcasting provides personalized updates to requested information by notifying each user when new content is available, without actually downloading the content.
Microsoft Internet Explorer 4.0 Webcasting is divided into 3 scalable tiers:
Microsoft Internet Explorer 4.0 Webcasting is a completely open, scalable solution that works with any HTTP server, any HTTP proxy, any HTML Web site, and any Web authoring tool, and scales up to multicast push solutions. Webcasting enables any existing Web site to be "pushed" without requiring any re-authoring or modifications to the site. This "push," or rather "smart pull," is accomplished by crawling the site on a scheduled basis. Content authors can then optimize and personalize this Webcasting experience by authoring a "channel." That is, by creating a single file that indexes the content on an existing site. This file uses an open standard file format, the Channel Definition Format (CDF), for indexing and "pushing" structured content on the Internet. Corporate customers who want higher-bandwidth efficiency in a LAN environment can deliver true push delivery via bandwidth-efficient multicast protocols. This scalable approach enables corporate customers to choose the right level of investments for their needs. Corporations can save money because Microsoft's Webcasting solution doesn't require any new server investments.
Webcasting is designed around open standards so that sites and corporations can author content once for a variety of clients. The Channel Definition Format promises to bring order to a fragmented market of third-party "push" products that do not interoperate. Most "push" products today depend on proprietary protocols and expensive server or proxy software. In this market, each client product is tied to a particular server product, and content authoring is tied to the "push" protocol. This fragmentation confuses users and frustrates content authors. Based on the open CDF file format, Microsoft's extensible Webcasting architecture provides a means to unify the various "push" services and promises interoperability of clients, servers, tools, and network protocols from different third-party vendors.
On the whole, Internet Explorer 4.0 provides a Webcasting solution that is easier for authors, faster for end-users, cheaper for corporations, and open for third-party products and services.
This article first explains Webcasting of existing Web sites and how users can have any conventional Web site "pushed" to them without any work by the site author. The article then explains how a Web site author can extend Webcasting via Channels and CDF. To address scalability within a large corporation with bandwidth limitations, the article then discusses Webcasting using Multicast or "true-push" delivery. Finally, the article details the broad third-party momentum behind CDF, explains the specific benefits of Webcasting, and provides a competitive analysis.
Microsoft Internet Explorer 4.0 automatically enables Webcasting of any existing Web site without requiring modifications to the site. In order to Webcast content from a conventional Web site, Internet Explorer 4.0 performs a scheduled "site crawl" of the site content, checking for updated content and optionally downloading information for offline use. A user can initiate this process by "subscribing" to a Web site using the Favorites menu in Internet Explorer. (Note: When a user "subscribes" to a site, there is no payment involved. The word "subscription" only denotes scheduled delivery of content.) Once the user subscribes to a site, Internet Explorer 4.0 periodically checks the site for updated content and passively notifies the user of changes.
The key features of Web-site subscriptions are:
The benefits of Web-site subscriptions are:
To subscribe to an ordinary Web site, a user can access the Internet Explorer Favorites menu and choose the Subscribe action, which presents the subscription information (see Figure 1) and allows the user to confirm or customize the default subscription. If the user chooses to customize the subscription, a wizard (illustrated in Figure 2) walks through the process of deciding various options. Most important of these is the choice between simply monitoring for content changes and downloading updated content for offline use. In addition, the subscription wizard allows the user to customize the site crawl, choose the schedule for visiting the site, and pick the method of notification about site changes. If the user wishes, Internet Explorer can send the user e-mail containing the HTML contents of the selected URL each time the site changes. (E-mail notifications would use MHTML to send HTML-in-e-mail with any standard HTML-enabled POP3 or SMTP e-mail application. In the absence of HTML e-mail, this MHTML will display as a text message containing an HTML attachment.)
Figure 1. Subscription summary information
Figure 2. Subscription Wizard
After the user subscribes to a Web site, Internet Explorer 4.0 will automatically visit the site according to the schedule the user selected. Internet Explorer will notify the user of content changes either by providing a red "gleam" on the corresponding item in the Favorites menu or by sending e-mail. If the user has requested the ability to work with the information offline, Internet Explorer 4.0 will download updated content for offline use. The user can then browse the site offline, as all of the site-crawled pages will be cached on the hard drive, available without the need for an active Internet connection. If a user who is offline clicks on a link to a page that is not cached, a dialog box will prompt the user offering the option of connecting to the Internet to view the unavailable page.
There are two flavors of Web-site subscription, depending on whether the user has chosen to download content for offline use or to monitor only for content changes.
If a user customizes a subscription to check only for updated content, Microsoft Internet Explorer 4.0 periodically visits the site, checks the subscribed page to see if content has changed, and notifies the user if new content is available. Note that in this case no content is downloaded. This process is illustrated in Figure 3.
Figure 3. Subscribing to a Web site—monitoring changes only
If a user wants to view subscribed content, then Internet Explorer 4.0 follows the process illustrated in Figure 4. First, Internet Explorer visits the site on a scheduled basis and crawls the pages (according to the user's preferences), downloads only the modified pages, and notifies the user that new content is available for offline browsing.
Figure 4. Subscribing to a Web site for offline use. Note only new or updated content is downloaded.
It is interesting to note that on its own, the simple periodic site crawl described above is completely competitive with many other "push" solutions, including the Netscape Netcaster product offering. In fact, the Netcaster concept of "channels" is nothing more than "sites that are site-crawled on a scheduled basis." Even without considering the significant advantages of managed push (described in the next section), a simple comparison of the entry-level site crawl features of both products makes it clear that Internet Explorer 4.0 provides superior functionality versus the limited crawler in Netcaster. For example, Internet Explorer will automatically authenticate the user when crawling sites that require a username and password. In addition, Internet Explorer uses a single cache that is shared between the Web browser and the site crawler, thus avoiding dual copies of the same HTML file.
It should also be clear that the site crawl mechanism of Webcasting is not true "push," but is rather "smart scheduled pull." Like almost all other "push" products on the market, Internet Explorer 4.0's ability to notify users of new content makes the "smart pull" feel like "push" content. Internet Explorer 4.0 depends on such "smart pull" for Webcasting existing sites on HTTP servers, but the Webcasting architecture can also scale up to provide for rich, multicast "true push."
Absolutely no re-authoring work is necessary to enable users to subscribe to an existing Web site. However, an author who expects his/her site to be subscribed to can do two things to improve the end-user Webcasting experience:
While Basic Webcasting has some key advantages, there are some fundamental limitations that make it necessary to go beyond the technology that it employs, namely:
To address these key limitations of site crawling, Microsoft has introduced the next level of Webcasting, "Managed" Webcasting.
The Channel Definition Format allows an author to optimize, personalize, and fully control how a site is Webcast. Authoring a CDF file is the only step required to convert any existing Web site into a "Channel."
While delivering a basic solution for Webcasting, a simple site crawl does not provide adequate functionality to create a useful "push" experience for many of today's Web sites. Some common customer and site concerns include:
Site structure is unknown: A site crawl uses the page-link tree structure of a Web site to determine what content to "push" (smart pull) because sites provide no additional information about the structured organization of content.
Hit-or-miss content usefulness: The HTML in today's Web sites provides no cues to help a site crawl decide which links point to useful content and which point to useless content. Because no such cues exist, most crawlers set a maximum number of levels and a disk-space limit and hope that the content crawled proves useful.
Site crawl schedules don't match content-update schedules: Existing Web sites do not advertise content-update schedules, and therefore a scheduled site crawl may check for updates too often or not often enough.
To address these issues, Microsoft worked with various industry leaders in creating and proposing the Channel Definition Format to the World Wide Web Consortium (W3C). This file format is based on the broadly supported Extensible Markup Language (XML) standard. The CDF specification provided by Microsoft is also available. CDF is an open and easily authored format for publishing a Channel, allowing Web publishers to personalize and streamline the delivery of information to their customers.
The immediate benefits that CDF offers include the following:
To understand the benefits that CDF offers, consider the following analogies. First, CDF offers users the ability to select the content they wish downloaded, rather than simply pulling large amounts of data from a Web site in the hopes that some subset of that information is what they need. In this case, CDF is like a restaurant menu: Users select the food (information) they want, and only the food they ordered is delivered to them. Browsing without CDF is like ordering everything on the menu and having to search through the many items on the table for the one dish you wished to eat.
CDF also offers the ability to manage the amount of information delivered to users. In this case, CDF could be compared to an automatic sprinkler system that manages the scheduled flow of water to different parts of a lawn. Just as you wouldn't flood your yard to water a small area, users and corporate IS managers want to control the "flood" of information to users. In this example, CDF is the brains of the sprinkler system (the Web site) and allows users and administrators to control the type and amount of water (information) delivered to the desktop on a scheduled basis.
So what exactly is a CDF file? A bare-bones CDF file contains nothing but a list of URLs pointing to content. This file is easy to create, and requires no changes to existing HTML pages. A more advanced CDF file, such as the one shown below in Figure 5, includes URLs pointing to content, but can also include a schedule for content updates, a hierarchical organization of the URLs describing the Web site structure, and Title/Abstract information describing individual content items.
This file format allows a Web publisher to offer automatic Webcasting of Channels from any Web server to any CDF-enabled client machine on the Internet.
Figure 5. A sample CDF file
The CDF mechanism makes Channel creation a simple two-step process that does not require authoring of new content, does not require re-authoring of existing content, and does not require any programming—either on the client or on the server. Creating a Channel with CDF is as simple as 1) writing a CDF file with a list of URLs to existing content, and 2) linking to this CDF file to make it discoverable.
The CDF file allows a site author to specify which content is automatically Webcast, solving the hit-or-miss problem with crawling. CDF also allows a Web site to provide a content update schedule, thus enabling server-side load balancing and ensuring that no bandwidth is wasted on polling stale information. CDF gives content authors the power to decide how a Web site should be Webcast—that is, what content should be Webcast and how often it should be updated.
CDF provides an index or map of a Web site that describes the type of information contained on the site. Specifically, CDF describes logical groupings of information (for example, sports news or financial information), providing hierarchical structure and category information about a site. Because this information is completely independent of content format, a CDF-based Channel can include any kind of Web content or applications built on HTML, JavaScript, Java™, and ActiveX™ technology.
Figure 6. CDF can separate content from structure.
The HTTP cookie standard provides a powerful mechanism for personalizing Web content. Sites that employ CDF with Internet Explorer 4.0's Webcasting gain valuable functionality from this standard. A Web site can use standard HTTP cookies to deliver personalized information to users by dynamically generating a custom CDF based on user preferences. CDF thus utilizes the existing cookie standard for HTML personalization on the Web and takes it a step further with personalized Channels.
Microsoft worked with various industry leaders to create the CDF specification proposed to the World Wide Web Consortium. The CDF standard submission is an application of Extensible Markup Language (XML) work that the W3C now has in progress. XML is already widely supported in the industry and has received the voiced approval of many companies, including Microsoft, Netscape, Sun, and SoftQuad. Industry experts consider XML to be the next great Web revolution. This aspect makes the CDF approach to "push" even more appropriate, as it lays the groundwork for a whole new wave of XML innovation on the Internet. CDF is one of many applications of XML, which provides rich structured information on the Web without dependence on HTML layout.
Because it is based on XML, the CDF file format can be understood and put to limited use by various existing HTML parsers. Specifically, the CDF syntax can be understood even within the primitive site crawler in Netscape Netcaster, and therefore CDF files can be used within the limitations of Netcaster in order to specify resources that need to be site crawled.
Upon visiting a CDF-enabled Web site, a user can subscribe to the site's Channel in one of two ways: by either choosing the Subscribe action from the Favorites menu, or by clicking on a hyperlink to a .CDF file or a button advertised on the Web site. Both actions walk the user through the process for subscribing to a Channel—presenting the subscription summary (see Figure 7) and allowing custom options to be changed by walking through a wizard (illustrated in Figure 8). For the average user, subscribing to a Channel is just like subscribing to any Web site; behind the scenes, the difference is that the Channel is a Web site that includes a CDF file. Again, the most important choice in the Channel Subscription Wizard is the choice between only monitoring for Channel changes versus downloading updated Channel content for offline use. In addition to this choice, the Channel Subscription Wizard allows the user to customize delivery of the Channel by choosing the update schedule and other preferences. If the user wishes, Internet Explorer 4.0 can send the user e-mail containing the top-level HTML page each time the Channel content changes. (E-mail notifications would use MHTML to send HTML-in-mail with any standard HTML-enabled POP3 or SMTP e-mail application. In the absence of HTML e-mail, this MHTML will display as a text message containing an HTML attachment.)
Figure 7. Channel subscription dialog
Figure 8. Channel Subscription Wizard
After subscribing to the Web site, Microsoft Internet Explorer 4.0 will automatically add the subscribed Channel logo to the Channel Pane in the browser and to the Channel Bar on the Active Desktop (see Figure 9).
Figure 9. The Channel Bar on the Active Desktop
Figure 10. The Channel Pane in the browser (shown here full screen)
The Channel Bar provides easy access to all subscribed Channels. The Channel Pane (Figure 10) notifies the user when content has been updated and allows the user to browse the hierarchical structure of a Channel (information contained in the CDF file). If the user chooses to enable offline browsing of this Channel, Microsoft Internet Explorer 4.0 will download new Channel content through periodic scheduled updates to ensure that all the latest content pointed to by the CDF file is available for offline use.
Behind the scenes, any Web site that provides a CDF file can be subscribed to as a Channel. (Any HTML page on the site can point to this CDF file, using either the <A> (anchor) tag or the <LINK> tag in HTML.) There are two flavors of Channel subscription, depending on whether the user chooses to view content offline or to only monitor for content changes.
If a user customizes a subscription only to check for updated content, Internet Explorer 4.0 periodically visits the site, downloads only the CDF file, and updates the Channel hierarchy displayed inside the Channel Pane (illustrated in Figures 10 and 11). Note that the CDF provides rich information about new content, including headlines displayed in the Channel Pane. The CDF also provides categorized links to Channel topics; however, clicking on any of these topics will require additional HTML pages to be downloaded, because they have not yet been delivered for offline use.
Figure 11. Client downloads CDF on a scheduled basis to show new Channel headlines and hierarchy.
If a dial-up user subscribes to a Channel for offline use, then Internet Explorer 4.0 periodically visits the site, downloads the CDF and all associated Channel content referenced in the CDF, and updates the Channel hierarchy displayed inside the Channel Pane (as shown in Figure 12). The CDF file provides information about newly updated content, including headlines displayed in the Channel Pane. Furthermore, all the pages pointed to by links in the Channel Pane and all the Channel content is available for offline use.
Figure 12. Client downloads CDF and all accompanying pages from the Web site on a scheduled basis.
It is important to note that after downloading the Channel's CDF file, Internet Explorer 4.0 will not download any Channel content other than specified pages that have not been downloaded before (including content that has been updated). It also bears repeating that any content format may accompany a CDF file when updating a Channel, independent of whether it is an HTML page or a more complex Java or ActiveX application.
While traditional HTTP publishing is sufficient for most content delivery needs, there are scenarios where a different delivery mechanism is required. Microsoft provides an open, extensible information delivery architecture that makes it possible to integrate the market's existing "push" products with the Microsoft Internet Explorer 4.0 Webcasting client. Today, users face potential conflicts and added learning time with multiple push software products on their PC.
Microsoft Internet Explorer 4.0 can help reduce scheduling conflicts and user interface confusion by providing a standard method for users to schedule information delivery.
The Webcasting architecture in the Internet Explorer 4.0 client provides architectural hooks that allow third parties to provide value-added benefits to enrich the Webcasting experience. Specifically, the Webcasting architecture in Internet Explorer 4.0 allows for plugging in third-party client software that defines new URL transport protocols or provides an alternative delivery mechanism for Channels.
Microsoft utilizes this extensible architecture in order to support multicast, or "true push," in Internet Explorer 4.0. By taking advantage of special network hardware, multicast protocols provide bandwidth-efficient broadcasting of content throughout a corporate network. Because of Microsoft's extensible Webcasting architecture, the NetShow™ networked multimedia software component in Internet Explorer 4.0 can receive Channel content that is broadcast via such a protocol. Furthermore, in a recently announced relationship, NetShow will integrate with StarBurst Communications' reliable one-to-many Multicast File Transfer Protocol (MFTP) technology. With this technology available in Internet Explorer 4.0, organizations can now take advantage of the bandwidth efficiencies of IP multicast to reliably deliver content to their intranet- and Internet-based users.
Figure 13. Multicast protocols allow corporations to save bandwidth using Internet Explorer 4.0 Webcasting and CDF.
To provide similar bandwidth savings to home users, Microsoft's recently announced Broadcast Architecture for Windows® initiative will allow PC users to receive CDF-authored Channel content over existing and future broadcast networks, including high-bandwidth direct broadcast satellites, as well as analog and cable TV channels. This means that without dialing up or otherwise using a two-way connection to the Internet, Channel content will be kept constantly fresh on users' PCs. In addition, Microsoft's relationship with AirMedia promises to make Channel content available to home users everywhere via air waves.
The multicast and broadcast solutions above are examples of how Microsoft's open Webcasting architecture based on CDF makes it possible to add value beyond "smart pull" over HTTP. Because of Microsoft's investment in open standards and the CDF file format, third parties can now write interoperable client, server, or authoring tool software for Webcasting.
CDF is being embraced by most of today's leading push software vendors, including PointCast, BackWeb, AirMedia, FirstFloor, Torso, UserLand Software, DataChannel, Lanacom, NetDelivery, NCompass, Diffusion, Wayfarer, and many others. (See the complete list at www.microsoft.com/corpinfo/press/1997/Mar97/Cdfrpr.htm.)
In a close relationship with Microsoft, PointCast has embraced CDF as the standard format for Channel content, making it possible for authors to create Channels that can be viewed by all Internet Explorer 4.0 users as well as throughout the PointCast Network of "push" clients. In addition to PointCast, BackWeb, AirMedia and FirstFloor will be using Internet Explorer 4.0 as their strategic platform for information delivery and are working with Microsoft to provide integrated solutions for customers. Numerous server-side tool vendors such as DataChannel, UserLand Software, and Torso are making it easy to author Channel content based on CDF. Finally, the Netscape Netcaster client software can support CDF files within the limitations of Netscape's simple site crawl technology.
In addition to the strong third-party support for CDF, numerous Microsoft server products and authoring tools will make it easier to "push" Channel content to Internet Explorer clients. For example, the next version of Microsoft FrontPage® Web authoring and management software will provide direct support for authoring CDF channels within the product. The Active Server Pages architecture in Internet Information Server (IIS) 3.0 makes it possible to dynamically generate CDF files using server-side scripts. The newly announced SiteServer 2.0 allows rich personalization of custom CDF Channel content integrated with other Web site services. As mentioned earlier, NetShow™ and the new Broadcast Architecture for Windows initiative will enable multicast and broadcast of CDF-based channels. Finally, the Microsoft Proxy Server, which already reduces network traffic with its active intelligent caching and makes Internet access more manageable and secure, will be enhanced in its next release to support highly scalable and distributed cached networks optimized around Internet Explorer 4.0 Channels and CDF. (By configuring Internet Explorer 4.0 clients to work with a shared caching proxy server, corporate administrators can mitigate the effects of channel load on network resources. This solution is not only for Internet-based content, but also for large corporate Intranets. It is anticipated that some users of Microsoft Proxy Server 2.0 should be able to realize reductions in network load for HTTP traffic by as much as 50 percent.)
To advance the end-user Web experience, Microsoft is working closely with the world's leading content providers to deliver a set of premium channels in Microsoft Internet Explorer 4.0. Using the latest Internet Explorer enhancements such as Dynamic HTML, these premium channels will provide personalized content that is informative, engaging, and designed for efficient interactivity, without requiring time-consuming round-trips to the Web server. And because this content will be Webcast using CDF, Internet Explorer 4.0 will be able to deliver the information automatically, so it will be available for users to view anytime—even offline.
The examples above and many more throughout the industry should make it clear that end-users, Web site authors, and corporations will all benefit from the broad industry-wide support of the CDF standard.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, this document should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.