Safe Web Surfing with the Internet Component Download Service

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

July 1996

Safe Web Surfing with the Internet Component Download Service

Mary Kirtland

Mary Kirtland is a member of the Microsoft Premier Developer Support team, working on support planning for new technologies such as ActiveX.

Everywhere I turn, people are talking about the Internet and the World Wide Web. TV news anchors carefully read off URLs (usually having no clue what they're saying) where you can get more information about community services. Print and broadcast ads display company home page addresses. Retailers display their wares, hoping to attract your business. What makes the Web so attractive? Unlike print media, information on the Web is easy to keep current, but the Web isn't as transient as broadcast media. It's really the best of both worlds-a fixed location where you can find continually updated information on just about anything.

The Web also offers an opportunity that traditional information sources do not. Instead of providing a fixed view of information, content providers can make their pages interactive. You can find out what movies are currently playing in theaters near your home, shop in virtual stores, even get personalized news delivered to your electronic doorstep, and this is just the beginning. Active documents, that is, Web pages that incorporate Microsoft ActiveXŞ controls, make revisiting a Web site compelling. You don't just visit a page once; you go back again and again because the information is always different and always under your control.

Active content doesn't come without a price. The code that activates documents needs to execute on some machine, and code always has the potential to do damage. When a user visits a new active document, how does he or she know it's safe to use? By the time he or she gets to the page, is it too late? This is, after all, an age when people write viruses that format hard drives for fun. As my mom might say, do you know where that code has been?

For many of the active documents you see today, the code resides on the Web server. This is usually safe-the webmaster knows where the code came from-but performance can suffer. Thousands or even millions of people might try to shop in the same virtual store simultaneously (www.microsoft.com got 190 million hits in April), and when they submit orders to the store, the Web server has to process them all. In the meantime, users twiddle their thumbs, waiting for their orders to be processed, and their computers sit idle.

An alternative is to move more of the code onto the client machines. Active documents are more responsive, since the work is performed locally rather than across the net. But there are potential problems. Someone has to put the code on the client machines for it to run, and every page a user visits may need different code for the page to work properly. For the sake of argument, let's say getting the code downloaded to a user's machine isn't a problem. However, when a user visits a new page, can he know it's safe to actually run the code on his machine?

One approach to easing concerns about installing unknown code on a machine is to create an environment where the code can't possibly do any harm and can't access the disk drives, video memory, or anything else that could potentially damage the user's system. Unfortunately, this sandbox approach significantly limits the kinds of interesting things an active document can do.

Another approach is to give the user some indication that the code is safe. Think about your last trip to your favorite software store. Did you consider whether the products you were looking at would harm your machine? Probably not. Why? Because you were looking at shrink-wrapped boxes that identified the manufacturer in a store you trusted.

What you need is a way to bring this same level of trust to code you find on the Internet. Let's say there is an authority everyone trusts. If a software vendor could prove their trustworthiness to the authority, the authority could give them a certificate of approval that the vendor could attach to their code. If you trust the authority, you should be able to trust anyone with a certificate. You would also want proof that the code and certificate had not been tampered with.

Active content is a cornerstone of the Microsoft Internet strategy, so considerable thought has been devoted to these problems. While Microsoft feels that the sandbox approach has its place, there are significant applications that can't be implemented in such a restricted environment. Thus, Microsoft is devising mechanisms that permit code authors to attach digital signatures to their code and permit users to inspect signatures before installing downloaded code. Digital signatures contain information about the vendor that created a file, and their certificates of approval give you some reason to trust them (see Figure 1). Digital signatures also contain information that can be used to check that the file has not been tampered with since the signature was attached. Think of digital signatures as shrinkwrap for the Internet.

Figure 1 A digital signature certificate

Microsoft's mechanism for inspecting and validating signatures is known as the Windows Trust Verification service. This service looks at the certificates of approval within a digital signature and tells you if it comes from a trusted source. The certificates within a signature form a trust chain: once you find a point on the chain you trust, you can trust all the certificates below it. To keep track of the certificates you trust, a secure database of sources is maintained on your machine. Initially this database contains a limited set of authorities that everyone trusts. Over time, as you indicates that additional sources are trustworthy, a trust hierarchy forms. The main purpose of the Windows Trust Verification service is to compare the contents of a digital signature to the trust hierarchy and indicate the trustworthiness of the code author.

The Internet Component Download Service

The Internet Component Download service is a mechanism through which applications download, certify, and install ActiveX component code from the Internet. This service uses Windows Trust Verification services internally to perform certificate checking. While the Internet Component Download service can only download ActiveX components, the Windows Trust Verification services are a general-purpose mechanism that you can use to download other types of files.

In this article, I'll describe how Internet Explorer 3.0 uses the Internet Component Download service to safely download and install ActiveX components. I'll also show you how to package your own components for this service. As I write this article, Internet Explorer 3.0 and the ActiveX SDK are still in alpha release, so this material is subject to change. Check the ActiveX SDK documentation listed in Figure 2 for the most up-to-date information.

Accessing Active Documents with Internet Explorer 3.0

First, let's see what happens when a user jumps to an active document. Figure 3 shows a sample active document. The document uses three different ActiveX controls. If these controls aren't installed on the user's machine, they need to be downloaded from some code server, checked to make sure it is safe to install the code, and installed on the user's machine before the document displays as the author intended.

Figure 3 An active Web page

In an active HTML document, the <OBJECT> tag indicates a control. In DWLDTEST.HTM (see Figure 4), which is the source file for the document shown in Figure 3, each <OBJECT> tag corresponds to one of the controls. The CLASSID attribute indicates the type of control and its value is a text representation of the CLSID associated with the control prefixed by "clsid:". The CODEBASE attribute indicates where the control code can be obtained.

Writing active HTML is outside the scope of this article, so I won't spend any more time discussing the <OBJECT> tag. If you want to learn more about this, the latest version of the specification is at:

 http://www.w3.org/pub/WWW/TR/WD-object.html

The important thing to consider is what happens when the browser spots an <OBJECT> tag and needs to create an instance of control. The Internet Component Download service is exposed to the browser through a single function, CoGetClassObjectFromURL.

 STDAPI CoGetClassObjectFromURL(REFCLSID rclsid, 
                               LPCWSTR szCodeURL,
                               DWORD   dwFileVersionMS,  
                               DWORD   dwFileVersionLS,
                               LPCWSTR szContentTYPE,
                               LPBINDCTX pBindCtx,
                               DWORD   dwClsContext,
                               LPVOID  pvReserved,
                               REFIID  riid, 
                               VOID**  ppv);

Figure 5 shows the high-level architecture of the component download mechanism. When the browser detects an <OBJECT> tag, it parses out the CLSID, file URL, and version number. The browser passes these as parameters to CoGetClassObjectFromURL, which downloads, verifies, and installs the component code, if necessary. It checks to see if the component is already installed and checks the local system for the specified CLSID. If a version number is specified in the CODEBASE attribute, the version of the installed component is compared to the version specified. If the component was already installed and the version number checks out, the server is loaded and a class factory is created. Otherwise, the browser starts to download code.

Figure 5 Internet Component Download Architecture

Locating and Downloading Code

URL monikers are used to download the required files asynchronously. The download service uses two pieces of information to locate the files: the szCodeURL parameter passed to CoGetClassObjectFromURL and the Internet search path, which is a list of object store servers specif-
ied in the registry key HKEY_CURRENT_USER\Software \Microsoft\Internet Explorer\CodeBaseSearchPath. The value for this key is a string in the following format:

 CodeBaseSearchPath = <URL1>;<URL2>;...<URLm>;CODEBASE;<URLm+1>;...<URLn>

Each URL is an absolute URL that points to HTTP servers acting as object stores. The download service first tries downloading files from URL1 through URLm, then the location specified in the szCodeURL parameter, then the locations URLm+1 through URLn.

The Internet search path mechanism gives the user or system administrator flexibility in determining how files download. By setting up URLs in the path before the CODEBASE keyword, you can install files from local intranet caches of common components. Removing the CODEBASE keyword from the path effectively disables component downloads from the Internet. By setting up URLs in the path after the CODEBASE keyword, you can install files from standard distribution points, even if the server specified in the active document is not available.

If you specify version numbers, the download service tries to download and install the file only if the version number specified is more recent that any version currently installed on the user's system. If the version number specified is -1, -1,-1,-1, the download service will always get the latest version of the file. When digging through the Internet search path, each object store receives an HTTP POST request containing the CLSID or MIME type and, optionally, a version number. The object store parses this information, checks its internal database of available files, and either fails or redirects the HTTP request to the appropriate downloadable file. If the request specifies a version number, the object store fails if it doesn't have a version equal to or later than the requested version. When you use the szCodeURL location, the version resource in the file can be compared with the requested version to determine if the correct file is available.

The download service also adds HTTP Accept and Accept-Language headers to all HTTP requests. This lets HTTP servers redirect requests for code based on the target platform or language. The Internet Component Download specification defines MIME types passed in the Accept header that identify the target operating system and CPU (see the specification for more details).

Once the download service figures out where to obtain the file, it can actually download the bits to a safe location on the user's machine. The URL moniker takes care of the actual data transfer. Remember, this happens asynchronously, so at this point CoGetClassObjectFromURL returns and further communication between the browser and the component download service occurs through the standard IBindStatusCallback mechanism for URL monikers and a new interface the browser must implement, ICodeInstall. ICodeInstall lets the download service display a user interface when it's verifying and installing code. It's documented in the Internet Component Download specification.

Verifying Files

Once the required files have been downloaded, the Windows Trust Provider Service function WinVerifyTrust is called to ensure that the file is safe to install. WinVerifyTrust searches the file for a signature block (also known as a digital signature). The signature block contains information about the author of the file, a public key, and an encrypted digest of the file's contents.

If it finds a signature, WinVerifyTrust validates the certificate. The validation process uses the concept of a trust hierarchy; each certificate is inspected for a parent certificate until it reaches the root certificate. WinVerifyTrust looks for the root certificate in the system's list of trusted root certificates. If WinVerifyTrust finds the root certificate, it inspects each certificate in turn to make sure the certificate is trusted by its parent until the original file certificate is tested. If the certificate is invalid for any reason, a message displays indicating that there is reason to be concerned about the contents of the file. However, the user always has the option to install the file anyway.

If the certificate is valid, the trust verification service decrypts the digest with the public key and regenerates the digest on the downloaded file. If the two digests don't match, the file has been tampered with. Again, a warning message displays for the user. The service also displays a warning if it doesn't find any certificate at all.

Even if the certificate is valid, the user or system administrator may elect to display messages before installing files on a system. The Windows Trust Verification Service actually defines two types of certificates: one for commercial developers and one for individual developers. The main differences are in the types of documentation developers must provide to qualify for the certificate and the types of security provided for the developers' private keys. Users can display warning messages for files from all commercial developers, all individual developers, commercial developers not previously encountered, individual developers not previously encountered, and so on. When the warning message displays, you can add the certificate holder to the trust hierarchy so future files from the same source install without warnings. The warning message tells you the name of the software, the identity of the publisher, and the issuing certificate authority so the user can make an informed decision about installing files.

Installing the Component

If the files are safe to install or the user elects to install them anyway, they are installed into the component download cache. In the initial release, the cache is a permanent store at the hardcoded location \windows\system\occcache. However, the download service spec does not specify that the cache is permanent and hardcoded, and hence this is subject to change in future versions.

Once installed, the component must register itself. For DLLs, this means loading the DLL and calling DllReg-isterServer. For EXEs, it means running the EXE and passing /RegServer on the command line. The download service adds information to the HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion registry key to keep track of what it has installed into the download cache. A new section, ModuleUsage, holds a list of owners and clients for each shared module. This differs from the shared DLL reference counting mechanism used by Windows because it identifies who is using a shared module. The ModuleUsage section currently determines whether an installed version of a file is out-of-date. In the future, the download service could identify files that can be removed from the cache using this information.

Creating the Object

After the component is completely registered, the server is loaded and a class factory is created and returned to the browser. Using the class factory, the browser creates an instance of the object and initializes it with any parameters specified in the <OBJECT> tag. The browser repeats this process for every <OBJECT> tag detected in the active document. You can see why it's important for the download and verification to happen asynchronously. There's a lot of work to be done and the actual downloading could take minutes! By running asynchronously, the component download service lets the browser display as much of the document as possible until all components are installed.

Note that this procedure applies to components specified in <OBJECT> tags only. (Microsoft plans to add support for document object components in a future version of the download service.) Browsers must use URL monikers and the trust verification services directly to download and install other types of files like code referenced by an <A HREF> tag, scripts, or fonts.

Making Components Trustworthy

If you are a component author, you'll want to make sure the user has the best possible experience when visiting active documents that use your components. You need to mark your components so the trust verification service recognizes them and then package your components for downloading from a Web site.

To mark your components you need to sign your code with a digital certificate. Signing code involves the following steps:

Running a one-way hash on the code and producing a fixed-length digest.
Encrypting the digest with your private key.
Combining the encrypted digest, your certificate, and your credentials into a signature block.
Embedding the signature block into your file.

The ActiveX SDK contains tools that help you generate test certificates and signatures and embed signatures in a Win32 Portable Executable (PE) format OCX, DLL, or EXE. Figure 6 lists the tools provided with the SDK. At the time this article was written, documentation for these tools was in the file bin\signcode.txt.

To generate a test certificate, run MAKECERT, which generates a random public/private key pair and a certificate file. To generate my certificate, I used this command line:

 MAKECERT -u:MaryKir -k:marykir.pvk -n:CN=MaryKirtland - 
     d:Mary-Kirtland marykir.cer

MAKECERT generated the certificate file, marykir.cer, and a file containing the private key, marykir.pvk. Next, I ran CERT2SPC to generate a signature block.

 CERT2SPC \inetsdk\bin\root.cer marykir.cer marykir.spc

The signature block is written into the file marykir.spc. The file root.cer is a fake root certificate provided in the SDK for testing.

To get a real certificate, you'll need to work with a certificate authority (CA), which is an organization like VeriSign that publishes policies for granting certificates and grants certificates to applicants who meet the stated criteria. Over time, Microsoft expects a hierarchy of CAs to develop. Microsoft also expects local registration agencies (LRAs) to emerge. They will verify evidence provided by applicants, but will rely on CAs to grant certificates.

When you apply for a code-signing certificate, you'll be required to provide credentials to the CA or LRA. You'll also need to generate a public/private key pair and include the public key with your credentials. Once your credentials are approved, the CA generates a software publisher certificate containing your public key and credentials. You can use this certificate to sign your code.

Once you have a certificate, run SIGNCODE to actually sign your code. SIGNCODE has a wizard-style interface that guides you through the process of selecting the file you want to sign, the file containing your certificate, and the file containing your private key. After signing the code, you can run the PESIGMGR and CHKTRUST tools to verify everything. Remember, if you change your file, you must re-sign it. CHKTRUST actually calls the WinVerifyTrust function, so you can see exactly what the user sees when your code is downloaded and verified.

Packaging Your Code

Once you have the files for your control in their final form, you need to package the code for download. There are three ways to do this: as a single PE file, as a cabinet (CAB) file, or as an INF file (see Figure 7).

If your control is totally self-contained, the PE format is the simplest way to package your control. The first <OBJECT> tag in DWLDTEST.HTM uses this approach.

 <OBJECT ID = "Image1" 
   CLASSID="clsid:bd11a280-2e73-11cf-b6cf-00aa00a74daf"
   CODEBASE="http://bigcompany/littleteam/eng/more/ 
                    marykir/codebase/wimg.ocx"
   HEIGHT=234 WIDTH=312>
   <PARAM NAME="Image" VALUE="winnet24.bmp">
</OBJECT>

The <OBJECT> tag specifies that the browser should create an instance of the WebImage sample control from the ActiveX SDK. The CODEBASE attribute specifies where the control can be found if it isn't already installed on the user's system. Note that the URL specifies the OCX file for the control directly.

When you use the PE-format method, you don't do any packaging at all-you just put your control out on a server. This won't work if your control depends on other files that might not be installed on the user's system, and it doesn't offer any file compression, so it's not the solution for every situation. However, if your control is small and self-contained, the PE format will work for you.

An alternative is to package your control in a CAB file, which is an archive of files compressed via the Lempel-Ziv method. The second <OBJECT> tag in DWLDTEST.HTM specifies an instance of the Smile sample control from Visual C++¨ 4.x.

 <OBJECT  ID="Smile1"
   CLASSID="CLSID:175CB003-BEED-11CE-9611-00AA004A75CF"
   CODEBASE="http://bigcompany/littleteam/eng/more/
                    marykir/codebase/smile.cab"
   HEIGHT=80 WIDTH=80>
</OBJECT>

Note that the CODEBASE attribute specifies a CAB file rather than the OCX.

Using a CAB file offers two advantages over the PE-format method. First, CAB files are compressed, which reduces the time required to download your control. Second, you can download multiple files; if your control uses any dependent files, you can put them in the CAB. Be careful to include only those files that must be downloaded in the cabinet or you'll waste the user's time downloading code that's already installed. For example, you might not want to include the MFC run-time DLLs since the odds are pretty good that some other application or component already installed them.

When you package your control in a CAB file, collect all the required files and write an INF file that provides further installation instructions. The INF file refers to files in the CAB and to files at other URLs (which is how you handle files that may be installed already). The INF file syntax understood by the Internet Component Download service is a subset of the standard Setup INF file syntax you may be familiar with (see Figure 8).

The value of the file key indicates where the file can be downloaded from. It can be a URL or the special value thiscab, which indicates that the file is located in the CAB file where the INF file came from. If no value is specified, component download fails if the file is not already installed on the user's machine.

FileVersion=a,b,c,d

The FileVersion key specifies the minimum required version of the file specified by the File key. If no value is specified, any version is acceptable.

File-%opersys%-%cpu%=[url | ignore]

%opersys% can be one of [win32 | mac] currently. %cpu% can be one of [x86 | ppc | mips | alpha]. A URL can be specified to indicate the correct file for the target operating system and CPU, or the special value "ignore", which indicates the file is not required for the specified platform.

Clsid={nnnnnnnn-nnnn-nnnn-nnnn-nnnnnnnnnnnn}

The value of the clsid key is the string representation of the component CLSID, enclosed in {}.

DestDir=[10 | 11]

DestDir can be set to 10 to place the file into the \windows directory or to 11 to place the file into the \windows\system directory. If no value is specified, the file is placed in the cache directory.

If you look at SMILE.INF (see Figure 9), you'll see that two files are required. SMILE.OCX is included in the CAB file, so File=thiscab is specified. MFC40.DLL is also required, but may be installed on the user's system already. A URL is provided so MFC40.DLL can be downloaded, if necessary.

Once you've got all your control's files and an INF file, a tool called DIANTZ.EXE, which is provided in the ActiveX SDK, creates the CAB file. To use DIANTZ, write a directive file (DDF) that specifies which files to combine into a cabinet. There doesn't appear to be any documentation about the format of a DDF file other than a brief example in the Internet Component Download specification. You should be able to use the same format for your own controls, just changing the value of CabinetNameTemplate and filling in your own list of files.

 ; DIAMOND directive file for My Component
.OPTION EXPLICIT ; generate errors on variable typos
.Set CabinetNameTemplate=myfile.CAB
; The files specified below are stored, compressed in 
; cabinet files
.Set Cabinet=on
.Set Compress=on
myfile1
myfile2
...

SMILE.DDF is the directive file used to create SMILE.CAB.

After you create the DDF file, run DIANTZ from the command line

 DIANTZ.EXE /f myfile.DDF

where "myfile" is the name of your DDF file.

The third packaging option is to specify an INF file that contains directions for downloading and installing your control. The third <OBJECT> tag in DWLDTEST.HTM tells the browser to create an instance of the Button sample control from Visual C++ 4.x.

 <OBJECT    ID="Button1"
   CLASSID="CLSID:4A8C998F-7713-101B-A5A1-04021C009402"
   CODEBASE="http://bigcompany/littleteam/eng/more/
                    marykir/codebase/button.inf"
   HEIGHT=50 WIDTH=80>
</OBJECT>

In this case, the CODEBASE attribute specifies an INF file. The primary advantage of specifying an INF file in the <OBJECT> tag is platform-independence. Since the browser downloads the INF file first, you can specify files to download for different target platforms. When the INF file is interpreted, the appropriate set of files is downloaded.

INF files also offer the most flexibility for downloading the minimum amount of code required to get your control working. With this approach, you do not get the compression offered by the CAB method (although an INF file can point to CAB files, which means only the INF file is uncompressed). However, if your control is fairly small and relies on other files that are probably installed on the user's machine, or if platform-independence is important, the INF file method is your best packaging choice. Figure 10 shows an INF file that works for the x86, MIPS, and Macintosh platforms.

Conclusion

Active documents on the Web add a whole new dimension to the Internet experience. This power is not without a price, however. Active documents require code, and any time new code is installed on a machine there is the potential for disaster. Users need a way to know what code can be trusted to be virus-free and non-malicious.

The code-signing mechanisms introduced by the Windows Trust Provider Service let component authors identify themselves and verify that their code has not been tampered with. The Internet Component Download service provides a straightforward way for browsers to download and verify code signatures before installing new code on a user's machine. Together, these services will bring safe surfing to millions of Web users.

From the July 1996 issue of Microsoft Systems Journal.