Building ISAPI Filters and the CVTDOC Sample

Adam Blum
Microsoft Corporation

April 1996

Note   This document is an early release of the final specification. It is meant to specify and accompany software that is still in development. Some of the information in this documentation may be inaccurate or may not be an accurate representation of the functionality of the final specification or software. Microsoft assumes no responsibility for any damages that might occur either directly or indirectly from these inaccuracies. Microsoft may have trademarks, copyrights, patents or pending patent applications, or other intellectual property rights covering subject matter in this document. The furnishing of this document does not give you a license to these trademarks, copyrights, patents, or other intellectual property rights.

Contents

Abstract

Introduction

1. Introduction to ISAPI Filters

2. Building the CVTDOC ISAPI Filter
2.1. What CVTDOC Does
2.2. Building the Filter
2.2.1. Requirements
2.2.2. GetFilterVersion
2.2.3. HttpFilterProc
2.2.4. Building and Testing
2.3. Building the Conversion Programs
2.3.1. Word-to-HTML Conversion: DOC2HTM.EXE
2.3.2. Microsoft Excel-to-HTML Conversion: XL2HTM.EXE
2.3.3. Text-to-HTML Conversion: TXT2HTML.BAT
2.3.4. Creating Your Own Conversions

3. Using CVTDOC
3.1. Installation
3.2. Conversion Programs
3.3. Usage

4. References

Abstract

This article details the programming required to successfully build Internet server application programming interface (ISAPI) filters, a powerful technology for extending the functionality of ISAPI-compliant Web servers such as the Microsoft® Internet Information Server (IIS). After explaining the ISAPI filter specification (part of the Microsoft ActiveX™ server framework) in general, it describes an example ISAPI filter, CVTDOC, in detail. CVTDOC is an ISAPI filter that allows a Web server to perform automatic file publishing by converting files on the fly from their application's native format to HTML.

Introduction

The Web's growing popularity for information publishing and retrieval has made many a custom-developed application obsolete. End users and Webmasters can easily create Web content that approximates what was done with custom applications. As a developer, does this mean that if a Web-based approach is chosen for building a solution, you are out of the loop? No way! Microsoft® Internet Information Server (IIS) provides a host of capabilities in the Microsoft ActiveX™ server framework for using your Microsoft Visual C++® development magic to provide advanced capabilities for Web-based applications. In this article, I'll explain one of these technologies, ISAPI filters, that will allow you to add some particularly cool features to your Web site. To get you hooked, I'll give you a free sample. Use the CVTDOC filter to generate HTML files dynamically. A filtered Web server provides smooth HTML conversion every time. (Warning: ISAPI programming is definitely addictive.)

The ISAPI Filter specification (included in the Microsoft ActiveX SDK at http://www.microsoft.com/intdev/sdk/)provides the capability of registering a DLL to intercept specific server events and perform appropriate actions. Unlike ISAPI itself, which is an improvement over the Common Gateway Interface (CGI) that Web servers have used for years, ISAPI filters are an entirely new capability in the world of Web servers. In effect, ISAPI filters let you extend the capabilities of your Web server. The ISAPI filter you build says to the Web server, "Hey, when something like this happens, let me handle it." Your filter can then handle the event entirely, process the event, and leave it available for the Web server and other filters to handle, or your filter can decide on the fly that it's not an event it needs to process at all. For example, you can create ISAPI filters to:

This last, extremely powerful capability is one that I'd like to explain in this article. First, I'll show you briefly what you need to do to write an ISAPI filter in general. Then we'll walk through the code of an example filter that enhances the capabilities of a Web server. The CVTDOC ISAPI filter sample allows your Web server to "automatically publish" files by dynamically converting them to HTML. Although CVTDOC is primarily meant as an example of ISAPI filters, you may find it useful in its own right. In the final section, I'll show the details of how to install and use the CVTDOC sample.

1. Introduction to ISAPI Filters

ISAPI filter authors must create two main functions for export: GetFilterVersion() and HttpFilterProc(). GetFilterVersion() is called just once by the Web server: On server startup when loading all filters. GetFilterVersion() should:

GetFilterVersion() takes just one argument of a structure that will store this information (version info, priority, and event flags).

BOOL WINAPI GetFilterVersion( PHTTP_FILTER_VERSION pVer );

The HTTP_FILTER_VERSION structure looks like this:

typedef struct _HTTP_FILTER_VERSION {
    DWORD     dwServerFilterVersion;
    DWORD     dwFilterVersion;
    CHAR         lpszFilterDesc[SF_MAX_FILTER_DESC_LEN+1];
    DWORD     dwFlags;
} HTTP_FILTER_VERSION, *PHTTP_FILTER_VERSION;

The GetFilterVersion() function should fill in the dwFilterVersion, lpszFilterDesc, and dwFlags structure members. Most importantly, dwFlags needs to have all events that it is interested in registered for by turning that flag bit on.

The events available to register are listed in Table 1.

Table 1. GetFilterVersion() Function—Events Available to Register

Event ID Description
SF_NOTIFY_READ_RAW_DATA Intercept data going to the server.
SF_NOTIFY_SEND_RAW_DATA Intercept data going from the server back to the client.
SF_NOTIFY_AUTHENTICATION Call your filter when authentication event occurs. Used to implement custom password schemes.
SF_NOTIFY_LOG Call your filter when the server is about to log a resource access or other event. Lets you implement your own custom logging schemes.
SF_NOTIFY_URL_MAP Call your filter when the server is mapping a logical path to a physical path. In effect, this is called every time a resource on your server is accessed.
SF_NOTIFY_PREPROC_HEADERS Called before server preprocesses headers coming from Web client.
SF_NOTIFY_END_OF_NET_SESSION Call your filter when the user's session is about to end.
SF_NOTIFY_SECURE_PORT Include with other flags if you want filter called when running over secure port (such as http://...).
SF_NOTIFY_NONSECURE_PORT Include with other flags when running over a normal HTTP connection (almost always included in your filter flags).

The SF_NOTIFY_READ_RAW_DATA and SF_NOTIFY_SEND_RAW_DATA flags allow the ISAPI filter dynamic-link library (DLL) to intercept data going from the client to the server (READ) or from the server back to the client (SEND), and store and manipulate the data for its own purposes. Intercepting the SF_NOTIFY_AUTHENTICATION event allows the filter to insert its own authentication scheme for use with the server. The SF_NOTIFY_LOG event allows the filter to supplement or replace the IIS logging mechanism with its own logging method. The SF_NOTIFY_URL_MAP is a good event to intercept if you want to change how the server responds to a request for a URL resource. For example, we will intercept the SF_NOTIFY_URL_MAP in the CVTDOC filter to create the file requested by the URL. The SF_NOTIFY_SECURE_PORT and SF_NOTIFY_NON_SECURE_PORT flags can be ORed with the events requested to allow your filter to restrict its operation to situations where the HTTP server is running over a secure port or over a normal HTTP session.

The other externally available function, HttpFilterProc(), is called by the HTTP server (for example, IIS) each time one of these events the filter is interested in occurs.

DWORD WINAPI HttpFilterProc(
    PHTTP_FILTER_CONTEXT pfc,
    DWORD NotificationType,
    LPVOID pvNotification
);

The first argument is an HTTP_FILTER_CONTEXT structure that has information about the server session, has function pointers available that can get more information about the server session, and can add headers or data to the response going back to the client. In the CVTDOC sample, this argument is not used; you won't always need to use this argument. The next argument indicates the event notification type. This determines what event triggered the call of your filter. It is almost always used because, as good form, you want to make sure that you are not processing events that do not interest you. Also, a single filter may be registered for multiple events, and HttpFilterProc() may have conditional logic based on the event that triggered its call. For example, a filter may be registered for the SF_NOTIFY_READ_RAW_DATA and SF_NOTIFY_SEND_RAW_DATA events, where it processes some of the data passing from the client to the server or from the server to the client. But the details of its actions will likely vary slightly depending on the direction, so it needs to know the triggering event. The third argument stores data associated with an event in a structure. Available structure types are listed in Table 2.

Table 2. HttpFilterProc() Function—Available Structure Types

Structure Type Description
HTTP_FILTER_RAW_DATA Points to the data passed back by a READ or SEND event.
HTTP_FILTER_PREPROC_HEADERS Accesses the client headers before the server processes them.
HTTP_FILTER_AUTHENT Provides user and password information from the server about to authenticate the client.
HTTP_FILTER_URL_MAP Provides the physical path resulting from the server mapping a logical path.
HTTP_FILTER_LOG Provides a variety of information about the client and its request that can be logged by the filter or changed to affect the native logging of IIS.

Once you have the information on the event type and the data associated with the event, your filter can do its work. Once the work is complete, the filter should return a valid return code. If you are not concerned with the event, you should immediately return SF_STATUS_REQ_NEXT_NOTIFICATION. If you handle an event and do not want any other filter or the server to handle it, return SF_STATUS_REQ_HANDLED_NOTICATION. If you handled an event, but it's all right for other filters and the server to deal with the event as well, return SF_STATUS_REQ_NEXT_NOTIFICATION. SF_STATUS_REQ_ERROR can be returned to indicate an error in the filter (reserve this for fairly serious problems). SF_STATUS_REQ_READ_NEXT can be returned to request to see more of the data being passed back to the client or received by the server, expecting to be called again with more data in the HTTP_FILTER_RAW_DATA structure.

Table 3. Valid Filter Return Codes

Return Code Use
SF_STATUS_REQ_NEXT_NOTIFICATION The filter is not concerned with the event.
SF_STATUS_REQ_HANDLED_NOTICATION The filter handled the event and will restrict other filters and the server from handling the event.
SF_STATUS_REQ_NEXT_NOTIFICATION Event is handled. It is all right for other filters and the server to handle the event now.
SF_STATUS_REQ_ERROR An error occurred in the filter. Reserve this return for fairly serious problems.
SF_STATUS_REQ_READ_NEXT Request to see more of the data being passed back to the client or received by the server. Expects to be called again.

Once your filter is built, you can install it on IIS by running REGEDT32.EXE and adding the DLL name to the key: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3Svc\Parameters\Filter DLLs. Ideally, this should be done by a SETUP program that accompanies your filter.

This is the essence of what's required to build and use an ISAPI filter. To give you a better sense of what's involved in building an ISAPI filter, I'll describe the construction of an ISAPI filter sample that ships with ActiveX server framework: CVTDOC. This should also give you a sense of the types of problems to which ISAPI filters can be applied.

2. Building the CVTDOC ISAPI Filter

First, I'll briefly describe the purpose of CVTDOC, then how I developed the ISAPI filter itself, and finally present how each of the supplied conversion programs was built.

2.1. What CVTDOC Does

CVTDOC is a simple ISAPI filter I wrote in response to a need from several clients for "automatic file publishing": Generating HTML on the fly for specific document types. CVTDOC uses the capability of an ISAPI filter to supplement server capabilities by registering itself as intercepting all URL map events and then checking to see whether the document type requested is one that it knows how to convert.

The following fragment from the CVTDOC documentation (CVTDOC.DOC when you download the sample) may explain this requirement better:

Web content creators and Webmasters often want to "publish" a document or data file on the Web. However, it can be very inconvenient to constantly run a conversion program to generate new HTML each time the document or data file is updated. Relying on the Webmaster to run the conversion program for data that is often updated is also prone to error. If you are positive that the user has the software to display the document in native form, no conversion is necessary, but this is dangerous to assume. It would be great to be able to leave the document in native form and have the Web server (or a Web server add-in such as CVTDOC) convert the document to HTML on the fly as needed.

CVTDOC is an Internet Services API (ISAPI) filter that dynamically converts documents to HTML if required when the HTML file is accessed. If the HTML document is out of date (older than the source document) or missing, it is automatically generated from the ISAPI filter, based on "conversion programs" registered for the source document type in the Registry. I provide sample conversion programs for Word documents, Microsoft Excel spreadsheets, and text files, but it's important to remember that this can be used for any document type. The primary purpose of CVTDOC is to demonstrate the powerful capabilities of ISAPI filters. Nevertheless, I think you will find it useful in its own right.

The following section describes in detail how the filter was constructed. It's relatively easy to lose sight of the forest for the trees here: A quick glance at Section 3 on installing and using the filter (both of which are really quite simple) may help avoid any disorientation as you plow through the minutiae of how this was built.

2.2. Building the Filter

Now that we know what's required, we can proceed to develop the filter. The basic steps are:

2.2.1. Requirements

The filter needs to be able to intercept URL requests ending with a reference in the format:

filename.extension.htm

...and convert the file filename.extension to filename.extension.htm if and only if the HTML file is missing or older than the source. For example, an HTML hyperlink reference such as:

<A HREF="specials.doc.htm">

...should result in CVTDOC conditionally converting the SPECIALS.DOC file to HTML. CVTDOC should first check whether the HTML for that document already exists. If not, or if the HTML file is older than the source data file, it is a candidate for automatic conversion to HTML. CVTDOC searches through a list of registered data file types and associated conversion programs stored in the Registry, looking for a conversion program for the given extension (such as .DOC, .XLS, or .TXT). If it finds a conversion program, that program is launched to generate the specified filename.extension.htm file (for example, SPECIALS.DOC.HTM).

Why does the HTML author need to use the strange syntax (SPECIALS.DOC.HTM) instead of just embedding the file reference (SPECIALS.DOC) and somehow configuring CVTDOC to know to convert all .DOC files to HTML? First of all, you may still want to embed references to a .DOC file and have it launch Word or, in general, embed a reference to the native file format and have the reference launch a viewer for that format if it is present. Using the syntax presented, references to the native format are still possible. More fundamentally, the Web browser is always going to attempt to launch a helper application if the URL ends with an extension of the native file format and not .HTM or .HTML. The URL ending with .HTM makes the browser expect HTML back, which is what it gets.

What we need is a filter that intercepts every request for a file of a type for which our filter can perform a conversion. From looking at Table 1, it might seem that there is no explicit "file requested" event, but in fact there is. As long as the request is for a file on our local site, a URL mapping event (which can be intercepted with the SF_NOTIFY_URL_MAP flag) takes place. That is, if the URL reference is SPECIALS.DOC.HTM or any other URL that resolves to a local file, such as http://ourstore.com/specials.doc.htm, a URL mapping event will take place on the server to convert the logical URL to a physical file path. The filter should intercept each URL mapping event by setting the SF_NOTIFY_URL_MAP flag in the HTTP_FILTER_VERSION structure on the GetFilterVersion() call. The other flag set should be SF_NOTIFY_ORDER_HIGH in order to get the notification as early as possible and make the necessary conversion, before other filters that may need to use the resulting data try to access it.

As preparation for writing the HttpFilterProc call, the pseudo-code for doing actual filter processing is the following:

IF URL request is filename.ext.htm
<!--HtmlStart-->&nbsp;&nbsp;&nbsp;<!--HtmlEnd-->IF filename.ext EXISTS
<!--HtmlStart-->&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<!--HtmlEnd-->IF filename.ext.htm MISSING or OLDER than filename.ext
<!--HtmlStart-->&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<!--HtmlEnd-->LOOK FOR CONVERSION PROGRAM FOR ext
<!--HtmlStart-->&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<!--HtmlEnd-->IF FOUND
<!--HtmlStart-->&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<!--HtmlEnd-->CONVERT filename.ext TO filename.ext.htm

With this information about the desired functionality, we're now ready to write the ISAPI filter.

2.2.2. GetFilterVersion

The first function we need to write is GetFilterVersion(). This performs the three steps outlined in the discussion of GetFilterVersion responsibilities for all ISAPI filters identified earlier:

This is done with:

pFilterVersion->dwFilterVersion = HTTP_FILTER_REVISION;
strcpy (pFilterVersion->lpszFilterDesc,
"CVTDOC - Converts document or data into HTML if HTML not present or older");

This first step provides the ISAPI filter revision number back to the server, as well as a text description of CVTDOC.

Both of these latter steps are accomplished by setting flags as shown:

pFilterVersion->dwFlags=(SF_NOTIFY_ORDER_HIGH | // be sure to intercept!
            SF_NOTIFY_SECURE_PORT |
            SF_NOTIFY_NONSECURE_PORT |
            SF_NOTIFY_URL_MAP  // tell us about all URL requests
            );

This sets the notification priority high, tells the filter that we are interested in both in sessions over secure and nonsecure ports, and registers the filter for all URL map events.

The entire GetFilterVersion() code is:

BOOL WINAPI GetFilterVersion (PHTTP_FILTER_VERSION pFilterVersion)
{
pFilterVersion->dwFilterVersion = HTTP_FILTER_REVISION;
strcpy (pFilterVersion->lpszFilterDesc,
"CVTDOC - Converts document or data into HTML if HTML not present or older");  
// now register for events we're interested in 
pFilterVersion->dwFlags=(SF_NOTIFY_ORDER_HIGH | // be sure to intercept!
    SF_NOTIFY_SECURE_PORT |
    SF_NOTIFY_NONSECURE_PORT |
    SF_NOTIFY_URL_MAP  // tell us about all URL requests
    );
    hEvtLog=RegisterEventSource(NULL,"CVTDOC");// open up event log
    return TRUE;
}

2.2.3. HttpFilterProc

Now we just need to write the HttpFilterProc() procedure, and then we're almost done. We've already developed the pseudo-code for what it needs to do. Here is the essence of the implemented HttpFilterProc:

// Make a copy of the supplied filename that was requested
// so that we can determine what the source file is
strcpy(szSrcFile,pURLMap->pszPhysicalPath);

// Check to see if there's an extension and then save a pointer to it
if (pszExt=strrchr(szSrcFile,'.')){ // check for extension

   // This is the request for a .htm or .html file
   if (!strnicmp(pszExt,".htm",3)){ // is it HTML?

      // Zap the extension on the copy of the file to get the source filename
      *pszExt='\0';
      // check for access() returning zero, indicating presence of source file
      if (!access(szSrcFile,0)){//check for presence of file
      
         // This function checks to see if the source file is newer
         // than the requested file, or if the requested file is
         // just not present
         if (FileDateCompare(szSrcFile,pURLMap->pszPhysicalPath)>0) 
         
         
            // This looks for a conversion program to run based on extension
            // then runs the conversion program
            if (CvtToHTML(szSrcFile,pURLMap->pszPhysicalPath)==TRUE)
            
               // This indicates that the filter handled the request for 
               // the URL so no other filters process
               return SF_STATUS_REQ_HANDLED_NOTIFICATION;
      } // end check for presence of file

} // End is it HTML?

} // End check for extension
// .
// .
// If we didn't attempt conversion, control is passed to next filter
// by returning SF_STATUS_NEXT_NOTIFICATION
return SF_STATUS_NEXT_NOTIFICATION;

First, we parse out the source file from the full HTML file (pURLMap->pszPhysicalPath) by copying the physical file path into szSrcFile and stripping off the .HTM extension if the extension is there (if it's not, then this is not a candidate URL for automatic conversion). Then we check for existence of the source file (access() returning 0 indicates presence). If the source file is there, then we check to see whether the source file is newer or if the HTML file is missing (with the FileDateCompare() function that we write elsewhere). If so, we attempt to convert the source file into HTML using the CvtToHTML() function. This function checks for available conversions in a Registry subkey called Conversions, created just for CVTDOC, which contains extensions (such as .DOC, .XLS, and .TXT) and their associated conversion programs. The Conversions key is located under the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3SVC\Parameters key. Creating it and filling it with values (file extensions and corresponding conversion programs) is part of the installation process documented in Section 3 If the conversion is not attempted, then control is passed to the next filter or to the server itself by returning SF_STATUS_NEXT_NOTIFICATION. If the conversion fails or no conversion program is found, these failures are reported to the Microsoft Windows NT® event log. In this case, the likely message to the Web user is a "404 Not found" error showing on their Web browser, unless the HTML file is already present on the server.

2.2.4. Building and testing

This section provides some tips on building ISAPI filters that may illustrate to you how simple it really is and so to encourage you. Make sure that your source file contains the following headers:

#include <httpext.h>
#include <httpfilt.h>

Make sure that your INCLUDE environment variable contains the ISAPI header directory (such as C:\INETSDK\INCLUDE) and that your LIB environment variable points to the ISAPI libraries (such as C:\INETSDK\LIB\I386). A makefile is supplied with CVTDOC on which you can model your ISAPI filter makefile, but it's worth a look to see how simple it is. ISAPI programs in general, and ISAPI filters in particular, are really very lightweight.

CC=cl -c
CVARS=-DWIN32 -DNDEBUG 
LINK=link
LINKOPT=/DLL
LIBS=wininet.lib user32.lib 
OBJS=cvtdoc.obj
LINKOUT=/OUT:cvtdoc.dll
DEFS=cvtdoc.def
.cpp.obj:
    $(CC) $(CFLAGS) $(CVARS) $*.cpp
cvtdoc.dll: $(OBJS) $(DEFS)
    $(LINK) $(LINKOPT) /DEF:$(DEFS) $(LINKOUT) $(LIBS) $(OBJS)

To do initial testing on the created filter, we wanted to see that a conversion program actually got called. Running REGEDT32.EXE, we created the Conversions subkey below the W3SVC\Parameters key in the Registry and added a value of .TXT with data of TXT2HTML.BAT %s. We created a batch file TXT2HTML.BAT with one line:

COPY %1 %1.htm

This batch file also shows the primary requirement of any conversion program that will be registered with CVTDOC. It needs to take the source file as its argument and create a destination HTML file that has the same name as the source file, with an .HTM appended to it. This is a characteristic of all the conversion programs supplied with CVTDOC and should be the convention followed by your own conversion programs that you register with CVTDOC. We do supply a text-to-HTML converter with the delivered CVTDOC filter. This "real" text-to-HTML conversion program is another TXT2HTML.BAT file that invokes a Perl script called TXT2HTML.PL.

Now we need to install the filter as part of the running IIS Web server. Still running REGEDT32.EXE, add the full path for CVTDOC.DLL to the Filter DLLs parameter in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3Svc\Parameters.

Now create an HTML file with contents of:

<A HREF="test.txt.htm">Quick test"</A>

Create a file called TEST.TXT with contents of:

<HTML><HEAD><TITLE>CVTDOC Test</TITLE></HEAD><BODY>Test data.</BODY></HTML>

If you place the HTML file onto your IIS Web server, load the HTML file into your Web browser, and click on the Quick Test link. It should result in the TXT2HTML.BAT file being invoked, and you will see the contents of TEST.TXT on your Web browser. This means that the server is calling our ISAPI filter successfully. Of course, you don't need to do any of this testing for CVTDOC; that is already complete. But this should give you some idea of the testing process for your own ISAPI filters.

2.3. Building the Conversion Programs

Now that we have a working CVTDOC ISAPI filter, we need to supply some conversion programs for it. As you'll see in Section 3, CVTDOC ships with conversion programs for Word .DOC files, Microsoft Excel .XLS files, and text files (with a real text-to-HTML converter written in Perl script rather than the stub batch file shown above). This is an immediately useful set of conversions, and you could just use the supplied conversion programs. However, CVTDOC is primarily meant to be a tool with which to register any data file type and associated conversion program. If you are planning to create conversion programs for other file types, a discussion of how these types were created may be useful. Note that the code for these conversion programs is not included with the CVTDOC sample as shipped with the ActiveX SDK, which concentrates on the CVTDOC ISAPI filter code itself, not the code for conversion programs.

2.3.1. Word-to-HTML conversion: DOC2HTM.EXE

DOC2HTM.EXE is invoked with the name of the source Word document as its argument. It will create an HTML file named with the source filename and an appended .HTM. To install it for use by CVTDOC, create a value under the W3Svc\Parameters\Conversions key with name of .DOC and data of—for example—C:\WWWROOT\CGI-BIN\DOC2HTM.EXE %s.

This was an easy conversion program to create. Microsoft Word combined with Microsoft Internet Assistant for Word allows you to load a Word document and convert it to HTML by selecting the Save As HTML option from the File menu. Creating the conversion program was just a matter of automating this with Microsoft Visual Basic® and the WordBasic OLE Automation interface. Here is the entire code for the Word-to-HTML converter supplied with CVTDOC.

Private Sub Form_Load()
Dim X As Object
Set X = CreateObject("Word.Basic")
    X.FileOpen Name:=Command
    NewFile = Command + ".htm"
    fmt = X.ConverterLookup("HTML")
    X.FileSaveAs Name:=NewFile, Format:=fmt
    Set X = Nothing
    Unload Me
End Sub

You must have Word 6.0 or later and Internet Assistant for Word installed on the IIS server machine for this code to work. Once a new conversion program is built, registering it with CVTDOC is as simple as adding a new value to the Conversions subkey of the W3Svc\Parameters key, with data of the full path to the conversion program, followed by "%s".

2.3.2. Microsoft Excel-to-HTML conversion: XL2HTM.EXE

XL2HTM.EXE is invoked with the name of the source Microsoft Excel spreadsheet. It will create an HTML file named with the source filename and an appended .HTM. It will only take the data from a named range in your Microsoft Excel spreadsheet titled Export. If no Export range is available, it will just export A1 through H20. To install it for use by CVTDOC, create a value under the W3Svc\Parameters\Conversions key with name of .XLS and data of—for example—C:\WWWROOT\CGI-BIN\XL2HTM.EXE %s.

Unfortunately, the Internet Assistant for Microsoft Excel cannot be invoked via OLE Automation (you cannot save as HTML within Microsoft Excel as you can with Word). So I had to write this conversion program from scratch. The entire program handling character formatting and alignment is quite long and not that interesting for the purpose at hand (to show you how to build your own conversions). Below is a grossly oversimplified (but functional) version of the code that just shows how to get the data from the Export range into an HTML table.

Private Sub Form_Load()
    Dim X As Object
    Set X = CreateObject("Excel.Sheet")
    Dim App as Object
    Set App = X.Application
    App.Workbooks.Open Command
    Dim CurSheet As Object
    Set CurSheet = App.ActiveWorkbook.Worksheets("Sheet1")
    Result =CurSheet.Range("Export").Select
    If (Result <> True) Then Result = CurSheet.Range("A1:H20").Select

    Dim OutputFile As String
    OutputFile = Command + ".htm"
    Open OutputFile For Output As #1

    Header = "Data From " + Command
    Line = "<HTML><HEAD><TITLE>" & Header & "</TITLE></HEAD><BODY>"
    Print #1, Line
    Line = "<H1>" & Header & "</H1>"
    Print #1, Line
    Print #1, "<TABLE>"

    NoRows = App.Selection.Rows.Count
    NoCols = App.Selection.Columns.Count
    ' now loop through all rows and columns printing out contents
    For Row = 1 to NoRows
        Print #1, "<TR>"
    For Col = 1 to NoCols
        Print #1, "<TD>"
        Print #1, App.Selection.Cells(Row, Col).Text
        Next Col
        Next Row
        Print #1, "</TABLE></BODY></HTML>"

    Set X = Nothing
    Set App = Nothing
    Set CurSheet = Nothing
    Unload Me
End Sub

2.3.3. Text-to-HTML conversion: TXT2HTML.BAT

TXT2HTML.BAT is invoked with the name of the source Word document as its argument. It will create an HTML file named with the source filename and an appended .HTM. To install it for use by CVTDOC, create a value under the W3Svc\Parameters\Conversions key with name of .TXT and data of—for example—C:\WWWROOT\CGI-BIN\TXT2HTML.BAT %s. You will need to have Windows NT Perl installed on your IIS machine, executable in the PATH. You can find Windows NT Perl on the Windows NT 3.51 Resource Kit, and at http://www.perl.hip.com/.

This batch file is a wrapper around TXT2HTML.PL, a Perl script for text-to-HTML conversion written by Seth Golub of the University of Washington. The script is entirely too large to present here. The batch file is as follows:

perl txt2html.pl < %1 > %1.htm

The Perl script moves through the text file placing headers around logical breakpoints, generally attempting to convert the content to HTML. It won't be perfect, but the result is a bit more attractive than a plain text file displayed on a Web browser.

2.3.4. Creating your own conversions

You can get a pretty good idea from the discussion above of how to create your own conversion program. It should take an argument of the source file. It should generate an HTML file named with the source filename and an appended .HTM. If the program in question exposes an OLE Automation interface, this usually makes writing a small Visual Basic application to do the work very easy. You should be able to use the Form_Load() subroutines presented as a model to build another Visual Basic-based conversion program.

3. Using CVTDOC

As mentioned earlier in this article, the primary purpose of CVTDOC is to demonstrate the capabilities of ISAPI filters. Hopefully, presenting how this filter was built has made it clear how to create your own filters. If you don't need the specific functionality offered by CVTDOC, you can stop here, fire up Developer Studio and start hacking your own ISAPI filters.

However, based on what you now know about the functionality available in CVTDOC, it may have value to you in and of itself. Assuming you now would like to use CVTDOC on your own Web site, here are the instructions to do so. CVTDOC is included in the ActiveX SDK in the directory \INETSDK\SAMPLES\ISAPI\CVTDOC. We recommend downloading and installing the ActiveX SDK to ensure you have what you need for the sample. Currently, the ActiveX SDK can be downloaded from http://www.microsoft.com/intdev/sdk.

3.1. Installation

  1. Copy CVTDOC.DLL to an appropriate subdirectory, such as the CGI-BIN subdirectory of your Web content directory.

  2. Update the Filter DLLs parameter of IIS. Run REGEDT32.EXE.

  3. Add the full path of CVTDOC.DLL to HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3SVC\Parameters\Filter DLLs (the DLLs are separated by commas).

  4. Create a Conversions subkey of the W3SVC\Parameters key.

  5. Add each extension (such as .XLS or .DOC) as a separate value.

  6. Enter the full path of the conversion program to run for each extension as the value's data.

  7. List each conversion program as taking two arguments of "%s %s", unless the conversion program in question will automatically add an .HTM to the source filename as its generated output file. For example, a value under the Conversions key might be .DOC, and the data would be C:\WWWROOT\CGI-BIN\DOC2HTM.EXE%s.

  8. Place the conversion programs in the directories referenced. Sample conversion programs supplied are discussed below.

3.2. Conversion Programs

There are three conversion programs supplied with this sample:

  1. DOC2HTM.EXE converts Word documents (.DOC files) to HTML. It requires that Internet Assistant for Word (WordIA) be installed on the IIS server machine. It has only been tested with Word for Windows 95. Usage is: DOC2HTM.EXE <Word .DOC file>. An HTML file will be generated with the filename of the original Word document and an appended .HTM (for example, SPECIALS.DOC.HTM).

  2. XL2HTM.EXE converts Microsoft Excel spreadsheets (.XLS files) to HTML. It does not require any other software beyond Microsoft Excel 5.0 or later. Usage is XL2HTM.EXE <Excel spreadsheet file>. Output is an HTML file with the extension .HTM (for example, SAMPLE.XLS.HTM). The area exported to the HTML file is the range named Export.

  3. TXT2HTML.BAT converts text files (.TXT files) to HTML, attempting to mark it up with HTML tags as best as possible. It invokes Seth Golub's TXT2HTML.PL Perl script to do this. This requires that Windows NT Perl be installed on the IIS server machine. Output is an HTML file with an .HTM extension.

These conversion programs are intended only as samples. You can use one installation of the CVTDOC filter to convert many different data file types. In fact, CVTDOC has the most value when it has conversions installed for uncommon file types that the user may not be able to handle. You should be able to find HTML conversion programs for almost any data format on the Internet and the World Wide Web. However, you may need to write "wrapper" batch files or programs that allow the conversion program to conform to the CVTDOC calling convention. This just means that the program must take two arguments, the first being the original document file and the second being the HTML output file. Alternatively, the program can take just one argument and generate HTML with the input filename and an appended .HTM (as do the three supplied conversion programs).

3.3. Usage

Embed a reference in your referring HTML page to the document or data filename with an appended .HTM. For example:

<HTML>
<HEAD><TITLE>Simple CVTDOC Example</TITLE></HEAD>
<BODY>
<H1>Welcome to the CyberStore</H1>
For maximum savings, please check out our 
<A HREF="specials.doc.htm">daily specials!</A>
</BODY>
</HTML>

The SPECIALS.DOC file will be converted automatically by the CVTDOC ISAPI filter if either the HTML file doesn't exist yet or it's older than the updated SPECIALS.DOC file. This allows the Webmaster to keep the HTML content current with very little intervention.

4. References

The following files and URLs will be useful references in your ISAPI filter development efforts. The first reference is the ActiveX SDK page on the Microsoft Web site. The remaining references are files and directories in the ActiveX SDK itself.

  1. ActiveX SDK: http://www.microsoft.com/intdev/sdk/

  2. ActiveX SDK ISAPI Filter Specification. After downloading and installing the SDK, the file location is \specs\isfilter.htm.

  3. ActiveX SDK CVTDOC ISAPI Filter Sample. After downloading and installing the SDK, the file location is \samples\isapi\cvtdoc.