Building an Internet Browser Using the Win32 Internet Functions

Dale Rogerson
Microsoft Developer Network Technology Group

April 1996

Click to open or copy the files in the SurfBear sample application for this technical article.

Abstract

This technical article discusses how to build an Internet browser using the Microsoft® Win32® Internet functions. The purpose of this article is to give the reader some idea of the use, power, and scope of the Win32 Internet functions, not to provide exhaustive documentation for these functions. The SurfBear sample application accompanying this article uses the Win32 Internet functions to read HTML files from an Internet server and to display them as raw, unformatted text.

Introduction

To paraphrase a friend of mine, you can't swing a squid without hitting the Internet. Computer magazines have devoted issues, and local newspapers have dedicated entire sections to the Internet. In fact, many newspapers are going online. Everyone seems to have a personal home page; even some homeless people have home pages. While much of the information printed about the Internet is hype, it's obvious that the Internet is becoming an integral part of computing.

Microsoft has introduced the Microsoft® Win32® Internet functions to assist developers in making the Internet an integral part of their applications. These new functions simplify accessing the Internet using FTP (File Transfer Protocol), Gopher, and HTTP (HyperText Transfer Protocol). Developers who use the Win32 Internet functions do not need to be familiar with TCP/IP or Windows® Sockets. For many common operations, developers need not know the details of the particular protocol they are using.

Eventually, the Win32 Internet functions will become part of the Win32 application programming interface (API) and ship with the various Windows-based platforms. Initially, the Win32 Internet functions will ship in a redistributable dynamic-link library called WININET.DLL (available from the Microsoft Internet SDK site at http://www.microsoft.com/intdev/sdk/) section of the Internet Development Toolbox Web site).

This article explains how to use the Win32 Internet functions to build a simple Internet browser. The article does not discuss the functions in intimate detail, but it does give a preview of their use and operation. Please refer to the Microsoft Win32 Internet Functions site at http://www.microsoft.com/intdev/sdk/docs/wininet/ topic for complete details.

This article is accompanied by the SurfBear sample application that I wrote. SurfBear takes an HTTP address for an HTML file, connects to the server, downloads the HTML file, and displays the raw HTML file in an edit control. The article covers the Internet-specific portions of this process. It does not cover the display or manipulation of HTML files or the user interface issues involved in this process.

Note   This article is based on a very early version of WININET.DLL. It is very likely that the names of parameters, flags, and functions will change. However, the scope and intent of the functions should remain the same, as presented in this article.

The Internet Functions

The best way to approach the Win32 Internet functions is to jump right into the code. The code below is sample code, with error handling removed for readability.

HINTERNET hNet = ::InternetOpen("MSDN SurfBear",
                                PRE_CONFIG_INTERNET_ACCESS,
                                NULL,
                                INTERNET_INVALID_PORT_NUMBER,
                                0) ;

HINTERNET hUrlFile = ::InternetOpenUrl(hNet,
                                "http://www.microsoft.com",
                                NULL,
                                0,
                                INTERNET_FLAG_RELOAD,
                                0) ;

char buffer[10*1024] ;
DWORD dwBytesRead = 0;
BOOL bRead = ::InternetReadFile(hUrlFile,
                                buffer,
                                sizeof(buffer),
                                &dwBytesRead);

::InternetCloseHandle(hUrlFile) ;

::InternetCloseHandle(hNet) ;

The code listing above contains four Internet functions: InternetOpen, InternetOpenUrl, InternetReadFile, and InternetCloseHandle. Let's examine each of these functions in turn.

InternetOpen

InternetOpen initializes WININET.DLL. It is called before any other Win32 Internet function.

HINTERNET hNet = ::InternetOpen(
          "MSDN SurfBear",              // 1 LPCTSTR lpszCallerName
          PRE_CONFIG_INTERNET_ACCESS,   // 2 DWORD dwAccessType
          "",                           // 3 LPCTSTR lpszProxyName
          INTERNET_INVALID_PORT_NUMBER, // 4 INTERNET_PORT nProxyPort
          0                             // 5 DWORD dwFlags
) ;

InternetOpen returns a handle of type HINTERNET. Other Win32 Internet functions take this handle as a parameter. Currently, you cannot use an HINTERNET handle with other Win32 functions such as ReadFile. This may change in the future, as Internet support is moved into the Microsoft Windows® and Microsoft Windows NT® operating systems.

When you are finished using the Win32 Internet functions, you should call InternetCloseHandle to free the resources allocated by InternetOpen. Applications that use the Microsoft Foundation Class Library (MFC) will typically call InternetOpen from the document's constructor. Most applications will call InternetOpen once per process.

The first parameter to InternetOpen, lpszCallerName, identifies the application that is using the Internet functions. This name becomes the user agent when the HTTP protocol is used.

The second parameter, dwAccessType, specifies the access type. In the example above, the PRE_CONFIG_INTERNET_ACCESS access type instructs the Win32 Internet functions to use registry information to find a server. Using PRE_CONFIG_INTERNET_ACCESS requires the registry to be set up properly. I cheated here and let Internet Explorer set the registry up for me. If you don't want to cheat, you need to set up the registry as shown in Figure 1.

Figure 1. Setting up the registry

In the registry, setting AccessType to 1 means "go directly to the net." Setting AccessType to 2 means "use a gateway." Setting DisableServiceLocation to 1 causes it to use one of the named servers; otherwise, a server is found using the Registration and Name Resolution (RNR) APIs, which are part of Windows Sockets.

Additional access types include the following:

The GATEWAY_INTERNET_ACCESS and CERN_PROXY_INTERNET_ACCESS access types require the third parameter to InternetOpen: the server name (lpszProxyName). PRE_CONFIG_INTERNET_ACCESS doesn't require a server name because it looks in the registry for the server.

The nProxyPort parameter is used for CERN_PROXY_INTERNET_ACCESS and specifies the port number to use. Using INTERNET_INVALID_PORT_NUMBER is the same as supplying the default port number.

The last parameter, dwFlags, sets additional options. You can use the INTERNET_FLAG_ASYNC flag to indicate that future Internet functions using the returned handle will send status information to a callback function, which is set using InternetSetStatusCallback.

InternetOpenUrl

Once you initialize the Win32 Internet functions, you can use other Internet functions. The next Internet function to call is InternetOpenUrl. This function connects to an Internet server and prepares for reading data from the server. InternetOpenUrl can work with FTP, Gopher, or HTTP protocols. In this article, we are concerned only with the HTTP protocol.

HINTERNET hUrlFile = ::InternetOpenUrl(
          hNet,                       // 1 HINTERNET hInternetSession
          "http://www.microsoft.com", // 2 LPCTSTR lpszUrl
          NULL,                       // 3 LPCTSTR lpszHeaders
          0,                          // 4 DWORD dwHeadersLength
          INTERNET_FLAG_RELOAD,       // 5 DWORD dwFlags
          0                           // 6 DWORD dwContext
) ;

InternetOpenUrl also returns an HINTERNET, which is passed to functions operating on this URL. You should use InternetCloseHandle to close this handle.

The first parameter to InternetOpenUrl, hInternetSession, is the handle returned from InternetOpen. The second parameter, lpszUrl, is the URL of the resource that we want. In the example above, we would like to get Microsoft's Web page. The next two parameters, lpszHeaders and dwHeadersLength, are used to send additional information to the server. These parameters require knowledge of the particular protocol being used.

dwFlags is a flag that can modify the behavior of InternetOpenUrl in several ways, including turning off caching, enabling raw data, and using existing connections instead of opening new connections.

The last parameter, dwContext, is a DWORD context value that will be sent to the status callback function if one is specified. If this value is zero, information will not be sent to the status callback function.

InternetReadFile

You read a file after opening it, so it's only logical that the next function should be InternetReadFile:

char buffer[10*1024] ;
DWORD dwBytesRead = 0;
BOOL bRead = ::InternetReadFile(
     hUrlFile,                 // 1 HINTERNET hFile
     buffer,                   // 2 LPVOID lpBuffer
     sizeof(buffer),           // 3 DWORD dwNumberOfBytesToRead
     &dwBytesRead              // 4 LPDWORD lpdwNumberOfBytesRead
);

buffer[dwBytesRead] = 0 ;
pEditCtrl->SetWindowText(buffer) ;

InternetReadFile takes the handle returned by InternetOpenUrl. It also works with the handles returned by other Win32 Internet functions, such as FtpOpenFile, GopherOpenFile, and HttpOpenRequest.

The remaining three parameters to InternetReadFile are also very straightforward. lpBuffer is a void pointer to a buffer that will hold the data, and dwNumberOfBytesToRead specifies the buffer size in bytes. The final parameter, lpdwNumberOfBytesRead, is a pointer to a variable that will contain the number of bytes read into the buffer. If the return value is True and lpdwNumberOfBytesRead points to a zero, the file has read to the end of the file. This behavior is identical to that of the Win32 ReadFile function. A real Web browser would loop on InternetReadFile, reading in blocks of data from the Internet.

To display the buffer, append a zero to the buffer and send it to an edit control.

Together, InternetOpen, InternetOpenUrl, and InternetReadFile build the foundation of an Internet browser. They make reading files off the Internet as easy as reading them off your local hard drive.

The HTTP Functions

In some instances, InternetOpenUrl is too generic, so you will need other Win32 Internet functions. InternetOpenUrl is a wrapper for various FTP, Gopher, and HTTP functions. When using HTTP, InternetOpenUrl calls InternetConnect, HttpOpenRequest, and HttpSendRequest. Let's say that we want to get the size of the HTML page before downloading it so that we can allocate a buffer in the exact size. HttpQueryInfo will get the size of the Web page.

A word of caution: Not all Web pages support getting the page size. (For example, www.toystory.com and www.movielink.com don't support this functionality.) Also, TCP/IP can send less data than requested. Therefore, your application should handle both of these cases and loop around InternetReadFile until the result is True and *lpdwNumberOfBytesRead is 0.

The code to open the file http://www.microsoft.com/msdn/msdninfo/ using HttpOpenRequest, HttpSendRequest, and HttpQueryInfo is shown below. The error checking has been removed.

// Open Internet session.
HINTERNET hSession = ::InternetOpen("MSDN SurfBear",
                                    PRE_CONFIG_INTERNET_ACCESS,
                                    NULL, 
                                    INTERNET_INVALID_PORT_NUMBER,
                                    0) ;

// Connect to www.microsoft.com.
HINTERNET hConnect = ::InternetConnect(hSession,
                                    "www.microsoft.com",
                                    INTERNET_INVALID_PORT_NUMBER,
                                    "",
                                    "",
                                    INTERNET_SERVICE_HTTP,
                                    0,
                                    0) ;

// Request the file /MSDN/MSDNINFO/ from the server.
HINTERNET hHttpFile = ::HttpOpenRequest(hConnect,
                                     "GET",
                                     "/MSDN/MSDNINFO/",
                                     HTTP_VERSION,
                                     NULL,
                                     0,
                                     INTERNET_FLAG_DONT_CACHE,
                                     0) ;

// Send the request.
BOOL bSendRequest = ::HttpSendRequest(hHttpFile, NULL, 0, 0, 0);

// Get the length of the file.            
char bufQuery[32] ;
DWORD dwLengthBufQuery = sizeof(bufQuery);
BOOL bQuery = ::HttpQueryInfo(hHttpFile,
                              HTTP_QUERY_CONTENT_LENGTH, 
                              bufQuery, 
                              &dwLengthBufQuery) ;

// Convert length from ASCII string to a DWORD.
DWORD dwFileSize = (DWORD)atol(bufQuery) ;

// Allocate a buffer for the file.   
char* buffer = new char[dwFileSize+1] ;

// Read the file into the buffer. 
DWORD dwBytesRead ;
BOOL bRead = ::InternetReadFile(hHttpFile,
                                buffer,
                                dwFileSize+1, 
                                &dwBytesRead);
// Put a zero on the end of the buffer.
buffer[dwBytesRead] = 0 ;

// Close all of the Internet handles.
::InternetCloseHandle(hHttpFile); 
::InternetCloseHandle(hConnect) ;
::InternetCloseHandle(hSession) ;

// Display the file in an edit control.
pEditCtrl->SetWindowText(buffer) ;

InternetConnect

The InternetConnect function connects to an HTTP, FTP, or Gopher server:

HINTERNET hConnect = ::InternetConnect(
          hSession,                     //1 HINTERNET hInternetSession
          "www.microsoft.com",          //2 LPCTSTR lpszServerName
          INTERNET_INVALID_PORT_NUMBER, //3 INTERNET_PORT nServerPort
          "",                           //4 LPCTSTR lpszUsername
          "",                           //5 LPCTSTR lpszPassword
          INTERNET_SERVICE_HTTP,        //6 DWORD dwService
          0,                            //7 DWORD dwFlags
          O                             //8 DWORD dwContext
) ;

The sixth parameter, dwService, determines the service type (HTTP, FTP, or Gopher). In the example above, InternetConnect connects to an HTTP server because dwService is set to INTERNET_SERVICE_HTTP. The second parameter (set to www.microsoft.com) provides the address of the server. Notice that the HTTP address must be parsed for the server name; InternetOpenUrl parses it for us. The first parameter, hInternetSession, is the handle returned from InternetOpen. The fourth and fifth parameters supply a username and password. None of the flags controlled by the seventh parameter affect HTTP operations. The last parameter supplies contextual information to the status callback function.

HttpOpenRequest

Once a connection has been established with a server, we open the desired file. The HttpOpenRequest and HttpSendRequest functions work together to open the file. HttpOpenRequest creates a request handle and stores the parameters in the handle. HttpOpenRequest sends the request parameters to the HTTP service.

HINTERNET hHttpFile = ::HttpOpenRequest(
          hConnect,              // 1 HINTERNET hHttpSession
          "GET",                 // 2 LPCTSTR lpszVerb
          "/MSDN/MSDNINFO/",     // 3 LPCTSTR lpszObjectName
          HTTP_VERSION,          // 4 LPCTSTR lpszVersion
          NULL,                    // 5 LPCTSTR lpszReferer
          0,                     // 6 LPCTSTR FAR * lplpszAcceptTypes
          INTERNET_FLAG_DONT_CACHE,  // 7 DWORD dwFlags
          0                      // 8 DWORD dwContext
) ;

By now, many of the parameters to the Internet functions will look familiar. The first parameter to HttpOpenResult is the HINTERNET returned by InternetConnect. The seventh and eighth parameters to HttpOpenRequest perform the same functions as the InternetConnect parameters that share these names.

The second parameter ("GET") specifies that we want to get the object named by the third parameter ("/MSDN/MSDNINFO/"). The HTTP version is passed to the fourth parameter; currently, this must be HTTP_VERSION. Because "GET" is the most popular verb type, HttpOpenRequest will take a NULL pointer for this parameter.

The fifth parameter, lpszReferer, is the address of the site where we found the URL we now want to see. In other words, if you're on www.home.com and you click a link that jumps to www.microsoft.com, the fifth parameter is "www.home.com," because it refers you to the target URL. This value can be NULL. The sixth parameter points to a list of content types that our program accepts. Passing null to HttpOpenRequest informs the server that only text documents are accepted.

HttpSendRequest

In addition to sending the request, HttpSendRequest allows you to send additional HTTP headers to the server. Information about HTTP headers can be found in the latest HTTP spec on http://www.w3.org/. In this example, HttpSendRequest is passed defaults for all the parameters:

BOOL bSendRequest = ::HttpSendRequest(
     hHttpFile, // 1 HINTERNET hHttpRequest
     NULL,      // 2 LPCTSTR lpszHeaders
     0,         // 3 DWORD dwHeadersLength
     0,         // 4 LPVOID lpOptional
     0          // 5 DWORD dwOptionalLength
);

HttpQueryInfo

To get information about the file, use the HttpQueryInfo function after calling HttpSendRequest:

BOOL bQuery = ::HttpQueryInfo(
     hHttpFile,                 // 1 HINTERNET hHttpRequest
     HTTP_QUERY_CONTENT_LENGTH, // 2 DWORD dwInfoLevel
     bufQuery,                  // 3 LPVOID lpvBuffer
     &dwLengthBufQuery          // 4 LPDWORD lpdwBufferLength
) ;

The results of the query are strings or lists of strings placed in lpvBuffer. The HTTP_QUERY_CONTENT_LENGTH query gets the length of the file. You can query for a broad range of information using HttpQueryInfo; see the Microsoft Win32 Internet Functions site at http://www.microsoft.com/intdev/sdk/docs/wininet/ topic for details.

SurfBear Sample Application

The SurfBear sample application uses the Win32 Internet functions to get files off the Internet and display the raw HTML in an edit control. SurfBear uses HttpOpenRequest and HttpSendRequest instead of InternetOpenUrl, purely for demonstration purposes.

Figure 2. SurfBear screen

SurfBear is an MFC version 4.0 dialog application. All of its Internet-related functionality is in the InternetThread.h and InternetThread.cpp files.

Reading files from the Internet can take a significant amount of time, so calling the Internet functions from a worker thread is a wise idea. This way, the application's window can be resized and moved while the system is waiting to get the data.

Figure 3 shows the flow of control for SurfBear.

Figure 3. Control flow in SurfBear

When the user presses the Goto button, CSurfBearDlg::OnBtnGoto calls CInternetThread::GetWebPage, passing the HTTP address of the desired Web page. GetWebPage parses the HTTP address into a server name and object name, which are stored in the member variables of CInternetThread. GetWebPage then calls AfxBeginThread, which creates a thread running the static member function GetWebPageWorkerThread. If the Internet functions have not been initialized, GetWebPageWorkerThread calls InternetOpen. It then attempts to read the desired Web page. When GetWebPageWorkerThread finishes, it posts a user-defined WM_READFILECOMPLETED message to the SurfBear dialog box. OnReadFileCompleted handles this message and copies the Web page into an edit control.

Conclusion

The Win32 Internet functions make reading information from FTP, Gopher, and HTTP servers as easy as reading files from your hard drive. Using only four functions—InternetOpen, InternetOpenUrl, InternetReadFile, and InternetCloseHandle—and very little knowledge of HTTP, you can write a simple Internet browser.

Turning this simple browser into an industrial-strength browser will take a lot of work, including knowledge of HTTP, displaying HTML files, and using multiple threads for progressive rendering of these files. The Win32 Internet functions isolate the developer from most of the grunge work involved in TCP/IP, Windows Sockets, and HTTP programming.