HOWTO: Prevent Cross-Site Scripting Security Issues in CGI or ISAPI

ID: Q253165


The information in this article applies to:
  • Microsoft Internet Explorer (Programming) versions 3.x, 4.x, 5, 5.01


SUMMARY

While dynamically generating Web pages, you might inadvertently introduce some security risks to clients that support scripting. Malicious script can be embedded within input submitted to Web servers. If the Web server returns this data to the client without modification, the client assumes that the script originated at the Web server. If the Web server is trusted by the browser, then the script is executed even though the source of the script is not originally the Web server.

This problem is referred to as a cross-site security scripting issue. This article discusses cross-site scripting security issues, ramifications, and prevention in the context of ISAPI and CGI.


MORE INFORMATION

The Problem

Many Web servers dynamically generate HTML-based input that is not confirmed to contain valid data. If input is not validated, then malicious script can be embedded within the data. If a server-side application such as a CGI script, ISAPI Extension, ISAPI Filter, and so forth, returns HTML based on malicious input, the script runs on the browser as though the trusted site generated it. The following is one scenario:

  • User is asked to enter his/her name at in a Web page (say, Joe)


  • Server application receives this name and then dynamically generates a Web page, similar to this:
  • 
    <html>Hello, Joe</html> 
It is important to realize that instead of "Joe," the user could have entered malicious script, which would then be passed back to the browser by the server application without any validation.

Ramifications

If input to your dynamic Web pages is not validated, you might encounter the following:
  • Data integrity can be compromised
  • Illegal objects can be installed and executed
  • Cookies can be set and read
  • User input can be intercepted
  • Malicious scripts can be executed by the client in the context of the trusted source
What server applications are at risk? The problem affects dynamic page creation based on input that was not validated. Typical examples include the following types of server applications. Please note that server application applies to any kind of .exe or .dll that is executed by the server and generates output directly to the Web browser. Following are the most typical applications:
  • Search engines that return results pages based on user input


  • Login pages that store user accounts in databases, cookies, and so forth, and later write the user name out to the client


  • Web forms that process and return unvalidated information

Prevention

You need to evaluate your specific situation to determine which techniques work best for you.

NOTE: In all techniques, you are validating data that you receive from input, and not your trusted script. Essentially, prevention means that you follow good coding practice by running sanity checks on input to your Web application. The following general approaches for preventing cross-site scripting attacks are presented here:
  • Avoid sending non-trusted text as part of an output stream from Web applications


  • Filter input parameters for special characters


  • Filter output based on input parameters for special characters


  • Encode output based on input parameters
Avoid sending non-trusted text as part of an output stream

Data inserted into an output stream originating from a server appears as originating from that server to a client application. Consider hard-coding your output rather than dynamically generating output based on submitted data. For example, if you have a Web page that accepts an input parameter that writes out the user's name to the Web page, such as "Hello Fred," you might consider writing out a something more generic such as "Hello user."

Filter input parameters for special characters

To filter input, remove some or all "special" characters from your input. Special characters are characters that enable script to be generated within an HTML stream. Special characters include the following:

< > " ' % ; ) ( & + - 
Note that your individual situation may warrant the filtering of additional characters or strings beyond the special characters noted above. While filtering can be an effective technique, there are a few caveats:
  1. Filtering may not be appropriate for some input. For example, in scenarios where you are receiving <TEXT> input from an HTML form, you might instead choose a method such as encoding (see below).


  2. Some filtered characters might actually be required input to server-side script. If you use a filter, you need to specify a character set ("charset" parameter in HTTP) for your Web pages to ensure that your filter is checking for the appropriate special characters. The data inserted into your Web pages should filter out byte sequences that are considered special based on the specific character set. A popular charset is ISO 8859-1 that was the default in early versions of HTML and HTTP.


Filter output based on input parameters for special characters

This technique is similar to filtering input except that you filter characters that will be written out to the client. While this can be an effective technique, it might present a problem for Web pages that write out HTML elements. For example, on a page that writes out <TABLE> elements, a generic function that removes the special characters would strip the < and > characters, thus ruining the <TABLE> tag. Therefore, in order for this technique to be useful, you would only filter data passed in or data that was previously entered by a user and/or stored in a database.

Encode output based on input parameters for special characters

Encode data received as input when you write it out as HTML. This technique is effective on data that was not validated for some reason during input. By using techniques such as HTML Encoding and URL encoding, you can prevent malicious script from executing. HTML Encoding replaces special characters such as < > & " with strings &lt; &gt; &amp; &quot;. URL encoding replaces non-printable characters with their hexadecimal equivalents. So "Hello, World!" looks like "Hello,+World%21". The following function demonstrates how to encode output data in C, and therefore can be used in ISAPI Extensions, ISAPI Filters, or CGIs that pass user input directly to the server. Note that this function only encodes < > & and ". You may need to encode other characters as well.

void HTMLEncode (char *pStrIn, char** ppStrOut){
	// ppStrOut must be freed outside of the HTMLEncode scope
	char *pTmp = pStrIn;
	DWORD i, TotLen = 0;
	for (i=0;pTmp[i];i++)    {
		switch (pTmp[i])        {
		case '<':
		case '>':
			TotLen += 4;
			break;            
		case '&':
			TotLen += 5;
			break;
			case '\"':
			TotLen += 6;
			break;
		default:
			TotLen++;
			break;
		}   
	}   
	*ppStrOut = (char *) malloc (TotLen+1);
	pTmp = *ppStrOut;
	for (i=0;pStrIn[i];i++)   {
		switch (pStrIn[i])        {
		case '<':
			memcpy ((void*) pTmp, "&lt;",4);
			pTmp+=4;
			break;
		case '>':
			memcpy ((void*) pTmp, "&gt;",4);
			pTmp+=4;
			break;
		case '&':
			memcpy ((void*) pTmp, "&amp;",5);
			pTmp+=5;                
			break;            
		case '\"':                
			memcpy ((void*) pTmp, "&quot;",6);                
			pTmp+=6;                
			break;            
		default:                
			*pTmp=pStrIn[i];                
			pTmp++;          
		}
	}   
	*pTmp=0;
}  

Here is an example of ISAPI code that is susceptible to cross-site security issues:

DWORD WINAPI HttpExtensionProc (EXTENSION_CONTROL_BLOCK *lpEcb){
	char szTemp [2048];
	…
	if (*lpEcb->lpszQueryString)       
		wsprintf (szTemp,"Query_String: %s", lpEcb->lpszQueryString);
	dwSize = lstrlen(szTemp);    
	lpEcb->WriteClient(lpEcb->ConnID, szTemp, &dwSize, 0);   
	…
} 
Here is the corrected code:

DWORD WINAPI HttpExtensionProc (EXTENSION_CONTROL_BLOCK *lpEcb){
	char szTemp [2048], *szOut;
	…
	HTMLEncode (lpEcb->lpszQueryString, &szOut);
	if (*lpEcb->lpszQueryString)
		wsprintf (szTemp,"Query_String: %s", szOut);
	dwSize = lstrlen(szTemp);
	lpEcb->WriteClient(lpEcb->ConnID, szTemp, &dwSize, 0);
	free (szOut);
	…
}            
Please note that the issue equally affects MFC ISAPI extensions, MFC ISAPI filters, and CGI.

Possible Sources of Malicious Data

While the problem applies to any page that uses input to dynamically generate HTML, the following are possible sources of malicious data to help you locate potential security risks:
  • Query_String
  • Cookies
  • Posted data
  • URLs and pieces of URLs, such as PATH_INFO
  • Data retrieved from users that is persisted in some fashion such as in a database

Conclusion

The following are key points to remember regarding the cross-site Scripting security problem:
  • The problem affects dynamic page creation based on input that was not validated.


  • Omission of a sanity check on input data can have unexpected security implications. The problem is preventable through good development standards such as input validation.


  • You need to evaluate solutions on a per page basis and use a technique that makes sense.


REFERENCES

For additional information, please see:

CERT Advisory CA-2000-02

Additional query words: Cross-site; CERT Advisories; CERT; CA-2000-02; Security

Keywords : kbCGI kbIE kbISAPI kbIIS
Version : WINDOWS:3.x,4.x,5,5.01
Platform : WINDOWS
Issue type : kbhowto


Last Reviewed: February 2, 2000
© 2000 Microsoft Corporation. All rights reserved. Terms of Use.