This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.


MIND


GEEK

Robert Hess

GEEK Your June 1999 column explained how spammers harvest email addresses off of Web sites. My 61-year-old mother was having the same problem recently. Her solution: display your email address in a GIF file and it won't be harvested.
GEEK That's a great solution, but unfortunately only to a point. In most situations, the email name isn't just being displayed as text (or in your case a GIF) on the page; it is also encoded into the HTML of the page to make it clickable:

 <a href="mailto:MyEmail@server.com">MyEmail@server.com</a>
Using a GIF for the display of the email name would only be solving part of the problem. And in many situations, just a clickable link like the following that launches an email message is being displayed:

 <a href="mailto:MyEmail@server.com">Send me some email</a>
Displaying this as a GIF basically just converts "Send me some email" into a GIF, which really isn't hiding anything at all.

    I would expect most spiders to instead be parsing the HTML for the mailto value and attempting to capture that. This is a lot easier then searching the entire text of the page for an @ or whatever and trying to determine if it is an email address or something else.

    One of my own Web pages uses a redirected email address. It is in the form of info@mysite.org. This email address doesn't really exist; instead, any email sent to it is automatically redirected to my real email address. I've had the site up for over a year now, and I haven't yet gotten any spam to that email address through a harvester, so I'm not sure how pervasive these spiders actually are.

    If you think people are extracting email addresses off of your pages programmatically, you might want to try a few tests to discover how best to defeat them. Create a couple of email addresses on some of the free Web-based email systems and don't use these addresses for anything else. Hide these email addresses on your page like so:


 <a href="mailto:email1@freemail.com">
<font size=1 color="white">email1@freemail.com</font>
</a>
 <a href="mailto:email2@freemail.com">
<font size=1 color="white">info</font>
</a>
 <a href="mailto:email3@freemail.com">
<img src="clear.gif" height=1 width=1>
</a>
 </body></html>
Put this code at the very bottom of your page, using a font size and color that makes it virtually invisible. A spider program wouldn't have any way of knowing that this wasn't a visible email address. You could then watch these accounts to see which ended up getting email sent to them.

GEEK You made some good suggestions about making it more difficult for those nasty spiders to get at email addresses on a Web page. Another approach that you might consider is storing the image of the email address as a GIF file. Chances are that, for a while, Web crawlers won't be using OCR techniques to analyze Web pages.
GEEK Good idea, but somebody's 61-year-old mother already beat you to it.

GEEK I have an application that uses the File input type to upload files to the server. It works fine if the user manually selects the file by using the browse button, but I cannot programmatically set the value of the element. I even tried coding the attribute in the tag, but it didn't work.


 <INPUT type=file id=userfile name=userfile size=64 align=left value="C:\autoexec.bat">
       I overlaid a textbox on the control and set its value to c:\autoexec.bat to make it look like it worked, and then programmatically set the defaultValue property of the file element. That appeared to work when the Submit button was pressed, but only a placeholder file (best description I can give) was uploaded with the same name as the file I wanted to upload—not the actual file, which contained some hex code, and the value of the textbox (which was the file name c:\autoexec.bat).
GEEK Nice idea. There are lots and lots of other people who would love to do this as well. Unfortunately, some of those people have less than pure motives. The problem is that although you want to preload the FileUpload element with the name of the file that the user wants to upload, you could preload this item with any file name. I'm sure that on any given system, there are files the user really doesn't want automatically sent over to some random Web site.

    You might argue that the user would clearly see that there is a file you are preparing to transfer, but unfortunately it would be far too easy to hide this. The easiest way to do this would be to simply have a form embedded into a very long page, where the Submit button along with some standard entry fields fit within a standard page height. Then you could have a lot of blank space stretching way off the bottom of the screen, and at the bottom have your FileUpload element. The user would probably never see it. Even simpler yet, add:


  style="display:none"
to the element, which would totally hide it from view.

    It is sad that such restrictions are needed to safeguard us from some sick individuals out there, but that's the way it is.

From the August 1999 issue of Microsoft Internet Developer.