Tracking and Fighting Spam: A Primer for Postmasters -- MIND December 1999

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

This article assumes you're familiar with email

Tracking and Fighting Spam:
A Primer for Postmasters
R'ykandar Korra'ti

Unauthorized email floods are one of the biggest headaches faced by system administrators of Internet-connected networks. There are some effective strategies for reducing its drain on your resources.

Spammers are thieves. They can argue about this all they want. They're hijacking your system to deliver their unrequested, unwanted advertising. If you pay for your bandwidth consumption, then they're charging you for the privilege. If you don't, then they're charging against your ISP for the privilege, and you can be sure that's included in your monthly fee—one way or another.
      The allegedly "legitimate" spammers have a bit more of a case; they don't hide where their mail is coming from, and they at least pretend to offer a way off their lists. But I'm not talking about that sort. I'm talking about the run-of-the-mill spammers. The pedestrian make-money-fast/mortgage fraud/gambling/pornmeisters who forge everything they can in the header, those who dump email on unsuspecting third parties to deliver for them—thus stealing from even more people than do "legitimate" spammers. These are the people who've forced site administrators to shut down relay services on their machines to stem the flow, thereby defeating a useful design function of the Internet.
      What does all this mean to you? If you want to track down spammers and get them shut down, you have the moral right, even, perhaps, the duty to do so. OK, maybe it's just a desire to see a spammer get hammered. Whatever. It's all right; they're the villains here.
      So where do you start? With a few specific tools available on the Web and a bit of analysis of the header you'll be ready to fight back.

The Tools
      The important tools—or more properly, Internet protocol services that various tools will implement—are ping, NSLookup, and WhoIs. Of these, WhoIs is most important. Traceroute, which I won't discuss here, is occasionally handy as well.
      The ping protocol is sufficiently well known that it doesn't need coverage here. Traceroute, on the other hand, isn't as familiar; it traces the route between you and a remote host, showing all the machines between you and the remote server. At each hop, it will provide the IP address of the machine that's routing its packets. This is particularly handy when you've got the real IP address of a spammer, and you want to know who is providing their upstream IP connectivity. A domain that is reticent about resolving spam issues may become more cooperative when faced with an angry upstream IP provider threatening to kick them off the net.
      NSLookup (or Name Service Lookup) is a protocol that allows users to find the name of a machine given its IP address, or the IP address of a machine given its name. It uses your local Domain Name Service (DNS) to get its information, and works off "live" data, updated more-or-less continually. If you don't have DNS running, you'll have to find someone running a copy that they'll let you access. This may be a problem if you have an ISP with extremely limited IP services or if you're on the private side of a restrictive firewall. In addition, NSLookUp only works if the machine the spammer is using reports its name—and they often don't.
      WhoIs—the most important protocol for spam tracing—doesn't necessarily require direct access to a DNS, if you know the numeric IP addresses of the WhoIs servers you're referencing. Generally, you'll need to be on the Internet side of any firewalls you've got running, which means you'll likely have DNS access anyway. WhoIs lets you find which domain owns a particular IP address or subnet (a subnet being a contiguous group of IP addresses). Typically, you'd use it to find the owner of a particular IP address. Unlike NSLookup, WhoIs works off of "authoritative" data—information provided to one of a set of official parties (such as the InterNIC) when a domain was originally created. This makes WhoIs more useful than NSLookup in most cases because spammers will often work from machines that have not been provided names, or at least not names that your nameserver knows.
      For the purposes of spam tracing, the important WhoIs servers are whois.internic.net, whois.apnic.net, whois.arin.net, and whois.ripe.net. These serve different IP ranges and sets of domains (and in some cases, geographical regions) on the net, and do not contain overlapping data. Often you'll have to try all four before getting the information you need. You can also get a large, updated list of WhoIs servers (most of which you'll never need to contact) by executing a lookup on whois-servers on the sipb.mit.edu server—a list maintained (of course) by MIT.
      So why use NSLookup at all? Occasionally it works, it's a lot faster, and you only have to check against one server. With WhoIs, you may have to work through a list of authoritative servers to find your domain owner; you don't generally have to do that with NSLookup. And if you suspect the spammer was working from a large ISP, the DNS information will be just as valuable.
      Which tool you pick for these services isn't important, as long as it works. The screenshots in this article are from CyberKit 2.4a by Luc Neijens. It's available from several places, including http://tucows.holler.net/dns95.html. I'll be sending him a postcard in payment later this week.

The Header
      First, you need to know a few basic rules about spam mail and its header lines:

Forget the From and Reply-to lines. They're always forged. And if there is (by unlikely chance) somebody actually at either of those addresses, they're being mail-bombed by people who don't follow this cardinal rule. This happened to a friend of mine earlier this year, and it's not fun. So don't bother.

Any header line with an X- in front of it, such as X-Verified:Sender, can also be ignored. These are like the aluminum foil strips B-52s used to throw out behind them on bombing runs—it's just chaff to confuse your radar.

Many spammers now claim that you can reply to their From or Reply-To addresses with a removal demand to get removed from their list. Or sometimes, they give you a special address to use in the message body. This is almost always a lie. The address usually doesn't exist, or occasionally is some poor random schmo who gets mail-bombed by people demanding to be taken off spam lists (see Rule 1).

If the address does exist, the spammer will not remove you from their lists. They will use your reply as confirmation that the address is good, and build new, improved spam databases with them. Got that? Great. There is a header line that actually does matter—the Received line.

Received lines contain all the data you need. Each one contains, in order, the name of the system that handed your system the email (as provided by the sender), the actual IP address of that machine (as provided by the receiver), the name of the receiving system, occasionally an intended recipient, and the time and date the handoff took place. Each time a mail system handles a piece of email—forwards it, receives it, and so on—a new Received line is added to the top of the list. This is a requirement of the RFCs governing how Internet mail works. You'll note that there are several Received lines in the typical piece of spam mail—usually more than would be found in legitimate mail, but not always.
      However many Received lines are present, they almost always contain enough data to find out at least where the spammer dropped off the mail for delivery. Usually it's enough to figure out the actual system of origin as well. Spammers realized this long ago, so they typically throw in extra, forged Received lines in an attempt to confuse the mail's recipient. But assuming your mail system is decently configured, you always have a starting point: the Received line added by your mail server. It's always the one at the very top of the list.

Decoding Sample Spam
       Figure 1 shows the header from a piece of mail I got just recently, as displayed in Outlook® 97.

Figure 1: Spam Header in Outlook

      Figure 1: Spam Header in Outlook

This is actually a fairly restrained piece of spam. There are only a few pieces of outright goofiness in the header, most notably a From line which appears to be Aztec in origin—but as noted before, you can ignore that. Start at the first (top) Received line.


 Received: from euromar-travel.com (root@[194.65.2.129])
 by anvilite.murkworks.net (8.6.12/8.6.9) with ESMTP id VAA29070 for
 <kiki@murkworks.net>; Thu, 29 Jul 1999 21:11:13 -0700

This much I know is true: it was received by me (anvilite.murkworks.net). That part was placed by the SMTP handler on my machine. Where it's from is another matter completely. Fortunately, part of that information was also placed by my machine—specifically, the portion in parentheses. Broken down and printed in color—red meaning provided by someone else, green meaning provided by my machine, and black meaning uninteresting in this scenario—it looks like this:


 Received: from euromar-travel.com (root@[194.65.2.129])
 by anvilite.murkworks.net (8.6.12/8.6.9) with ESMTP id VAA29070 for 
 <kiki@murkworks.net>; Thu, 29 Jul 1999 21:11:13 -0700

Here's how it looks generalized into a template:


 Received from <sending machine, as provided by that machine—untrustworthy>
 (<sending machine's name (sometimes) and IP address (always),
 as provided by receiving machine—trustworthy>)
 by <receiving machine's name, provided by receiving machine> for
 <some random user, sometimes accurate and sometimes not;
 provided by sending machine but uninteresting>; 
 <date and time, provided by receiving machine, sometimes interesting>

Note that in this case, the receiving machine (anvilite.murkworks.net) did not fill in the name of the sending machine, and instead left only the IP address, 194.65.2.129. This used to be standard; hosts would assume that other hosts would correctly identify themselves. The IP was provided for other reasons.
These days the sending host often lies—particularly where spammers are involved—and most administrators turn on a feature called reverse authentification. Reverse authentification causes the receiving mailer to look up the name belonging to the IP address of the machine handing it the mail. It then puts this name in the header, next to the IP address, but inside the parentheses rather than outside. This gives you a handy and quick check to see if a sending machine lied.
In this example the reverse authentification failed. This does not automatically mean that the sending host was lying about its name. It could simply be flaky, it could be running DHCP and therefore have a changing IP address, or it could be a machine within a domain that doesn't have a specific name and therefore doesn't have a specific DNS entry. Now is when you need the tools I mentioned earlier.

Figure 2: NSLookUp of IPAddress

First, let's see what NSLookup has to say about 194.65.2.129 (see Figure 2). As you can see, name service knows about this host, and thinks that it's a machine called dns.madinfo.pt, which is not what the sending host told us. Suspicious, but not necessarily a problem. One host can have many names. Besides, almost no spammers hand you the mail directly. Let's ping euromar-travel.com and see whether they're alive (see Figure 3).

Figure 3: Pinging euromar-travel.com

Figure 3: Pinging euromar-travel.com

Well, they aren't. But still, this isn't fatal; they might not be a host that is up full time. Let's use the second tool, WhoIs. WhoIs provides access to the various domain registration databases, such as those maintained by the InterNIC and other organizations. And as you can see in Figure 4, this yields better results. Euromar-travel.com is a registered domain with the InterNIC, and all its domain name servers are in the madinfo.pt domain. This means that euromar-travel.com is probably a single machine leasing an address from a larger organization. It also means that the host has legitimately identified itself; it gave a correct name.

Figure 4: Detailed Information from WhoIs

Figure 4: Detailed Information from WhoIs

And what does this mean? The upshot is that this Received line is valid. So the host that handed this mail to me was almost certainly not the originating site of the spam. Spammers don't generally play that way. Plus, there's another, older Received line below this one, which means that euromar-travel.com was simply a victim—someone a spammer picked on to deliver the spam mail for free.
When you have a valid Received line, it's generally safe to assume that the previous Received line was filled in by an honest host. This isn't always the case, but it's a reasonable rule of thumb. It also doesn't mean that everything in that Received line is valid, any more than everything in the previous Received line was correct. Spammers lie to everyone. So let's look at the Received line filled in by the previous hop.


 Received: from tgnfg.nada.kth.se (zeus.host4u.net [216.71.64.21])
  by euromar-travel.com (8.8.8/8.8.8) with SMTP id FAA03578;
  Tue, 30 Mar 1999 05:08:56 GMT

This example shows that euromar-travel.com has reverse-authentification turned on; instead of just the IP address in parentheses (216.71.64.21), there's a host name (zeus.host4u.net) as well. Validating this using either NSLookup (see Figure 5) or WhoIs (see Figure 6) tells you that indeed euromar-travel.com found the correct name.

Figure 5: Validating the IP Address in NSLookup

Figure 6: IP Address Lookup in WhoIs

Figure 6: IP Address Lookup in WhoIs

And, as the Received line makes clear, that correct name differs wildly from the name handed to them by the sending system. The use of the word "nada" was the other warning flag. People do give hosts and domains names like that in real life, but more often it's a spammer making something up. So the spamming host was almost certainly host4u.net. But before sending mail, it's important to verify that tgnfg.nada.kth.se isn't real—and a quick scan against the major WhoIs servers shows that it's not. Neither is the listed subdomain, nada.kth.se. kth.se does exist (see Figure 7), but that doesn't mean much at this point. I can be fairly sure that the spam came from some random user at host4u.net. Which specific user, I don't know and can't find out; but they can, with their log files.

Figure 7: Domain Name Lookup in WhoIs

Figure 7: Domain Name Lookup in WhoIs

      Often, there will be more Received lines following the first dishonest host. Ignore them. They would almost always have been placed by the spammer to confuse you—more aluminum foil in your radar.

Taking Action
      At this point, it's time to send mail. You need to send a note saying what happened, and you must include the spam itself, with its entire header intact. Without that header, the sending system can't track down the individual user who sent the spam, and your complaint will generally just be ignored.
      But where should you send the complaint? Not to the spammer, of course (see Rule 4). You should usually inform two addresses at the originating site: abuse and postmaster (see Figure 8). If you're running a mail server on the Internet and you're able to send or receive mail, you're required to have a postmaster address, and it's required that you monitor it. Postmaster is the default contact for any domain. This isn't necessarily strictly enforced—postmaster addresses have bounced before, mostly those in spam havens—but it's generally true. Having an abuse address is an informal standard that has cropped up among the larger ISPs. If they have an abuse user, they generally request that you send spam complaints to that address, and reserve postmaster for connectivity and delivery problems. I send to both in many cases, to postmaster alone if the originating domain isn't an ISP (this does happen), and to abuse alone in the cases of a few larger ISPs that I know monitor the abuse address regularly.
       Figure 8 shows some sample mail that gets the point across. Note the second paragraph about their system being used to relay spam. I'm able to trace things only so far with the information I have; but the site from which the spam appeared to originate will be able to check its mail and IP-connect logs to see whether it came from somewhere further back along the line. If it did, it'll be up to them to trace it. It'll also mean that the spammer probably hacked their system in ways that violate Federal anti-hacking statutes. In either case, they'll find out where it came from. If the mail didn't come from a hidden source, the sending ISP will be able to discover what specific user sent the spam, and terminate their service or take other punitive action, depending upon their policies.
      That takes care of the primary complaint. Next I'll tell euromar-travel.com that their systems have been used to relay spam mail. They won't be any fonder of this than I was happy to receive the spam; after all, their systems were victimized much more severely than mine. They, too, will need a note explaining what happened (see Figure 9), a copy of the spam, and the full header, to verify for their own satisfaction that they were used as I describe.
      Finally it's useful to look for addresses in the spam itself (email addresses or Web sites), find out who is hosting those services, and let them know that spammers and spamming are involved. Generally, they'll shut things down pretty quickly, and that'll be one fewer spammer operating on the net. At least until they get another "100 Free Hours!" CD-ROM in the mail, anyway.

http://msdn.microsoft.com/library/partbook/ asp20/aspinternetmail.htm
and
http://msdn.microsoft.com/library/psdk/cdosys/ cdosysguide_transportevts_examples_6ycf.htm

From the December 1999 issue of Microsoft Internet Developer.