TECHSPLOITATION I spend an inordinate amount of time wondering why my spam looks the way it does. Until quite recently, I received about 20,000 spam e-mails every day. The poor little Bayesean filter in my Thunderbird e-mail program couldn't keep up and would routinely barf when confronted with such huge piles of crap from "Nuclear R. Accomplishment" with the subject line "$subject" and a message body full of random quotes from Beowulf.
Before I finally fixed my spam problem — oh blissfully small inbox! — I developed a few vaguely paranoid theories. Briefly, I imagined spammers were spying on my inbox and culling sender names from it that matched those of my friends. In my saner moments, I would wonder why exactly spam evolved to look the way it does. Why do spammers keep sending me pictures of pink, bouncy letters that spell "mortgage," followed by text from a random Web site? And why, oh why, do they send me e-mails containing nothing but the cryptic line, "he said from the doorway, where she"? How can that be good business sense?
So I called expert Daniel Quinlan, who is an antispam architect at Ironport Systems as well as a contributor to open-source antispam system Spam Assassin. He patiently listened to me rant about my e-mail problems — I think antispam experts are sort of like geek therapists — then explained why I receive spam from random dictionary words strung together into a name like Elephant Q. Thermodynamic. It's done to fool any spam filter that refuses to receive e-mail from somebody who has already sent you spam in the past. "They want to create a name that your spam filter has never seen before," Quinlan said. It turns out every weirdness in my spam is "probably there for a good reason," he said. In the arms race between spammers and antispammers, spammers try every trick they can to circumvent filtering software.
Often, the spam you get is the result of months or years of this arms race. For example, spammers of yesteryear started sending images instead of text, so that spam filters looking for text like "viagra" would be fooled. Instead, the image would contain the word "viagra," but filters would see only an image and let it through. In response, antispam software began tossing e-mails that contained only an image, since spam containing an image typically has some text with it like "check out my pictures from Hawaii" or whatever. Rarely does a real person send just an image.
Quinlan said spammers figured out their pictures were being chucked, so they started adding a few random words to their mail and got through the filters again. Then antispammers started chucking e-mails with images that also contained random words that didn't make sentences. And that's why, today, you get images with chunks of text taken from random books and Web sites. As long as the text fits into sentences and isn't random words strung together, spam filters have a harder time figuring out if the mail is spam or ham. Spammers also send slightly different images every time, so that spam filters can't identify the image itself as spam. And they fill the images with bouncy, pink letters advertising their crap because character recognition software can't read bouncy letters. So any spam filter that uses character recognition software to look at text in images to find spam will be fooled.
OK, so there is a reason behind the madness. But how could Quinlan explain the spam I get that contains no advertisement for anything, no links nor images, and instead merely quotes some random passage from Dostoyevsky? Quinlan said there's no way to know for sure, but the reigning theory among antispam experts is that it's part of what's called a "directory harvest attack" in which the spammer tries to figure out if there's a real person behind a randomly chosen e-mail address.