Proceedings of the FREENIX Track: 2003 USENIX Annual Technical Conference
Total Page:16
File Type:pdf, Size:1020Kb
USENIX Association Proceedings of the FREENIX Track: 2003 USENIX Annual Technical Conference San Antonio, Texas, USA June 9-14, 2003 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2003 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. ASK: Active Spam Killer Marco Paganini [email protected] www.paganini.net Abstract tion message is crafted in such a way that a simple re- ply keeping the “Subject” line intact will suffice. The We present Active Spam Killer (ASK), a program that confirmation message also contains a unique MD5 [16] attempts to validate unknown senders before allowing hash computed by combining the contents of the original delivery of their message. Validation occurs by means of email with a secret key known only to the recipient. This a challenge reply sent to senders who are not yet known prevents false confirmation returns as the code is based to the system. Messages are kept in a queue pending on the unique characteristics of the receiver. confirmation until the sender replies to the challenge. Further messages coming from confirmed senders are The message remains stored in the pending mail delivered immediately. In a sample of 1000 spam mails, queue until a confirmation return is received (a reply ASK was 99.7% effective at blocking spam, resulting in to the confirmation message with the MD5 hash in the only 3 spam messages being delivered. Other programs’ “Subject” header). When that happens, ASK checks the best ratios were 97.8% or as many as 22 spam messages pending mail queue for a message whose MD5 signa- delivered. ture matches the one in the confirmation return. If found, the original message is delivered from the pending mail 1 Introduction queue and the sender’s email address is automatically added to the whitelist. This sender has now been vali- Unsolicited Commercial Emails (UCEs), also com- dated and all future messages from this address will be monly called “spam” represent a serious problem to immediately delivered. most Internet users, who are constantly bombarded with all sorts of scams, promotions, and offensive material. The sender’s address can also be added to two other This situation has prompted the creation of many differ- lists: the ignorelist and the blacklist. The first causes ent tools to eliminate the problem, with varying degrees emails to be silently ignored while the second not only of success. ignores the email but also sends a message back to the The most common way to deal with spam is to parse originator explaining that future emails coming from this incoming mails and decide whether they should be deliv- address are blocked. ered or not based on their contents. Reasonable results This challenge-authentication scheme guarantees that have been obtained with this technique, but the complex- delivered emails always come from valid senders. Un- ity and diversity of human languages make it a difficult wanted (but technically valid) senders can be easily ig- task. Such content filtering tools propose to attack one nored as they offer a simple key to their detection: their of the weak spots in spam: the message content itself. own email addresses. The effectiveness of such tools can be drastically re- duced when the incoming mail employs an unknown Although an auto-responder could be used to defeat language or even an unknown character set. Also, clev- this method, this is unlikely as it would expose the erly crafted emails may never be detectable as the differ- sender to legal complications, account and service can- ence between those and perfectly valid emails is subtle. cellations, and a fairly large number of ”Invalid Ad- dress” response messages during normal operation. Active Spam Killer (ASK) proposes to attack a dif- ferent weak spot in the spam chain: the validity of the The rest of this paper is organized as follows: Sec- sender’s email address. When a message from an un- tion 2 provides background information on popular anti- known origin is received, a challenge (also known as spam techniques. Section 3 details ASK’s design and confirmation message) is sent back to the mail origi- operation. Section 4 compares the effectiveness of ASK nator. This message contains brief instructions to the against other anti-spam tools. Section 5 surveys other sender on how to get authenticated into the system and challenge-based solutions. Finally, Section 6 presents cause delivery of the original message. The confirma- our conclusions and future work. USENIX Association FREENIX Track: 2003 USENIX Annual Technical Conference 51 2 Background not) coming from spammers who use open relays to send Compared to the traditional method of mailing printed their messages. Even though some RBLs specifically list propaganda, spam costs almost nothing. No mailing in- dial-in lines, often it is not practical to use those as many frastructure is needed except for an Internet account and legitimate users send emails directly from their worksta- a personal computer. A one or two percent return on in- tions over dialup lines. vestment potential applied to millions of emails results Some RBL providers have complicated procedures in a number large enough to justify the approach to a to remove entries from their databases, leading to a number of people. substantial number of false positives (See Section 4.1). The first ingredient is a large number of email ad- Also, there are normally no whitelist mechanisms avail- dresses, usually obtained by harvesting addresses off the able to regular users, meaning that it is impossible to Internet. A study conducted by the Federal Trade Com- grant access to a certain known valid email or IP with- mission concluded that 86% of all addresses posted to out supervisory privileges. Web pages and to the newsgroups received spam [5]. 2.2 Content Filtering Tools Anything resembling an email address is collected and used. Those without the resources to mine the addresses Tools in this category try to detect spam by investigating themselves can purchase CDs containing ready-to-use the content of received emails. lists of addresses. Early attempts used simplistic keyword filters, nor- Next, a Mass Emailer like DynamicMailer [3] is used mally implemented with generic mail processing tools. to deliver the message to the list of addresses. These Specific strings like “Dear Sir” in the body of the email programs normally provide certain facilities like header would flag the message as spam. This produced inaccu- forging, sending emails directly from the user’s worksta- rate results and a fairly large number of false positives. tion (bypassing the need for an SMTP server) and others. A more sophisticated approach to the problem em- Spammers must always act as inconspicuously as pos- ploys scoring of certain keywords, sentences and char- sible to avoid account cancellation, revenge from angry acteristics pertinent to spam mail. A simple “Dear Sir” users, and legal problems. The sender’s email address might indicate that we are dealing with spam, but “Dear is often forged to an invalid address or to a throw-away Sir,”“Click Here,” and “Call now” in the same email account from a free web-mail provider. These accounts message are a very clear indication. SpamAssassin [20] are rapidly filled with bounce messages as a result of utilizes this technique. outdated and invalid emails in the list. Usually, a complex set of rules exists with a positive Instructions on how to obtain the product or service or negative numeric score assigned to each rule. Rules offered are normally contained in the body of the mail, that clearly indicate spam receive high values. Negative with URLs pointing to the Web page of the seller. values reduce the probability of the email being classi- Next, we outline the most common anti-spam tech- fied as spam (even in the presence of other clear indica- niques in use today. tions). A rule to match a string like “Dear Sir/Madam” will have a lower score than “Work from home” as the 2.1 Realtime Blackhole Lists latter is a clearer indication of the common home em- Misconfigured mail servers will relay any incoming ployment Internet scams. Expressions like “Usenix” message to any destination without control. These and “Algorithm” would receive a negative value as they servers can be used to deliver mail indiscriminately and strongly suggest that this is not a spam mail. The mail in many cases, anonymously. will be marked as spam if the sum of all matching scores Realtime Blackhole Lists (RBLs) are online databases exceeds a user-defined threshold. containing the IP address of such servers. These There are specific and distinct rules for the email databases are queried by Mail Transfer Agents (MTAs) headers and body. The rules applied to the headers will upon receipt of a new message. If the IP address origi- be equally effective, regardless of the language used to nating the connection is listed in the database, the con- send the email. Body analysis however is greatly af- nection will be terminated and the appropriate error code fected by language: a set of rules that perfectly detects will be sent. Some RBLs also keep dial-in IP addresses emails in the English language could completely miss in an attempt to detect messages sent directly from dial- emails written in Italian or Korean.