Understanding the Network-Level Behavior of Spammers

Understanding the Network-Level Behavior of Spammers Anirudh Ramachandran and Nick Feamster College of Computing, Georgia Tech {avr, feamster}@cc.gatech.edu ABSTRACT Unfortunately, little is known about how much spam comes from This paper studies the network-level behavior of spammers, includ- botnets versus other techniques (e.g., short-lived route announce- ing: IP address ranges that send the most spam, common spamming ments, open relays, etc.), the geographic and topological distribu- modes (e.g., BGP route hijacking, bots), how persistent across time tion of where most spam originates (in terms of Internet Service each spamming host is, and characteristics of spamming botnets. Providers, countries, and IP address space), the extent to which dif- We try to answer these questions by analyzing a 17-month trace ferent spammers use the same network resources, the stationarity of over 10 million spam messages collected at an Internet “spam of these properties over time, and so forth. A primary goal of this sinkhole”, and by correlating this data with the results of IP-based paper is to shed some light on these relatively unstudied questions. blacklist lookups, passive TCP fingerprinting information, routing Beyond merely exposing spammers’ behavior, gathering infor- information, and botnet “command and control” traces. mation about the network-level behavior of spam could be a ma- We find that most spam is being sent from a few regions of jor asset for designing spam filters that are based on spammers’ IP address space, and that spammers appear to be using transient network-level behavior (presuming that the network-level charac- “bots” that send only a few pieces of email over very short peri- teristics of spam are sufficiently different than those of legitimate ods of time. Finally, a small, yet non-negligible, amount of spam mail, a question we explore further in Section 4). Whereas spam- is received from IP addresses that correspond to short-lived BGP mers have the flexibility to alter the content of emails—both per- routes, typically for hijacked prefixes. These trends suggest that de- recipient and over time as users update spam filters—they have far veloping algorithms to identify botnet membership, filtering email less flexibility when it comes to altering the network-level proper- messages based on network-level properties (which are less vari- ties of the spam they send. It is far easier for a spammer to alter the able than email content), and improving the security of the Internet content of email messages to evade spam filters than it is for that routing infrastructure, may prove to be extremely effective for com- spammer to change the ISP, IP address space, or botnet from which bating spam. spam is sent. Towards the goal of developing techniques that will help in the design of more robust network-level spam filters, this paper char- Categories and Subject Descriptors acterizes the network-level behavior of spammers as observed at C.2.0 [Computer Communication Networks]: Security and pro- a large spam sinkhole domain, which stores complete logs of all tection; C.2.3 [Computer Communication Networks]: Network spam received from August 2004 through December 2005. We operations – network management perform a joint analysis of the data collected at this sinkhole with an archive of BGP route advertisements as heard from the receiving General Terms network, traces from the “command and control” of a Bobax botnet, and traces of legitimate email from the mail server logs of a large Design, Management, Reliability, Security email service provider. Although many aspects of mail headers can be forged, we base our analysis strictly on properties of the sender Keywords that are difficult to forge (e.g., IP addresses that made connections spam, botnet, BGP, network management, security to our mail servers, passive TCP fingerprints, corresponding route announcements, etc.). 1. Introduction We draw the following surprising conclusions from our study: This paper presents a study of the network-level characteristics of unsolicited commercial email (“spam”). Much attention has been • The vast majority of received spam arrives from a few con- devoted to studying the content of spam, but comparatively little at- centrated portions of IP address space (Section 4). Spam tention has been paid to spam’s network-level properties. Conven- filtering techniques currently make no assumptions about tional wisdom often asserts that most of today’s spam comes from the distribution of spam across IP address space. In a re- botnets, and that a large fraction of spam comes from Asia; a few lated area, many worm propagation models assume a uni- studies have attempted to quantify some of these characteristics [5]. form distribution of vulnerable hosts across IP address space (e.g., [29]). In contrast, we find that the vast majority of spamming hosts—and, perhaps not coincidentally, most Permission to make digital or hard copies of all or part of this work for Bobax-infected hosts—lie within a small number of IP ad- personal or classroom use is granted without fee provided that copies are dress space regions. Unfortunately, with a few exceptions not made or distributed for profit or commercial advantage and that copies (e.g., 60.* – 70.*), most legitimate email comes from the bear this notice and the full citation on the first page. To copy otherwise, to same regions of IP address space, which suggests that, in republish, to post on servers or to redistribute to lists, requires prior specific general, effective filtering based on network-level properties permission and/or a fee. SIGCOMM’06, September 11-16, 2006, Pisa, Italy. may require determining second-order characteristics (e.g., Copyright 2006 ACM 1-59593-417-0/06/0009 ...$5.00. botnet membership). • Most received spam is sent from Windows hosts, each of untraceably. Based on our findings, Section 7 offers positive rec- which sends a relatively small volume of spam to our do- ommendations for designing more effective mitigation techniques. main (Section 5). Most bots send a relatively small volume We conclude in Section 8. of spam to our sinkhole (i.e., less than 100 pieces of spam over 17 months), and about three-quarters of them are only 2. Background and Related Work active for a single time period of less than two minutes (65% This section provides an overview of techniques both for sending of them send all spam in a “single shot”). and for mitigating spam and discusses related work in these areas. • A small set of spammers continually use short-lived route announcements to remain untraceable (Section 6). A small por- 2.1 Spam: Methods and Mitigation tion of spam is sent by sophisticated spammers, who briefly In this section, we offer background on the main techniques used advertise IP prefixes, establish a connection to the victim’s by spammers to send email, as well as some of the more commonly mail relay, and withdraw the route to that IP address space used mitigation techniques. after spam is sent. Anecdotal evidence has suggested that spammers might be exploiting the routing infrastructure to 2.1.1 Spamming methods remain untraceable [1, 30]; this paper quantifies and docu- Spammers use various techniques to send large volumes of mail ments this activity for the first time. To our surprise, we dis- while attempting to remain untraceable. We describe several of covered a new class of attack, where spammers attempt to these techniques, beginning with “conventional” methods and pro- evade detection by hijacking large IP address blocks (e.g., gressing to more intricate techniques. /8s) and sending spam from widely dispersed “dark” (i.e., Direct spamming. Spammers may purchase upstream connec- unused or unallocated) IP addresses within this space. tivity from “spam-friendly ISPs”, which turn a blind eye to the Beyond these findings, this paper’s joint analysis of several activity. Occasionally, spammers buy connectivity and send spam datasets provides a unique window into the network-level charac- from ISPs that do not condone this activity and are forced to change teristics of spam. To our knowledge, this paper presents the first ISPs. Ordinarily, changing from one ISP to another would require study that examines the interplay between spam, botnets, and the a spammer to renumber the IP addresses of their mail relays. To Internet routing infrastructure. remain untraceable and avoid renumbering headaches, spammers We acknowledge that our spam corpus represents only a sin- sometimes obtain a pool of dispensable dialup IP addresses, send gle vantage point, and, as such, drawing general conclusions about outgoing traffic from a high-bandwidth connection the IP address Internet-wide spam is not possible. Our goal is not to present con- spoofed to appear as if it came from the dialup connection, and clusive figures about Internet-wide characteristics of spam. Indeed, proxy the reverse traffic through the dialup connection back to the the data we have collected is a small, localized sample of all spam spamming hosts [25]. traffic, and our statistics may not be reflective of Internet-wide char- Open relays and proxies. Open relays are mail servers that acteristics. However, the spam we have collected represents an in- allow unauthenticated Internet hosts to connect and relay email teresting dataset as it reflects the complete set of spam emails re- through them. Originally intended for user convenience (e.g., to let ceived by a single Internet domain. This dataset exposes spamming users send mail from a particular relay while they are traveling or as a typical network operator for some Internet domain might also otherwise in a different network), open relays have been exploited witness it. This unique view can help us better understand whether by spammers due to the anonymity and amplification offered by the features of spam that any single network operator observes the extra level of indirection.

Understanding the Network-Level Behavior of Spammers

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support