Understanding the World's Worst Spamming Botnet
Total Page:16
File Type:pdf, Size:1020Kb
Computer Sciences Department Understanding the World’s Worst Spamming Botnet Tatsuya Mori Holly Esquivel Aditya Akella Akihiro Shimoda Shigeki Goto Technical Report #1660 June 2009 Understanding the World's Worst Spamming Botnet Tatsuya Mori Holly Esquivel Aditya Akella NTT Laboratories UW - Madison UW - Madison Akihiro Shimoda Shigeki Goto Waseda University Waseda University ABSTRACT ming to template-based spamming. These new sophisticated On November 11, 2008, the primary web hosting company, user interfaces play a key role in the efficiency of dissemi- McColo, for the command and control servers of Srizbi bot- nation mechanisms in spamming infrastructures [8]. These net was shutdown by its upstream ISPs. Subsequent re- improvements have lead to an exponential increase in spam- ports claimed that the volume of spam dropped significantly ming capabilities. For example, “Srizbi” is claimed to be everywhere on that very same day. In this work, we aim capable of sending 60 billion spam messages per day, which to understand the world’s worst spamming botnet, Srizbi, is more than half of the total 100 billion spam messages sent and to study the effectiveness of targeting the botnet’s com- per day on average [9]. According to a more recent report, mand and control servers, i.e., McColo shutdown, from the today’s newest spamming botnet, “Conficker”, consists of viewpoint of Internet edge sites. We conduct an extensive more than 10 million infected hosts all over the world and measurement study that consists of e-mail delivery logs and could be capable of sending out 400 billion spam messages packet traces collected at four vantage points. The total mea- per day [2]. These large global-spamming infrastructures surement period spans from July 2007 to April 2009, which have traditionally been hard to stifle. includes the day of McColo shutdown. We employ passive In late 2008, a bold and drastic action was taken to con- TCP fingerprinting on the collected packet traces to identify tain the world’s worst spamming botnet, Srizbi. On Novem- Srizbi bots and spam messages sent from them. ber 11, the web hosting service provider, McColo, was shut The main contributions of this work are summarized as down by its two upstream ISPs. McColo is known as a so- follows. We first estimate the global scale of Srizbi botnet in called “bulletproof hosting” company because it allowed its a probabilistic way. Next, we quantify the volume of spam customers to bypass laws regulating Internet content and ser- sent from Srizbi and the effectiveness of the McColo shut- vices. McColo also allowed these customers to remain on- down from an edge site perspective. Finally, we reveal sev- line regardless of complaints. The company hosted the C&C eral findings that are useful in understanding the growth and servers for major spamming botnets, including Srizbi [3]. evolution of spamming botnets. We detail the rise and steady Accordingly, as many operators and researchers expected, it growth of Srizbi botnet, as well as, the version transition of is widely reported that the volume of spam dropped from 50 Srizbi after the McColo shutdown. to 75 percent on the very same day [3, 14]. Although recent measurement studies report that spam vol- ume has returned to pre-McColo shutdown levels [14], the 1. INTRODUCTION temporal but great success of the shutdown indicates that this Over the past few years, the volume of e-mail spam has unprecedented and drastic move was effective. This action grown significantly to the point it is no longer just a nui- allows us to better understand the larger picture of spam- sance. Some reports suggest that as much as 90–95% of all ming botnets and the way in which they can make transi- e-mail sent or received today is spam [1,6]. E-mail spam has tions, which is crucial to building an effective and sustain- evolved along many dimensions in recent times, and is em- able anti-spam solution. As a first step toward this goal, we ployed to conduct numerous subversive and illegal activities aim to understand the world’s worst spamming botnet, Srizbi such as financial scams, phishing, and propagating malware. and to study the effectiveness of targeting the botnet’s C&C Recently, botnets have been widely used as a scalable servers (i.e., McColo shutdown). We also look at the long- and elusive approach to disseminating spam messages. The term trends of Srizbi to study how the botnet has grown and bot on the infected host sends out spam messages triggered evolved. by instructions from spammers who purchased access to the We conduct an extensive analysis of e-mail delivery logs botnet. Spammers send the instructions from the command and packet traces collected at four different vantage points and control (C&C) server via IRC channels. Recently, spam- across two countries: US and Japan. We also use publicly ming botnets have made the transition from proxy-based spam- available packet traces published by MAWI [12]. The four 1 locations consist of four different types of Internet edge sites, Table 1: Total measurement periods of data sets. namely, an enterprise network, a campus network, a leaf tcpdump SMTP log site of a scientific research network, and an international UW Feb 9, 2008 – Jul 11, 2008 Feb 1, 2008 – Apr 30, 2008 Apr 7, 2008 – Jul 31, 2008 Apr 7, 2008 – Jul 31, 2008 backbone link used by several research organizations. The CORP Dec 26, 2008 – Apr 17, 2009 Jan 1, 2009 – Mar 31, 2009 total data collection periods span from July 2007 to April GEM – Aug 1, 2008 – Apr 30, 2009 2009. The spam volume changes between the pre- and post- MAWI Jul 1, 2007 – Apr 27, 2009 – McColo time period can be studied from our data sets. To detect the spam traffic from Srizbi bots, we leverage a TCP fingerprinting technique, which can identify the op- surement tool used to collect the packet traces. The second erating system of a host based on the TCP/IP stack of the set of data contains all e-mail delivery records for each van- system. As Stern discovered [18], Srizbi uses a dedicated tage point for all respective e-mail servers. For future refer- network driver that uses intrinsic TCP/IP parameter settings. ence, we refer to this the data set as SMTP log(s). Table 1 Thus, we can extract hosts infected with the Srizbi trojan by summarizes the measurement period of each data set. tracking their TCP fingerprint signature. In addition to the In the following, we describe how each data set is col- three signatures presented by Stern et al. [18], we found that lected and processed for our analysis. the Srizbi botnet has other variants of these signatures. Tcpdump. For UW, and CORP, packet traces are col- The primary contributions of this work are: lected on the incoming external links of the networks. For • We probabilistically estimate the size of Srizbi botnet by MAWI, we use packet traces which were collected on trans- correlating data sets that were independently sampled at Pacific line (150-Mbps link) that connects US and Japan, Internet edge sites. which is utilized by several research organizations. Analyz- ing packet traces enables us to study all the incoming SMTP • We quantify the volume of spam sent from Srizbi and connections to the networks. To extract minimal informa- study the effectiveness of the McColo shutdown from the tion excluding private information, we filter all the packets view point of Internet edge sites. other than TCP packets with SYN flag that are destined to • We reveal several findings that are useful in understand- the SMTP port. This filtration allows us to employ TCP fin- ing the spread of spamming botnets; specifically, we note gerprinting on packets from e-mail senders while discarding the steady growth of Srizbi and the version transition of all other private information in the subsequent SMTP trans- Srizbi after the McColo shutdown. missions. The IP addresses of MAWI tcpdump traces are We believe that the collection and analysis of long-term anonymized to make the data publicly available. Since these data sets is a promising approach to identifying the upcom- traces have been collected since July 2007, we can study the ing spamming botnets, studying how they are mitigated by long-term trends of the spamming botnets. actions against them, and building a methodology to stop SMTP logs. For UW, CORP and GEM, SMTP logs were spamming botnets permanently. collected on commercial anti-spam appliances. UW and CORP The remainder of this paper is structured as follows: Sec- operate greylisting mechanisms on top of their anti-spam ap- tion 2 presents a description of data sets we use in this work. pliances. Greylisting is a mechanism that temporarily re- In section 3, we present our findings on the characteristics jects e-mail messages from a sender which has not previ- and trends of the Srizbi botnet. In section 4 we discuss some ously been seen. Greylisting is effective because if an e- related studies and compared them to ours. Finally, section 5 mail is rejected, a spammer will likely not retransmit it since concludes our work. spammers cannot afford the time and resources to retry thou- sands of bounced messages. By analyzing greylisting logs, the SMTP connections which did not attempt retransmission 2. DATA DESCRIPTION can be extracted. In this work, we regard these connections We collected data from four vantage points located at dif- as attempted spam messages sent to the e-mail servers. That ferent organizations and countries. The measurement period is, if a connection is filtered by greylisting and is not retried spans from July 2007 to Apr 2009.