Studying Spamming Botnets Using Botlab John P
Total Page:16
File Type:pdf, Size:1020Kb
Studying Spamming Botnets Using Botlab John P. John Alexander Moshchuk Steven D. Gribble Arvind Krishnamurthy Department of Computer Science & Engineering University of Washington Abstract erate. Passive honeynets [13, 27, 41] are becoming less applicable to this problem over time, as botnets are in- In this paper we present Botlab, a platform that con- creasingly propagating via social engineering and web- tinually monitors and analyzes the behavior of spam- based drive-by download attacks that honeynets will not oriented botnets. Botlab gathers multiple real-time observe. Overall, there is still opportunity to design de- streams of information about botnets taken from distinct fensive tools to filter botnet spam, identify and block perspectives. By combining and analyzing these streams, botnet-hosted malicious sites, and pinpoint which hosts Botlab can produce accurate, timely, and comprehensive are currently participating in a spamming botnet. data about spam botnet behavior. Our prototype system In this paper we turn the tables on spam botnets by us- integrates information about spam arriving at the Univer- ing the vast quantities of spam that they generate to mon- sity of Washington, outgoing spam generated by captive itor and analyze their behavior. To do this, we designed botnet nodes, and information gleaned from DNS about and implemented Botlab, a continuously operating bot- URLs found within these spam messages. net monitoring platform that provides real-time informa- We describe the design and implementation of Botlab, tion regarding botnet activity. Botlab consumes a feed of including the challenges we had to overcome, such as all incoming spam arriving at the University of Washing- preventing captive nodes from causing harm or thwart- ton, allowing it to find fresh botnet binaries propagated ing virtual machine detection. Next, we present the re- through spam links. It then executes multiple captive, sults of a detailed measurement study of the behavior of sandboxed nodes from various botnets, allowing it to ob- the most active spam botnets. We find that six botnets serve the precise outgoing spam feeds from these nodes. are responsible for 79% of spam messages arriving at the It scours the spam feeds for URLs, gathers information UW campus. Finally, we present defensive tools that take on scams, and identifies exploit links. Finally, it corre- advantage of the Botlab platform to improve spam filter- lates the incoming and outgoing spam feeds to identify ing and protect users from harmful web sites advertised the most active botnets and the set of compromised hosts within botnet-generated spam. comprising each botnet. 1 Introduction A key insight behind Botlab is that the combination of both incoming and outgoing spam sources is essential for Spamming botnets are a blight on the Internet. By some enabling a comprehensive, accurate, and timely analysis estimates, they transmit approximately 85% of the 100+ of botnet behavior. Incoming spam bootstraps the pro- billion spam messages sent per day [14, 21]. Botnet- cess of identifying spamming bots, outgoing spam en- generated spam is a nuisance to users, but worse, it can ables us to track the ebbs and flows of botnets’ ongoing cause significant harm when used to propagate phishing spam campaigns and establish the ground truth regard- campaigns that steal identities, or to distribute malware ing spam templates, and correlation of the two feeds can to compromise more hosts. classify incoming spam according to botnet that is sourc- These concerns have prompted academia and industry ing it, determine the number of hosts active within each to analyze spam and spamming botnets. Previous stud- botnet, and identify many of these botnet-infected hosts. ies have examined spam received by sinkholes and pop- ular web-based mail services to derive spam signatures, 1.1 Contributions determine properties of spam campaigns, and character- ize scam hosting infrastructure [1, 39, 40]. This analysis Our work offers four novel contributions. First, we tackle of “incoming” spam feeds provides valuable information many of the challenges involved in building a real-time on aggregate botnet behavior, but it does not separate ac- botnet monitoring platform, including identifying and tivities of individual botnets or provide information on incorporating new bot variants, and preventing Botlab the spammers’ latest techniques. Other efforts reverse hosts from being blacklisted by botnet operators. engineered and infiltrated individual spamming botnets, Second, we have designed network sandboxing mech- including Storm [20] and Rustock [5]. However, these anisms that prevent captive bot nodes from causing harm, techniques are specific to these botnets and their com- while still enabling our research to be effective. As well, munication methods, and their analysis only considers we discuss the long-term tension between effectiveness characteristics of the “outgoing” spam these botnets gen- and safety in botnet research given botnets’ trends, and we present thought experiments that suggest that a de- Propagation: Malware authors are increasingly relying termined adversary could make it extremely difficult to on social engineering to find and compromise victims, conduct future botnet research in a safe manner. such as by spamming users with personal greeting card Third, we present interesting behavioral character- ads or false upgrade notices that entice them to install istics of spamming botnets derived from our multi- malware. As propagation techniques move up the proto- perspective analysis. For example, we show that just col stacks, the weakest link in the botnet defense chain a handful of botnets are responsible for most spam re- becomes the human user. As well, systems such as pas- ceived by UW, and attribute incoming spam to specific sive honeynets become less effective at detecting new botnets. As well, we show that the bots we analyze use botnet software, instead requiring active steps to gather simple methods for locating their command and control and classify potential malware. (C&C) servers; if these servers were efficiently located Customized C&C protocols: While many of the older and shut down, much of today’s spam flow would be dis- botnet designs used IRC to communicate with C&C rupted. As another example, in contrast to earlier find- servers, newer botnets use encrypted and customized ings [40], we observe that some spam campaigns utilize protocols for disseminating commands and directing multiple botnets. bots [7, 9, 33, 36]. For example, some botnets communi- Fourth, we have implemented several prototype de- cate via HTTP requests and responses carrying encrypted fensive tools that take advantage of the real-time in- C&C data. Manual reverse-engineering of bot behavior formation provided by the Botlab platform. We have has thus become time-consuming if not impossible. constructed a Firefox plugin that protects users from scam and phishing web sites propagated by spam bot- Rapid evolution: To evade detection from trackers nets. The plug-in blocked 40,270 malicious links em- and anti-malware software, some newer botnets morph anating from one botnet monitored by Botlab; in con- rapidly. For instance, most malware binaries are often trast, two blacklist-based defenses failed to detect any of packed using polymorphic packers that generate differ- these links. As well, we have designed and implemented ent looking binaries even though the underlying code a Thunderbird plugin that filters botnet-generated spam. base has not changed [29]. Also, botnet operators are For one user, the plugin reduced the amount of spam that moving away from relying on a single web server to host bypassed his SpamAssassin filters by 76%. their scams, and instead are using fast flux DNS [12]. The rest of this paper is organized as follows. Sec- In this scheme, attackers rapidly rebind the server DNS tion 2 provides background material on the botnet threat. name to different botnet IP addresses, in order to defend Section 3 discusses the design and implementation of against IP blacklisting or manual server take-down. Fi- Botlab. We evaluate Botlab in Section 4 and describe ap- nally, botnets also make updates to their C&C protocols, plications we have built using it in Section 5. We discuss by incorporating new forms of encryption and command our thoughts on the long-term viability of safe botnet re- distribution. search in Section 6. We present related work in Section 7 Moving forward, analysis and defense systems must and conclude in Section 8. contend with the increasing sophistication of botnets. 2 Background on the Botnet Threat Monitoring systems must be pro-active in collecting and executing botnet samples, as botnets and their behavior A botnet is a large-scale, coordinated network of comput- change rapidly. As well, botnet analysis systems will in- ers, each of which executes specific bot software. Botnet creasingly have to rely on external observations of botnet operators recruit new nodes by commandeering victim behavior, rather than necessarily being able to crack and hosts and surreptitiously installing bot code onto them; reverse engineer botnet control traffic. the resulting army of “zombie” computers is typically controlled by one or more command-and-control (C&C) 3 The Botlab Monitoring Platform servers. Botnet operators employ their botnets to send The Botlab platform produces fresh information about spam, scan for new victims, steal confidential informa- spam-oriented botnets, including their current cam-