BOTMAGNIFIER: Locating Spambots on the Internet

Total Page:16

File Type:pdf, Size:1020Kb

BOTMAGNIFIER: Locating Spambots on the Internet BOTMAGNIFIER: Locating Spambots on the Internet Gianluca Stringhini§, Thorsten Holz‡, Brett Stone-Gross§, Christopher Kruegel§, and Giovanni Vigna§ §University of California, Santa Barbara ‡ Ruhr-University Bochum {gianluca,bstone,chris,vigna}@cs.ucsb.edu [email protected] Abstract the world-wide email traffic [20], and a lucrative busi- Unsolicited bulk email (spam) is used by cyber- ness has emerged around them [12]. The content of spam criminals to lure users into scams and to spread mal- emails lures users into scams, promises to sell cheap ware infections. Most of these unwanted messages are goods and pharmaceutical products, and spreads mali- sent by spam botnets, which are networks of compro- cious software by distributing links to websites that per- mised machines under the control of a single (malicious) form drive-by download attacks [24]. entity. Often, these botnets are rented out to particular Recent studies indicate that, nowadays, about 85% of groups to carry out spam campaigns, in which similar the overall spam traffic on the Internet is sent with the mail messages are sent to a large group of Internet users help of spamming botnets [20,36]. Botnets are networks in a short amount of time. Tracking the bot-infected hosts of compromised machines under the direction of a sin- that participate in spam campaigns, and attributing these gle entity, the so-called botmaster. While different bot- hosts to spam botnets that are active on the Internet, are nets serve different, nefarious goals, one important pur- challenging but important tasks. In particular, this infor- pose of botnets is the distribution of spam emails. The mation can improve blacklist-based spam defenses and reason is that botnets provide two advantages for spam- guide botnet mitigation efforts. mers. First, a botnet serves as a convenient infrastructure In this paper, we present a novel technique to support for sending out large quantities of messages; it is essen- the identification and tracking of bots that send spam. tially a large, distributed computing system with mas- Our technique takes as input an initial set of IP addresses sive bandwidth. A botmaster can send out tens of mil- that are known to be associated with spam bots, and lions of emails within a few hours using thousands of learns their spamming behavior. This initial set is then infected machines. Second, a botnet allows an attacker “magnified” by analyzing large-scale mail delivery logs to evade spam filtering techniques based on the sender to identify other hosts on the Internet whose behavior is IP addresses. The reason is that the IP addresses of some similar to the behavior previously modeled. We imple- infected machines change frequently (e.g., due to the ex- mented our technique in a tool, called BOTMAGNIFIER, piration of a DHCP lease, or to the change in network and applied it to several data streams related to the deliv- location in the case of an infected portable computer). ery of email traffic. Our results show that it is possible Moreover, it is easy to infect machines and recruit them to identify and track a substantial number of spam bots as new members into a botnet. This means that black- by using our magnification technique. We also perform lists need to be updated constantly by tracking the IP ad- attribution of the identified spam hosts and track the evo- dresses of spamming bots. lution and activity of well-known spamming botnets over Tracking spambots is challenging. One approach to time. Moreover, we show that our results can help to im- detect infected machines is to set up spam traps. These prove state-of-the-art spam blacklists. are fake email addresses (i.e., addresses not associated with real users) that are published throughout the Inter- 1 Introduction net with the purpose of attracting and collecting spam messages. By extracting the sender IP addresses from Email spam is one of the open problems in the area of the emails received by a spam trap, it is possible to ob- IT security, and has attracted a significant amount of tain a list of bot-infected machines. However, this ap- research over many years [11, 26, 28, 40, 42]. Unso- proach faces two main problems. First, it is likely that licited bulk email messages account for almost 90% of only a subset of the bots belonging to a certain botnet will send emails to the spam trap addresses. Therefore, to be a set of email messages that share a substantial the analysis of the messages collected by the spam trap amount of content and structure (e.g., a spam campaign can provide only a partial view of the activity of the bot- might involve the distribution of messages that promote net. Second, some botnets might only target users lo- a specific pharmaceutical scam). cated in a specific country (e.g., due to the language used Input datasets. At a high level, our approach takes two in the email), and thus a spam trap located in a different datasets as input. The first dataset contains the IP ad- country would not observe those bots. dresses of known spamming bots that are active during Other approaches to identify the hosts that are part of a certain time period (we call this time period the obser- a spamming botnet are specific to particular botnets. For vation period). The IP addresses are grouped by spam example, by taking control of the command & control campaign. That is, IP addresses in the same group sent (C&C) component of a botnet [21, 26], or by analyzing the same type of messages. We refer to these groups of the communication protocol used by the bots to interact IP addresses as seed pools. The second dataset is a log with other components of the infrastructure [6, 15, 32], of email transactions carried out on the Internet during it is possible to enumerate (a subset of) the IP addresses the same time period. This log, called the transaction of the hosts that are part of a botnet. However, in these log, contains entries that specify that, at a certain time, cases, the results are specific to the particular botnet that IP address C attempted to send an email message to IP is being targeted (and, typically, the type of C&C used). address S. The log does not need to be a complete log In this paper, we present a novel approach to identify of every email transaction on the Internet (as it would be and track spambot populations on the Internet. Our am- unfeasible to collect this information). However, as we bitious goal is to track the IP addresses of all active hosts will discuss later, our approach becomes more effective that belong to every spamming botnet. By active hosts, as this log becomes more comprehensive. we mean hosts that are online and that participate in spam campaigns. Comprehensive tracking of the IP addresses Approach. In the first step of our approach, we search belonging to spamming botnets is useful for several rea- the transaction log for entries in which the sender IP ad- sons: dress is one of the IP addresses in the seed pools (i.e., the known spambots). Then, we analyze these entries • Internet Service Providers can take countermea- and generate a number of behavioral profiles that capture sures to prevent the bots whose IP addresses reside the way in which the hosts in the seed pools sent emails in their networks from sending out email messages. during the observation period. • Organizations can clean up compromised machines In the second step of the approach, the whole trans- in their networks. action log is searched for patterns of behavior that are • Existing blacklists and systems that analyze similar to the spambot behavior previously learned from network-level features of emails can be improved the seed pools. The hosts that behave in a similar man- by providing accurate information about machines ner are flagged as possible spamming bots, and their IP that are currently sending out spam emails. addresses are added to the corresponding magnified pool. • By monitoring the number of bots that are part of In the third and final step, heuristics are applied to re- different botnets, it is possible to guide and support duce false positives and to assign spam campaigns (and mitigation efforts so that the C&C infrastructures the IP addresses of bots) to specific botnets (e.g., Rus- of the largest, most aggressive, or fastest-growing tock [5], Cutwail [35], or MegaD [4, 6]). botnets are targeted first. We implemented our approach in a tool, called BOT- Our approach to tracking spamming bots is based on MAGNIFIER. In order to populate our seed pools, we the following insight: bots that belong to the same bot- used data from a large spam trap set up by an Internet net share the same C&C infrastructure and the same code Service Provider (ISP). Our transaction logs were con- base. As a result, these bots will feature similar behavior structed by running a mirror for Spamhaus, a popular when sending spam [9, 40, 41]. In contrast, bots belong- DNS-based blacklist. Note that other sources of infor- ing to different spamming botnets will typically use dif- mation can be used to either populate the seed pools or ferent parameters for sending spam mails (e.g., the size to build a transaction log. As we will show, BOTMAGNI- of the target email address list, the domains or countries FIER also works for transaction logs extracted from net- that are targeted, the spam contents, or the timing of their flow data collected from a large ISP’s backbone routers.
Recommended publications
  • Show Me the Money: Characterizing Spam-Advertised Revenue
    Show Me the Money: Characterizing Spam-advertised Revenue Chris Kanich∗ Nicholas Weavery Damon McCoy∗ Tristan Halvorson∗ Christian Kreibichy Kirill Levchenko∗ Vern Paxsonyz Geoffrey M. Voelker∗ Stefan Savage∗ ∗ y Department of Computer Science and Engineering International Computer Science Institute University of California, San Diego Berkeley, CA z Computer Science Division University of California, Berkeley Abstract money at all [6]. This situation has the potential to distort Modern spam is ultimately driven by product sales: policy and investment decisions that are otherwise driven goods purchased by customers online. However, while by intuition rather than evidence. this model is easy to state in the abstract, our under- In this paper we make two contributions to improving standing of the concrete business environment—how this state of affairs using measurement-based methods to many orders, of what kind, from which customers, for estimate: how much—is poor at best. This situation is unsurpris- ing since such sellers typically operate under question- • Order volume. We describe a general technique— able legal footing, with “ground truth” data rarely avail- purchase pair—for estimating the number of orders able to the public. However, absent quantifiable empiri- received (and hence revenue) via on-line store order cal data, “guesstimates” operate unchecked and can dis- numbering. We use this approach to establish rough, tort both policy making and our choice of appropri- but well-founded, monthly order volume estimates ate interventions. In this paper, we describe two infer- for many of the leading “affiliate programs” selling ence techniques for peering inside the business opera- counterfeit pharmaceuticals and software. tions of spam-advertised enterprises: purchase pair and • Purchasing behavior.
    [Show full text]
  • Zambia and Spam
    ZAMNET COMMUNICATION SYSTEMS LTD (ZAMBIA) Spam – The Zambian Experience Submission to ITU WSIS Thematic meeting on countering Spam By: Annabel S Kangombe – Maseko June 2004 Table of Contents 1.0 Introduction 1 1.1 What is spam? 1 1.2 The nature of Spam 1 1.3 Statistics 2 2.0 Technical view 4 2.1 Main Sources of Spam 4 2.1.1 Harvesting 4 2.1.2 Dictionary Attacks 4 2.1.3 Open Relays 4 2.1.4 Email databases 4 2.1.5 Inadequacies in the SMTP protocol 4 2.2 Effects of Spam 5 2.3 The fight against spam 5 2.3.1 Blacklists 6 2.3.2 White lists 6 2.3.3 Dial‐up Lists (DUL) 6 2.3.4 Spam filtering programs 6 2.4 Challenges of fighting spam 7 3.0 Legal Framework 9 3.1 Laws against spam in Zambia 9 3.2 International Regulations or Laws 9 3.2.1 US State Laws 9 3.2.2 The USA’s CAN‐SPAM Act 10 4.0 The Way forward 11 4.1 A global effort 11 4.2 Collaboration between ISPs 11 4.3 Strengthening Anti‐spam regulation 11 4.4 User education 11 4.5 Source authentication 12 4.6 Rewriting the Internet Mail Exchange protocol 12 1.0 Introduction I get to the office in the morning, walk to my desk and switch on the computer. One of the first things I do after checking the status of the network devices is to check my email.
    [Show full text]
  • Email Phishing for IT Providers How Phishing Emails Have Changed and How to Protect Your IT Clients
    Email Phishing for IT Providers How phishing emails have changed and how to protect your IT clients 1 © 2016 Calyptix Security Corporation. All rights reserved. I [email protected] I (800) 650 – 8930 (800) 650-8930 I [email protected] Contents Introduction ............................................................................................ 2 Phishing overview .................................................................................. 3 Trends in phishing emails ...................................................................... 6 Email phishing tactics .......................................................................... 11 Steps for MSP & VARS .......................................................................... 24 Advice for your clients .......................................................................... 29 Sources .................................................................................................. 35 1 © 2016 Calyptix Security Corporation. All rights reserved. I [email protected] I (800) 650 – 8930 Introduction There are only so many ways to break into a bank. You can march through the door. You can climb through a window. You can tunnel through the floor. There is the service entrance, the employee entrance, and access on the roof. Criminals who want to rob a bank will probably use an open route – such as a side door. It’s easier than breaking down a wall. Criminals who want to break into your network face a similar challenge. They need to enter. They can look for a weakness in your
    [Show full text]
  • CUB Guide to Fighting Robocalls
    CitizensUtilityBoard.org 1-800-669-5556 Guide to Fighting Robocalls February 2021 The latest news on the robocall fi ght Robocalls are prerecorded messages from computer-generat- ed dialers, and Illinois is one of the nation’s hardest hit states. In early 2021, for example, the state received more than 153 million robocalls (about 57 per second) in the span of just one month. That ranked Illinois eighth in the country for these calls, according to the robocall-blocking fi rm YouMail. While there are helpful robocalls (alerting you to school closings or when a prescription is ready), YouMail estimates about 42 percent of the calls in Illinois were scams and another 22 percent were simply marketing pitches. Unwanted robocalls are annoying, and costly. The Federal Com- munications Commission (FCC) put the price tag at $3 billion a year just from lost time, not even counting any fraud. TechRe- One in 10 Americans public put the total annual loss for consumers at $9.5 billion. are scammed each year, While policymakers are fi nally starting to act against illegal robo- calls, don’t wait for federal law to catch up. Use the simple tips in resulting in an annual loss of this guide to protect yourself from unwanted calls. The law The Telephone Robocall Abuse Criminal Enforcement and $9.5 billion Deterrence (TRACED) Act became federal law in 2019. The act increases penalties and requires phone companies to validate Source: TechRepublic, calls before they reach you. This is to combat “spoofi ng,” October 2019 when a robocaller uses your area code and/or prefi x to ap- pear as if someone locally—maybe a friend or neighbor—is Note: In December 2020, the FCC ordered that organiza- trying to reach you.
    [Show full text]
  • Locating Spambots on the Internet
    BOTMAGNIFIER: Locating Spambots on the Internet Gianluca Stringhinix, Thorsten Holzz, Brett Stone-Grossx, Christopher Kruegelx, and Giovanni Vignax xUniversity of California, Santa Barbara z Ruhr-University Bochum fgianluca,bstone,chris,[email protected] [email protected] Abstract the world-wide email traffic [20], and a lucrative busi- Unsolicited bulk email (spam) is used by cyber- ness has emerged around them [12]. The content of spam criminals to lure users into scams and to spread mal- emails lures users into scams, promises to sell cheap ware infections. Most of these unwanted messages are goods and pharmaceutical products, and spreads mali- sent by spam botnets, which are networks of compro- cious software by distributing links to websites that per- mised machines under the control of a single (malicious) form drive-by download attacks [24]. entity. Often, these botnets are rented out to particular Recent studies indicate that, nowadays, about 85% of groups to carry out spam campaigns, in which similar the overall spam traffic on the Internet is sent with the mail messages are sent to a large group of Internet users help of spamming botnets [20,36]. Botnets are networks in a short amount of time. Tracking the bot-infected hosts of compromised machines under the direction of a sin- that participate in spam campaigns, and attributing these gle entity, the so-called botmaster. While different bot- hosts to spam botnets that are active on the Internet, are nets serve different, nefarious goals, one important pur- challenging but important tasks. In particular, this infor- pose of botnets is the distribution of spam emails.
    [Show full text]
  • The History of Spam Timeline of Events and Notable Occurrences in the Advance of Spam
    The History of Spam Timeline of events and notable occurrences in the advance of spam July 2014 The History of Spam The growth of unsolicited e-mail imposes increasing costs on networks and causes considerable aggravation on the part of e-mail recipients. The history of spam is one that is closely tied to the history and evolution of the Internet itself. 1971 RFC 733: Mail Specifications 1978 First email spam was sent out to users of ARPANET – it was an ad for a presentation by Digital Equipment Corporation (DEC) 1984 Domain Name System (DNS) introduced 1986 Eric Thomas develops first commercial mailing list program called LISTSERV 1988 First know email Chain letter sent 1988 “Spamming” starts as prank by participants in multi-user dungeon games by MUDers (Multi User Dungeon) to fill rivals accounts with unwanted electronic junk mail. 1990 ARPANET terminates 1993 First use of the term spam was for a post from USENET by Richard Depew to news.admin.policy, which was the result of a bug in a software program that caused 200 messages to go out to the news group. The term “spam” itself was thought to have come from the spam skit by Monty Python's Flying Circus. In the sketch, a restaurant serves all its food with lots of spam, and the waitress repeats the word several times in describing how much spam is in the items. When she does this, a group of Vikings in the corner start a song: "Spam, spam, spam, spam, spam, spam, spam, spam, lovely spam! Wonderful spam!" Until told to shut up.
    [Show full text]
  • Zerohack Zer0pwn Youranonnews Yevgeniy Anikin Yes Men
    Zerohack Zer0Pwn YourAnonNews Yevgeniy Anikin Yes Men YamaTough Xtreme x-Leader xenu xen0nymous www.oem.com.mx www.nytimes.com/pages/world/asia/index.html www.informador.com.mx www.futuregov.asia www.cronica.com.mx www.asiapacificsecuritymagazine.com Worm Wolfy Withdrawal* WillyFoReal Wikileaks IRC 88.80.16.13/9999 IRC Channel WikiLeaks WiiSpellWhy whitekidney Wells Fargo weed WallRoad w0rmware Vulnerability Vladislav Khorokhorin Visa Inc. Virus Virgin Islands "Viewpointe Archive Services, LLC" Versability Verizon Venezuela Vegas Vatican City USB US Trust US Bankcorp Uruguay Uran0n unusedcrayon United Kingdom UnicormCr3w unfittoprint unelected.org UndisclosedAnon Ukraine UGNazi ua_musti_1905 U.S. Bankcorp TYLER Turkey trosec113 Trojan Horse Trojan Trivette TriCk Tribalzer0 Transnistria transaction Traitor traffic court Tradecraft Trade Secrets "Total System Services, Inc." Topiary Top Secret Tom Stracener TibitXimer Thumb Drive Thomson Reuters TheWikiBoat thepeoplescause the_infecti0n The Unknowns The UnderTaker The Syrian electronic army The Jokerhack Thailand ThaCosmo th3j35t3r testeux1 TEST Telecomix TehWongZ Teddy Bigglesworth TeaMp0isoN TeamHav0k Team Ghost Shell Team Digi7al tdl4 taxes TARP tango down Tampa Tammy Shapiro Taiwan Tabu T0x1c t0wN T.A.R.P. Syrian Electronic Army syndiv Symantec Corporation Switzerland Swingers Club SWIFT Sweden Swan SwaggSec Swagg Security "SunGard Data Systems, Inc." Stuxnet Stringer Streamroller Stole* Sterlok SteelAnne st0rm SQLi Spyware Spying Spydevilz Spy Camera Sposed Spook Spoofing Splendide
    [Show full text]
  • Characterizing Robocalls Through Audio and Metadata Analysis
    Who’s Calling? Characterizing Robocalls through Audio and Metadata Analysis Sathvik Prasad, Elijah Bouma-Sims, Athishay Kiran Mylappan, and Bradley Reaves, North Carolina State University https://www.usenix.org/conference/usenixsecurity20/presentation/prasad This paper is included in the Proceedings of the 29th USENIX Security Symposium. August 12–14, 2020 978-1-939133-17-5 Open access to the Proceedings of the 29th USENIX Security Symposium is sponsored by USENIX. Who’s Calling? Characterizing Robocalls through Audio and Metadata Analysis Sathvik Prasad Elijah Bouma-Sims North Carolina State University North Carolina State University [email protected] [email protected] Athishay Kiran Mylappan Bradley Reaves North Carolina State University North Carolina State University [email protected] [email protected] Abstract Despite the clear importance of the problem, much of what Unsolicited calls are one of the most prominent security is known about the unsolicited calling epidemic is anecdotal issues facing individuals today. Despite wide-spread anec- in nature. Despite early work on the problem [6–10], the re- dotal discussion of the problem, many important questions search community still lacks techniques that enable rigorous remain unanswered. In this paper, we present the first large- analysis of the scope of the problem and the factors that drive scale, longitudinal analysis of unsolicited calls to a honeypot it. There are several challenges that we seek to overcome. of up to 66,606 lines over 11 months. From call metadata we First, we note that most measurements to date of unsolicited characterize the long-term trends of unsolicited calls, develop volumes, trends, and motivations (e.g., sales, scams, etc.) have the first techniques to measure voicemail spam, wangiri at- been based on reports from end users.
    [Show full text]
  • The History of Digital Spam
    The History of Digital Spam Emilio Ferrara University of Southern California Information Sciences Institute Marina Del Rey, CA [email protected] ACM Reference Format: This broad definition will allow me to track, in an inclusive Emilio Ferrara. 2019. The History of Digital Spam. In Communications of manner, the evolution of digital spam across its most popular appli- the ACM, August 2019, Vol. 62 No. 8, Pages 82-91. ACM, New York, NY, USA, cations, starting from spam emails to modern-days spam. For each 9 pages. https://doi.org/10.1145/3299768 highlighted application domain, I will dive deep to understand the nuances of different digital spam strategies, including their intents Spam!: that’s what Lorrie Faith Cranor and Brian LaMacchia ex- and catalysts and, from a technical standpoint, how they are carried claimed in the title of a popular call-to-action article that appeared out and how they can be detected. twenty years ago on Communications of the ACM [10]. And yet, Wikipedia provides an extensive list of domains of application: despite the tremendous efforts of the research community over the last two decades to mitigate this problem, the sense of urgency ``While the most widely recognized form of spam is email spam, the term is applied to similar abuses in other media: instant remains unchanged, as emerging technologies have brought new messaging spam, Usenet newsgroup spam, Web search engine spam, dangerous forms of digital spam under the spotlight. Furthermore, spam in blogs, wiki spam, online classified ads spam, mobile when spam is carried out with the intent to deceive or influence phone messaging spam, Internet forum spam, junk fax at scale, it can alter the very fabric of society and our behavior.
    [Show full text]
  • Best Practices
    Northampton Public Schools Cybersecurity Information Best Practices: ● Use a Secure Password A secure password is one that is a minimum of 8-12 characters, and includes a mix of different characters, lower and uppercase, and includes a few numbers and symbols. ● Don’t use your Work or School email account for personal sites Limiting interconnected accounts reduces risks in event of a cyberattack. It is also important to keep in mind that if you ever leave the district, your email account will be suspended, this could restrict access to linked accounts. ● Never enter a password or payment information into a site unless you’re sure it’s legitimate. You will never have to use personal payment information at work, but for personal accounts, minimize the amount of times you save payment information. ● Don’t brush off security warnings Most sites will send you an email if they see a suspicious or new login attempt. If you see a successful login you don’t recognize, or multiple failed attempts, be sure to change your password. ● Don’t share passwords Limiting the number of places you’re logged in will help protect you against threats, be sure to log out and never share your passwords with others. If you suspect someone has your login information, change your password right away. ● Don’t open documents or links from an unknown sender You can learn more about email spam and phishing below. If you have any doubts or questions, contact your IT Department Spam Mail and Internet Phishing: What is Spam Mail? Email spam is unsolicited messages sent in bulk by email.
    [Show full text]
  • Permanent Email • Spam and Spoofing
    CU eComm Program Permanent Email • Spam and Spoofing We get a lot of inquiries regarding SPAM. As a result, we have compiled a list of the most frequently asked questions with explanations for each. See below. I hope you find this information helpful. SPAM FAQ Is the alumni association sending me spam? No. Neither the alumni association nor Harris Internet Services, who host the Alumni Connections servers that maintain the email forwarding service, are the source of the spam. We respect your privacy and do not sell, or otherwise distribute, alumni email addresses; nor have they been stolen. How did the spammers get my address? Unfortunately, there are many ways that spammers obtain email addresses: addresses compiled by commercial entities from web commerce; web crawlers searching for specific domain names (such as @alumni.anyschool.edu) on any web site where contact information might be listed; and a practice sometimes called "blind spamming" or "dictionary spamming" in which spammers simply guess at the username before the @ sign and send literally thousands of messages. This is why you may see a message addressed to "albert@, allen@, andrew@ ...," etc. The spammers run what is basically a dictionary of usernames against a domain • those that don't immediately bounce are kept and reused. This can give the false impression that the spammer has access to the list of user accounts. Viruses can also generate unwanted email and harvest addresses for spammers. Even if your computer is not infected, if someone else has your address in their address book and their computer becomes infected, viruses will often send multiple messages to everyone in the address book.
    [Show full text]
  • Kaspersky Security Bulletin Spam Evolution 2008
    Kaspersky Security Bulletin Spam Evolution 2008 Daria Gudkova Tatiana Kulikova Katerina Kalimanova Daria Bronnikova Introduction........................................................................................................................................... 3 Annual overview............................................................................................................................... 3 Trends in 2008 ...................................................................................................................................... 4 Spam scams using text messaging .................................................................................................... 4 Spam and social networking sites..................................................................................................... 5 Distribution of spam ............................................................................................................................. 7 Key sources of spam ......................................................................................................................... 8 Types and size of spam emails.......................................................................................................... 9 Phishing............................................................................................................................................... 10 Malicious attachments and links to infected websites ........................................................................ 11 Emails with
    [Show full text]