Understanding the Mirai Botnet Manos Antonakakis, Georgia Institute of Technology; Tim April, Akamai; Michael Bailey, University of Illinois, Urbana-Champaign; Matt Bernhard, University of Michigan, Ann Arbor; Elie Bursztein, Google; Jaime Cochran, Cloudflare; Zakir Durumeric and J. Alex Halderman, University of Michigan, Ann Arbor; Luca Invernizzi, Google; Michalis Kallitsis, Merit Network, Inc.; Deepak Kumar, University of Illinois, Urbana-Champaign; Chaz Lever, Georgia Institute of Technology; Zane Ma and Joshua Mason, University of Illinois, Urbana-Champaign; Damian Menscher, Google; Chad Seaman, Akamai; Nick Sullivan, Cloudflare; Kurt Thomas, Google; Yi Zhou, University of Illinois, Urbana-Champaign https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/antonakakis
This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9
Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX Understanding the Mirai Botnet
Manos Antonakakis Tim April‡ Michael Bailey† Matthew Bernhard/ Elie Bursztein◦ Jaime Cochran. Zakir Durumeric/ J. Alex Halderman/ Luca Invernizzi◦ Michalis Kallitsis§ Deepak Kumar† Chaz Lever Zane Ma†∗ Joshua Mason† Damian Menscher◦ Chad Seaman‡ Nick Sullivan. Kurt Thomas◦ Yi Zhou†
‡Akamai Technologies .Cloudflare Georgia Institute of Technology ◦Google §Merit Network †University of Illinois Urbana-Champaign /University of Michigan
Abstract of factors—efficient spreading based on Internet-wide The Mirai botnet, composed primarily of embedded scanning, rampant use of insecure default passwords in and IoT devices, took the Internet by storm in late 2016 IoT products, and the insight that keeping the botnet’s when it overwhelmed several high-profile targets with behavior simple would allow it to infect many hetero- massive distributed denial-of-service (DDoS) attacks. In geneous devices—all played a role. Indeed, Mirai has this paper, we provide a seven-month retrospective anal- spawned many variants that follow the same infection ysis of Mirai’s growth to a peak of 600k infections and strategy, leading to speculation that “IoT botnets are the a history of its DDoS victims. By combining a variety new normal of DDoS attacks” [64]. of measurement perspectives, we analyze how the bot- In this paper, we investigate the precipitous rise of Mi- net emerged, what classes of devices were affected, and rai and the fragile IoT ecosystem it has subverted. We how Mirai variants evolved and competed for vulnerable present longitudinal measurements of the botnet’s growth, hosts. Our measurements serve as a lens into the fragile composition, evolution, and DDoS activities from Au- ecosystem of IoT devices. We argue that Mirai may rep- gust 1, 2016 to February 28, 2017. We draw from a resent a sea change in the evolutionary development of diverse set of vantage points including network telescope botnets—the simplicity through which devices were in- probes, Internet-wide banner scans, IoT honeypots, C2 fected and its precipitous growth, demonstrate that novice milkers, DNS traces, and logs provided by attack vic- malicious techniques can compromise enough low-end tims. These unique datasets enable us to conduct the first devices to threaten even some of the best-defended targets. comprehensive analysis of Mirai and posit technical and To address this risk, we recommend technical and non- non-technical defenses that may stymie future attacks. technical interventions, as well as propose future research We track the outbreak of Mirai and find the botnet directions. infected nearly 65,000 IoT devices in its first 20 hours before reaching a steady state population of 200,000– 300,000 infections. These bots fell into a narrow band of 1 Introduction geographic regions and autonomous systems, with Brazil, Columbia, and Vietnam disproportionately accounting for Starting in September 2016, a spree of massive distributed 41.5% of infections. We confirm that Mirai targeted a denial-of-service (DDoS) attacks temporarily crippled variety of IoT and embedded devices ranging from DVRs, Krebs on Security [46], OVH [43], and Dyn [36]. The ini- IP cameras, routers, and printers, but find Mirai’s ultimate tial attack on Krebs exceeded 600 Gbps in volume [46]— device composition was strongly influenced by the market among the largest on record. Remarkably, this overwhelm- shares and design decisions of a handful of consumer ing traffic was sourced from hundreds of thousands of electronics manufacturers. some of the Internet’s least powerful hosts—Internet of By statically analyzing over 1,000 malware samples, Things (IoT) devices—under the control of a new botnet we document the evolution of Mirai into dozens of vari- named Mirai. ants propagated by multiple, competing botnet operators. While other IoT botnets such as BASHLITE [86] and These variants attempted to improve Mirai’s detection Carna [38] preceded Mirai, the latter was the first to avoidance techniques, add new IoT device targets, and in- emerge as a high-profile DDoS threat. What explains troduce additional DNS resilience. We find that Mirai har- Mirai’s sudden rise and massive scale? A combination nessed its evolving capabilities to launch over 15,000 at- ∗Denotes primary, lead, or “first” author tacks against not only high-profile targets (e.g., Krebs
USENIX Association 26th USENIX Security Symposium 1093 09/30/2016 Source code released 09/18/2016 11/26/2016 02/23/2017 OVH attacks 10/21/2016 Deutsche Telekom Deutsche Telekom begin Dyn attacks CWMP exploit attacker arrested
Sept Oct Nov Dec Jan Feb 08/01/2016 09/21/2016 10/31/2016 01/18/2017 Mirai surfaces Krebs on Security Liberia Lonestar Mirai author peak attack attacks begin identified
Figure 1: Mirai Timeline—Major attacks (red), exploits (yellow), and events (black) related to the Mirai botnet.
on Security, OVH, and Dyn), but also numerous game release of Mirai’s source code on hackforums.net [4]. We servers, telecoms, anti-DDoS providers, and other seem- rely on this code to develop our measurement method- ingly unrelated sites. While DDoS was Mirai’s flavor ology (Section3). Furthermore, as we detail later (Sec- of abuse, future strains of IoT malware could leverage tion5), this source code release led to the proliferation access to compromised routers for ad fraud, cameras for of Mirai variants with competing operators. One notable extortion, network attached storage for bitcoin mining, variant added support for a router exploit through CPE or any number of applications. Mirai’s reach extended WAN Management Protocol (CWMP), an HTTP-based across borders and legal jurisdictions, and it infected de- protocol that enables auto-configuration and remote man- vices with little infrastructure to effectively apply security agement of home routers, modems, and other customer- patches. This made defending against it a daunting task. premises equipment (CPE) [15]. This exploit led to an out- Finally, we look beyond Mirai to explore the security age at Deutsche Telekom late November 2016 [47], with posture of the IoT landscape. We find that the absence of the suspected attacker later arrested in February 2017 [13]. security best practices—established in response to desk- In this work, we track Mirai’s variants and examine how top worms and malware over the last two decades—has they influenced Mirai’s propagation. created an IoT substrate ripe for exploitation. However, this space also presents unique, nuanced challenges in the Botnet structure & propagation We provide a sum- realm of automatic updates, end-of-life, and consumer no- mary of Mirai’s operation in Figure2, as gleaned from tifications. Without improved defenses, IoT-based attacks the released source code. Mirai spread by first entering are likely to remain a potent adversarial technique as bot- a rapid scanning phase (x) where it asynchronously and net variants continue to evolve and discover new niches “statelessly” sent TCP SYN probes to pseudorandom IPv4 to infect. In light of this, Mirai seems aptly named—it is addresses, excluding those in a hard-coded IP blacklist, on Japanese for “the future.” Telnet TCP ports 23 and 2323 (hereafter denoted TCP/23 and TCP/2323). If Mirai identifies a potential victim, it en- tered into a brute-force login phase in which it attempted 2 The Mirai Botnet to establish a Telnet connection using 10 username and password pairs selected randomly from a pre-configured Mirai is a worm-like family of malware that infected list of 62 credentials. At the first successful login, Mirai IoT devices and corralled them into a DDoS botnet. We sent the victim IP and associated credentials to a hard- provide a brief timeline of Mirai’s emergence and discuss coded report server (y). its structure and propagation. A separate loader program (z) asynchronously in- Timeline of events Reports of Mirai appeared as fected these vulnerable devices by logging in, determining early as August 31, 2016 [89], though it was not until the underlying system environment, and finally, down- mid-September, 2016 that Mirai grabbed headlines with loading and executing architecture-specific malware ({). massive DDoS attacks targeting Krebs on Security [46] After a successful infection, Mirai attempted to conceal and OVH [74] (Figure1). Several additional high-profile its presence by deleting the downloaded binary and ob- attacks later targeted DNS provider Dyn [36] and fuscating its process name in a pseudorandom alphanu- Lonestar Cell, a Liberian telecom [45]. In early 2017, the meric string. As a consequence, Mirai infections did not actors surrounding Mirai came to light as the Mirai author persist across system reboots. In order to fortify itself, was identified [49]. Throughout our study, we corroborate the malware additionally killed other processes bound our measurement findings with these media reports and to TCP/22 or TCP/23, as well as processes associated expand on the public information surrounding Mirai. with competing infections, including other Mirai vari- Another significant event in this timeline is the public ants, .anime [25], and Qbot [72]. At this point, the bot
1094 26th USENIX Security Symposium USENIX Association 3.1 Network Telescope Attacker Mirai’s indiscriminate, rapid scanning strategy lends it- Send command self to tracking the botnet’s propagation to new hosts. We monitored all network requests to a network telescope [9] Command Report Dispatch Loader composed of 4.7 million IP address operated by Merit Infrastructure & Control Server Network over a seven month period from July 18, 2016 Relay Load to February 28, 2017. On average, the network telescope Report received 1.1 million packets from 269,000 IP addresses per minute during this period. To distinguish Mirai traffic
Devices Scan Victim from background radiation [94] and other scanning ac- Bots tivity, we uniquely fingerprinted Mirai probes based on an artifact of Mirai’s stateless scanning whereby every Attack probe has a TCP sequence number—normally a random 32-bit integer—equal to the destination IP address. The DDoS Target likelihood of this occurring incidentally is 1/232, and we would expect to see roughly 86 packets demonstrating this pattern in our entire dataset. In stark contrast, we Figure 2: Mirai Operation—Mirai bots scan the IPv4 address observed 116.2 billion Mirai probes from 55.4 million IP space for devices that run telnet or SSH, and attempt to log in us- addresses. Prior to the emergence of Mirai, we observed ing a hardcoded dictionary of IoT credentials. Once successful, only three IPs that perform scans with this fingerprint. the bot sends the victim IP address and associated credentials to Two of the IP addresses generated five packets; two on a report server, which asynchronously triggers a loader to infect TCP/80 and three on TCP/1002. The third IP address be- the device. Infected hosts scan for additional victims and accept DDoS commands from a command and control (C2) server. longs to Team Cymru [1], who conducts regular TCP/443 scans. We caution that the raw count of IP addresses seen listened for attack commands from the command and con- scanning over time is a poor metric of botnet size due to trol server (C2) while simultaneously scanning for new DHCP churn [87]. To account for this, we tracked the size victims. of the botnet by considering the number of hosts actively “scanning” at the start of every hour. We detected scans Malware phylogeny While not directly related to using the methodology presented by Durumeric et al. [23], our study, the Mirai family represents an evolution of in which we group packets from a single IP address in BASHLITE (otherwise known as LizardStresser, Torlus, a temporal window into logical scans. We specifically Gafgyt), a DDoS malware family that infected Linux identified scans that targeted the IPv4 address space at an devices by brute forcing default credentials [86]. BASH- estimated rate of at least five packets per second, expiring LITE relied on six generic usernames and 14 generic pass- inactive scans after 20 minutes. We geolocated IPs using words, while the released Mirai code used a dictionary Maxmind [61]. of 62 username/password pairs that largely subsumed BASHLITE’s set and added credentials specific to con- sumer routers and IoT devices. In contrast to BASHLITE, 3.2 Active Scanning Mirai additionally employed a fast, stateless scanning While Mirai is widely considered an IoT botnet, there module that allowed it to more efficiently identify vulner- has been little comprehensive analysis of infected devices able devices. over the botnet’s entire lifetime. In order to determine the manufacturer and model of devices infected with Mirai, we leveraged Censys [22], which actively scans the IPv4 3 Methodology space and aggregates application layer data about hosts on the Internet. We focused our analysis on scans of HTTPS, Our study of Mirai leverages a variety of network vantage FTP, SSH, Telnet, and CWMP between July 19, 2016 and points: a large, passive network telescope, Internet-wide February 28, 2017. scanning, active Telnet honeypots, logs of C2 attack A number of challenges make accurate device labeling commands, passive DNS traffic, and logs from DDoS difficult. First, Mirai immediately disables common out- attack targets. In this section, we discuss our data sources ward facing services (e.g., HTTP) upon infection, which and the role they play in our analysis. We provide a prevents infected devices from being scanned. Second, high-level summary in Table1. Censys scans often take more than 24 hours to complete,
USENIX Association 26th USENIX Security Symposium 1095 Role Data Source Collection Site Collection Period Data Volume Growth and size Network telescope Merit Network, Inc. 07/18/2016–02/28/2017 370B packets, avg. 269K IPs/min Device composition Active scanning Censys 07/19/2016–02/28/2017 136 IPv4 scans, 5 protocols Ownership & evolution Telnet honeypots AWS EC2 11/02/2016–02/28/2017 141 binaries Telnet honeypots Akamai 11/10/2016–02/13/2017 293 binaries Malware repository VirusTotal 05/24/2016–01/30/2017 594 binaries DNS—active Georgia Tech 08/01/2016–02/28/2017 290M RRs/day DNS—passive Large U.S. ISP 08/01/2016–02/28/2017 209M RRs/day Attack characterization C2 milkers Akamai 09/27/2016–02/28/2017 64.0K attack commands DDoS IP addresses Akamai 09/21/2016 12.3K IP addresses DDoS IP addresses Google Shield 09/25/2016 158.8K IP addresses DDoS IP addresses Dyn 10/21/2016 107.5K IP addresses
Table 1: Data Sources—We utilized a multitude of data perspectives to empirically analyze the Mirai botnet.
Protocol Banners Devices Identified probes to fingerprint devices [58]. Nmap successfully HTTPS 342,015 271,471 (79.4%) handled 98% of SSH banners and 81% of FTP banners, FTP 318,688 144,322 (45.1%) but matches only 7.8% of the Telnet banners. In order to Telnet 472,725 103,924 (22.0%) increase our coverage and also accommodate HTTPS and CWMP 505,977 35,163 (7.0%) CWMP (which Nmap lacks probes for), we constructed SSH 148,640 8,107 (5.5%) our own regular expressions to map banners to device Total 1,788,045 587,743 (31.5%) manufacturers and models. Unfortunately, we found that in many cases, there was not enough data to identify a Table 2: Devices Identified—We identified device type, model, model and manufacturer from FTP, Telnet, CWMP, and and/or vendor for 31.5% of active scan banners. Protocol ban- ners varied drastically in device identifiability, with HTTPS SSH banners and that Nmap fingerprints only provide certificates being most descriptive, and SSH prompts being the generic descriptions. In total, we identified device type least. and/or model and manufacturer for 31.5% of banners (Table2). We caution that this methodology is suscepti- ble to misattribution in instances where port-forwarding during which devices may churn to new IP addresses. Fi- and Universal Plug and Play (UPnP) are used to present nally, Censys executes scans for different protocols on multiple devices behind a single IP address, making the different days, making it difficult to increase label speci- distinction between middlebox and end-device difficult. ficity by combining banners from multiple services. We navigated these constraints by restricting our analysis 3.3 Telnet Honeypots to banners that were collected within twenty minutes of scanning activity (the time period after which we expire To track the evolution of Mirai’s capabilities, we collected a scan). This small window mitigates the risk of erro- binaries installed on a set of Telnet honeypots that mas- neously associating the banner data of uninfected devices queraded as vulnerable IoT devices. Mechanically, we with Mirai infections due to DHCP churn. presented a BusyBox shell [92] and IoT-consistent device Post-filtering, our dataset included 1.8 million banners banner. Our honeypots logged all incoming Telnet traf- associated with 1.2 million Mirai-infected IP addresses fic and downloaded any binaries that attackers attempted (Table2). We had the most samples for CWMP, and to install on the host via wget or tftp (the methods of the least for SSH. We caution that devices with open infection found in Mirai’s original source). In order to services that are not closed by Mirai (e.g., HTTPS and avoid collateral damage, we blocked all other outgoing FTP) can appear repeatedly in Censys banner scans during requests (e.g., scanning and DoS traffic). our measurement window (due to churn) and thus lead to We logged 80K connection attempts from 54K IP ad- over counting when compared across protocols. As such, dresses between November 2, 2016 and February 28, we intentionally explored protocols in isolation from one 2017, collecting a total 151 unique binaries. We filtered another and limited ourselves to measurements that only out executables unrelated to Mirai based on a YARA sig- consider relative proportions rather than absolute counts nature that matched any of the strings from the original of infected hosts. source code release, leaving us with 141 Mirai binaries. Finally, we processed each infected device’s banner to We supplemented this data with 293 binaries observed by identify the device manufacturer and model. We first ap- honeypots operated by Akamai, which served a similar plied the set of regular expressions used by Nmap service purpose to ours, but were hosted on a different public
1096 26th USENIX Security Symposium USENIX Association cloud provider. As a final source of samples, we included found in the binaries uploaded to their honeypots. The 594 unique binaries from VirusTotal [90] that we scanned service simulated a Mirai-infected device and communi- for using the YARA rules mentioned above. In total, we cated with the C2 server using a custom bot-to-C2 proto- collected 1,028 unique Mirai samples. col, which was reverse engineered from malware samples We analyzed the binaries for the three most common ar- prior to source code release. In total, Akamai observed chitectures—MIPS 32-bit, ARM 32-bit, and x86 32-bit— 64K attack commands issued by 484 unique C2 servers which account for 74% of our samples. We extracted (by IP address). We note that a naive analysis of attack the set of logins and passwords, IP blacklists, and C2 do- commands overestimates the volume of attacks and tar- mains from these binaries, identifying 67 C2 domains and gets: individual C2 servers often repeat the same attack 48 distinct username/password dictionaries (containing a command in rapid succession, and multiple distinct C2 total 371 unique passwords). servers frequently issued the same command. To account for this, we heuristically grouped attack commands along two dimensions: by shared C2 infrastructure and by tem- 3.4 Passive & Active DNS poral similarity. We collapsed matching commands (i.e., Following the public release of Mirai’s source code, com- tuples of attack type, duration, targets, and command op- peting Mirai botnet variants came into operation. We tions) that occur within 90 seconds of each other, which disambiguated ownership and estimate the relative size yielded 15,194 attacks from 146 unique IP clusters. Our of each Mirai strain by exploring passive and active DNS attack command coverage includes the Dyn attack [36] data for the 67 C2 domains that we found by reverse engi- and Liberia attacks [45]. We did not observe attack com- neering Mirai binaries. We also leveraged our DNS data mands for Krebs on Security and OVH, which occurred to map the IP addresses present in attack commands to prior to the milker’s operation. victim domain names. From a large U.S. ISP, we obtained passive DNS data 3.6 DDoS Attack Traces consisting of DNS queries generated by the ISP’s clients and their corresponding responses. More specifically, Our final data source consists of network traces and ag- we collected approximately 209 million resource records gregate statistics from Akamai and Google Shield (the (RRs)—queried domain name, and associated RDATA— providers for Krebs on Security) and Dyn. These attacks and their lookup volumes aggregated on a daily basis. cover two distinct periods in Mirai’s evolution. We used For our active DNS dataset, we obtained 290 million this data to corroborate the IP addresses observed in at- RRs per day from Thales, an active DNS monitoring tacks versus those found scanning our passive network system [44]. Both datasets cover the period of August 1, telescope, as well as to understand the volume of traf- 2016 to February 28, 2017. fic generated by Mirai1. From Akamai, we obtained an Using both passive and active DNS datasets, we per- aggregate history of all DDoS attacks targeting Krebs formed DNS expansion to identify shared DNS infrastruc- on Security from 2012–2016, as well as a small sam- ture by linking related historic domain names (RHDN) ple of 12.3K IPs related to a Mirai attack on September and related historic IPs (RHIPs) [5]. This procedure be- 21, 2016. For Google Shield, we shared a list of IP ad- gan with the seed set of C2 domains and IPs extracted dresses observed by our network telescope and in turn during reverse engineering of our honeypotted binaries. received aggregate statistics on what fraction matched For a given seed foo.com, we identified the IP addresses any of 158.8K IP addresses involved in a 1-minute Mirai that foo.com previously resolved to and added them to HTTP-flood attack on September 25, 2016. Finally, Dyn a growing set of domains and IPs. We additionally per- provided us with a set of 107.5K IP addresses associated formed the reverse analysis, starting from an IP and find- with a Mirai attack on October 21, 2016. ing any domain names that concurrently resolved it. Thus, even from a single domain name, we iteratively expanded 4 Tracking Mirai’s Spread the set of related domain names and IP addresses to con- struct a graph reflecting the shared infrastructure used As a first step towards understanding Mirai, we analyzed by Mirai variants. In total, we identified 33 unique DNS how the botnet bootstrapped its initial infections, what clusters that we explore in detail in Section5. types of devices it targeted, and how it eventually infected an estimated 600K hosts. To contextualize the properties 3.5 Attack Commands of Mirai, we compare it against prior botnets and worms. 1 To track the DDoS attack commands issued by Mirai We overlapped attack traces with every Mirai scanning IPs on our network telescope. The overlap may have been inflated by non-Mirai operators, Akamai ran a “milker” from September 27, attack IPs being assigned to Mirai devices over time through DHCP 2016–February 28, 2017 that connected to the C2 servers churn.
USENIX Association 26th USENIX Security Symposium 1097