Understanding the Mirai Botnet
Manos Antonakakis Tim April‡ Michael Bailey† Matthew Bernhard/ Elie Bursztein◦ Jaime Cochran. Zakir Durumeric/ J. Alex Halderman/ Luca Invernizzi◦ Michalis Kallitsis§ Deepak Kumar† Chaz Lever Zane Ma†∗ Joshua Mason† Damian Menscher◦ Chad Seaman‡ Nick Sullivan. Kurt Thomas◦ Yi Zhou†
‡Akamai Technologies .Cloudflare Georgia Institute of Technology ◦Google §Merit Network †University of Illinois Urbana-Champaign /University of Michigan
Abstract of factors—efficient spreading based on Internet-wide The Mirai botnet, composed primarily of embedded scanning, rampant use of insecure default passwords in and IoT devices, took the Internet by storm in late 2016 IoT products, and the insight that keeping the botnet’s when it overwhelmed several high-profile targets with behavior simple would allow it to infect many hetero- massive distributed denial-of-service (DDoS) attacks. In geneous devices—all played a role. Indeed, Mirai has this paper, we provide a seven-month retrospective anal- spawned many variants that follow the same infection ysis of Mirai’s growth to a peak of 600k infections and strategy, leading to speculation that “IoT botnets are the a history of its DDoS victims. By combining a variety new normal of DDoS attacks” [64]. of measurement perspectives, we analyze how the bot- In this paper, we investigate the precipitous rise of Mi- net emerged, what classes of devices were affected, and rai and the fragile IoT ecosystem it has subverted. We how Mirai variants evolved and competed for vulnerable present longitudinal measurements of the botnet’s growth, hosts. Our measurements serve as a lens into the fragile composition, evolution, and DDoS activities from Au- ecosystem of IoT devices. We argue that Mirai may rep- gust 1, 2016 to February 28, 2017. We draw from a resent a sea change in the evolutionary development of diverse set of vantage points including network telescope botnets—the simplicity through which devices were in- probes, Internet-wide banner scans, IoT honeypots, C2 fected and its precipitous growth, demonstrate that novice milkers, DNS traces, and logs provided by attack vic- malicious techniques can compromise enough low-end tims. These unique datasets enable us to conduct the first devices to threaten even some of the best-defended targets. comprehensive analysis of Mirai and posit technical and To address this risk, we recommend technical and non- non-technical defenses that may stymie future attacks. technical interventions, as well as propose future research We track the outbreak of Mirai and find the botnet directions. infected nearly 65,000 IoT devices in its first 20 hours before reaching a steady state population of 200,000– 300,000 infections. These bots fell into a narrow band of 1 Introduction geographic regions and autonomous systems, with Brazil, Columbia, and Vietnam disproportionately accounting for Starting in September 2016, a spree of massive distributed 41.5% of infections. We confirm that Mirai targeted a denial-of-service (DDoS) attacks temporarily crippled variety of IoT and embedded devices ranging from DVRs, Krebs on Security [46], OVH [43], and Dyn [36]. The ini- IP cameras, routers, and printers, but find Mirai’s ultimate tial attack on Krebs exceeded 600 Gbps in volume [46]— device composition was strongly influenced by the market among the largest on record. Remarkably, this overwhelm- shares and design decisions of a handful of consumer ing traffic was sourced from hundreds of thousands of electronics manufacturers. some of the Internet’s least powerful hosts—Internet of By statically analyzing over 1,000 malware samples, Things (IoT) devices—under the control of a new botnet we document the evolution of Mirai into dozens of vari- named Mirai. ants propagated by multiple, competing botnet operators. While other IoT botnets such as BASHLITE [86] and These variants attempted to improve Mirai’s detection Carna [38] preceded Mirai, the latter was the first to avoidance techniques, add new IoT device targets, and in- emerge as a high-profile DDoS threat. What explains troduce additional DNS resilience. We find that Mirai har- Mirai’s sudden rise and massive scale? A combination nessed its evolving capabilities to launch over 15,000 at- ∗Denotes primary, lead, or “first” author tacks against not only high-profile targets (e.g., Krebs 09/30/2016 Source code released 09/18/2016 11/26/2016 02/23/2017 OVH attacks 10/21/2016 Deutsche Telekom Deutsche Telekom begin Dyn attacks CWMP exploit attacker arrested
Sept Oct Nov Dec Jan Feb 08/01/2016 09/21/2016 10/31/2016 01/18/2017 Mirai surfaces Krebs on Security Liberia Lonestar Mirai author peak attack attacks begin identified
Figure 1: Mirai Timeline—Major attacks (red), exploits (yellow), and events (black) related to the Mirai botnet.
on Security, OVH, and Dyn), but also numerous game release of Mirai’s source code on hackforums.net [4]. We servers, telecoms, anti-DDoS providers, and other seem- rely on this code to develop our measurement method- ingly unrelated sites. While DDoS was Mirai’s flavor ology (Section3). Furthermore, as we detail later (Sec- of abuse, future strains of IoT malware could leverage tion5), this source code release led to the proliferation access to compromised routers for ad fraud, cameras for of Mirai variants with competing operators. One notable extortion, network attached storage for bitcoin mining, variant added support for a router exploit through CPE or any number of applications. Mirai’s reach extended WAN Management Protocol (CWMP), an HTTP-based across borders and legal jurisdictions, and it infected de- protocol that enables auto-configuration and remote man- vices with little infrastructure to effectively apply security agement of home routers, modems, and other customer- patches. This made defending against it a daunting task. premises equipment (CPE) [15]. This exploit led to an out- Finally, we look beyond Mirai to explore the security age at Deutsche Telekom late November 2016 [47], with posture of the IoT landscape. We find that the absence of the suspected attacker later arrested in February 2017 [13]. security best practices—established in response to desk- In this work, we track Mirai’s variants and examine how top worms and malware over the last two decades—has they influenced Mirai’s propagation. created an IoT substrate ripe for exploitation. However, this space also presents unique, nuanced challenges in the Botnet structure & propagation We provide a sum- realm of automatic updates, end-of-life, and consumer no- mary of Mirai’s operation in Figure2, as gleaned from tifications. Without improved defenses, IoT-based attacks the released source code. Mirai spread by first entering are likely to remain a potent adversarial technique as bot- a rapid scanning phase (x) where it asynchronously and net variants continue to evolve and discover new niches “statelessly” sent TCP SYN probes to pseudorandom IPv4 to infect. In light of this, Mirai seems aptly named—it is addresses, excluding those in a hard-coded IP blacklist, on Japanese for “the future.” Telnet TCP ports 23 and 2323 (hereafter denoted TCP/23 and TCP/2323). If Mirai identifies a potential victim, it en- tered into a brute-force login phase in which it attempted 2 The Mirai Botnet to establish a Telnet connection using 10 username and password pairs selected randomly from a pre-configured Mirai is a worm-like family of malware that infected list of 62 credentials. At the first successful login, Mirai IoT devices and corralled them into a DDoS botnet. We sent the victim IP and associated credentials to a hard- provide a brief timeline of Mirai’s emergence and discuss coded report server (y). its structure and propagation. A separate loader program (z) asynchronously in- Timeline of events Reports of Mirai appeared as fected these vulnerable devices by logging in, determining early as August 31, 2016 [89], though it was not until the underlying system environment, and finally, down- mid-September, 2016 that Mirai grabbed headlines with loading and executing architecture-specific malware ({). massive DDoS attacks targeting Krebs on Security [46] After a successful infection, Mirai attempted to conceal and OVH [74] (Figure1). Several additional high-profile its presence by deleting the downloaded binary and ob- attacks later targeted DNS provider Dyn [36] and fuscating its process name in a pseudorandom alphanu- Lonestar Cell, a Liberian telecom [45]. In early 2017, the meric string. As a consequence, Mirai infections did not actors surrounding Mirai came to light as the Mirai author persist across system reboots. In order to fortify itself, was identified [49]. Throughout our study, we corroborate the malware additionally killed other processes bound our measurement findings with these media reports and to TCP/22 or TCP/23, as well as processes associated expand on the public information surrounding Mirai. with competing infections, including other Mirai vari- Another significant event in this timeline is the public ants, .anime [25], and Qbot [72]. At this point, the bot 3.1 Network Telescope Attacker
Send command Mirai’s indiscriminate, rapid scanning strategy lends it- self to tracking the botnet’s propagation to new hosts. We Command Report Dispatch Loader monitored all network requests to a network telescope [9] Infrastructure & Control Server composed of 4.7 million IP address operated by Merit Relay Load Network over a seven month period from July 18, 2016 Report to February 28, 2017. On average, the network telescope received 1.1 million packets from 269,000 IP addresses
Devices Scan Victim per minute during this period. To distinguish Mirai traffic Bots from background radiation [94] and other scanning ac- tivity, we uniquely fingerprinted Mirai probes based on Attack an artifact of Mirai’s stateless scanning whereby every probe has a TCP sequence number—normally a random DDoS Target 32-bit integer—equal to the destination IP address. The likelihood of this occurring incidentally is 1/232, and we would expect to see roughly 86 packets demonstrating Figure 2: Mirai Operation—Mirai bots scan the IPv4 address this pattern in our entire dataset. In stark contrast, we space for devices that run telnet or SSH, and attempt to log in us- observed 116.2 billion Mirai probes from 55.4 million IP ing a hardcoded dictionary of IoT credentials. Once successful, addresses. Prior to the emergence of Mirai, we observed the bot sends the victim IP address and associated credentials to only three IPs that perform scans with this fingerprint. a report server, which asynchronously triggers a loader to infect Two of the IP addresses generated five packets; two on the device. Infected hosts scan for additional victims and accept TCP/80 and three on TCP/1002. The third IP address be- DDoS commands from a command and control (C2) server. longs to Team Cymru [1], who conducts regular TCP/443 scans. listened for attack commands from the command and con- We caution that the raw count of IP addresses seen trol server (C2) while simultaneously scanning for new scanning over time is a poor metric of botnet size due to victims. DHCP churn [87]. To account for this, we tracked the size of the botnet by considering the number of hosts actively Malware phylogeny While not directly related to “scanning” at the start of every hour. We detected scans our study, the Mirai family represents an evolution of using the methodology presented by Durumeric et al. [23], BASHLITE (otherwise known as LizardStresser, Torlus, in which we group packets from a single IP address in Gafgyt), a DDoS malware family that infected Linux a temporal window into logical scans. We specifically devices by brute forcing default credentials [86]. BASH- identified scans that targeted the IPv4 address space at an LITE relied on six generic usernames and 14 generic pass- estimated rate of at least five packets per second, expiring words, while the released Mirai code used a dictionary inactive scans after 20 minutes. We geolocated IPs using of 62 username/password pairs that largely subsumed Maxmind [61]. BASHLITE’s set and added credentials specific to con- sumer routers and IoT devices. In contrast to BASHLITE, Mirai additionally employed a fast, stateless scanning module that allowed it to more efficiently identify vulner- Protocol Banners Devices Identified able devices. HTTPS 342,015 271,471 (79.4%) FTP 318,688 144,322 (45.1%) Telnet 472,725 103,924 (22.0%) CWMP 505,977 35,163 (7.0%) 3 Methodology SSH 148,640 8,107 (5.5%) Total 1,788,045 587,743 (31.5%) Our study of Mirai leverages a variety of network vantage Table 2: Devices Identified—We identified device type, model, points: a large, passive network telescope, Internet-wide and/or vendor for 31.5% of active scan banners. Protocol ban- scanning, active Telnet honeypots, logs of C2 attack ners varied drastically in device identifiability, with HTTPS commands, passive DNS traffic, and logs from DDoS certificates being most descriptive, and SSH prompts being the attack targets. In this section, we discuss our data sources least. and the role they play in our analysis. We provide a high-level summary in Table1. Role Data Source Collection Site Collection Period Data Volume Growth and size Network telescope Merit Network, Inc. 07/18/2016–02/28/2017 370B packets, avg. 269K IPs/min Device composition Active scanning Censys 07/19/2016–02/28/2017 136 IPv4 scans, 5 protocols Ownership & evolution Telnet honeypots AWS EC2 11/02/2016–02/28/2017 141 binaries Telnet honeypots Akamai 11/10/2016–02/13/2017 293 binaries Malware repository VirusTotal 05/24/2016–01/30/2017 594 binaries DNS—active Georgia Tech 08/01/2016–02/28/2017 290M RRs/day DNS—passive Large U.S. ISP 08/01/2016–02/28/2017 209M RRs/day Attack characterization C2 milkers Akamai 09/27/2016–02/28/2017 64.0K attack commands DDoS IP addresses Akamai 09/21/2016 12.3K IP addresses DDoS IP addresses Google’s Project Shield 09/25/2016 158.8K IP addresses DDoS IP addresses Dyn 10/21/2016 107.5K IP addresses
Table 1: Data Sources—We utilized a multitude of data perspectives to empirically analyze the Mirai botnet.
3.2 Active Scanning identify the device manufacturer and model. We first ap- plied the set of regular expressions used by Nmap service While Mirai is widely considered an IoT botnet, there probes to fingerprint devices [58]. Nmap successfully has been little comprehensive analysis of infected devices handled 98% of SSH banners and 81% of FTP banners, over the botnet’s entire lifetime. In order to determine the but matches only 7.8% of the Telnet banners. In order to manufacturer and model of devices infected with Mirai, increase our coverage and also accommodate HTTPS and we leveraged Censys [22], which actively scans the IPv4 CWMP (which Nmap lacks probes for), we constructed space and aggregates application layer data about hosts on our own regular expressions to map banners to device the Internet. We focused our analysis on scans of HTTPS, manufacturers and models. Unfortunately, we found that FTP, SSH, Telnet, and CWMP between July 19, 2016 and in many cases, there was not enough data to identify a February 28, 2017. model and manufacturer from FTP, Telnet, CWMP, and A number of challenges make accurate device labeling SSH banners and that Nmap fingerprints only provide difficult. First, Mirai immediately disables common out- generic descriptions. In total, we identified device type ward facing services (e.g., HTTP) upon infection, which and/or model and manufacturer for 31.5% of banners prevents infected devices from being scanned. Second, (Table2). We caution that this methodology is suscepti- Censys scans often take more than 24 hours to complete, ble to misattribution in instances where port-forwarding during which devices may churn to new IP addresses. Fi- and Universal Plug and Play (UPnP) are used to present nally, Censys executes scans for different protocols on multiple devices behind a single IP address, making the different days, making it difficult to increase label speci- distinction between middlebox and end-device difficult. ficity by combining banners from multiple services. We navigated these constraints by restricting our analysis to banners that were collected within twenty minutes of 3.3 Telnet Honeypots scanning activity (the time period after which we expire To track the evolution of Mirai’s capabilities, we collected a scan). This small window mitigates the risk of erro- binaries installed on a set of Telnet honeypots that mas- neously associating the banner data of uninfected devices queraded as vulnerable IoT devices. Mechanically, we with Mirai infections due to DHCP churn. presented a BusyBox shell [92] and IoT-consistent device Post-filtering, our dataset included 1.8 million banners banner. Our honeypots logged all incoming Telnet traf- associated with 1.2 million Mirai-infected IP addresses fic and downloaded any binaries that attackers attempted (Table2). We had the most samples for CWMP, and to install on the host via wget or tftp (the methods of the least for SSH. We caution that devices with open infection found in Mirai’s original source). In order to services that are not closed by Mirai (e.g., HTTPS and avoid collateral damage, we blocked all other outgoing FTP) can appear repeatedly in Censys banner scans during requests (e.g., scanning and DoS traffic). our measurement window (due to churn) and thus lead to We logged 80K connection attempts from 54K IP ad- over counting when compared across protocols. As such, dresses between November 2, 2016 and February 28, we intentionally explored protocols in isolation from one 2017, collecting a total 151 unique binaries. We filtered another and limited ourselves to measurements that only out executables unrelated to Mirai based on a YARA sig- consider relative proportions rather than absolute counts nature that matched any of the strings from the original of infected hosts. source code release, leaving us with 141 Mirai binaries. Finally, we processed each infected device’s banner to We supplemented this data with 293 binaries observed by honeypots operated by Akamai, which served a similar 3.5 Attack Commands purpose to ours, but were hosted on a different public cloud provider. As a final source of samples, we included To track the DDoS attack commands issued by Mirai 594 unique binaries from VirusTotal [90] that we scanned operators, Akamai ran a “milker” from September 27, for using the YARA rules mentioned above. In total, we 2016–February 28, 2017 that connected to the C2 servers collected 1,028 unique Mirai samples. found in the binaries uploaded to their honeypots. The We analyzed the binaries for the three most common ar- service simulated a Mirai-infected device and communi- chitectures—MIPS 32-bit, ARM 32-bit, and x86 32-bit— cated with the C2 server using a custom bot-to-C2 proto- which account for 74% of our samples. We extracted col, which was reverse engineered from malware samples the set of logins and passwords, IP blacklists, and C2 do- prior to source code release. In total, Akamai observed mains from these binaries, identifying 67 C2 domains and 64K attack commands issued by 484 unique C2 servers 48 distinct username/password dictionaries (containing a (by IP address). We note that a naive analysis of attack total 371 unique passwords). commands overestimates the volume of attacks and tar- gets: individual C2 servers often repeat the same attack command in rapid succession, and multiple distinct C2 servers frequently issued the same command. To account for this, we heuristically grouped attack commands along 3.4 Passive & Active DNS two dimensions: by shared C2 infrastructure and by tem- poral similarity. We collapsed matching commands (i.e., Following the public release of Mirai’s source code, com- tuples of attack type, duration, targets, and command op- peting Mirai botnet variants came into operation. We tions) that occur within 90 seconds of each other, which disambiguated ownership and estimate the relative size yielded 15,194 attacks from 146 unique IP clusters. Our of each Mirai strain by exploring passive and active DNS attack command coverage includes the Dyn attack [36] data for the 67 C2 domains that we found by reverse engi- and Liberia attacks [45]. We did not observe attack com- neering Mirai binaries. We also leveraged our DNS data mands for Krebs on Security and OVH, which occurred to map the IP addresses present in attack commands to prior to the milker’s operation. victim domain names. From a large U.S. ISP, we obtained passive DNS data consisting of DNS queries generated by the ISP’s clients 3.6 DDoS Attack Traces and their corresponding responses. More specifically, we collected approximately 209 million resource records Our final data source consists of network traces and aggre- (RRs)—queried domain name, and associated RDATA— gate statistics from Akamai and Google’s Project Shield and their lookup volumes aggregated on a daily basis. (the providers for Krebs on Security) and Dyn. These at- For our active DNS dataset, we obtained 290 million tacks cover two distinct periods in Mirai’s evolution. We RRs per day from Thales, an active DNS monitoring used this data to corroborate the IP addresses observed system [44]. Both datasets cover the period of August 1, in attacks versus those found scanning our passive net- 2016 to February 28, 2017. work telescope, as well as to understand the volume of Using both passive and active DNS datasets, we per- traffic generated by Mirai1. From Akamai, we obtained formed DNS expansion to identify shared DNS infrastruc- an aggregate history of all DDoS attacks targeting Krebs ture by linking related historic domain names (RHDN) on Security from 2012–2016, as well as a small sam- and related historic IPs (RHIPs) [5]. This procedure be- ple of 12.3K IPs related to a Mirai attack on September gan with the seed set of C2 domains and IPs extracted 21, 2016. For Project Shield, we shared a list of IP ad- during reverse engineering of our honeypotted binaries. dresses observed by our network telescope and in turn For a given seed foo.com, we identified the IP addresses received aggregate statistics on what fraction matched that foo.com previously resolved to and added them to any of 158.8K IP addresses involved in a 1-minute Mirai a growing set of domains and IPs. We additionally per- HTTP-flood attack on September 25, 2016. Finally, Dyn formed the reverse analysis, starting from an IP and find- provided us with a set of 107.5K IP addresses associated ing any domain names that concurrently resolved it. Thus, with a Mirai attack on October 21, 2016. even from a single domain name, we iteratively expanded the set of related domain names and IP addresses to con- 1 struct a graph reflecting the shared infrastructure used We overlapped attack traces with every Mirai scanning IPs on our network telescope. The overlap may have been inflated by non-Mirai by Mirai variants. In total, we identified 33 unique DNS attack IPs being assigned to Mirai devices over time through DHCP clusters that we explore in detail in Section5. churn.