Identifying and Characterizing Bashlite and Mirai C&C Servers
Total Page:16
File Type:pdf, Size:1020Kb
Identifying and Characterizing Bashlite and Mirai C&C Servers Gabriel Bastos∗, Artur Marzano∗, Osvaldo Fonseca∗, Elverton Fazzion∗y, Cristine Hoepersz, Klaus Steding-Jessenz, Marcelo H. P. C. Chavesz, Italo´ Cunha∗, Dorgival Guedes∗, Wagner Meira Jr.∗ ∗Department of Computer Science – Universidade Federal de Minas Gerais (UFMG) yDepartment of Computing – Universidade Federal de Sao˜ Joao˜ del-Rei (UFSJ) zCERT.br - Brazilian National Computer Emergency Response Team NIC.br - Brazilian Network Information Center Abstract—IoT devices are often a vector for assembling massive which contributes to the rise of a large number of variants. botnets, as a consequence of being broadly available, having Variants exploit other vulnerabilities, include new forms of limited security protections, and significant challenges in deploy- attack, and use different mechanisms to circumvent existing ing software upgrades. Such botnets are usually controlled by centralized Command-and-Control (C&C) servers, which need forms of defense. The increasing number of variants of these to be identified and taken down to mitigate threats. In this paper malwares makes the analysis and reverse engineering process we propose a framework to infer C&C server IP addresses using expensive, hindering the development of countermeasures and four heuristics. Our heuristics employ static and dynamic analysis mitigations. To address the growing number of threats, security to automatically extract information from malware binaries. We analysts and researchers need tools to automate the collection, use active measurements to validate inferences, and demonstrate the efficacy of our framework by identifying and characterizing analysis, and reverse engineering of malware. C&C servers for 62% of 1050 malware binaries collected using The C&C server is a core component of botnets, responsible 47 honeypots. for coordinating its bots [5]. Considering that taking down a C&C makes the botnet innocuous, security analysts dedicate I. INTRODUCTION efforts to identify C&C servers, while malware developers The Internet of Things (IoT) is the network of physical implement mechanisms to complicate such identification. As devices connected through the Internet, such as security cam- an example of such mechanisms, malware developers add code eras and vehicular systems. The use of IoT devices in different to obfuscate the identity of C&Cs or avoid contacting them applications shows opportunities for economic and technolog- when executing on top of sandbox environments. ical development in different sectors of society. However, the In this work we extend existing tools to perform automated minimalist design of most of those devices, constrained due analysis of Bashlite and Mirai IoT malware families to identify to market competition among vendors, compromises security C&C servers. We improve Detux, a sandbox for malware and leads to vulnerabilities. This problem is aggravated by the evaluation with isolation mechanisms to prevent malware nature of embedded software and the challenges in applying executions from interfering with each other or the Internet. updates. Malicious agents exploit vulnerabilities in IoT de- We propose four different heuristics that employ static and vices to infect them to create botnets [1], [2]. Although the dynamic analysis to infer C&C server candidates. We develop computational power of each infected device (bot) is small, an active clients that connect to the inferred C&C candidates and IoT botnet may coordinate thousands of bots to successfully exchange messages using the Bashlite and Mirai protocols to perform malicious activities, such as massive distributed denial validate inferences. of service (DDoS) attacks. We also propose a technique to identify variants of each DDoS attacks have increased in frequency and intensity, malware family. The challenge is in identifying similar bi- with services being attacked daily and some attacks generating naries even though they contain no metadata. Our technique traffic on the order of 1 Tbps [3]. The total losses for the uses Radare2, a framework with tools for reverse engineering attacked companies are in the order of billions of dollars, binary executables, to identify and extract functions from bi- since those attacks exhaust resources such as processing and naries. We then use a fuzzy hash to compare similarity among bandwidth, including well-provisioned services, causing un- functions, present a metric to calculate the distance between availability [4]. The large number and topological distribution binaries, and execute a hierarchical clustering algorithm to of infected devices allow IoT botnets to perform massive, group similar malware variants. difficult to mitigate, attacks. Our study analyzes IoT malwares collected by 47 low- Two families of IoT botnets, Bashlite and Mirai, have interactivity honeypots distributed across 15 Brazilian states. recently gained notoriety after being used to perform DDoS We present static analysis results for 25,183 binaries collected attacks of 400 Gbps and 1 Tbps, respectively [1], [3], with between Jan. 2017 and Dec. 2018, showing the evolution of considerable impact on large services (e.g., DynDNS). The anti-analysis mechanisms used by Bashlite and Mirai variants. source code for both malwares is available on the Internet, We present dynamic analysis results for 1,050 IoT malwares, III. C&C SERVER IDENTIFICATION To identify C&C servers, we start by performing a special- ized static code analysis, which allows us to detect and avoid countermeasures employed by malwares to deceive dynamic analysis (xIII-A). Then, we execute the malware in the Detux sandbox, which monitors its behavior in a controlled and realistic IoT environment (xIII-B). Next, we use four heuristics based on the dynamic analysis of the network traffic generated by each binary to infer IP addresses that probably host C&C Figure 1: Malware collection and analysis infrastructure. servers (xIII-C). Finally, we execute tools to validate the inferred IP addresses and identify real C&C servers (xIII-D). collected between 2018-11-27 and 2018-12-26, and between A. Static analysis of malwares 2019-01-30 and 2019-02-28, showing that we are able to infer C&C candidates for 96% and validate inferences for The first part of the static analysis consists of ELF data 62% of the binaries. We show that our heuristics detect 29% extraction performed by Detux. Static analysis extracts data more C&C servers when compared to a baseline approach such as file and program headers, sections, debug symbols, based on static analysis alone. Finally, the proposed binary and strings from a binary. We use the symbol table to identify clustering technique identifies different Bashlite and Mirai binaries compiled with and without debugging information variants, reducing by 47.8% the number of binaries that (stripped and non-stripped). In addition, we store strings that security analysts need to analyze in an effort to understand correspond to IP addresses contained in each binary. This malware functionality. information is used in the following steps of our analysis. The framework proposed in this article automates the iden- The second part of the static analysis aims to overcome tification of C&C servers, improving the mitigation of botnets mechanisms used by the attackers to deceive the dynamic and reducing their impact. Besides, the clustering technique analysis. We observed the absence of DNS requests in sev- reduces the number of binaries that security analysts need to eral executions of Mirai malwares, an atypical behavior for analyze. We believe these contributions are an additional step this malware family, which normally identifies its C&Cs by in mitigating those threats. domain names. Through manual inspection of the source code, we identified the presence of a malware activation II. MONITORING INFRASTRUCTURE AND DATASET mechanism which verifies whether the name of the executable (argv[0]) matches a predefined value in the code (activation The malware dataset used in this work was gathered by a key). Binaries containing this activation mechanism behave as passive data collection infrastructure, depicted in Fig. 1, which expected (by performing network scanning and by contacting monitors infection attempts. This infrastructure is composed the C&C server) only when the verification is successful. by 47 low interactivity honeypots, emulating SSH and telnet Otherwise, it runs an alternative routine that contacts a fake services with known default credentials of IoT devices, com- C&C server, preventing identification of the real C&C and monly abused by Bashlite and Mirai malware. potentially directing mitigation efforts to the wrong target. The honeypots collect the authentication credentials used To make activation key decryption hard, the bytes that to log in, and the subsequent sequence of commands issued compose it are encoded. Let B ;B ;:::;B be the bytes during the infection attempt. The commands are not executed 1 2 N representing the activation key, in which B is the null by the honeypots; all replies from the honeypots are computed N byte (0x00) that ends the string. The order of these bytes by interpreting the commands. This is possible because the is exchanged for each pair of consecutive bytes and a infection process is automated, using a limited, known, set of byte with zero value is inserted between every pair. Thus, programs commonly installed in IoT devices. the activation key would be encoded in the binary as The commands issued during