Analyzing Variation Among Iot Botnets Using Medium Interaction Honeypots

Analyzing Variation Among IoT Botnets Using Medium Interaction Honeypots Bryson Lingenfelter Iman Vakilinia Shamik Sengupta Dept. of Computer Science and Eng. School of Computing Dept. of Computer Science and Eng. University of Nevada, Reno University of North Florida University of Nevada, Reno Reno, USA Jacksonville, USA Reno, USA [email protected] [email protected] [email protected] Abstract—Through analysis of sessions in which files were pattern comparison, root cause identification, risk assessment, created and downloaded on three Cowrie SSH/Telnet honeypots, attack frequency analysis, and attack origins analysis [2]. we find that IoT botnets are by far the most common source of Medium interaction honeypots such as Cowrie [3] provide a malware on connected systems with weak credentials. We detail our honeypot configuration and describe a simple method for fake shell and virtual file system to the attacker, such that listing near-identical malicious login sessions using edit distance. the attacker can enter commands and see believable output A large number of IoT botnets attack our honeypots, but without having access to a real system [4]. This is an ideal the malicious sessions which download botnet software to the environment for tracking the state of IoT botnets, which are honeypot are almost all nearly identical to one of two common easy to capture samples from because they are automated and attack patterns. It is apparent that the Mirai worm is still the dominant botnet software, but has been expanded and modified not able to perform honeypot detection as thoroughly as a by other hackers. We also find that the same loader devices deploy real attacker. We collect data using the open source Cowrie several different botnet malware strains to the honeypot over the honeypot, which is a continuation of the Kippo honeypot. course of a 40 day period, suggesting multiple botnet deployments Kippo/Cowrie honeypots have previously been used to cluster from the same source. We conclude that Mirai continues to be login sessions based on session time and attacker skill [5], and adapted but can be effectively tracked using medium interaction honeypots such as Cowrie. visualize attacker location and common commands [6]. Index Terms In this paper we describe a honeypot configuration based on the Cowrie SSH/Telnet honeypot for capturing remote login sessions and downloaded files, expanding from our previous I. INTRODUCTION work analyzing password attempts [7]. For our analysis, we Shortly after the Mirai malware made headlines in 2016 due look at sessions which created and downloaded files. Login to its usage in a 620 Gbps attack against krebsonsecurity.com session analysis has previously been done using high inter- [1], the botnet code was publicly released as open source action honeypots and classifying each command as part of software. The original version of Mirai targeted vulnerable a specific state [8]. We instead describe a simple method Internet of Things (IoT) devices using a short wordlist based for clustering these sessions using edit distance to enumerate on default username and password combinations, amassing an identical attack patterns. In light of these sessions almost army of hundreds of thousands of connected devices using universally being associated with the Mirai malware, we just these credentials. The release of Mirai’s source code provide discussion of Mirai’s functionality. Previous work pro- has resulted in numerous clones and modifications by other vides complete description of the original malware’s behavior, malicious actors which have expanded upon the attacks present architecture, and attack methods [9], [10]; we instead focus on in the original version of Mirai. Despite receiving much how much Mirai has been modified. There is some existing attention after the 2016 attacks, IoT security is still a major work in this area. Y. Liu and H. Wang [11] use branch name, issue and new botnets appear regularly. As a result, there is a configuration, IP addresses, attack methods, and credential need to keep track of developments in IoT botnets so current dictionaries to distinguish between binary files from different attacks can be appropriately dealt with. Mirai variants. Other work describing Mirai has identified Honeypots are a popular technique for capturing malicious versions which target different ports and devices [10]. activity. They provide a sandbox environment for malicious Using our clustering results, we note how many different actors to attack, recording each action for later analysis. Hon- Mirai variants attack our honeypots and how much they differ eypots have been used for a wide range of tasks, such as attack from one another and the publicly available Mirai source code in terms of commands entered, choice of filler words, and This research is supported by the National Science Foundation (NSF), USA, downloaded files. We aim to provide an idea of the amount of Award #1739032. variation present in Mirai attacks both in terms of number of distinct variants and variation amount. We also provide 978-1-7281-3783-4/20/$31.00 ©2020 IEEE analysis of variation in files from apparently identical malware 0761 Authorized licensed use limited to: UNIVERSITY OF NEVADA RENO. Downloaded on December 18,2020 at 22:04:09 UTC from IEEE Xplore. Restrictions apply. loaders as well as identical IP addresses and find that many IP the Combined Log Format data generated by Apache into the addresses serve as loader servers for multiple strains of Mirai- format used by the Elasticsearch search and analytics engine. based botnets. Finally, we provide information on other attacks We make an ssh server available on the Postfix machine by observed by the honeypot to give some perspective regarding forwarding port 22222 on the pfsense router to port 22 on the prevalence of Mirai. the mail server. We also make the ELK machine accessible through ssh over port 22 from the internal network, allowing II. METHODOLOGY access to Elasticsearch and Kibana via an ssh tunnel through A. Honeypot Configuration the mail server. This allows us to remotely access ELK without needing to expose the server publicly. Malware data was collected from 3 honeypots running the Cowrie SSH/Telnet honeypot [3], the continuation of the popular Kippo honeypot. The Cowrie honeypot is a medium- B. Dataset interaction honeypot which provides a dummy shell imple- Every event Cowrie adds to its log has a structure defined mented in python to the attacker with a virtual filesystem by the event type, such as cowrie.login.success and and prefab command outputs. Cowrie can be used to emulate cowrie.command.input. Each event, regardless of type, an IoT system by setting the prefab outputs to be consistent has a session id field that can be used to identify events with actual IoT devices, and is therefore a good environment generated by the same session. To collect the data for this for observing IoT botnets without the overhead of a high- paper, we enumerated every unique session id associated with interaction honeypot. Cowrie has also been modified multiple a cowrie.session.file_download event and used times during its development specifically in response to issues Elasticsearch to collect all other events for these ids. The where sessions from the Mirai malware didn’t result in a file file download event is used for both downloads from remote download, making it particularly good for capturing samples servers and files created during the login session, so we of Mirai. consider both for our analysis. The data was collected over We made minimal alterations to Cowrie’s default config- a period of 40 days, from 4 February 2019 to 15 March 2019, uration as of February 2019. Cowrie’s userdb was modified and consists of 84,602 unique login sessions. These sessions slightly to allow login for the three default cowrie users account for 21.2% of the 398,233 session dataset; that is, (root, tomcat, and oracle) given any password. Additionally, roughly 20% of the time a connection was initiated with a the honeypot was configured to accept both ssh and telnet honeypot it resulted in a successful login followed by a file connections on ports 22/2222 and 23/2223, respectively. This creation or download. These login sessions resulted in a total was done to allow as much traffic as possible. Each honeypot of 312 unique files collected by the honeypot. had an Apache webserver running on port 80 associated with a registered domain name and static web page. Finally, each C. Attack Pattern Comparison honeypot had a Postfix mailserver running on port 25 with To identify identical attack patterns, each session was en- several valid email accounts set up. Each portion of the coded as a list of the first argument for each command entered honeypot (login, web, and mail) ran on a separate virtual by the attacker. For example, if an attacker logged in, typed machine and received traffic through port forwarding. For the cat /proc/mounts followed by echo test, then ended analysis in this paper, only the Cowrie component is necessary. the session, the encoded list would be ["cat", "echo"]. A command containing ; (which most shells treat as end of line), but not commands with && or ||, was split into multiple commands. If the first argument was a shell such as /bin/busybox or /bin/bash we instead use the second argument, which is the first argument to the shell being called. To identify identical attack patterns, Levenshtein distance between lists of arguments was used as a similarity metric. Levenshtein distance computes the number of edits (substitution, insertion, and deletion) required to match two sequences of non-equal length, providing a rough estimation of similarity between login sessions. Edit distance has previously been used for applications such as root cause analysis [12]. The metric was normalized by dividing by its upper bound, Fig.

Load more