Systematically Fingerprinting Low- and Medium-Interaction Honeypots at Internet Scale
Total Page:16
File Type:pdf, Size:1020Kb
Bitter Harvest: Systematically Fingerprinting Low- and Medium-interaction Honeypots at Internet Scale Alexander Vetterl Richard Clayton University of Cambridge University of Cambridge [email protected] [email protected] Abstract alistic environment for humans to interact with. Attack- ers have a strong motivation to detect honeypots at an The current generation of low- and medium interac- early stage as they do not want to disclose their methods, tion honeypots uses off-the-shelf libraries to provide the exploits and tools [21]. These attackers have attempted transport layer. We show that this architecture is fa- to distinguish honeypots by executing commands within tally flawed because the protocols are implemented sub- the login shell (or the impersonation of the login shell) tly differently from the systems being impersonated. We and examining the responses. This has led to an arms present a generic technique for systematically finger- race as attackers develop new distinguishers and honey- printing low- and medium interaction honeypots at In- pot authors improve the verisimilitude of their system. ternet scale with just one packet and an ERR (Equal Er- However, if a honeypot can be detected at the transport ror Rate) of 0.0183. We conduct Internet-wide scans and level, for example without completing the SSH hand- identify 7 605 honeypot instances across nine different shake or Telnet options negotiation, the honeypot’s value honeypot implementations for the most important net- will be minimal and efforts to impersonate the service work protocols SSH, Telnet, and HTTP. For SSH hon- will be in vain [25]. This aspect of the detection arms eypots we also determined their patch level and find that race is especially challenging because modern protocols they are poorly maintained – 27% of the honeypots have such as SSH must handle a variety of versions, key ex- not been updated within the last 31 months and only change mechanisms, ciphers and service requests. Sim- 39% incorporate improvements from 7 months ago. We ilarly, in Telnet the client and server can negotiate nu- believe our findings to be a ‘class break’ in that trivial merous settings such as line mode, echo and terminal patches cannot address the issue. type. As the RFCs do not mandate every aspect of a net- work protocol, two implementations of a complex proto- 1 Introduction col may deal with ambiguities differently, and this may reveal the presence of a honeypot. Early detection of new attack vectors and abusive be- Finding deviations between applications and their fin- haviour is a cornerstone of contemporary approaches to gerprints has a long history and a number of theoretical improving Internet security. Honeypots, resources that and prototyped approaches, mainly based on binary anal- appear to be legitimate systems, have long proven effec- ysis, have been proposed [6, 7, 37]. Specific to finger- tive in capturing malware, helping to counter spam and printing honeypots, Holz et al. [21] showed that the exe- providing early warning signals about upcoming threats. cution time of attackers’ commands may be significantly Secure Shell (SSH), Telnet, and HTTP were a focus of longer due to the additional overhead of logging and early honeypot research. Since SSH is the de facto stan- sandboxing the execution of the honeypot itself. Simi- dard to login to servers over an unsecured network, SSH larly, Fu et al. [17] found that attackers could use the la- honeypots continue to be extremely valuable. With the tency of the networking layer to fingerprint honeypots. In rise of the Mirai botnet, which recruits a variety of IoT 2015 Cymmetria Research summarized the known ways devices using Telnet, it became evident that Telnet hon- to fingerprint a variety of honeypots and provided a list eypots remain a key source of information [3]. of recommendations for honeypot developers [5]. For the last decade, honeypot research has received All of these are individual findings, focusing on iden- limited attention, with efforts mainly focused on the tifying single honeypot implementations. However, we monitoring of human activity and the provision of a re- are the first to observe that there is a generic weakness in the current generation of low- and medium-interaction Table 1: Honeypots in this study honeypots because of their reliance on off-the-shelf li- Updated Language Library braries to implement large parts of the transport layer. These libraries are used for their convenience, but they SSH Kippo May 15 Python TwistedConch were never intended to provide identical behaviour to Cowrie May 18 Python TwistedConch ‘real’ servers. We systematically identify these differ- Telnet ences, which can number in the thousands, and show that TPwd Feb 16 C custom this allows us to locate a large variety of honeypots by MTPot Mar 17 Python telnetsrv TIoT May 17 Python custom Internet scale scanning. Cowrie May 18 Python TwistedConch We believe this to be a class break in that patches to HTTP/Web the current generation of honeypots cannot address the Dionaea Sep 16 Python custom issue. Until these honeypots are given a new architec- Glastopf Oct 16 Python BaseHTTPServer Conpot Mar 18 Python BaseHTTPServer ture, anyone with moderate capabilities has a plethora of extremely simple and quick methods of identifying that a honeypot is running on a particular IPv4 address and can large-scale attacks. A 2012 report for the European Net- thereby treat it differently than otherwise. work and Information Security Agency (ENISA) evalu- Overall, we make three main contributions: ated the vast majority of available honeypots, including We present a generic and accurate technique for • high-interaction honeypots. The authors recommend us- systematically fingerprinting low- and medium in- ing the medium-interaction honeypots Kippo for SSH, teraction honeypots by constructing distinguishing Glastopf for HTTP and Dionaea for the remaining proto- probes at the transport layer. We identified thou- cols [20]. Honeypots that support Telnet ‘out of the box’ sands of deviations between honeypots and the ser- were not widely available at the time of that study. vices they are impersonating. Table 1 provides an overview of the honeypots that we We use this technique to perform Internet-wide • considered in our study, which we now discuss in detail. scans for 9 different honeypots for the most im- SSH Honeypots: In this paper we consider SSH portant network protocols SSH, Telnet and HTTP. honeypots emulating generic servers running OpenSSH We find 7 605 honeypot instances residing on 6 125 whose login credentials can be guessed and commands IPv4 addresses: 2 779 honeypot instances for SSH, issued within an operating system shell such as bash. 1 166 for Telnet and 3 660 for HTTP. OpenSSH is the most widely used SSH protocol suite, We provide insights into how honeypots are con- • and is installed on approximately 77% of all SSH servers figured and deployed in practice. We discover that listening on port 22 [13]. only 39% of the honeypots were updated within Many SSH honeypots have been developed over the the previous 7 months. Furthermore, we find that years and one of the first SSH honeypots was Ko- 546 (72%) of the 758 Kippo honeypots are still (44 joney [24] but active development ceased around 2006. months later) vulnerable to a known fingerprinting Kojoney uses the TwistedConch library which dates back technique that was first disclosed in 2014. to 2002 and is the de facto standard implementation of SSHv2 for Python2/3. Kojoney inspired Kippo [23] 2 Background which was developed from 2009 to 2015 but the Kippo author now recommends people use a forked project Honeypots are classified by the type of system that they called Cowrie [31]. Cowrie has added more extensive emulate, such as web or email servers, or general purpose logging and support for Telnet, and it remains under ac- remote access ‘shells’. Honeypots are further classified tive development. The project’s philosophy is to only as low interaction (in this context merely collecting cre- implement shell commands that are being used by at- dential guesses), medium interaction (providing a higher tackers and so as of 2018-01, Cowrie (partly) emulates level of interaction) or high interaction (allowing attack- 34 commands [8]. In 2015, Deutsche Telekom included ers full control of a machine). Kippo in T-Pot, a multi-honeypot platform “based on High-interaction honeypots have significant value, but well-established honeypots” [9]. T-Pot combines differ- many people are unable to accept the risk that such hon- ent honeypots for network services with an intrusion de- eypots may be used for DDoS attacks, malware distri- tection system and a monitoring and reporting engine. bution or the sending of email spam. However, low- As of March 2016, Kippo was replaced by Cowrie “since and medium-interaction honeypots have proven effec- it offers huge improvements over Kippo” [10]. tive as they are easy to deploy and to maintain, while Telnet Honeypots: Since mid-2016, and the rise of at the same time potential harm is minimised. They the Mirai botnet, there has been an increasing interest in are especially useful in collecting quantitative data about Telnet honeypots. In this paper we consider four recent systems: MTPot, Telnet-IoT-Honeypot (TIoT), Telnet- the lowest, i.e. the responses RPi for Ij are overall the Password-Honeypot (TPwd) and Cowrie. least similar. In other words, we try to find the probe MTPot, developed by Cymmetria Research, a com- that results in the most distinctive response across all the pany focusing on cyber deception, is designed to catch protocol implementations. Mirai binaries. It is written in Python 2.x and uses the Cosine similarity is an established technique for com- telnetsrv library for its telnet protocol implementation. paring sets of information, commonly used to measure TIoT also aims to catch IoT malware and is also written text semantic similarity.