See No Evil: Evasions in Honeymonkey Systems
Total Page:16
File Type:pdf, Size:1020Kb
See No Evil: Evasions in Honeymonkey Systems Brendan Dolan-Gavitt & Yacin Nadji May 7, 2010 Abstract tacks can no longer be found by using honeypots that host vulnerable network services and moni- Client-side attacks have emerged in recent years tor for signs of compromise. Instead, researchers to become the most popular means of propagat- have proposed using honeyclients, which actively ing malware. In order to keep up with this new seek out malicious content on the web, to find wave of web-based malware, companies such as previously unknown client-side attacks. Google routinely crawl the web, feeding suspi- We note in passing that, like detection of mali- cious pages into virtual machines that emulate cious software, detection of malicious web pages client systems (known as honeymonkeys or hon- is theoretically undecidable. We provide a proof eyclients). In this paper, we will demonstrate of this fact, in the spirit of Fred Cohen [1], in that although this approach has been successful Appendix A. However, such theoretical impos- to date, it is vulnerable to evasion by attackers: sibility need not deter us; just as a large indus- by guarding exploit code with client-side checks try has grown up around virus detection, hon- to determine whether a visitor is a human or eyclients are in widespread use today, and it is an automated system, attackers can ensure that both useful and necessary to consider practical only human users are exploited, causing the page evasions against them. to look benign when viewed by a honeyclient. Just as honeypots can be classified as high- or Our 13 evasions, which include both observa- low- interaction, so too can honeyclients be cat- tional and interactive proofs of humanity, are egorized based on the extent to which they em- able to easily distinguish between honeyclients ulate a full end user system. HoneyMonkey [14] and human victims. We evaluate the strengths and the MITRE Honeyclient [8], which are both and weaknesses of these evasions, and speculate high-interaction honeyclients, use a full virtual on even more powerful attacks and defenses. machine with a web browser to visit potentially malicious sites; if the visit results in a persis- 1 Introduction tent change to the state of the VM outside the browser, one can conclude that a successful at- Attacks against end users have undergone a dra- tack has occurred. By contrast, low-interaction matic shift in the past several years, moving from honeyclients such as honeyc [13] merely emulate attacks against listening network services to at- a full browser, and use heuristics to detect at- tacks on client software such as browsers and tacks rather than monitoring for changes in the document viewers. These changes have created state of the OS. a need for new defensive tactics. Many new at- Honeyclients have recently gained wide use in 1 the Google Safe Browsing API [3]. As Google on the attacker, or run the risk of alerting the crawls the web, it uses heuristics to detect poten- user. By contrast, many of the evasions we de- tially malicious web sites. These sites are then scribe are transparent to the user and do not fed to a custom honeyclient to verify that they harm the attacker's chances of success. are, in fact, malicious [11]. If a web page is found to host attack code, users who attempt to visit the page from Google's search results are pre- 3 Methodology sented with the warning shown in Figure 1. The Honeyclients and honeypots are two similar so- user must ignore this warning to view the poten- lutions to fundamentally different problems. A tially malicious page. honeypot attempts to capture push-based infec- Due to the prevalence of client-side attacks tion, malware that is forced onto unsuspecting and the increasing use of honeyclients to de- servers. Once a server becomes infected, it fur- tect them, a likely next step for attackers is thers the campaign by pushing its infection on to to consider honeyclient evasions | techniques other hosts. A honeyclient, however, attempts to that cause a web page to exhibit benign behavior capture pull-based infection, which requires some when visited by a honeyclient while still exploit- human involvement. ing real users. In this project, we will present a By detecting whether a client is a human or number of such evasions, assess their utility to a honeyclient we can mount attacks on humans attackers, and consider possible defenses honey- only, nullifying honeyclients as a tool for analy- clients can use to counter our attacks. sis. To demonstrate these attacks, we have de- signed and implemented a reverse Turing test 2 Related Work that looks for a variety of features that would most likely be present in a true human client. In Honeypots have long been used to detect and the interest of a conservative analysis, we have analyze novel attacks. Efforts such as the Hon- limited ourselves to ourselves to legal constructs eynet Project [12] and numerous academic pa- in common languages seen on the web, such as pers [10, 6] have described the use of both real HTML, JavaScript, Flash and Java. Our ap- and virtual machines to pick up on network- proach precludes, for example, the use of a secu- based attacks on vulnerable services. As the rity flaw in Flash to implement our reverse Tur- use of honeypots became more common, evasion ing test. This conservative approach also means techniques were developed [5, 7] that used tech- that our evasions cannot be countered by patch- niques such as virtual machine detection to pre- ing known client-side flaws. If our features deter- vent executing in a honeypot environment. As mine a client is a human, the exploit is launched. attacks shifted to the client side, honeyclients Otherwise, the site executes only benign code. were proposed as a means of detecting attacks. We have ranked our features by considering Systems such as Strider HoneyMonkey [14] use the following categories: virtual machines running a vulnerable version In- 1. How difficult is the feature to counter? ternet Explorer to detect malicious sites. Despite the widespread use of honeyclients, 2. How noticeable is the feature to a human honeyclient evasion has received little attention client? in academic literature. Although Wang et al. discusses evasions, the attacks they consider are 3. How costly is the feature's implementation either easy to counter, impose a significant cost for an attacker? 2 Figure 1: Google Malicious Site Warning { Does the attack incur false positives by 4 Attacks rejecting real humans? { Does it increase the time required to Our attacks fall into two different categories: execute the attack? state-based and behavior-based. State-based at- tacks query the client's state to infer information This ranking can be used by an attacker to de- about the user. For example, if the browser has termine the features to be used, and the order recently visited sites like Facebook and Twitter, in which they should be executed to detect hon- it makes it more likely that the client is an actual eyclients. A subset of the features can be used human, rather than a browser instance spawned depending on the attacker's needs. For example, from a VM snapshot. Behavior-based attacks if it is more important to prevent missed attacks interact with the user directly. CAPTCHA is on humans than defeating honeyclient analysis, a a well known technique for differentiating hu- feature that introduces some false positives may mans and computers. Here, we use it in a similar be left out. fashion; if the CAPTCHA is properly solved, we For each feature, we have created pages that know a human is in control of the browser and demonstrate the evasion. When loaded by a hu- we can deliver our exploit safely. These two at- man, client-side code on the page tests for some tack categories are shown in Figure 2. Both at- feature that indicates the presence of a human, tacks occasionally rely on a predefined timeout and then performs an XMLHttpRequest back to or threshold that must be crossed to determine if the server with the test result. In a real attack, an exploit should be launched or not. This values the server could then examine the result and, if are in no way static, and will change as further the test indicates that the client is human, send testing is done to determine the \best" possible the exploit code to the client. Thus, an attacker values. An overview of our attacks is presented can avoid leaking valuable exploit code to a hon- in Table 1, with explanations of the intuition be- eyclient, while remaining effective against human hind them presented in the subsequent sections. users. We can evaluate our approach by comparing It is important to note that all the attacks are page execution behavior between the MITRE written entirely in Javascript or Flash. All major Honeyclient and a human client. browsers support Javascript and 99% of Internet- 3 Behavior-based Stimulus User Interaction Launch Attack Yes Decide: user is human? Data collection History No example.com cnn.com twitter.com Do Nothing facebook.com goatse.cx ... State-based Figure 2: The two categories of evasions we describe. Behavior-based evasions use user interaction to distinguish between humans and honeyclients, while state-based evasions use only data gathered from the browser and operating system. enabled computers have Flash installed1, making user intervention. If a camera is connected, we these suitable languages to perform honeyclient launch the exploit. evasion. Connection Speed Large companies who 4.1 State-based Attacks would benefit from honeyclient testing e.g., google.com generally have high throughput In- State-based attacks infer if a client is a honey- ternet connections to deliver fast and reliable ser- client based on information harvested from the vices.