On the Feasibility of Online Malware Detection with Performance Counters
Total Page:16
File Type:pdf, Size:1020Kb
On the Feasibility of Online Malware Detection with Performance Counters John Demme Matthew Maycock Jared Schmitz Adrian Tang Adam Waksman Simha Sethumadhavan Salvatore Stolfo Department of Computer Science, Columbia University, NY, NY 10027 [email protected], [email protected], fjared,atang,waksman,simha,[email protected] ABSTRACT 1. INTRODUCTION The proliferation of computers in any domain is followed by Malware { short for malicious software { is everywhere. the proliferation of malware in that domain. Systems, in- In various forms for a variety of incentives, malware exists cluding the latest mobile platforms, are laden with viruses, on desktop PCs, server systems and even mobile devices like rootkits, spyware, adware and other classes of malware. De- smart phones and tablets. Some malware litter devices with spite the existence of anti-virus software, malware threats unwanted advertisements, creating ad revenue for the mal- persist and are growing as there exist a myriad of ways to ware creator. Others can dial and text so-called \premium" subvert anti-virus (AV) software. In fact, attackers today services resulting in extra phone bill charges. Some other exploit bugs in the AV software to break into systems. malware is even more insidious, hiding itself (via rootkits or In this paper, we examine the feasibility of building a mal- background processes) and collecting private data like GPS ware detector in hardware using existing performance coun- location or confidential documents. ters. We find that data from performance counters can be This scourge of malware persists despite the existence of used to identify malware and that our detection techniques many forms of protection software, antivirus (AV) software are robust to minor variations in malware programs. As a being the best example. Although AV software decreases result, after examining a small set of variations within a fam- the threat of malware, it has some failings. First, because ily of malware on Android ARM and Intel Linux platforms, the AV system is itself software, it is vulnerable to attack. we can detect many variations within that family. Further, Bugs or oversights in the AV software or underlying system our proposed hardware modifications allow the malware de- software (e.g., the operating system or hypervisor) can be tector to run securely beneath the system software, thus exploited to disable AV protection. Second, production AV setting the stage for AV implementations that are simpler software typically use static characteristics of malware such and less buggy than software AV. Combined, the robustness as suspicious strings of instructions in the binary to detect and security of hardware AV techniques have the potential threats. Unfortunately, it is quite easy for malware writers to advance state-of-the-art online malware detection. to produce many different code variants that are functionally equivalent, both manually and automatically, thus defeating Categories and Subject Descriptors static analysis easily. For instance, one malware family in C.0 [Computer Systems Organization]: General|Hard- our data set, AnserverBot, had 187 code variations. Alterna- ware/software interfaces; K.6.5 [Management of Com- tives to static AV scanning require extremely sophisticated puting and Information Systems]: Security and Pro- dynamic analysis, often at the cost of significant overhead. tection|Invasive software Given the shortcomings of static analysis via software im- plementations, we propose hardware modifications to sup- General Terms port secure efficient dynamic analysis of programs to detect Security in Hardware, Malware and its Mitigation malware. This approach potentially solves both problems. First, by executing AV protection in secure hardware (with Keywords minimum reliance on system software), we significantly re- Malware detection, machine learning, performance counters duce the possibility of malware subverting the protection mechanisms. Second, we posit that dynamic analysis makes 1 detection of new, undiscovered malware variants easier. The This work was supported by grants FA 99500910389 (AFOSR), FA 865011C7190 (DARPA), FA 87501020253 (DARPA), CCF/TC intuition is as follows: we assume that all malware within a 1054844 (NSF), Alfred P. Sloan fellowship, and gifts from Microsoft certain family of malware, regardless of the code variant, at- Research, WindRiver Corp, Xilinx and Synopsys Inc. Any opinions, tempts to do similar things. For instance, they may all pop findings, conclusions and recommendations do not reflect the views of the US Government or commercial entities. up ads, or they may all take GPS readings. As a result, we would expect them to work through a similar set of program Permission to make digital or hard copies of all or part of this work for phases, which tend to exhibit similar detectable properties personal or classroom use is granted without fee provided that copies are in the form of performance data (e.g., IPC, cache behavior). not made or distributed for profit or commercial advantage and that copies In this paper, we pose and answer the following central bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific feasibility question: Can dynamic performance data be used permission and/or a fee. to characterize and detect malware? We collect longitudi- ISCA ’13 Tel-Aviv, Israel nal, fine-grained microarchitectural traces of recent mobile Copyright 2013 ACM 978-1-4503-2079-5/13/06 ...$15.00. 559 AnserverBot Android malware and Linux rootkits on ARM and Intel plat- DroidKungFu forms respectively. We then apply standard machine learn- ing classification algorithms such as KNN or Decision Trees Coarse-Grained: Dynamic traces to detect variants of known malware. Our results indicate Suspicious Programs Signature that relatively simple classification algorithms can detect Honey Pot Current Generation malware at nearly 90% accuracy with 3% false positives for Fine-Grained: (μ)-arch execution some mobile malware. New Zitmo profiles Plankton Signatures We also describe hardware support necessary to enable Proposed online malware detection in hardware. We propose methods ANALYST Behavioral Analysis and techniques for (1) collection of fine-grained runtime data Internet without slowing down applications, (2) secure execution of USER AV algorithms to detect at runtime the execution of malware and (3) secure updating of the AV algorithms to prevent Internet (Filled With subversion of the protection scheme. Malware) Signatures, Coarse Fine Classifiers Another important contribution of the paper is to describe Grained: Grained: Call & File (μ)-arch Profile the experimental framework for research in the new area of Analysis Analysis hardware malware detection. Towards this we provide a dataset used in this research. This dataset can be down- Alerts (Current) Alerts (Proposed) loaded from: http://castl.cs.columbia.edu/colmalset. The rest of the paper is organized as follows. In Section 2 Figure 1: AV signature creation and deployment. we provide background on malware, then describe the key intuition behind our approach (Section 3), and explain our experimental method (Section 4), followed by evaluations of tion, users anonymously send cryptographic signatures of Android ARM malware, x86 Linux Rootkits and side chan- executables to the AV vendor. The AV vendor then deter- nels (Sections 5, 6, 7). We describe hardware support in mines how often an executable occurs in a large population Section 8 and our conclusions in Section 9. of its users to predict if an executable is malware: often, un- common executable signatures occurring in small numbers 2. BACKGROUND ON MALWARE are tagged as malware. This system is reported to be effec- tive against polymorphic and metamorphic viruses but does In this section, we provide an abbreviated and fairly in- not work against non-executable threats such as malicious formal introduction on malware. pdfs and doc files [8]. Further it requires users to reveal 2.1 What is Malware and Who Creates It? programs installed on their machine to the AV vendor and trust the AV vendor not to share this secret. Malware is software created by an attacker to compro- mise security of a system or privacy of a victim. A list of 2.3 How Good is Anti-Virus Software? different types of malware is listed in Table 1. Initially cre- Just like any other large piece of software, AV systems ated to attain notoriety or for fun, malware development tend to have bugs that are easily exploited, and thus AV today is mostly motivated by financial gains [1, 2]. There are reports of active underground markets for personal in- formation, credit cards, logins into sensitive machines in the United States, etc. [3]. Also, government-funded agencies Table 1: Categories of Malware (allegedly) have created sophisticated malware that target Malware Brief Description specific computers for espionage or sabotage [4, 5, 6]. Mal- Worm Malware that propagates itself from one infected host to other hosts via exploits in the OS interfaces ware can be delivered in a number of ways. To list a few, an typically the system-call interface. unsuspecting user can be tricked into: clicking on links in Virus Malware that attaches itself to running programs \phishing" emails that download and install malware, open- and spreads itself through users' interactions with various systems. ing email attachments with malicious pdfs or document files, Polymorphic A virus that,