CLASSIFICATION OF PERSISTENCE

MECHANISMS USING LOW-ARTIFACT DISK

INSTRUMENTATION

A Dissertation Presented by

Jennifer Mankin

to The Department of Electrical and Computer Engineering

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Electrical and Computer Engineering

in the field of Computer Engineering

Northeastern University Boston, Massachusetts

September 2013 Abstract

The proliferation of malware in recent years has motivated the need for tools to an- alyze, classify, and understand intrusions. Current research in analyzing malware focuses either on labeling malware by its maliciousness (e.g., malicious or benign) or classifying it by the variant it belongs to. We argue that, in addition to provid- ing coarse family labels, it is useful to label malware by the capabilities they em- ploy. Capabilities can include keystroke logging, downloading a file from the internet, modifying the Master Boot Record, and trojanizing a system binary. Unfortunately, labeling malware by capability requires a descriptive, high-integrity trace of malware behavior, which is challenging given the complex stealth techniques that malware employ in order to evade analysis and detection. In this thesis, we present Dione, a flexible rule-based disk I/O monitoring and analysis infrastructure. Dione interposes between a system-under-analysis and its hard disk, intercepting disk accesses and re- constructing high-level file system and registry changes as they occur. We evaluate the accuracy and performance of Dione, and show that it can achieve 100% accuracy in reconstructing file system operations, with a performance penalty less than 2% in many cases.

ii Given the trustworthy behavioral traces obtained by Dione, we convert file system- level events to high-level capabilities. For this, we use model checking, a formal veri- fication approach that compares a model extracted from a behavioral trace to a given specification. Since we use Dione traces of file system and registry events, we aim to label persistence capabilities—that is, we label a sample by the mechanism it uses not only to persist on disk, but to restart after a system boot. We model the Windows service, a commonly-employed capability used by malware to persist, load a binary after reboot, and even load dangerous code into the kernel. We model the installation of a Windows service, the system boot, and the file access of the service binary. We test our models on over 1000 real-world malware samples, and show that it success- fully identifies service-installing malware samples over 99% of the time, and malware that loads that service over 98% of the time. Moreover, we demonstrate that we are able to use traces of disk reads to differentiate between two types of file accesses. We show that we can not only detect when a persistence mechanism is installed, but also that the persistence mechanism is successful because we detect the automatic load of the program binary after a system reboot. We correctly identify file access types from disk access patterns with less than 4% of samples mislabeled, and demonstrate that even an expert analyst would have difficulty correctly identifying the mislabeled accesses.

iii Acknowledgements

First and foremost, I would like to thank my husband Dana. Not only would it have been nearly impossible to complete this work without his love and support, but it most definitely would not have been this much fun! I would also like to thank my family for everything they’ve done for me and for supporting me throughout the years. I specifically owe my success to my parents for instilling in me a love of learning and logic, and for emphasizing to me the most important thing is to try.

The insightful and inspiring help from both my academic and industry advisors was critical throughout this entire process, culminating with this dissertation. I would like to acknowledge the tremendous support of my advisor at Northeastern, Dr. David Kaeli, and thank him for his many years of dedication to helping his students achieve great things. I also want to thank my technical supervisors at MIT Lincoln Labo- ratory, Charles Wright and Graham Baker, for developing this exciting research and guiding me throughout the process. Finally, I would like to thank my colleagues at Northeastern and MIT Lincoln Labs for their invaluable feedback and discussions.

iv [This page intentionally left blank.]

v Contents

Abstract ii

Acknowledgements iv

v

1 Introduction 1 1.1 Motivation ...... 3 1.2 Contributions ...... 10 1.3 Organization of Dissertation ...... 12

2 Background 14 2.1 Malicious Software ...... 15 2.1.1 Malware Types ...... 15 2.1.2 Anti-Forensics Techniques ...... 16 2.1.3 Evasion Techniques ...... 18 2.2 Malware Analysis ...... 26 2.2.1 Static Binary Analysis ...... 27 2.2.2 Dynamic Analysis ...... 28 2.3 Windows Concepts ...... 30

vi 2.3.1 The Windows Registry ...... 30 2.3.2 NTFS File System ...... 33 2.3.3 Performance Optimizations for Disk Accesses ...... 36 2.4 Formal Verification and Model Checking ...... 37 2.4.1 Predicate Logic ...... 39 2.4.2 Temporal Logic ...... 41 2.4.3 Linear Temporal Predicate Logic ...... 43 2.5 Summary ...... 44

3 Related Work 45 3.1 Malware Analysis and Instrumentation ...... 45 3.2 Characterizing Malware Behavior ...... 52 3.2.1 Characterizing Malware with Machine Learning ...... 53 3.2.2 Characterizing Malware Using Modeling ...... 55

4 Dione: A Disk Instrumentation Framework 60 4.1 Threat Model and Assumptions ...... 60 4.2 Dione Operation ...... 61 4.2.1 Dione Policy Commands ...... 64 4.2.2 Dione State Commands ...... 65 4.3 Live Updating ...... 66 4.3.1 Live Updating Challenges ...... 66 4.3.2 Live Updating Operation ...... 68 4.4 Disk Sensor Integration ...... 70 4.5 Experimental Results ...... 72 4.5.1 Experimental Setup ...... 72

vii 4.5.2 Evaluation of Live Updating Accuracy ...... 72 4.5.3 Evaluation of Performance ...... 74 4.6 Registry Monitoring ...... 81

5 Labeling Malware Persistence Mechanisms with Dione 84 5.1 Modeling Persistence Mechanisms with LTPL ...... 84 5.1.1 System Boot ...... 87 5.1.2 Service Install ...... 87 5.1.3 File Access ...... 88 5.1.4 Persistent Service Load ...... 89 5.2 Dione Capability Labeler Implementation ...... 90 5.3 Experimental Setup ...... 91 5.3.1 Testbeds ...... 91 5.3.2 Malware Corpus ...... 93 5.3.3 Assignment of “Truth” Labels ...... 94 5.3.4 Model Checker Results ...... 98 5.4 Labeling File Access Type ...... 103 5.4.1 Motivation ...... 104 5.4.2 Program Binary Load Classifier ...... 107 5.4.3 SVM Classifier Implementation ...... 108 5.4.4 Results ...... 110

6 Directions for Future Work 117

7 Thesis Summary and Contributions 119

8 Appendix 122

viii 8.1 Tables ...... 122

Bibliography 137

ix Chapter 1

Introduction

The past decade has been boldly marked by the ongoing arms race between mali- cious software creators and security researchers. Not only are security companies and researchers overwhelmed by the several million new unique samples discovered each month, but the sophistication of malicious software continues to increase as well [46]. Malicious software, or malware, can take many forms. While the amount of harm caused by a malware sample can vary, all malware share the property of having not been installed with the full consent and knowledge of the user. Spyware or adware can be installed on a user’s system, causing annoying pop-ups or violating privacy expectations by tracking user habits [54]. Alternatively, malware may force the system to become part of a network of hijacked machines used to send spam, hijack other systems, or perpetuate Distributed Denial of Service (DDOS) attacks on banks or targets of political protest [10]. Increasingly, malware is used for financial gain. For example, banking threats seek to steal credentials from users or banking systems in order to perpetuate financial crimes, while fake-alert and ransomware threats trick the user into paying either for impostor security software or for the safe return of

1 their “ransomed” data [45]. Rootkits can be particularly dangerous, as they exist to provide additional stealth measures to prevent the user or security products from detecting the presence of the rootkit and any other malware it is packaged with [10]. Rootkits can execute with administrator privilege by attacking and patching the code of the operating system. Though the number of new rootkits discovered in the wild has been decreasing since 2011, tens of thousands of new samples are still discovered every month [46]. Furthermore, there is a common adage in security that the winner between malware and a security product is that which was loaded first. As a result, rootkits are increasingly turning to infecting the Master Boot Record (MBR); since it performs key startup operations, infection of the MBR is a devastating attack on the system [45]. Once a rootkit has breached kernel-level code, it is difficult to trust any security product or malware analyzer running on the infected system. In the past couple decades, research into labeling malware has focused on identi- fying the malware by family or variant. While having labels available for new samples is useful to provide a coarse-grained identification, we argue that labeling the behav- ior of the malware could be more useful than identifying the family it belongs to. Capability labeling is a promising solution to understanding how malware behaves. Instead of identifying malware by its family or strain, identifying malware by the capabilities it possesses allows security products to identify the high-level behaviors that new malware is employing. There are several benefits to labeling or identifying capabilities present in malware or software. A system equipped with on-the-fly capability detection could provide notifications to users when software or malware is installed with certain malicious ca- pabilities. The information could also be used by security researchers and products to

2 outline the necessary steps to clean a system of the infection and prevent intrusions. Furthermore, it allows security researchers to build up large corpuses of labeled sam- ples for future research and experimentation, identifying what each sample actually does. Unfortunately, identifying high-level malware capabilities is a challenging problem. First, it is difficult to obtain a descriptive, high-integrity trace of system events, since malware writers employ a variety of techniques in order to prevent their malware from being analyzed. Second, it is difficult to derive useful high-level behaviors from the trace of events that has been obtained, as high-level behaviors can manifest themselves in a variety of ways.

1.1 Motivation

Currently, state of the art research into malware labeling focuses predominantly on one of two areas: labeling new samples as either malicious or benign, or labeling new samples by family or variant. Early anti-virus technologies relied on signature match- ing to identify and label software samples as malicious; these signatures contained unique byte patterns, such as sequences of instructions, with each signature typically only covering a single malware variant [17]. In order to counter attempts at obfus- cation, researchers and AV vendors introduced the ability to use regular expressions over the byte sequences, for example to skip over arbitrarily-inserted nop instructions, though these too are easily evaded with polymorphic and metamorphic obfuscation techniques [12].

3 Instead of using syntactic signatures—that is, raw byte patterns or regular expressions— researchers have developed semantic models of malicious behavior based on instruc- tion sequences [18]. Malicious behaviors are modeled, and the models abstract in- struction sequences to use variables names and symbolic constants. Then, templates of malicious behaviors are compared against potentially-malicious binaries to detect instruction sequences that are semantically equivalent, rather than identical on a byte level. Abstracting semantic awareness to an even higher level, recent work has focused on behavioral signatures. These behavioral signatures often looked at sequences of system calls, or even higher-level behaviors represented by semantically-equivalent system calls [4, 6, 52]. After building behavioral representations of malware samples, both formal verification and machine learning techniques can be applied to label samples by their maliciousness, or in an effort to divide them into classes based on their family or variant. Unfortunately, deriving family-based labels to identify malware samples presents some significant challenges. Bailey et al. performed a detailed study of anti-virus (AV) products and found that not only do different AV vendors use different labels for different malware samples, but these AV vendors actually disagree on the number and granularity of unique labels in general [4]. The goal of applying familial labels to malware samples is to have a concise clustering of samples, with similar items grouped into clusters that reflect appropriate differences while avoiding having so many labels that the labels become meaningless. With too coarse a clustering, malware samples may be labeled as being from the same family, when in reality they do not share all functionality or capabilities. With too fine a granularity, similar variants within the same family could be labeled as individual families, resulting in a clustering that

4 becomes less distinct as clusters blur together. The problem of labeling by family is further exacerbated by the lack of “ground truth” labels. When researchers attempt to assess the quality of their clustering algorithms, they often choose samples that many AV vendors can easily label. This results in a malware corpus of “easy-to-label” samples, and thus the effectiveness of labeling algorithms cannot be extrapolated onto larger datasets for which ground truth is not known [49]. The blending and merging of malware samples arises from the relative ease with which bad-actors can generate new malware samples. Malware writers can use ob- fuscation techniques to produce samples with unique hashes and signatures. For example, polymorphic techniques encrypt the body of the code, decrypting on the fly during execution [20]. Meanwhile, metamorphic techniques change the structure of the code—for example, using instruction reordering, insertion of junk instructions, and registry renaming—while ensuring that the semantics of the code remains the same [20]. Additionally, malware can be written in high-level programming lan- guages, and source code and malware kits can be found on the Internet for little or no cost, allowing even those with minimal programming skills to generate malware. This means that malware writers can create new variants by adding new functionality to old variants or by mixing existing components. The result of these techniques is that the differentiation between malware families and variants begins to blur. As the number of unique malware samples found in the wild continues to increase, we posit that it is more useful to identify a malware sample by the behavioral char- acteristics it possesses than by a variant label. A “capability” can be broadly defined as being any intended feature of the software. Keystroke logging, downloading a file from the internet, trojanizing a system binary, and overwriting the Master Boot Record are all examples of malware capabilities. Instead of applying a single family

5 label to a malware sample, each sample would be labeled with all of the capabilities it employs. Labeling a sample by the capabilities it possesses, rather than by its family, pro- vides several benefits. The first is that it provides an opportunity for alerting a user or administrator when benign or malicious software installs or employs a potentially dangerous or intrusive capability. Capability labeling can also identify how malware infects a system, how it propagates to other systems, how it survives and restarts after a reboot, or how it hides from the user or security products. Understanding each of these characteristics is critical in developing products or advisories to clean systems after infection, and to prevent malware from spreading to other systems. A secondary benefit to capability labeling is in assisting security and malware researchers, as it would allow researchers to build up large corpuses of malware for which the high-level behaviors of each sample is known. If a researcher needs to test a malware removal tool, for example, on real-world samples, they could simply query the corpus for all samples that are labeled as having the specific capabilities that allow them to persist on a system. The first challenge in the labeling of malware based on behavior, whether by capability, by family, or by maliciousness, is in obtaining a descriptive, high-integrity behavioral trace, as even the best malware labeling algorithm cannot be accurate if is processing an incomplete or inaccurate behavioral trace [20]. Behavioral traces can be acquired using both static and dynamic techniques. Static analysis, in which the binary is examined without actually executing it, is scalable but can be prevented by malware looking to escape analysis [6, 43, 79]. For example, disassembling a malware sample to obtain an instruction trace is useful for extracting the control and data flow of a malware sample. However, in practice, malware often utilizes techniques to

6 prevent disassembly from occurring in the first place, maintaining the code’s original functionality while transforming the binary. Dynamic analysis attempts to understand the malware behavior by observing the malware as it runs, collecting system call information, instruction traces, or other events. While dynamic analysis avoids the obfuscation problems of static analysis, it too has limitations. Manual analysis, performed by attaching a piece of malware to a debugger, is too time-consuming to be scalable; furthermore, malware can detect when it is running with a debugger and evade analysis [34, 57]. Similarly, analyzers running in-host can be detected by malware and uninstalled or misled. Running malware in a virtualization or emulation layer to collect event traces provides protection from the malware and, as a result, provides the broadest coverage of malware analysis. However, even these techniques can be detected by malware, and the malware can then voluntarily exit to avoid analysis. Recent work suggested that nearly 25% of malware utilized techniques to detect dynamic analyzers, and evaded analysis by exiting [50]. Thus, the biggest problem of dynamic analysis is that it only reveals what was actually executed, not all potential behaviors that could manifest on a given system. In order to address this challenge—the acquisition of a descriptive, high-integrity trace—it is important for malware analyzers to work at either a higher privilege level or a lower semantic level than the malware. In this dissertation, we present a disk instrumentation and analysis infrastructure that does both. Dione, the Disk I/O aNalysis Engine, is a flexible, portable, policy-based disk monitoring infrastructure which facilities the collection and analysis of disk I/O. It uses information from a sensor interposed between a System Under Analysis (SUA) and its hard disk. Since it monitors I/O outside the reach of the operating system, it is resilient to stealth

7 measures employed by rootkits—including those with administrator-level privilege. Instead of relying on constructs that can be manipulated by malware, Dione recon- structs high-level file system and Windows registry operations using only low-level intercepted metadata and disk sector addresses. The second challenge to capability labeling is that, even after a high-integrity, descriptive event trace has been obtained, it is necessary to convert the lower-level events of the trace into higher-level behaviors or capabilities. The type of trace that is available informs the types of capabilities that can be labeled. Since Dione provides comprehensive, high-integrity events taking place at the disk level, we show that we can infer high-level properties relating to the persistence capabilities of malware. That is, we use the traces generated from Dione to demonstrate not only how malware persists on disk, but how malware automatically restarts after a system is rebooted. Persistence capabilities include trojanizing a system binary, overwriting the MBR to force malicious code to load, utilizing the Windows service mechanism to automati- cally load code or drivers at boot time, or pointing special auto-start registry keys to the malicious code. Given the descriptive, high-integrity traces produced by Dione, we set out to label malware samples possessing certain capabilities that are used to persist and restart upon system reboot. For our Dione Capability Labeler (DCL), we use model check- ing, an algorithmic formal verification method used to verify properties of software. Model checking is a property verification approach in which a property is specified using a description language, resulting in a logic formula [28]. Likewise, a system (in this case, a trace of malware execution) is also modeled using description language. Then, the model of the system is compared to the logic formula to determine whether the model satisfies the specification.

8 In the context of detecting a property—in this case, a capability—in malware, the capability is specified in the description language, or logic, of the model checker. This specification describes the behaviors (and the temporal ordering between behaviors) that would be present in the behavioral trace if the malware possessed that capability. Then, a model is extracted from each behavioral trace, and the model is compared to the specification to determine whether the model fits the specification. If so, the malware is labeled as having the specified capability. In this work, we use the specification language Linear Temporal Predicate Logic [7, 42, 71], or LTPL, to model our capability specifications. We model three phases of program behavior based on events gathered from Dione traces: (1) Installation of persistence capability, (2) System boot, and (3) File access (program load) after reboot. We chose these three stages because automatic loading of a program after reboot demonstrates persistence and successful automatic loading. While the models of system reboot and program load are shared across all persistence capabilities, each type of persistence capability will require a separate model for the installation phase. We also demonstrate that we can use the patterns detected from the file-read events recorded in Dione traces to differentiate between two types of high-level file accesses: file copy and program binary load. As a result, we can label malware as not only having the capability to install a service, but also that it has successfully utilized this mechanism to automatically load after a system boot. We chose the Windows service as the persistence mechanism to model, since it is a common mechanism used by malware to persist and restart after reboot [66]. The service mechanism can cause a lot of damage because it allows malware to load malicious code into kernel space, it can be set to run automatically when the system boots up, and it may not show up in Task Manager as a process [10, 66]. Using

9 domain knowledge about malware behavior, the NTFS file system, and the relation- ship between the Windows XP operating system and corresponding disk behaviors, we generate models for a service installation, a system reboot, and a program load; we then combine the stages into a specification that detects the automatic loading of the service after reboot. Because the pattern of disk accesses for a file load can vary dramatically, we generalize the model for program load such that it specifies any type of file access to avoid false negatives. Then, we bolster the model of a file access with a supervised learning approach that differentiates between a program binary load and another type of file read operation, a file copy. We generate features based on the file content read pattern of the Dione trace and use a Support Vector Machine (SVM) algorithm [22] to classify a series of disk accesses as either a file load or a file copy. We demonstrate that we not only detect the persistence mechanism being in- stalled, but we also verify that the persistence mechanism is successful because we can detect the program binary load automatically after a system reboot.

1.2 Contributions

With this dissertation, we provide the following contributions to malware analysis and disk forensics:

• We present Dione: The Disk I/O aNalysis Engine. Dione is the first portable disk and file system analyzer to analyze disk traffic outside the system under analysis to provide comprehensive, high-integrity traces for the NTFS file sys- tem. We detail the challenges, the design, and the implementation of Dione, explaining how we bridge both the semantic and temporal gaps in reconstructing high-level operations from raw low-level metadata.

10 • We analyze the accuracy and performance of Dione demonstrating that it pro- duces traces of file system operations with 100% accuracy, with a performance overhead generally less than 10%—and often below 2%—in reconstructing file system operations.

• We present DCL: the Dione Capability Labeler. We detail our models for three properties: the Windows service installation, system reboot, and file access. We model these properties using the logic language LTPL [71], and implement a problem-specific model checker that checks the events of a Dione disk trace against the specifications for each property. We demonstrate that DCL can process a large number of samples in a short amount of time, labeling each sample based on whether it exhibits a service persistence capability.

• We present a machine learning classifier that identifies a file binary load given a disk access pattern, using this classifier to bolster our model for service per- sistence. Our classifier mislabels fewer than 4% of traces, yet we show that correctly labeling the mislabeled traces would be difficult for even an expert analyst. By demonstrating that we can detect a file binary being automati- cally loaded after a system boot, we can decisively label a sample as having a successful persistence mechanism.

• We create an automated malware analysis testbed, which can automatically in- strument malware samples using Dione and the Volatility memory introspec- tion framework [76]. We run DCL with integrated file access pattern classifier on over 1,000 real-world malware samples, detecting Windows service installation over 99% of the time and service persistence over 97% of the time. Furthermore,

11 we show that, using Dione’s ability to generate on-the-fly traces of malware be- havior, we can label more service installs and loads than a memory introspection framework operating on a single snapshot in time.

1.3 Organization of Dissertation

The rest of the dissertation is organized as follows. In Chapter 2, we provide relevant background material. This includes a discussion of malware types, as well as in-depth explanation of the techniques that malware can utilize in order to hide from analysis. We discuss the techniques of static and binary malware analysis, including the advan- tages and disadvantages of each. We explain relevant Windows concepts; specifically, we describe the structure of the NTFS file system as it pertains to disk instrumenta- tion, as well as the optimizations used by the Windows operating system that make instrumentation more challenging. Additionally, we discuss model checking, and the logic language LTPL. In Chapter 3, we discuss the related research in this area. This section includes discussions of previous research on disk instrumentation, malware analysis, and the use of machine learning and model checking to perform intrusion detection, malware identification, and capability labeling. In Chapter 4, we describe the Dione infrastructure. We detail the implemen- tation of Dione, including design challenges and solutions of Dione. We evaluate the accuracy of the Dione live updating engine, and the performance of full disk instrumentation using the Xen hypervisor. Finally, we conclude with an explanation of the limitations of Dione. In Chapter 5, we describe our behavioral models for service install, system boot,

12 and service load; additionally, we model these properties in the logic language LTPL. We describe the classification algorithm used to bolster the file access model, and we detail the results of our integrated model checker and file access classifier on over a thousand real-world samples. Finally, we conclude the dissertation with objectives for future research in Chap- ter 6 and a summary of contributions in Chapter 7.

13 Chapter 2

Background

In this chapter, we will outline relevant background information. We will begin with a discussion of malware, including malware types, the common anti-forensics techniques used by malware to avoid detection, and the evasion techniques they use to hide from malware analyzers and security products. We will then discuss the ways in which malware can be analyzed, including both static and dynamic analysis techniques. Understanding how these analyzers can be misled by stealthy analyzers will motivate the need for an analyzer that provides descriptive yet high-integrity traces. Before we introduce Dione, our file system and disk I/O analysis infrastructure, in Chapter 4, we will provide a thorough introduction to relevant Windows concepts, including the NTFS file system, and the optimizations used by the Windows operating system that make file system instrumentation more challenging. Since the Dione Capability Labeler relies on model checking using formal specifications, we will discuss model checking, and common description logic languages, including the Linear Temporal Predicate Logic that we use to model persistence capabilities in Chapter 5.

14 2.1 Malicious Software

The term malware, or malicious software, can be used to describe a variety of un- wanted or undesirable software or scripts. Generally, malware includes anything that causes harm to a user, a computer system, or a network, though the amount of harm can vary [54]. In this section, we will define and describe malware types, and de- tail the anti-forensics and evasion techniques they may employ in order to hide from malware analysis tools.

2.1.1 Malware Types

Viruses and worms can both be categorized as infectious agents; they are similar in that they not only serve some nefarious purpose, but they are also capable of replicating themselves [10]. A virus, however, requires an explicit user interaction— double clicking on an executable, or opening a corrupted email attachment—whereas a worm can propagate on its own, automatically transmitting itself over the network. A trojan or trojan horse is malicious software that a user downloads or installs believing that the software serves some benevolent, useful function [66]. The trojan may indeed be bundled with useful software, or the software may be entirely malicious. The verb trojanizing is also increasing in usage, and refers to malware hijacking and patching an executable that already exists on the system so that the malicious code will execute when the previously-benign program is loaded or run. Spyware and adware may be used separately or together, and vary between merely annoying and malicious [10]. Adware exists to disciple advertisements on the user’s computer, while spyware tracks the users habits, usernames, passwords, or keystrokes. Once a machine has been compromised by a worm, virus, or trojan, additional

15 types of malware may be installed. A is a method of bypassing standard authentication to allow the attacker remote access to the compromised machine in the future [54]. A is a collection of machines that have been compromised and are commanded and controlled by a bot herder [10]. Machines in a botnet may wait for orders from the bot herder; these order could include sending spam, perpe- trating Distributed Denial of Service (DDOS) attacks, and harvesting usernames and passwords to commit financial crimes [10]. A rootkit is a particularly interesting component of malware. It exists to conceal itself and other components, and to command and control a system remotely [10]. A rootkit’s most important quality is that it is stealthy: a good rootkit will go undetected by the user to ensure that it stays present on the system as long as possible. A rootkit may attain administrator-level privilege, either by exploiting a program that is running with supervisor privilege, or by tricking an administrator into installing malicious software. If the rootkit has unmitigated access to the kernel code and data, then it can be difficult or impossible to detect. Intuitively, it follows that malware will often be an amalgam of multiple malicious components. For example, a virus may be packaged with a rootkit, so that the rootkit can hide the presence of the virus. A rootkit may install a backdoor, so that the attacker can command and control the compromised system. A botnet may be composed of systems that have all been compromised by a rootkit.

2.1.2 Anti-Forensics Techniques

In order to thwart post-mortem forensic analysis of a compromised system, mal- ware may utilize anti-forensics techniques [66]. On the simplest level, malware could download and then subsequently delete any file-based payload, possible overwriting

16 the sectors that held the contents, to avoid any signature-based antivirus disk scan. At a lower level, malware can manipulate the properties of a file. The hidden prop- erty specifies whether a file or directory is hidden from the user, both in the graphical explorer and through the command line. Another set of properties are the MAC timestamps. Though “MAC” actually stands for Modified, Accessed, and Creation times, NTFS utilizes a fourth timestamp as well, the Change time, which indicates that metadata was changed. By setting any of the numbers to an unreasonably low number, the Windows explorer will not display the time [10]. Alternatively, the malware could set the MAC times of a newly created malicious file to the same times- tamps as system files, so that the file appears to have been there since the operating system was first installed. Instead of hiding through the use of the hidden property, malware can also hide through more sophisticated mechanisms. The first technique is called In-Band Hiding, as it involves hiding in spaces that are specified by the file system. An example of this is hiding in Alternate Data Streams (ADSs) [10]. As will be detailed further in Section 2.3.2, the contents of a file in NTFS are stored in an attribute called $DATA. However, a file can have multiple $DATA properties. These ADSs are a way to persistently store information on disk, but they will not appear in Windows explorer or in command line listings unless explicitly requested. Furthermore, the data stored in an ADS is not included in the total size property of a file; this is because sizes are associated with attributes, so the stated file size is actually just the size of the default $DATA attribute. Conversely, Out-of-Band Hiding utilizes space not specified by the file system. Malware may hide in the Master Boot Record (MBR), discussed in further detail in Section 2.1.3. Alternatively, malware can hide in slack space. Because file content is always allocated in clusters (commonly 4KB, or 8 sectors), there may be up

17 to 7 unused sectors for a given file. For certain versions of Windows, writing to this slack space only requires repositioning the logical End-of-File (EOF) pointer, writing to the space, and then non-destructively truncating the file by resetting the logical EOF [10].

2.1.3 Evasion Techniques

Dione can be useful in instrumenting and analyzing the intrusion and presence of each of these various types of malware, assuming that there is some symptom of compromise which percolates to the disk. However, Dione’s particular strength is in instrumenting and analyzing “hard” malware; that is, malware which uses rootkits or rootkit-like technology to hide itself and any other malicious software with which it is packaged. Once a rookit has attained kernel-level privilege, in-host analysis—and even some virtualization-based analysis infrastructures—cannot be trusted, as the rootkit could thwart or misdirect any attempts to analyze it. These techniques can broadly divided into three categories: altering control flow, system call patching, and modifying kernel objects.

The Windows System Call Mechanism

Since the system call provides the interface into kernel space, many of the methods used by rootkits to hide themselves or associated malware occur within the steps used when a system call is invoked [10]. In this section, we describe the system call interface in Windows. The steps taken for a system call in Windows running on a modern x86 processor are summarized in Figure 2.1 [10, 64]. First, a user application calls a native API function (the native API implements the system call interface in Windows). The address of the function is obtained through

18 User Application Portable Executable (PE) File Format

.text (Code Section) ... CreateFile(); kernel32.dll ntdll.dll ... .data NtCreateFile() (Data Section) 0x12345678 ... CreateFile() .idata (Import Data Section) KiFastSystemCall() ... CreateFile 0x12345678 1

User Mode SYSENTER Kernel Mode 0x------3D 2 EAX

0x00000008 0x81864880 ntoskrnl.exe 0x81864880 IA32_SYSENTER_CS IA32_SYSENTER_EIP File System Driver KiFastCallEntry() ntfs.sys I/O SSDT Manager 0x001 File System Driver KiSystemService() ... 4 malfilt.sys 0x03C 3 0x03D NtCreateFile Disk Driver NtCreateFile() 0x03E disk.sys ... 0x187

Disk

Figure 2.1: System call mechanism for Windows running on a modern x86 processor. Four mechanisms by which a rootkit can alter control flow are: (1) Import Address Table, (2) SYSENTER Machine Specific Registers, (3) System Service Dispatch Ta- ble, and (4) Filter Driver.

19 the Import Address Table in the executable. This address points to a function in the kernel32.dll dynamic linked library, which calls another function exported by ntdll.dll. The dynamic linked library ntdll.dll routes system calls between the user mode and kernel mode interfaces [64]. The ntdll.dll function KiFastSystemCall populates regular registers and three Machine Specific Registers (MSRs). Of particular note, this code will store a system call dispatch id in the lower 12 bits of the EAX register. For example, the CreateFile system call will store a number containing 0x3D as the 12 least-significant bits into EAX. Of the three MSRs, two of them (IA32 SYSENTER CS and IA32 SYSENTER EIP) contain the Ring 0 code segment and offset into the code segment, respectively, at which the processor will start executing the code [29]. For Windows, this address will point to the KiFastCallEntry function. Finally, KiFastSystemCall will call the SYSENTER instruction, which is used by modern processors to switch from user mode (Ring 3) to kernel mode (Ring 0). Once in Ring 0, execution begins in the ntoskrnl.exe executable [10]. The execution proceeds to the function KiSystemService, which obtains the system call dispatch ID from EAX and uses it to index into the System Service Dispatch Table (SSDT). An SSDT is an array of addresses, in which each address is a pointer to the entry point of a function in kernel space. There are two SSDT’s; one is for Windows GUI functions, and the other is for the Windows Native API (e.g., system calls). Once the kernel-mode function is obtained from the SSDT, control flow proceeds to the appropriate kernel mode component. For disk I/O commands, control flow proceeds to the I/O Manager, which then uses the appropriate driver stack to execute the I/O command.

20 Altering Control Flow Through Hooking

A rootkit may have many motivations for modifying control flow in its attempt to hide itself and its actions from the user and to collect information from the machine it has compromised. It may block system calls to disrupt the work done by a program (e.g., security software), replace kernel functions altogether, track all system calls made and their input parameters (e.g., to instrument a system or application), or filter out all output parameters (e.g., to hide a file or process). One such way to alter control flow is to modify a call table; a call table is simply an array of addresses, where each address points to a function or routine. By swapping out the address with a new address, the system will call the attacker’s function instead of the correct kernel function. The process of swapping out function pointers is referred to as hooking, and there are several call tables that can be hooked [10]. The Import Address Table (IAT) is an application-level userspace call table. Each entry in the IAT contains the addresses of all library routines that a program imports from a Dynamic Linked Library (DLL). The IAT is populated when a DLL is linked to at load time. While a rootkit could hook a function to any DLL, the user-space functions that implement part of the system call interface are particularly dangerous. For example, a rootkit could hook the IAT entries pointing to the user-space library kernel32.dll in order to hide newly created malicious files; this scenario is labeled (1) in the system call diagram of Figure 2.1. While any exported library routine can be hooked in this manner, the disadvantage of this approach is that each hook only applies to the given application, and since it hooks a user-space call table, any program (such as security software) running in kernel space could easily detect this. Unfortunately, several other call tables reside in kernel space; hooking any of these tables results in a system-wide, rather than application-level, hook. The first option

21 is to hook hardware call tables at the system call interface. Old processors (e.g., pre- Pentium II [10]) jumped to kernel space to handle system calls via an interrupt (specif- ically, INT 0x2E). Contemporary processors use the dedicated SYSENTER instruction. To hook the former, a rootkit would hook the interrupt handler corresponding to interrupt 0x2E in the Interrupt Descriptor Table (IDT). To hook the SYSENTER in- struction, an attacker would hook an MSR. Given the flat memory model, the Code Segment MSR is unnecessary: it is enough to hook only the IA32 SYSENTER EIP regis- ter. This is done by swapping the original pointer (which points to the kernel function KiFastCallEntry) with a pointer to a new function. This hooking location is labeled (2) in Figure 2.1. The unique disadvantage to the hardware-based approaches is that these call gates are passthrough. Control passes through the hook to the system call interface, but does not return through the hook. It is possible to instrument or block any system calls, but not to filter output results, thus eliminating the opportunity for a rootkit to hide processes or files. Instead of hooking the hardware system call interface, a rootkit could instead hook a Windows-specific table: the System Service Dispatch Table (SSDT). With this approach, the attacker can both instrument and monitor input, and filter output, since control can return to the hook after the system call execution completes. The 391 functions of the Windows API comprise the kernel-mode system call interface, and thus provide a dangerous path for control flow modification. Hooking the SSDT is performed by obtaining the index of the function in the SSDT, and swapping the function pointer with one that points to its nefarious replacement. This hook is labeled (3) in Figure 2.1. Hooking, whether performed in the IAT, IDT, or SSDT, always suffers from the same disadvantage: it is relatively easy to detect. In order to detect any of these

22 hooks, security software would need to iterate through each of the pointers in the vari- ous tables to ensure that it points to a location in memory that falls within the library or executable that implements them. In other words, the pointers in the IAT should point to the region of memory containing the corresponding DLL, and the pointers in the IDT, IA32 SYSENTER EIP register, and the SSDT all point to memory correspond- ing to ntoskrnl.exe. However, determining the address ranges of these libraries and modules itself requires using a system call, such as ZwQuerySystemInformation. As a result, if a rootkit can hook this system call before being detected, it can deflect hooking countermeasures.

System Call Patching

Given that hooking can be detected by confirming that critical function pointers point within the bounds of the library executable it is expected to reside in, it makes sense that a rootkit may attempt to modify the executable code itself. This technique is more challenging for the malware creator, but also more difficult to detect [10]. Patching is a technique in which the raw bytes of an executable are overwritten in order to, for example, mask or replace instructions. Patching can be performed in two locations: in memory or on disk. Binary patching modifies the bytes of the executable on disk; while the patch is permanent and persistent, it can be detected by looking at binary file checksums. A run-time patch, on the other hand, modifies the binary while it resides in memory, and thus would not survive a reboot. Whether performed in memory or on disk, patching requires overwriting the ma- chine code of system calls or other useful kernel routines. The simplest example of patching would be to perform in-place modification of bytes. For example, the at- tacker could replace instructions with NOPS to prevent the execution of the original

23 instructions. This approach is severely limited by the number of bytes which are . A far more flexible approach is to overwrite the original instructions with a jump instruction (JMP, CALL, or RET) that redirects the control flow into another region of code, called trampoline or detour code [10]. The trampoline code has more space in which to do things, including executing the original, overwritten instructions. With this approach, a rootkit could instrument system calls and parameters by plac- ing the trampoline at the start of the patched routine; the trampoline would execute this prologue before optionally calling the original instructions to perform the stated system call’s operation. By placing a trampoline after the system call executes, as an epilogue, a rootkit could filter output parameters. The steps of patching broadly consist of: (a) Saving the original code that will be patched, (b) injecting trampoline code, and then (c) performing in-place patching of the original code to force execution to jump to the specified address of the injected trampoline code. The primary means for security software to detect patching would be to look for suspicious jump instructions at the start of a function. Even this heuristic is not foolproof, as a rootkit creator could simply move the jump patch farther from the start of the the function. To be particularly stealthy, a rootkit could patch code in the Master Boot Record (MBR). The MBR is located at the first sector on disk. The code of the MBR is loaded by the BIOS at system startup; the MBR then loads the boot sector of the active disk partition, which in turn loads the operating system. This method uses a combination of run-time and binary patching; it patches MBR boot code on disk in order to have it alter system code in memory. The advantage of this approach is that it is performed before any security software is loaded, and the winner of a battle between security software and malware is often that which embeds itself in the kernel

24 first.

Modifying Kernel Objects

A third rootkit evasion technique addresses some of the limitations of the previous methods, though it has some limitations of its own. This techniques is called Direct Kernel Object Modification (DKOM), and it involves modifying kernel data structures representing processes, drivers, and authentication tokens [10]. A similar method is used whether hiding processes or drivers from the user or security software. Both processes and drivers are maintained in doubly-linked lists. Therefore, to hide a particular process or driver, a rootkit needs only to traverse the appropriate list to find the process or driver to be hidden, and adjust the forward and backward links of it and its immediate neighbors. A rootkit can also elevate the privileges of a process by modifying the privilege substructures in the process object. There are a few disadvantages to DKOM. First, not all objects have a kernel object to represent them; for example, there is no kernel file object, so DKOM could not be used to hide a file. Second, data structures are undocumented, so Microsoft can adjust the fields of a structure between major and even minor releases, which could break the bit-specific object patches.

Filter Drivers

The next rootkit evasion technique moves past the system call interface and into the device driver stack. This kernel mode technique takes advantage of the layered device driver architecture supported by Windows. A Windows device driver does have not a monolithic structure; rather, it features a modular approach by which a series of drivers perform some work and pass along the Interrupt Request Packet (IRP) to the

25 next driver in the chain. This is advantageous in that new drivers can be added to the series and still leverage the work done by other drivers in the chain. A Filter Driver is a driver that intercepts and modifies information as it makes its way through the driver stack [10]. While this can be a good thing—for example, filter drivers could be used to encrpyt and decrpyt data as it passes to persistent storage—it can also be used for malicious purposes. A filter driver could be used for keylogging, to filter network traffic, and to hide files and directories. This scenario is labeled (4) in Figure 2.1, as a malicious filter driver is inserted before the disk driver. As a result, any analyzer or security software examining files by hooking the system call interface will still be deceived, as the filter driver will hide any files before they reach the system call boundary.

2.2 Malware Analysis

Once a sample of malware has been obtained, there are several methods that can be used to learn about the malware’s behavior. Given the sophisticated techniques that can be employed by malware to prevent security software from detecting itself or other malware, the job of analyzing malware behavior is a difficult one. Any malware analysis solution will face several tradeoffs. The closer the analyzer is to the malware, the more semantic information there will be to analyze. There will also be more types of semantic information to analyze. However, if an analyzer operates at the same or lower privilege level than the malware, it can be evaded, thwarted, or misled. This section discusses various options for malware analysis. Malware analysis can be roughly divided into two categories: static binary analysis and dynamic analysis.

26 2.2.1 Static Binary Analysis

In static binary analysis, the binary is analyzed before it is run; the binary itself is disassembled to learn more about how the malware might behave. On a basic level, static analysis can yield the architecture it was compiled for, the executable type, and the operating system on which it would run. Static analysis can also yield string names, such as passwords, paths, and file names, and imported library functions and symbols can be extracted (though this task is easier if the binary was dynamically linked). Another step of static analysis is disassembly of the binary. In this step, the raw binary bytes are converted into machine code. This step is difficult for x86 bina- ries because text and data can exist together, and because instructions are variable length. Additionally, both compiler optimizations and a crafty malware writer may take steps to further obfuscate the code. There are two common techniques for disas- sembling a binary. The first, linear sweep, will iteratively disassemble one instruction at a time [66]. An attacker could complicate this process by inserting junk between instructions that does not alter control flow, but may cause the disassembler to be- come out of step with the instructions. The second technique is to use a flow-oriented approach, disassembling instructions until a branch instruction is encountered, then building a list of locations to disassemble (for example, the locations of both the true and false branches of a conditional branch). Anti-disassembly techniques take advantage of the assumptions that a disassembler makes, resulting in inaccurate dis- assembly [66]. Additionally, disassembly can be difficult or impossible if the binary is compressed or encrypted, and will not capture the program’s behavior if it is un- packed as it runs. In short, static analysis can yield some good first observations about a binary, but it will not always provide a detailed evaluation of the malware’s

27 behavior.

2.2.2 Dynamic Analysis

Unlike static analysis, dynamic analysis watches the malware as it runs in order to detect its control flow and behavior. One dynamic analysis method is to use a debugger. A debugger is flexible; the analyst can set breakpoints to pause execution at any point in order to construct a control flow for the program, as well as examine memory and CPU registers. Unfortunately, malware writers have developed ways to check for the presence of a debugger, either through an API, by looking for tell- tale signs of breakpoints (e.g., an INT 0x03 instruction for a software breakpoint or by using hardware breakpoint registers to stymie hardware breakpoints), or even by performing timing analysis to determine if the execution is taking too long. If any of these detection methods comes up true, then the malware can simply quit or perform other benign activities in order to prevent the analyzer from learning anything useful about its behavior. Dynamic analyzers can also utilize the same techniques that rootkits themselves use to monitor or change control flow. For example, the analyzer could hook into the system call interface using one of the methods described in Section 2.1.3. Then, the analyzer can create a trace of system calls and their parameters in order to understand the behavior of the malware. Host-based tools could also use the Windows API to track registry processes, registry modifications, and file system operations. In exchange for the rich semantics that such approaches provide, the analyzer sacrifices fidelity, as malware operating at the same privilege level could use evasion techniques to undermine the analysis. In order to operate at a higher privilege level than the malware, analyzers can

28 run the malware in a sandbox, such as in a virtualization solution (e.g., VMware [75], Xen [5]) or emulation solution (e.g., Qemu [8]). This solution is better logistically, since the analyzer can run the malware in an uncompromised sandbox, and then quickly and easily revert to a previously obtained clean snapshot to be ready for the next analysis. Furthermore, the analyzer can utilize Virtual Machine Introspection (VMI) techniques to understand what is occurring inside the VM, without hooking directly into the kernel structures. This way, the analyzer is operating at both a higher privilege level (at a so-called Ring -1 level), and also looking at lower-level but higher-integrity data. For example, since a system call ID is stored in the EAX register before the SYSENTER instruction is executed, the VMI could create a system call trace by directing examining the EAX register when SYSENTER is executed. Since the analyzer doesn’t rely on host-level interfaces (such as the Windows API) to obtain information, it is not as easily misdirected by malware. It is in malware’s best interest to remain unanalyzable as long as possible, so that it can continue to survive in the wild, perhaps adding additional systems to its botnet or continuing to acquire financial gains. Understanding that a virtualization layer allows analysis and security software to essentially run at a higher privilege level than the operating system, malware may test whether it is being run in an emulated or virtualized environment. If the test is positive, the malware may gracefully exit, or perform some benign operations, in order to hide its true malicious operation. These tests are referred to as red pills, and they can take many forms. Brute-force, high-level red pills include checking hardware adapters (for example, the VGA adapter in the VMware environment) for a well-known device string [15], or even checking that the disk serial number or user name corresponds to those used by a well-known dynamic analysis emulator such as Anubis [50]. Red pills can also operate on a low-level,

29 such as checking for a well-known bug in common emulators. For example, malware can test whether it is being run in the popular emulator Qemu [8] by executing the following instruction:

or %bh, 0x04(%ebx) [57]

Due to a Qemu bug, the instruction will reference the wrong memory address, so the malware can detect that it is being run in an emulator based on the result of this operation. Even in the presence of perfect emulation, timing or secondary hard- ware effects (such as TLB flushes on VM exit operations) will still serve as red pills for malware [23]. It is generally agreed that it is impossible to guarantee perfect transparency for virtualization or emulation solutions.

2.3 Windows Concepts

Because the majority of malware attacks are directed at the Windows operating sys- tem [53], Dione performs disk instrumentation for the Windows NTFS file system. 1 In this section, we discuss relevant Windows-specific concepts, including the particu- lars of the NTFS file system and the disk optimizations made by the Windows cache manager.

2.3.1 The Windows Registry

The Windows registry is a centralized database for configuration data, storing infor- mation about hardware, device configuration, drivers, user preferences, network and firewall configuration, and program startup information [63]. The hierarchy of the

1Plans to expand Dione to instrument other file system, such as ext3, are left for future work.

30 Windows registry can be thought of in terms similar to a file system. At the top level, there are root keys; below each root key are more keys, or subkeys. Thus, keys can be thought of as directories, and each key will have a path, which can be fully qualified from the root key. Just like a directory in a file system, a key will also have a name. Figure 2.2 breaks down a creation of a new key. The key has path HKLM\system\CurrentControlSet\Services and name Beep. The key has no value; for clarity, we also list the type KEY.

Figure 2.2: Breakdown of a Windows Registry key.

Below each key are values. A value is analogous to a file. It has a path, which is comprised of every key above it in the hierarchy. It has a name, and just as a file, the combination of a path and name uniquely identifies the registry value. Finally, just as a file often stores contents, a value stores contents as well. The common terminology is to (confusingly) refer to the data the value is storing as its value, referring to the value structure itself by its name. Keeping this terminology, we refer to the value contents as the value, though we may also refer to the value-name and value-value for clarity. The value may be one of several types, including an integer (DWORD), a string (SZ), or even any arbitrary binary data (BINARY). In Figure 2.3, we show some new values created under the Beep key created in Figure 2.2. The Beep key that was previously created now becomes part of the path, and two values are created below it: Start, which is an integer type and stores the value 0x02, and DisplayName, which

31 stores the string “BeepService”.

Figure 2.3: Breakdown of two Windows Registry values; each value is associate with a key (in this case, Beep, and has both a name and a value.

There are six root keys in the registry; each has a long name, but is more commonly referred by an acronym:

• HKEY USERS (HKU): Stores configuration data for all users with accounts on the machine

• HKEY CURRENT USER (HKCU): Stores configuration data for the user that is cur- rently logged in (and is actually just a link to the subkey in HKU for the logged-in user)

• HKEY CLASSES ROOT (HKCR): Stores file association and Component Object Model (COM) object registration information

• HKEY LOCAL MACHINE (HKLM): Stores system-related information

• HKEY PERFORMANCE DATA (HKPD): Stores performance information

• HKEY CURRENT CONFIG (HKCC): Stores a current hardware profile (and is actu- ally just a link to a subkey under the HKLM root key)

Some of the data stored in the registry is populated on system startup, and resides only in memory. Other data is stored on disk, and is loaded into memory when

32 the system starts up. On disk, the registry is stored in five (extensionless) files in the path WINDOWS\system32\config: default, SAM, SECURITY, software, and system. These files do not correspond to the root keys; most of the information stored in the hive files appears in the Windows registry under the HKLM root key. For example, the registry key of Figure 2.2, showed some persistent data stored in a hive file, but when shown in the registry hierarchy the subkeys fall below the HKLM root key. The data stored in the registry hive files has its own file system-like structure; open source tools including regfi [55] can parse the hive files, outputting registry key paths, names, and values.

2.3.2 NTFS File System

Many of the challenges of interpreting NTFS arise from its design goals of being scalable and reliable. Scalability is achieved through multiple levels of indirection. Reliability is accomplished through redundancy and by ordering writes in a system- atic way to ensure a consistent result. Unfortunately, from an instrumentation and operation reconstruction view, this is often in the least-convenient ordering. The primary metadata structure of NTFS is the Master File Table, or MFT [14]. The MFT is composed of entries, which are each 1 KB in size. Each file or directory has at least one MFT entry to describe it. The MFT entry is flexible: The first 42 bytes are the MFT entry header and have a defined purpose and format, but the rest of the bytes store only what is needed for the particular file it describes. Among other things, the MFT header contains a sequence number (which is incremented whenever that entry is reused for a new file), a flag indicating whether the entry is currently used, and whether it describes a file or a directory. In NTFS, everything is a file—even file system administrative metadata. This

33 means that the MFT itself is a file called $MFT; its contents are the entries of the MFT (therefore, the MFT has an entry in itself for itself). Figure 2.4 shows a representation of the MFT file, and expands $MFT’s entry (which always resides at index 0 in the MFT). Like any other file, the $MFT file expands and contracts as needed, and if the disk is fragmented, the $MFT can expand into fragmented, non-consecutive clusters anywhere on disk. This is shown in Figure 2.4, whereby the contents of $MFT are stored in two non-contiguous runs of clusters.

MFT Entry Header Signature: FILE Seq Num: 1 In-Use: 1 Is-Directory: 0 Attribute Headers Attribute Content Unused Space Base Ref: 0

Cluster 104

105 Name: Name: Name: $STANDARD_ $FILE_NAME $DATA INFORMATION Type ID: 48 Type ID: 128 106 Type ID: 16 Resident: 1 Resident: 0 Resident: 1 107 Created: Name: $MFT Run 0: 2011 06 06 Parent MFT: 5 Start: 104 . . . . . 20:04:37 Count: 4 . File Modified: Run 1: 220 2011 09 06 Start: 220 15:31:32 Count: 2 . .

MFT Modified: . 221 2011 09 06 15:31:32 Accessed: 2011 09 06 15:31:32 . . .

Figure 2.4: Representation of the MFT, which is saved in a file called $MFT. The first entry holds the information to describe $MFT itself; the contents of this entry are expanded to show the structure and relevant information of a typical MFT entry.

Everything associated with a file is stored in an attribute. The attribute types are pre-defined by NTFS to serve specific purposes. For example, the $STANDARD INFORMATION

34 attribute contains access times and permissions, and the $FILE NAME attribute con- tains the file name and the parent directory’s MFT index. Even the contents of a file—after all, a file’s purpose for existing is to store contents—are stored in an attribute, called the $DATA attribute. The contents of a directory are references to its children; these too are stored in attributes (referred to as $INDEX ROOT and $INDEX ATTRIBUTE. Each attribute consists of the standard attribute header, a type-specific header, and the contents of the attribute. If the contents of an attribute are small, then the contents will follow the headers and will reside in the MFT entry itself. These attributes will be called resident; there is a flag in the attribute header to indicate whether the attribute contents are resident or not. In Figure 2.4, the contents of $STANDARD INFORMATION and $FILE NAME attributes are resident. If the contents are large, then an additional level of indirection is used. In this case, a runlist follows the attribute header. A runlist describes all the disk clusters which actually contain the contents of the attribute, where a run is defined as a starting cluster address and a length of consecutive clusters. (in NTFS terminology, a cluster is the minimum unit of disk access, and is generally eight sectors long). In the example MFT of Figure 2.4, since the contents of the MFT file are very large, $DATA’s contents are not resident; its runlist indicates that the contents of $MFT can be found in clusters 104-107 and 220-221. It is easy to see that a small file will occupy only the two sectors of its MFT entry. A large file will occupy the two sectors of its MFT entry, plus the content clusters themselves. Consider, then, the problem of a very large file on a highly fragmented disk: it might take more than the 1024 bytes just to store the content runlist. In this case, NTFS scales with another level of indirection and another

35 attribute, and multiple MFT entries are allocated (in addition to the base entry) to store all attributes. Each of the non-base MFT entries will contain the MFT index of the base index; this reference will be 0 for the base index.

2.3.3 Performance Optimizations for Disk Accesses

Disk accesses are expensive in terms of performance. While accesses to disk may take upwards of 5-10 ms, accesses to RAM in a modern computer may take 50-100 ns (with cache speeds even faster) [25]. Therefore, the operating system uses optimizations to minimize unnecessary disk accesses. One such optimization is the page cache. The page cache is a buffer of disk-backed pages that are stored in main memory; as a result, frequently-accessed disk clusters will be available more quickly. Disk contents will be paged to the page cache on the granularity of clusters; this is convenient in modern systems because a cluster is the same size as a page (4KB). Windows has different policies for reads and writes as they relate to the page cache; these policies are carried out by the cache manager. The multi-threaded cache manager utilizes a thread for intelligent read-ahead. The goal of intelligent read- ahead is that the data will already be in faster main memory before it needed. With intelligent read-ahead, spatial locality is used to prefetch data from disk according to some perceived pattern of read accesses. For example, if the reads are streaming through the disk, the operating system will prefetch the next sequential clusters; if the reads follow a strided pattern, the operating system will prefetch the next clusters that follow the strided pattern. The size of data that is prefetched is double the size of the last access. For write accesses, Windows uses Lazy Writing, courtesy of the cache manager’s delay thread [64]. Instead of immediately flushing writes to disk, writes are buffered

36 and flushed as a burst to disk. When a page is written to, it is marked dirty. Every second, Windows flushes one-eighth of the dirty pages to disk; therefore, it could take as long as 8 seconds for a write to be flushed to disk from RAM. The advantage of this scheme is that it reduces contention on the disk; this happens because the number of disk I/O operations is reduced when multiple writes occur to the same cluster within a short time frame. Instead of flushing the write to disk every time a change is made, it will only perform one write at the end of the interval. The performance advantage comes at the cost of reliability; while the user is under the impression that a change has been committed to persistent storage, it may actually still be in volatile memory for several seconds more, and would be lost in the event of a hard shutdown.

2.4 Formal Verification and Model Checking

Formal verification is a technique that has been used to specify and validate sequential circuit designs, communication protocols, and software correctness. Due to its ability to model software behaviors in the face of obfuscated code, it has also been recently used to model malware behaviors and capabilities. Model checking is a property verification approach that compares a model ex- tracted from a behavioral trace to a given specification [28]. The specification of the property to be detected is represented by a formula φ, which is written using the description language, or logic, of the model checker. Additionally, a model M is ex- tracted from each behavioral trace, and represented in the same description language. Then, the model from each behavioral trace is compared to the specification in order to determine whether the model M satisfies φ. The model checker outputs true or false, indicating whether the property is verified for the system.

37 The description languages used to describe models and property formulas are based on propositional logic. Verifying a property requires constructing a declara- tive statement, or proposition, about that system, and then determining whether that proposition is true or false. Propositional logic provides a formal language for describing these declarative statements, and includes the familiar operators of not (¬), and (∧), or (∨), and implication ( =⇒ ). For example, the proposition: “(¬p ∨ r) =⇒ (p ∧ q)” can be translated as ‘if not p or r, then p and q’. To create a running example of a type of malware behavior that we want to provide a formal specification for, let’s assume that we have a trace of all x86 instructions executed, gathered using dynamic instrumentation. Our specification formalizes the following behavior:

In the program execution path, at some point in time a register is set to zero, and after that, this same register is eventually pushed on the stack before any other modification occurs to that register.

By looking at key words of that statement, we can determine the ideal way to represent it. The word and indicates that we will need propositional operators, the phrases at some point and after that imply a preferred temporal ordering, and the phrase this same register implies that we care not just about the operations, but the inputs to the operations. Using pure propositional logic, we cannot be very specific in formulating this specification. For simplification, we’ll focus on a single register, eax. We can only specify that the trace should contain both an instruction that sets register eax to 0 (mov(eax,0)), and an instruction that pushes register eax onto the stack (push(eax)). Each instruction opcode and parameter(s) combination form a single propositional

38 atom; for example, mov(eax,0) and mov(eax,1) are as different of propositional atoms as p and q are, above. Therefore, our propositional statement is specified as:

φ = mov(eax, 0) ∧ push(eax) (2.1) and this statement only evaluates to true if both of the instruction opcode/parameter events appear in the trace. Note that this statement does not specify an ordering between the instructions, merely that they both must appear in the trace in order for the statement to evaluate to true (a binary bag-of-words-like specification).

2.4.1 Predicate Logic

Predicate logic extends propositional logic, satisfying the need for a richer language. It includes modifiers such as there exists (∃) and for all (∀). It also allows for the use of variables to generalize a statement, working as place holders for concrete values. In order to use Predicate Logic as a formal language, we define two types of “objects” that can appear in a predicate logic statement: Terms and Formulas. A term is an object; it can refer to a variable, or a function. Consider a formula that will describe file properties. We can refer generically to our file as variable f, and we can describe certain properties of our file with functions of f, such as p(f), the path of f, and n(f), the name of f. A term can be recursive: If x is a variable (a term), and f(x) is a function of that variable (also a term), then g(f(x)), a function of the function, is also a term. Conversely, a formula is a predicate—it is a statement that resolves to true or false. For example, we can use predicates to describe whether a certain type of

39 operation occurred on a certain file. For example, C(f) is true if file f was created. Formulas can be connected using propositional symbols, such as ¬, ∧, ∨, and =⇒ .

For example, if φ1 is a formula and φ2 is a formula, then φ1 ∨ φ2 is also a formula. Formulas can also be combined with variables in such a way that utilize the predicate symbols ∃ and ∀. For example, if φ is a formula and f is a variable, then ∃fφ is also a formula, and would read as there exists some f for which the formula φ evaluates to true. Given these two types of objects, we can define a vocabulary for Predicate Logic (as a formal language) as having three sets: A set of predicate (or formula) symbols P, a set of function symbols F, and a set of constant symbols C (since a constant is a function without arguments, C can also be treated as a part of the function set F). For the instruction trace example, our vocabulary of predicates consists of: P = {mov(x, y), push(x)}, where x and y are variables (terms) over the set of Functions F. Using predicates allows us to write specifications that differentiate between the type of operation (for example, the high level behavior, or the instruction opcode), and the parameters of that operation. For example, in Equation 2.1, we simplified the original statement, which referred to “some register”, to refer only to register eax. If we were to keep the original statement, the specification would be:

φ =mov(eax, 0) ∧ push(eax) ∨ mov(ebx, 0) ∧ push(ebx) (2.2) ∨ mov(ecx, 0) ∧ push(ecx) ∨ ...

In this case, it is more succinct to generalize the statement using predicate logic, creating a variable to represent the register that has a finite number of values. We

40 combine the variable with the predicate operator there exists, which can be used in combination with variables. Equation 2.2 can be rewritten in predicate logic as:

φ = ∃rmov(r, 0) ∧ push(r) (2.3)

This translates to: In the program execution path, some register is set to zero, and and this same register is pushed on the stack. However, this statement still says nothing about the ordering between the instructions.

2.4.2 Temporal Logic

Model checking is based on temporal logic; that is, the system is represented as a sequence of states. The formula representing the model is not always true for the model, but rather, the formula is true only at some point in time, when the system has moved through a correct series of states. There are two ways to think of time in a temporal formula. The first way to think of time is as branching. With branching time, time is represented as a tree, with an initial state as a single node at the root and possible future paths branching out from that state. Branching time is useful when there are many possible paths, but not all will occur, such as in a trace consisting of statically disassembled instructions of a program. There are multiple possible paths of execution, though not all are taken. For example, at a conditional statement, there are two two different branches that are possible depending on whether the if or else branch is followed. The second way to think of time is as linear; that is, there time is a set of paths, where each path is a sequence of states. For example, a trace of instructions intercepted during the course of execution of a program would be represented in linear time, as each event occurred sequentially in the path of execution.

41 Due to the temporal nature of model checkers, the language used to specify a behavior or capability uses temporal operators, which define how the different states connect to each other in time. As our dynamically-obtained traces of disk events occur in linear time, we focus here on the logic specification language Linear Temporal Logic, or LTL [28]. LTL contains the expected propositional operators, but also includes temporal operators: X, F, G, U, and R. Respectively, these temporal operators stand for neXt state, some Future state, Globally in all future states, Until, and Release. More formally, for a formula φ along a linear path π:

• Xp is true on a path π if p holds in the next state, π1

• Fp is true if p holds at any point in the future on path π

• Gp is true if p holds globally throughout the future on path π

• pUq is true if p holds on the path π until q holds

• pRq is true if q always holds on path π, with this requirement released once p holds. Furthermore, it is possible that p will never hold.

Using LTL, we can provide a more specific specification by forcing a temporal ordering between the instructions. The following LTL statement formula translates to “in the program execution path, at some point, eax is set to zero, and immediately after that, eax is pushed on the stack.”

φ = F(mov(eax, 0) ∧ Xpush(eax)) (2.4)

Or alternatively, the following LTL specification allows other instructions to appear between the mov and push instructions, specifying: In the program execution

42 path, at some point, eax is set to zero, and after that, eax register is eventually pushed on the stack.

φ = F(mov(eax, 0) ∧ Fpush(eax)) (2.5)

Again, we can see the limit of LTL when we want to generalize the statement over all registers, and the statement becomes unnecessarily complex:

φ =F(mov(eax, 0) ∧ Fpush(eax))∨

F(mov(ebx, 0) ∧ Fpush(ebx))∨ (2.6)

F(mov(ecx, 0) ∧ Fpush(ecx)) ∨ ...

2.4.3 Linear Temporal Predicate Logic

Formulas using Linear Temporal Logic are composed only of propositions—of other formulas, connected by propositional operators and temporal operators, while pred- icate logic lacks a notion of time. Therefore, in order to write robust yet succinct specifications, it makes sense to combine the two, to form Linear Temporal Predi- cate Logic. As with predicate logic, it allows us to differentiate between operations and parameters, while maintaining the temporal operators that specify the ordering between operations. Using LTPL, we can write:

φ = ∃rF(mov(r, 0) ∧ Fpush(r)) (2.7)

Translating to: There exists some register r, which is assigned the value 0, and at some point in the future, that same register r is pushed onto the stack.

43 Using the powerful combination of LTPL, we can even succinctly specify the orig- inal statement, using multiple variables and multiple temporal operators.

φ = ∃rF(mov(r, 0) ∧ X(¬∃mov(r, t))Upush(r)) (2.8)

Translating to: In the program execution path, at some point, some register is set to zero, and after that, this same register is eventually pushed on the stack before any other modification occurs to that register.

2.5 Summary

In this chapter, we outlined the background information that is relevant to this the- sis. In discussing the types of malware and the techniques employed by malware to hide from security products and analyzers, we motivated the need for an analyzer that provides a high-integrity, behavioral trace. We discussed the techniques used by malware analyzers to understand malware behavior, including static and dynamic techniques. To prepare for our description of Dione, our file system analysis frame- work, we introduced relevant Windows concepts, including the NTFS file system and the optimizations used by Windows to increase disk access performance. Finally, in advance of our technique to detect malware persistence capabilities, we introduced model checking, and described the logic languages to generate specifications of mal- ware behaviors.

44 Chapter 3

Related Work

3.1 Malware Analysis and Instrumentation

The ability to instrument disk accesses and file system operations is useful in many security fields, including intrusion detection and prevention and malware analysis. However, Dione is the first disk analysis infrastructure to provide live, up-to-date instrumentation for Windows NTFS file systems. Research on Intrusion Detection Systems (IDSs) has frequently included tech- niques to monitor disk accesses or modifications to the file system [3, 31, 33, 35, 58, 59, 73, 77, 81]. Kim and Spafford demonstrated that malware intrusions could be de- tected by monitoring Unix systems for unauthorized modifications to the file system with Tripwire [35]. Tripwire performed file-level integrity checks and compared the result to a reference database. While it worked quite well to discover modifications to files, it did not discover changes made to files if they are reverted before the utility is run again. Furthermore, it inherently produced many false positives. Stolfo et al. also developed host-based anomaly detection system which monitored changes to the

45 file system [73]. Their File Wrapper Anomaly Detection System (FWRAP) consisted of a host-based sensor which wrapped around a modified file system to extract infor- mation about each file access. Their anomaly detection algorithm then determined the probability that a file access was abnormal and generated an alert based on the score. Both these host-based solutions require a trusted OS, whereas Dione does not require that the host is uncompromised. On the other end of the spectrum from host-based solutions, Pennington et al. implemented a rule-based IDS that resided on an NFS server [59]. The authors enumerated the specific ways in which malware modifies data on disk, such as modi- fications to system administration files, log scrubbing, and timestamp reversal. Their IDS was effective at catching rootkits that modified persistent data on disk. It was, however, implemented for Linux (which has a far lower share of malware intrusions), and it resided on a separate storage processor, and thus could not be easily utilized for a desktop computer. Dione addresses both of these issues, as it monitors Windows systems with NTFS file systems, and it can monitor either a virtual machine or any desktop with an interposing hardware sensor. While host-based IDSs are problematic because a privileged rootkit can override or misdirect malware detectors, and network-based IDSs lack visibility into the host events, virtualization-based IDSs offer both high visibility and isolation from com- promised operating systems. Garfinkel and Rosenblum introduced the first IDS to leverage virtualization technology, thus revolutionizing malware detection [24]. Their IDS, Livewire, utilized Virtual Machine Introspection (VMI) techniques, such as the monitoring of memory and register contents and events such as interrupts, memory accesses, and device state changes. However, it did not incorporate disk accesses, thus missing out on additional system information.

46 Payne et al. proposed requirements that should guide any virtual machine mon- itoring infrastructure, and implemented XenAccess to incorporate VMI capabili- ties [58]. We observed their requirements in our implementation of Dione, as they provide an excellent guide for the proper design of an infrastructure for monitor- ing VMs. The disk-monitoring capabilities of their proof-of-concept implementation, however, can only be used for paravirtualized guest OSes, which is a simplification of the problem of interpreting a complex file system like NTFS for fully-virtualized Windows guest OSs. Azmandian et al. used low-level architectural events and disk and network accesses in their machine learning-based VMI-IDS [3]. While their instrumentation platform captured more types of events in addition to disk accesses, providing a rich set of features for their IDS, their disk instrumentation lacked the higher-level file system semantics provided by Dione. The work of Zhang et al. is very similar to ours; they presented a VMI-IDS that monitored the disk accesses of the virtual machine under analysis [81]. Their IDS creates a mapping between files and their sectors and monitor accesses to these sec- tors. Their system allows for the creation of rules that that watch for the types of accesses, discussed in [59], that might indicate an intrusion. However, their moni- toring framework is dependent upon virtualization technology, and it only runs for FAT32 file systems, a significantly more simple challenge. Jiang et al. also implemented a VMI-IDS, called VMwatcher, which incorporates disk, memory, and events [31]. However, they too cannot analyze the ubiquitous NTFS file system, and instead their Windows VMs must use the Linux ext2/ext3 file systems. The VMI-IDS of Joshi et al. detect intrusions before the vulnerability is disclosed [33]. Unfortunately, their solution to inspecting disk accesses requires

47 invoking code in the address space of an application process within the guest operating system itself, so to undo the effects of this intrusive action, their heavyweight solution must checkpoint and rollback. With Ghostbuster, Wang et al. [77] present a cross-view diff-based approach to de- tecting rootkits. By enumerating files, configuration settings, and processes at both a high-level (Windows APIs) and low-level (examining the data structures themselves), Ghostbuster can determine whether a stealthy rootkit is hiding evidence of infection. Their host-based solution has the advantage of being able to check for more than just file system operations; however, it will not provide a ground truth. While their approach will catch many file-evasion techniques (including many described in Sec- tion 2.1.3), it can still be evaded by particularly stealthy malware that interposes between the calls to obtain the raw metadata used to construct the low-level view. Furthermore, it only detects file hiding, as opposed to other file system operations, and it performs detection with dedicated snapshots views of the file system. Dione continually updates its views as metadata is written to disk, and thus maintains an up-to-date view. Other researchers have acknowledged the role of disk accesses in malware intru- sions by providing rootkit prevention solutions [11, 21]. With Rootkit Resistant Disks, Butler et al. provide a means to block accesses to directories containing sensitive op- erating system configuration files and executables [11]. Their hardware-based solution requires that all sensitive directories reside on a separate partition from the rest of the file system, and they physically block access to that partition unless a secure token is present. Chubachi et al. also provide a mechanism to block accesses to disk, and they can operate on a file-level granularity [21]. Unfortunately, they collect their mappings of files and sectors before the VM boots, and do not provide a live updating

48 capability as files are created, deleted, or changed in size. As a result, their sector watch list becomes inaccurate as the VM executes. Sundararaman et al. protect disk with a different approach: they developed a new disk format which provides data versioning for roll-back in the event of an intrusion [74]. They selectively version all metadata and user-specified content, allowing users to have block-based protection of their disks but through high-level semantics. However, it requires new file system modification, and thus is only applicable with open-source file systems. Previous work has also addressed the need for malware analyzers, with different solutions operating at different levels of semantics and isolation from malware. Sev- eral solutions perform malware analysis in-host [23, 62, 78]. DiskMon, part of the SysInternals tools for Windows, is an in-host solution which uses kernel event tracing to track file system operations [62]. Another solution for dynamically analyze mal- ware samples (including file system operations) is CWSandbox, which uses hooks in the Windows API to obtain the information it needs and to hide from malware [78]. Since both solutions reside in-host, malware could detect their presences (for example, by checking for hooks in the Windows API) and attempt to deceive the analyzers by providing their own in-host hooks. Many dynamic analyzers instrument the behavior of malware by tracing system calls [23, 38, 43, 41, 68]. These analyzers can use an emulation or virtualization layer to achieve isolation from the malware, and perform low-level semantic reconstruction by introspecting on registers and the VM’s memory. King and Chen’s BackTracker uses a virtualized environment to gather process and file system-related events that led to a system compromise of a Linux guest [38]. Despite BackTracker’s residence in the virtualization layer, the authors concede that

49 malware can hide from it, preventing live analysis. Sitaraman and Venkatesan ex- tended the functionality of BackTracker, providing improvements that will reduce the size of the dependency graph generated by BackTracker [68]. However, the implemen- tation of their event logger is compiled into the kernel or implemented as a loadable kernel module, and as such is not isolated from the malware. The work of Krishnan et al. creates a whole-system analysis by monitoring disk accesses, physical memory, and system calls, and reconstructing their intertwined relationships to provide a complete post-mortem forensic analysis [41]. Their disk monitoring infrastructure logs accesses to disk blocks and periodically performs a scan of the disk to connect blocks to files. The result is that their mappings are only accurate at the time of the scan, and do not reflect the file system changes that may occur between scans. Dione, on the other hand, uses live updating to maintain a perpetually up-to-date of the file system for accurate file system analysis. Kruegel et al. developed TTAnalyze (later renamed Anubis) to profile malware behavior, including file system activities, of Windows systems emulated in Qemu [43]. This approach has many advantages. Their instrumentation provides a rich opportu- nity to track all system calls and their parameters, and also all Windows API functions and parameters. This allows them to have a full-system, on-the-fly reconstruction. They can also identify the running process to limit the trace to only the functions called by the malware sample. Ether is another dynamic malware analyzer which isolates itself from malware through the virtualization layer [23]. Ether monitors malware at different levels of granularity: a fine-grained (instruction-level) granularity, or a coarse-grained (system call level) granularity. The goal of Ether is complete transparency, so that the malware cannot detect that it is being analyzed. Unfortunately, the performance cost of Ether

50 is steep (approximately 3000 times slowdown for single-step instrumentation [79]). Chow et al. introduced the idea of a replay approach with Aftersight [16]. Though it was built with bug-detection in mind, rather than malware-analysis, it provided an interesting foundation for future work. Yan et al. expanded on the heteroge- neous replay approach; their V2E records the malware’s behavior in a transparent virtual machine, then replay its behavior in a software-based dynamic binary analysis platform [79]. V2E provides both transparency and strong instrumentation support, without the high overhead seen in Ether. The authors were able to demonstrate that V2E could defeat common anti-emulation attacks. While Dione only provides file system-level instrumentation traces, and many of these analyzers provide multi-faceted analysis information, Dione provides some advantages not found in other analyzers. First, they are inextricably tied to the platforms they were developed for (for example, Qemu and Xen). Therefore, they cannot be ported to other environments in order to provide side-by-side comparison of environment sensitive malware. Dione, by contrast, can be ported to a variety of virtualization, emulation, and bare hardware platforms, and will produce comparable output reports. Additionally, they cannot provide the same level of ground truth that Dione can provide. The in-host solutions can be misled by malware utilizing lower-level call table hooking or filter drivers; analysis could also be bypassed if the analyzer provides its own hooks and the malware restores the call tables to eliminate the hooks. Even analyzers which reside in the virtualization or emulation layer face the theoretical chance of being intentionally misled by malware. For example, consider an analyzer that identifies library calls by comparing the executing instruction pointer with their exported library function addresses to determine which library function

51 was called. Malware could hook a call table (such as the SSDT, as described in Section 2.1.3), diverting system calls to a different location in memory which does not correspond to the exported library addresses. As a result, the system call would not be recorded unless the hook eventually returns control to the original library function. Analyzers that rely on other tells for the system call id (for example, by recording the value in EAX at each SYSENTER invocation) might even be misled by theoretical malware that encodes and decodes the system dispatch id before and after the SYSENTER transition 1. Since Dione intercepts raw disk accesses, and relies only on state changes and the actual intercepted disk sectors and contents, it cannot be misled by kernel-level malware. Finally, many other analyzers can only obtain information from intercepted system calls and possibly their parameters (Anubis can, by contrast, obtain more information both through the additional Windows API and by injecting new function calls into the executing instruction stream). Dione intercepts all the raw metadata of a file, and can therefore determine every property relating to that file, whether or not it can be read or modified by a Windows API or system call.

3.2 Characterizing Malware Behavior

Though capability labeling for malware behaviors is a more recent discipline, it draws upon work from several related areas. Specifically, previous work has utilized behav- ioral traces and profiling in order to label malware by its family or variant (malware classification and clustering) or by its maliciousness (intrusion detection). Since the templates or specifications of malicious behavior used in these areas tend to identify

1It is worth noting that such deceptive system call obfuscation would only be performed with the unique purpose of thwarting VMI or emulator-based analyzers, and therefore it is far more likely that malware would simply detect the analyzer and exit.

52 specific malware capabilities and techniques, research in this area is directly related to labeling based on the capabilities themselves. Behavioral traces can be gener- ated either statically or dynamically. With the traces, researchers can characterize malware using one of several techniques: machine learning, informal modeling, and formal verification.

3.2.1 Characterizing Malware with Machine Learning

Malware clustering and classification naturally follows from the generation of mal- ware analysis traces. Lee et al. conducted early research using behavioral, rather than signature-based, clustering and classification of malware samples [48]. However, they used simple system call traces to describe malware behavior, and more recent re- search has shown that better results can be achieved with a high-level behavior-based approach, rather than with system call traces [4, 6]. Bailey et al. also used a malware’s behavior in order to create a fingerprint [4]. Rather than system calls, they focus on higher level descriptions of what the malware is doing on the system. They showed that existing antivirus solutions for character- izing malware are inconsistent across products, incomplete across malware strains, and do not contain concise semantics. With a classification technique that focuses on system state changes, instead of low-level system calls or binary signatures, they could do a better job classifying malware (including malware that hadn’t been seen, and therefore didn’t have a signature) than existing antivirus products. Similarly, Bayer et al. use a behavioral profile, rather than just a system call trace, to cluster malware [6]. They introduce taint analysis to Anubis to track dependencies between both native API and Windows API functions, and also track control flow dependencies and network traffic, in order to generate the behavioral profile. This

53 work achieved better clustering than Bailey et al., and with their Locality-Sensitive Hashing based clustering algorithm they can scale to real-world data sets. They also achieved significantly better results than a purely system-call based approach, which they attribute to too much noise in the system call traces. Similarly, Jang et al. present BitShred, a clustering technique for malware triage [30]. They use feature hashing to reduce the high-dimensional feature space drawn from behavioral profiles, and use the Jaccard and BitVector Jaccard distance to measure similarity. Rather than focusing on clustering of known malware variants, Reick et al. de- veloped a classification scheme that can determine whether a new malware instance belongs to a known malware family or is a new malware strain [60]. Behavior traces are obtained from the in-host CWSandbox [78] dynamic analysis platform. They use the Support Vector Machine (SVM) model to classify new behavior; with two vari- ants, they can alternately perform multi-class classification and predict and detect novel malware behavior. Additionally, for each malware family they obtain a feature ranking in order to gain additional insight into its typical behavior patterns. Rieck et al. later argued that batch clustering of malware samples can be extended to include the iterative classification of new samples [61]. However, their approach is closer to system call analysis, as they capture system call traces, encode them into their Malware Instruction Set (MIST), and create behavioral patterns through a slid- ing window in the instruction stream. They bridge clustering and classification with reports that demonstrate typical behavior for homogeneous groups. These prototypes maintain intermediate results, such as cluster assignments in previous iterations of the algorithm, for use in incrementally analyzing new samples. Though this work had some success in identifying malicious code behavior, it

54 was dependent upon static analysis using disassembly, which is known to be easily defeated with malicious hand-tuned assembly code [66]. Interestingly, there has been enough work in malware classification and clustering that additional research seeks to verify, analyze, and even constructively criticize the research and evaluation of previous work. Since classification and clustering algorithms require some sort of distance metric to determine how similar two pieces of malware are, Apel et al. devoted research to evaluating different types of distance metrics [2], and found that the Manhattan distance satisfied their criterion the best. Li et al. attempt to shed light on the inherent problems in using machine learning for classification by constructively criticizing the evaluation previous work [49], in- cluding the state-of-the-art by Bayer et al. [6]. They conclude that the problem arises from the lack of “ground truth”—that using malware samples that can be identi- fied by anti-virus scanners will bias the corpus in favor of easy-to-cluster instances. While the problem is still not solved, it is useful to consider effects such as these in evaluating clustering algorithms.

3.2.2 Characterizing Malware Using Modeling

Instead of using machine learning techniques on behavioral profiles, complementary research aims to use formal [7, 9, 20, 19, 36, 37, 67, 70, 69, 71] and informal [39, 44, 72] verification techniques to label or classify malicious malware samples. Kruegel et al. aimed to determine whether a Loadable Kernel Module (LKM) in Linux resembled that of a rootkit when loaded into kernel space [44]. They created an abstract model of program behavior using static analysis, generating a control flow graph of preprocessed kernel module code, and compared it to an informally-defined specification of rootkit behavior. Kirda et al. also generated informal specifications

55 of spyware behavior using a combined static and dynamic analysis approach [39] and a customized browser instrumentation infrastructure. In their work on AccessMiner, Lanzi et al. analyze and model benign program behavior to better understand malicious behavior [47]. After first attempting a ma- chine learning-based approach and demonstrating that an n-gram sliding windows of system calls does not produce sufficiently accurate results, they model benign behav- ior with an access activity model, whereby benign activity is expressed in terms of access tokens to system resources (eg, files and registry keys). Researchers have observed that a common behavior employed by malicious pro- grams relates to the way sensitive data is treated, and have developed informal policies to define this behavior [72, 80]. Stinson et al. informally define a malicious bot be- havior as one in which data is received from the network and subsequently used as an input parameter to a system call—that is, an untrusted source is fed into a trusted sink [72]. They use system call interposition and tainting to achieve their dynamic analysis. With Panorama, a whole-system information flow tracking system, Yin et al. also used taint propagation to detect malicious behavior. Panorama can detect when a malware sample accesses sensitive data that it should not have access to, and can track what it does with that sensitive data [80]. Many researchers have chosen static analysis to obtain the traces that will be used in their behavioral analysis. Bergeron et al. were among the first researchers to utilize formal verification techniques to detecting malicious code patterns in malware [9]. The authors used static analysis to generate a control flow graph of security-critical API calls, and then used model checking to verify these graphs against a malicious code specification. Likewise, Singh et al. identify fundamental functionality that sufficiently capture malicious properties of a virus, which they call organs, including

56 survey, concealment, propagation, injection, and self-identification [67]. They use Linear Temporal Logic (LTL) formulas to encode malicious behaviors. However, these early works do not provide comprehensive evaluations of their methodologies on sufficient amounts of real-world malware samples. Kinder et al. also sought to describe and identify malware based on behavioral signatures using model checking [36]. In order to succinctly and comprehensively de- scribe these behaviors, they developed and demonstrated the use of a new temporal logic, Computation Tree Predicate logic (CTPL), on statically-generated instruction traces. They demonstrated that the same specification of malicious behaviors could be used to identify several different real-world worms [37]. CTPL was extended to ex- press stack operations by Song and Touili [70]. The resulting logic was called SCTPL and allowed them to model a program using a Pushdown System with predicates over the stack. They further expanded this work to produce SCTPL formulas that consider values, rather than names, of registry and memory locations [69], and they also improve the efficiency of the detection algorithm. Finally, they abandon the branching logic variants of CTL for a Linear Temporal Logic in [71]. In doing so, they describe LTPL, a linear temporal logic with predicates, and then extend it for their Pushdown System with stack semantics for SLTPL. Similarly, Beaucamps et al. utilized a variation of LTL with predicates [7]. Their two contributions were to abstract static traces into high-level behaviors, and then use model checking to com- pare them against a malware specification expressed in First-Order Linear Temporal Logic (FOLTL). While these papers utilized model checking in a novel way, they were more concerned with intrusion detection—labeling a sample as malicious or benign— rather than on generating specifications and detecting capabilities on samples (both malicious and benign), as ours does.

57 Christodorescu et al. provided a richer specification of malware behavior [20]. They developed formal templates of malicious behavior consisting of instruction se- quences with variables and symbolic constants; a match for malicious behavior is detected when a malware sample’s instruction sequence is a match for a template. Since many of these formal and informal modeling methodologies require hand- written malware specifications, Christodorescu et al. developed an automated system to generate malware specifications, or malspecs [19]. A malspec is generated from the system call-based dependence graphs of malicious programs, and is represented as a dependence graph. Recognizing the much of the previous work in malware detection demonstrated high detection rates, though without a common testing methodology and lacking large datasets on which to test the algorithms (in some cases, models were tested with only a dozen or two benign and malicious samples), Likewise, in their preventative system, Kolbitsch et al. represent malicious behavior in a dependency graph of relevant system calls [40]. Then, their on-line scanner monitors system call invocations and parameters, and determine on-the-fly whether the program matches one of the behavior graphs. Canali et al. performed a detailed study of previously-researched malware de- tectors [12]. They explored the design space of hundreds of models, and tested the models on hundreds of thousands of samples. They demonstrated that analytical rea- soning does not demonstrate utility, but that it must be supplemented by a rigorous evaluation with a sufficiently-large dataset. Finally, the research of Martignoni et al. into capability labeling most closely aligns with the work described in this dissertation [52]. They create high-level be- havior specifications from domain knowledge. From malware samples, they generate behavior graphs, and use a behavior matching algorithm to determine whether the

58 sample exhibits each high-level behavior. However, they target different types of ca- pabilities (generally, network-related capabilities and keylogging), and they represent their specifications using behavior graphs (or and/or graphs), instead of the more succinct LTPL. Additionally, they tested their approach on a mere 25 samples (11 benign); given the variety of ways in which malware can manifest a certain behavior, it does not sufficiently demonstrate the effectiveness of their solution.

59 Chapter 4

Dione: A Disk Instrumentation Framework

Dione is a flexible, policy-based disk I/O monitoring and analyzing infrastructure [51]. Dione maintains a view of the file system under analysis. A disk sensor intercepts all accesses from the System-Under-Analysis (SUA) to its disk, and passes that low-level information to Dione. The toolkit then reconstructs the operation, updates its view of the file system (if necessary), and passes a high-level summary of the disk access to an analysis engine as specified by the user-defined policies. The rest of this section discusses Dione in more detail.

4.1 Threat Model and Assumptions

Our threat model does not require that the SUA is trusted or uncompromised. The SUA can be compromised by malware with administrator-level privileges that can hide its presence from host-level detection mechanisms. The attacker may access, modify, create, or delete files anywhere in the file system.

60 However, we assume that there is some disk-level artifact of the malware infection. This means that the malware needs to either download files to the hard disk, create new files, or modify existing files. We can still observe these operations even if a kernel-level rootkit has attempted to hide these operations and artifacts from a host detection mechanism. We assume that there is a sensor that interposes between the SUA and its hard disk and provides disk access information. This sensor can be a software sensor (e.g., a virtualization layer) or a hardware sensor. We assume that both the sensor providing the disk access information and the Analysis Machine (that is, the machine which runs Dione) are trusted. Therefore, in a virtualization-based solution, neither the hypervisor nor the virtual domain which is serving as the Analysis Machine can have been compromised. In a physical solution, the separate machine running Dione cannot have been compromised.

4.2 Dione Operation

There are four discrete components to Dione: A sensor, a processing engine, an anal- ysis engine, and the Dione Manager. The Dione architecture is shown in Figure 4.1. The Sensor interposes between the SUA and its disk. It intercepts each disk access, and summarizes the access in terms of a Logical Block Address (LBA, or simply sector), a sector count, the operation (read/write), and the actual contents of the disk access. The sensor type is flexible. It can be a physical sensor, which interposes between a physical SUA and the analysis machine, or a virtual sensor, such as a hypervisor, which intercepts disk I/O of a virtual SUA. The Processing Engine is a daemon on the analysis machine. The multithreaded

61 Anallyssiiss Macchiine Disk Processing Engine (Dione Daemon)

Disk SSeenssorr Live Policy Analysis Access Updating Engine Engine Classification

Dione Manager SSyysstteem Unnddeerr AAnnaallyyssiiss ((SSUAA))

Figure 4.1: High-level overview of Dione Architecture.

Dione daemon interacts with both the user and the sensor. It receives disk access information from the sensor, and performs three steps. The first step is Disk Access Classification; for each sector, it determines which file it belongs to (if known) and whether the access was to file content or metadata. In the Live Updating phase, it compares the intercepted metadata to its view of the file system to determine if any high-level changes occurred. It passes the high-level access summary to the Policy Engine, which determines if any policies apply to the file accessed. If so, it passes the information along to the analysis engine. The Analysis Engine performs some action on the information it has received from the processing engine. Currently, the analysis engine logs the accesses to a file, but future work will extend the analysis engine. An example of a portion of an outputted analysis log is provided in Figure 4.2. The Dione Manager is a command line program which the user invokes to send commands to the Dione daemon. The commands can be roughly divided into two categories: Policy Commands and State Commands. A summary of all commands is presented in Table 4.1.

62 Command Description declare-rule Declare a new rule for instrumentation. Types of rules include: • access: Record an access to file content/metadata • operation: Record high-level file system operation, (e.g., file cre- ation, deletion, move) • anti-forensics: Record anti-forensics operation, (e.g., file hiding, timestamp reversal, Alternate Data Stream (ADS) creation/dele- tion) • MBR Alert: Record read/write access to Master Boot Record (MBF) delete-rule Delete a previously-declared rule list List all rules apply Bulk-apply declared rules to file record data structures scan Perform a full scan of a disk image (or mounted disk partition), creating all file records from the raw bytes and automatically applying all declared rules save Save the state of the Dione file record hierarchy to a file to be loaded from later load Load the Dione file record hierarchy from a previously-save configuration file

Table 4.1: Commands used for communication with the Dione daemon.

63 <> <0x00000110> <0x00000002> <0x00000000> <%25SystemRoot%25\System32\svchost.exe -k sysmgr> <664> <40> <0> <64> <664> <472> <32> <664> <320> <32> <664> <296> <24> <664> <176> <64> <664> <440> <32> <664>

Figure 4.2: Sample Dione Disk Trace.

4.2.1 Dione Policy Commands

As Dione instruments the file system under analysis, the user can specify policies to determine whether the instrumentation data should be passed along to the Analysis Engine. A policy specifies an action to be taken on a file for a given operation. The Policy Engine is a flexible framework for declaring new policies. Currently, we have implemented four types of policies: Record, TimeStamp Alert, Hide-Alert, and MBR Alert. Policies can be declared or deleted at any point when Dione is running, including when it is actively monitoring a live system. The Record policy specifies whether accesses should be recorded to a log file. When an access is recorded, Dione will specify whether it was to file content or metadata, whether it was a read or write, and whether it was a special operation such as a file creation, deletion, or renaming. A special annotation is provided for files which are created with their hidden property set to hide from the user. The

64 Timestamp Alert detects a specific symptom of intrusion: the reversal of any of the time-stamp properties of a file (the so-called Modification, Access, and Creation (MAC) times). The Hide-File Alert detects the hiding of a file. For each of these three policy types, optional arguments specify whether the policy should apply for reads, writes, or both. If the specified file is a directory, the policy can optionally apply to all of its descendants. If a file does not exist when the policy is declared, the policy will remain in the system and will be automatically applied when the file is created in the SUA. The MBR Alert looks for an access to a specific region of disk: the sectors on the partition containing the Master Boot Record (MBR). This policy, when applied, records reads and writes to the sectors on the MBR partition. In the Policy Command category, the user can declare, delete, list, or bulk-apply policies.

4.2.2 Dione State Commands

In the State Command category, Dione loads and saves a view of state of the file sys- tem under analysis. The load step is necessary to pre-populate Dione data structures with the state. This step is required before Dione will begin monitoring I/O. The goal of this stage is that Dione will already know everything about the file system before the SUA boots, so that it can immediately begin monitoring and analyzing disk I/O. This step can be accomplished with a disk scan, which reconstructs the file system from the raw bytes of the disk, or by loading a previously saved configuration file. The advantage of the load/save functionality is that a disk scan only needs to be performed once, which is useful in the case of very large disks with many files for which a raw scan takes longer than a load.

65 4.3 Live Updating

As the SUA boots and runs, new files are created, deleted, moved, expanded, shrunk, and renamed. As a result, the pre-populated view of the SUA’s file system, including the mappings between sectors and files, quickly become out-of-date, reducing the accuracy of the monitoring and logging of disk I/O. The solution to this problem is Live Updating: an on-the-fly reconstruction of disk events based solely on the intercepted disk access information. In the next sections, we will detail the challenges and solutions to live updating. As our implementation is initially geared toward Windows systems with the NTFS file system, and NTFS is particularly susceptible to the challenges inherent to live updating, we will begin with an introduction to those NTFS concepts which will aid in the understanding of the live updating implementation.

4.3.1 Live Updating Challenges

There are two big challenges to live updating: overcoming the Semantic Gap and the Temporal Gap. The Semantic Gap is a well-studied problem in which low-level data must be mapped to high-level data. In our case, we need to map the raw byte contents of a disk access to files and their properties. Fortunately, there are existing techniques, such as the open-source The Sleuth Kit (TSK) [13], which do much of the work to bridge the semantic gap. The Temporal Gap occurs when low-level behaviors occurring at different points in time must be pieced together to reconstruct high-level operations. The high-level operations that Dione monitors include file creation, deletion, expansion, move/re- name, and updates in MAC times and the hidden property.

66 The first challenge of live updating is identifying the fields in an intercepted MFT entry for which a change indicates a high-level operation. Often is is not just a single change in an intercepted MFT entry that indicates a high-level operation, but a combination of changes across multiple intercepted MFT entries. Due to requirements for reliability, these changes will be propagated to disk in an inconvenient ordering. As a result, Dione must piece together the low-level changes across time in order to reconstruct high-level events. The biggest challenge resulting from the temporal gap is the detection of file creation. An intercepted MFT entry lacks two critical pieces of information: the MFT index of that entry, and the full path of the file it describes. For a static image, it not a challenge to calculate both. However, in live analysis, the metadata creation will occur before the $MFT file’s runlist is updated—and just like any other file, $MFT can expand to a non-contiguous location on disk. Therefore, in certain cases it can be impossible to determine (at the time of interception) the index of a newly created file. In fact, it can be impossible to determine at interception time whether a file creation actually occurred in the first place. A similar challenge arises in determining the absolute path of a file. The MFT entry contains only the MFT index of that file’s parent, not its entire path. If the parent’s file creation has not yet been intercepted, or the intercepted parent did not have an MFT index when its creation was intercepted (due to the previously described problem), Dione has no way to identify the parent and thus reconstruct the path. This situation occurs quite frequently whenever an application is being installed. In this case, many (up to hundreds or thousands) of files are created in a very short amount of time. Since the OS bunches writes to disk in one delayed burst, many hierarchical directory levels are created in which files cannot determine their paths.

67 The temporal gap also proves a challenge when a file’s attributes are divided over multiple MFT entries. As Dione will only intercept one MFT entry at a time, it will never see the full picture at once. Therefore, it needs to account for the possibility of only intercepting a partial view of metadata.

4.3.2 Live Updating Operation

Live updating in Dione occurs in three steps. First, file metadata is intercepted as it is written to disk. Next, the pertinent properties of the file are parsed from the metadata, resulting in a reconstructed description of the file whose metadata was intercepted. Finally, Dione uses the intercepted sector, the existing view of the file system, and the reconstructed file description from the second step to determine what event occurred. It updates the data structures to represent the file system change. After intercepting an access to disk, Dione looks at the intercepted disk contents and approximates whether the disk contents “look like” metadata (i.e., whether the contents appear to be an intercepted MFT entry). If it looks like metadata, Dione parses the raw bytes and extracts the NTFS attributes. It also attempts to calculate the MFT index by determining where the intercepted sector falls within Dione’s copy of the MFT runlist. With this calculated index, it can attempt to retrieve a File Record. There are two outcomes of this lookup: either a valid File Record is retrieved, or no File Record matches the index. If a valid File Record is found, Dione will compare the extracted attributes to those attributes found in the existing File Record. If any changes are detected, it will modify the File Record to reflect the changes. A summary of the semantic and temporal artifacts of each type of file operation is presented in Table 4.2. However, if a valid File Record is not found, one of three situations has occurred.

68 Operation Artifacts • No existing File Record for calculated index File Creation • Sector falls within MFT runlist, otherwise buffer until MFT runlist expands to include sector

• File Record exists for calculated index File Deletion • In-Use flag off in intercepted MFT entry header

• File Record exists for calculated index • Creation Time: Intercepted > F ileRecord, File Replacement∗ OR MFT Entry Sequence Number: Intercepted > F ileRecord OR MFT Entry type (base vs. nonbase) changed

• File Record exists for calculated index File Rename • File Name: Intercepted 6= F ileRecord

• File Record exists for calculated index File Move • Parent’s MFT Index: Intercepted 6= F ileRecord

• File Record exists for calculated index File Shrink/Expand • Runlist: Intercepted 6= F ileRecord

• File Record exists for calculated index Timestamp Reversal • MAC Times: Intercepted < F ileRecord

• File Record exists for calculated index File Hidden • Hidden flag: Intercepted = 1 && F ileRecord = 0

• File Record exists for calculated index ADS Creation • List of $Data attributes: Intercepted 6= F ileRecord

• File Record exists for calculated index ADS Deletion • List of $Data attributes: Intercepted 6= F ileRecord

Table 4.2: Summary of the artifacts for each file system operation. An MFT index is computed based on the intercepted sector and the known MFT runlist. If a file record is found with the calculated index, properties of the file record are compared with properties parsed from the intercepted metadata. ∗ A replacement is characterized by a file deletion and creation within the same flush to disk, whereby the same MFT entry is reused. 69 In the first case, a new file has just been created, and it has been inserted into a “hole” in the MFT. The file creation can be verified because the intercepted sector falls within the known runlist of the MFT. In the second case, a new file has just been created, but the MFT was full, and thus it could not be inserted into a hole. Dione buffers a reference to this file in a list called the Wait Buffer 1. Eventually Dione will intercept the $MFT file’s expansion, and the file creation can be validated and the path constructed. In the final case, the intercepted data had the format of metadata (e.g., the data looked like an MFT entry), but the data actually turned out to be the contents of another file. This happens for redundant copies of metadata and for the file system’s $Logfile; additionally, a malicious user could create file contents which mimic the format of a MFT entry. In any of these cases, a reference to this suspected file—and the sector at which it was discovered—will be saved in the Wait Buffer. However, the Wait Buffer will be periodically purged of any File Records when their corresponding sectors are verified as belonging to a file which is not $MFT.

4.4 Disk Sensor Integration

In order to be portable to any type of sensor, the Dione instrumentation library is compiled as a library. The Dione daemon is an executable created from the library. Communication between the Dione daemon requires two corresponding components: A sensor-side API and a Dione receiver. The receiver is compiled into the Dione library, whereas the sensor-side library is compiled separately. Therefore, an inter- process (in the case of a virtualization or emulation-based sensor) or inter-system (in the case of a physical sensor) communication protocol is required.

1A newly-created file will also be placed in the Wait Buffer if it has a valid MFT index, but its path cannot be constructed because its parent has yet to be intercepted.

70 The virtualization and emulation based sensors (using Xen and Qemu, respec- tively) utilize an interprocess communication protocol in order to communicate disk access information between the hypervisor/emulator and the Dione daemon. We have implemented a producer/consumer communication protocol using shared mem- ory and semaphores. A sensor-side API (called the DiskMonitor) provides two exter- nally available functions. The first is an initialize function; it is called from the Xen or Qemu I/O initialization function, and it sets up the shared memory region and semaphores. The second is a disk access function; it marshalls the disk access information (LBA, count, operation, and access contents) into the shared memory region, and is therefore called once per multi-sector disk access. The Xen-based implementation calls these functions from within the block device driver. The Qemu-based implementation calls these functions from within the dma- helpers device driver. The Xen implementation works for raw disk images, whereas the Qemu implementation works for both raw and the new Qemu Copy-on-Write (QCOW2) disk image formats. The physical sensor, created with a custom FPGA board, interposes between a system and its hard disk; therefore, it allows Dione to instrument a physical SUA, preventing the malware from detecting that it is being analyzed. The physical sensor parses the disk access information (LBA, count, operation, and access contents) from the SATA commands. It then passes them along to the Dione daemon, which is running on another physical system, through ethernet. The Dione library is compiled with a client-side receiver that opens a socket for the given network interface and waits for packets on that socket. The disk access information for each packet is unmarshalled and passed to the rest of the Dione daemon.

71 4.5 Experimental Results

Next, we evaluate the accuracy and performance of Dione and demonstrate its utility using real-world malware. Though Dione is a flexible instrumentation framework capable of collecting and analyzing data from both physical and virtual sensors, we use a hypervisor-based solution which utilizes the virtualization layer as a data-collecting sensor.

4.5.1 Experimental Setup

Our virtualization-based solution uses the Xen 4.0.1 hypervisor. Our host system contains a dual-core Intel Xeon 3060 processor with 4 GB RAM and Intel VMX hardware virtualization extensions to enable full-virtualization. The 160 GB, 7200 RPM SATA disk was partitioned with a 25 GB partition for the root directory and a 80 GB partition for the home directory. The virtual machine SUA ran Windows XP Service Pack 3 with the NTFS file system.

4.5.2 Evaluation of Live Updating Accuracy

In order to gauge the accuracy of live updating, we ran a series of tests to determine if Dione correctly reconstructed the file system operations for live updating. For our tests, we chose installation and uninstallation programs, as they perform many file system operations very quickly and stress the live updating system. We chose three open source applications (OpenOffice, Gimp, and Firefox), and performed both an installation and a uninstallation for each. We also ran an all-inclusive test that installed all three, then uninstalled all three.

72 Program Creations (Delayed) Deletions Moves Errors OpenOffice Install 3934 3930 1 0 0 Gimp Install 1380 1380 0 0 0 Firefox Install 152 135 71 0 0 OpenOffice Uninstall 353 62 3788 3836 0 Gimp Uninstall 5 0 1388 0 0 Firefox Uninstall 6 0 80 0 0 All 6500 6114 5986 3815 0

Table 4.3: Breakdown of file system operations for each benchmark. The subset of file creations which wait for the delayed expansion of the MFT are also indicated.

These benchmarks perform a varying number of changes to the file system hier- archy. Table 4.3 lists each of the seven benchmarks and the number of file creations, deletions, and moves 2. As discussed in Section 2.3.1, if many new files are created at once and the MFT does not have enough free space to describe them, there is a delay between when the file creation is intercepted and when the MFT expands to fit the new file metadata (at which point the file creation can be verified). We also include the number of delayed-verification file creations in Table 4.3, as these create additional stress to Dione’s live updating accuracy. For each test, we started from a clean Windows XP SP3 disk image. We executed one of the seven programs in a VM, instrumenting the file system. We then shut down the VM, and dumped Dione’s view of the dynamically-generated state of the file system to a file. We then ran a disk scan on the raw static disk image, and compared the results of the static raw disk scan to the results of the dynamic execution instrumentation. An error is defined as any difference between the dynamically- generated state and the static disk scan. This includes a missing file (one that was not reported created), an extraneous file (one that was not reported deleted), a misnamed

2The “All” test is not a sum of the individual tests, because the operating system also creates, deletes, and moves files, and the number of these may change slightly through tests.

73 file, a file with the wrong parent ID or path, a file mislabeled as a file or directory, a file mislabeled as hidden, a file with an incorrect timestamp (of any of the four timestamps maintained by Windows), or a file with an incorrect runlist. Table 4.3 shows the result of the accuracy tests. In each case, Dione maintained a 100% accurate view of the file system, with no differences between the dynamically-generated view and the static disk scan.

4.5.3 Evaluation of Performance

In order to gauge the performance degradation associated with disk I/O instrumen- tation using Dione, we ran two classes of benchmarks: one high in file content reads and writes, and one high in file metadata reads and writes.

Iozone Benchmark

Iozone generates and measures a variety of file operations. It varies both the file size and the record size (e.g., the amount of data read/written in a given transaction). Because it creates very large files, reading and writing to the same file for each test, this is a content-heavy benchmark with very little metadata being processed. We ran all Iozone tests on a Windows XP virtual machine with a 16 GB virtual disk and 512 MB of virtual RAM. We used the Write and Read tests (which stream accesses through the file), and Random Write and Random Read (which perform random accesses). We varied the file size from 32 MB to 4 GB, and chose two record sizes: 64 KB and 16 MB. We ran each test 50 times to average out some of the variability that is inherent with running a user-space program in a virtual machine. For each test, we ran three different instrumentation configurations. For the Base- line configuration, we ran all the tests without instrumentation (that is, with Dione

74 turned off). In the second configuration, called Inst, Dione is on, and performing full instrumentation of the system. There are, however, no rules in the system, so it does not log any of these accesses. This configuration measures the minimum cost of instrumentation, including live updating. The final configuration is called Inst+Log. For these tests, Dione is on and providing instrumentation; additionally, a rule is set to record every access to every file on the disk. Figure 4.3 shows the results of the tests. Each of the lines represents the performance with instrumentation, relative to the baseline configuration. For the Read Iozone tests (Figures 4.3(a) and 4.3(b)), the slowdown attributed to instrumentation is near 0 for files 512 MB and smaller. Since the virtual machine has 512 MB of RAM, Windows prefetches and keeps data in the page cache for nearly the entire test. Practically, this means that the accesses rarely go to the virtual disk. Since Dione only instruments actual I/O to the virtual disk—and not file I/O within the guest OS’s page cache—Dione is infrequently invoked. At larger file sizes, Windows needs to fetch data from the virtual disk, which Xen intercepts and communicates to Dione. At this point, the performance of in- strumentation drops relative to the baseline case. In the worst case for streaming reads, Dione no-log instrumentation achieves 97% of the performance of the unin- strumented execution. For the random read tests with large file sizes, there is a larger penalty for in- strumentation. Recall that Dione incurs a penalty relative to the amount of data accessed on the virtual disk. Therefore, the penalty is higher when more accesses are performed than are necessary. Windows XP utilizes intelligent read-ahead, in which the cache manager prefetches data from a file according to some perceived pattern. For random reads, the prefetched data may be evicted from the cache before it is

75 Instrumentation Overhead Instrumentation Overhead Read Tests (64 KB Record) Read Tests (16 MB Record)

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Inst (Stream) Inst (Stream) Inst (Random) Inst (Random) 0.2 0.2 Inst+Log (Stream) Inst+Log (Stream)

Performance Relative to Baseline Inst+Log (Random) Performance Relative to Baseline Inst+Log (Random) 0.0 0.0 32 64 128 256 512 1024 2048 4096 32 64 128 256 512 1024 2048 4096 File Size (MB) File Size (MB) (a) Read Test, 64 KB Record Size (b) Read Test, 16 MB Record Size

Instrumentation Overhead Instrumentation Overhead Write Tests (64 KB Record) Write Tests (16 MB Record)

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Inst (Stream) Inst (Stream) Inst (Random) Inst (Random) 0.2 0.2 Inst+Log (Stream) Inst+Log (Stream)

Performance Relative to Baseline Inst+Log (Random) Performance Relative to Baseline Inst+Log (Random) 0.0 0.0 32 64 128 256 512 1024 2048 4096 32 64 128 256 512 1024 2048 4096 File Size (MB) File Size (MB) (c) Write Test, 64 KB Record Size (d) Write Test, 16 MB Record Size

Figure 4.3: Performance of instrumentation, normalized to the baseline (no instru- mentation) configuration for Iozone benchmarks for streaming and random read and write tests.

76 used, resulting in more accesses than necessary. This also explains why the penalty is not as high for the tests using the larger record size (for a given file size). Win- dows adjusts the amount of data to be prefetched based on the size of the access, so the ratio of prefetched data to file size is higher with a larger record size. With more prefetched data, there is a higher likelihood that the data will be used before it is evicted from the cache. Fortunately, this overhead is unlikely to be incurred in practice, as random-access of a 2 GB file is rarely performed. Another observation is that the performance of Dione actually improves for streaming and random reads as file sizes get larger than 1 and 2 GB (respectively). This is explained by considering the multiple levels of memory hierarchy in a virtu- alized system. As the file size grows larger than the VM’s RAM, I/O must go to the virtual disk. However, the file may still be small enough to fit in the RAM of the host, as the host will naturally map files (in this case, the VM’s disk image) to its own page cache. Thus, disk reads are not performed from the physical disk until the working size of the file becomes larger than available physical RAM. Since phys- ical disk accesses are very slow, any cost associated with Dione instrumentation is negligible compared to the cost of going to disk. The Iozone Write tests (Figures 4.3(c) and 4.3(d)), show some performance degra- dation at small files sizes. Windows must periodically flush writes to the virtual disk, even if the working set fits in the page cache. However, the performance impact is minimal for all file sizes, with a worst-case 10% performance degradation, though it is generally closer to 3%. Additionally, the random write tests do not show the penalty associated with random reads. Since Windows only writes dirty blocks to disk, there are fewer unnecessary accesses to disk. It is also noticeable that speedup values are sometimes greater than 1 for the

77 32 MB file size write tests. This would imply that the benchmark runs faster with instrumentation than without. In reality, this effect is explained by an optimization Windows uses when writing to disk. Instead of immediately flushing writes to disk, writes are buffered and flushed as a burst to disk. With this Lazy-Writing, one eighth of the dirty pages are flushed to disk every second, meaning that that a flush could be delayed up to eight seconds. From the perspective of the user—and therefore, the timer—the benchmark is reported to have completed. In reality, the writes are stored in the page cache and have yet to be flushed to disk. The long-running benchmarks will have flushed the majority of their writes to disk before the process returns. However, a short-running benchmark—such as the Iozone benchmarks operating on a 32 MB file—may still have outstanding writes to flush. The time it will take to flush these will vary randomly through the tests. We reported a 21-24% standard deviation (normalized to the mean) for the baseline, instrumentation, and logging tests. This effect is examined in more detail in the next section. For all tests, the cost of logging all accesses is relatively low, falling anywhere from 0-8%. For these tests, the root directory (under which the logs were stored) was on a separate partition than the disk image under instrumentation. Therefore, logging introduced an overhead, as the disk alternated between writing to the log file and accessing the VM’s disk image. This performance penalty can be reduced by storing the log on the same partition as the disk image. Future work can also reduce the overhead by buffering log messages in memory—performing a burst write to the log—to reduce the physical movements of the disk.

78 Normalized Instrumenation Performance Execution Time 180 Inst No Inst Inst + Log 160 Inst 1.2 Inst+Log 140 1.0 120

0.8 100

0.6 80

60 0.4 Execution Time (s) 40 0.2 20 Performance Relative to Baseline

0.0 0 OO Gimp Firefox OO Gimp Firefox OO Gimp Firefox OO Gimp Firefox Install Install Install Uninstall Uninstall Uninstall Install Install Install Uninstall Uninstall Uninstall (a) Performance of Dione instrumentation (b) Average execution time with and without (error bar equals one standard deviation). Dione instrumentation.

Figure 4.4: Evaluation of Dione instrumentation for Open Office, Gimp, and Firefox Install/Uninstall benchmarks.

Installation Benchmarks

In the second set of performance experiments, we evaluated the overhead of bench- marks that are high in metadata accesses. These tests will heavily stress the live updating part of Dione’s execution. We ran the same six install/uninstall bench- marks as in the accuracy tests; the number of creations (including delayed), deletions, and moves were listed in Table 4.3. We ran each test ten times to average out the variation inherent in running a user-space application on a virtual machine; for each run, we started from a the same clean disk image snapshot. We used a Windows XP SP3 virtual machine with an 8 GB virtual disk and 512 MB of virtual RAM. We compared the baseline execution (with no instrumentation) to full instrumen- tation with Dione with and without logging. Figure 4.4 graphs the execution times of the three configurations, as well as the performance of Dione instrumentation

79 relative to the baseline execution. As Figure 4.4 shows, even when the workload requires frequent metadata analy- sis for live updating, the overhead of instrumentation is low. Without logging, the full instrumentation of the benchmarks causes between a 1% and 5% performance degradation. The three benchmarks with the least penalty are OpenOffice installation and unin- stallation and Gimp installation. These have between 1-2% performance degradation for instrumentation without logging, compared to 5% for Firefox Install and Gimp Uninstall (Firefox Uninstall is excluded for now, and explained in more detail be- low). Figure 4.4(b), which graphs the average execution times of the six benchmarks, provides more insight. These three benchmarks are the longest running of the six benchmarks, which is important because of how Windows performs writes to disk. As described in the previous section, Windows will perform a burst flush to disk, and writes could be delayed as many as eight seconds before they are flushed from the page cache. While the program is reported to have completed, there are still out- standing writes that need to be flushed to disk. This effect is especially pronounced in any program with a runtime on the same order of magnitude as the write delay. We can see this effect in Figure 4.4, which includes error bars showing the normal- ized standard deviation for the 10 runs of each benchmark. The 3 longest-running benchmarks also have the lowest standard deviations. This means that the results of these three tests are the most precise, and the average reflects the true cost of instrumentation. While two of the three shortest-running benchmarks have the high- est reported cost of instrumentation, the standard deviation between tests is greater than the reported performance penalty. The execution time of the Firefox Uninstall is dwarfed by the time Windows may delay its writes—as reflected in its high standard

80 deviation. In practice, this means that a user is unlikely to ever notice a slowdown attributed to disk instrumentation for short bursts of disk activity. These tests also show between a 0% and 9% performance decrease. In these tests, the disk image resided on the same partition as the log file. Therefore, the cost of logging to a file was lower than for the content tests.

4.6 Registry Monitoring

As discussed in Section 2.3.1, Windows stores configuration data for the operating system, users, and applications in the Windows registry. While some of the registry is created on system boot and remains only in memory, much of it is backed up on the disk in the form of Windows registry hive files. There are five registry hive files: system, security, software, default, and SAM. In order to have a dynamic view of the Windows registry hive files that is always up to date, we integrated registry monitoring into Dione. In addition to the file system operations already tracked by Dione, it also tracks when registry keys are created, deleted, or changed. We keep track of the registry hive files in the same way that Windows does: by mapping the files to memory. Initialization of the registry hive files can occur in one of two ways. If the file system state is obtained through a scan of the raw disk, via the operation scan, then an optional argument will carve the registry files out of the raw disk, saving them to memory. These files can also be saved to disk, so that in future system starts, they can be loaded automatically from the saved state, rather then carved from the raw disk. We detail the additional commands that are needed for registry monitoring, as well as new arguments for existing commands, in Table 4.4.

81 New Commands save-registry Save given raw registry hive file from Dione to disk load-registry Load Dione with a previously-saved raw registry hive file stored on disk New Parameters to Existing Commands declare-rule Declare a new rule for instrumentation. New rule type includes: • registry: Save all registry key creations, deletions, changes to ex- isting keys and values. scan Perform a full scan of a disk image (or mounted disk partition), creating all file records from the raw bytes and automatically applying all declared rules. Optionally carve the registry files from the raw disk, saving them in memory for Dione use.

Table 4.4: New commands, as well as new arguments to existing commands, for use to communicate with the Dione daemon to perform registry monitoring.

When a disk write comes across the wire, Dione determines whether that write is to a content sector of one of the registry hive files. If so, it patches its view of the hive file in memory using the sector number, sector count, and associated raw file content. Though writes to other files may be intertwined with writes to the registry, there are never more than three files written to simultaneously; this means that we can judge a series of writes to a hive file to be complete once there have been writes to three other consecutive files. Once a series of writes to the hive file is complete, we parse the hive file using the regfi open source library [55], storing the information for each key and subkey in list. We then compare the previous view of the hive file to this newly-parsed view, and look for any differences. We use a naive algorithm, originally described by Johnson et al. [32] in an internal document, but summarized by [27]. The differencing algorithm we used consists of three steps: First, it goes through the list item by item until two items disagree. Second, it compares the kth item

82 ahead in each list with the k lines following the mismatch, incrementing k in each round, until a match is found. The advantage of this approach is that, if the item to be matched occurs quickly, it will find it quickly. Once the match is found, the algorithm continues to the next disagreement. This algorithm we used is known to be a naive algorithm, though it works well in practice when there are relatively few differences between the items to be compared, and relatively few duplications [27]. With registry modifications, the first case is often true (there are few modifications relative to the list of all keys), and the second is always true (there are no duplications). Unfortunately, occasionally the algorithm produced an inaccurate, noisy trace. In approximately 22 of 1,084 samples, an error in the diff algorithm resulted in an event trace listing the deletion of every key in the registry, followed by the creation of every key in the registry. This error is easily detectable; as a result, we discarded the traces for which the error occurred. Future work will implement a more robust algorithm to avoid this problem.

83 Chapter 5

Labeling Malware Persistence Mechanisms with Dione

In this section, we discuss persistence capability labeling with DCL, the Dione Capa- bility Labeler. We generate specifications for properties, including the service install, service load, system boot, and file access, using Linear Temporal Predicate Logic (LTPL). We support our file loading model with a machine learning classifier that differentiates between two types of file access patterns. We implement an automated testbed and generate Dione traces from over one thousand real-world malware sam- ples, evaluating the accuracy of our models in their ability to detect persistence mechanisms.

5.1 Modeling Persistence Mechanisms with LTPL

In order to demonstrate the successful use of a persistence mechanism to survive and automatically restart after a reboot, we broke each persistence capability into three phases, and DCL models each of the phases. The first phase is installation,

84 whereby the malware makes the necessary changes to the file system (creating new files and modifying existing files) and to the registry (adding new keys and values, and modifying the contents of existing subkeys). The second phase is system boot, whereby we model the sequence of disk operations that are indicative of a system boot. Without the reboot, we cannot test whether the persistence mechanism was successful. Finally, we model the service load, whereby the binary associated with a service, if one was installed, is automatically loaded after reboot. This stage incorporates another model, the file access. This stage demonstrates that the file associated with the persistence mechanism was accessed after the system booted. In order to eliminate false negatives from occurring—with a file access going unlabeled—we keep the model sufficiently generic. In Section 5.4, we bolster the file access model with a machine learning algorithm to differentiate between different types of file accesses, to ensure we correctly label the loading of the program binary associated with the persistence mechanism. In Section 2.4, we discussed model checking and the specification language Linear Temporal Predicate Logic, LTPL, using examples from an x86 instruction trace. In this section, we model persistence capabilities using LTPL, replacing the x86 instruc- tion predicates with seven predicates representing operations obtained from a Dione trace, plus a predicate to perform a regular expression match of two strings. The predicate vocabulary used to model the persistence capabilities from Dione events is provided in Table 5.1. 1

1Recall from Section 2.3.1 that, since keys and values are hierarchically organized, it is useful to think of the hierarchy as analogous to a file system. Each key or value has a path (the concatenation of all keys higher in the hierarchy) and a name, and just like a file, it may optionally hold contents (which we also refer to as the value). Consequently, we can use similar terminology between files and registry keys.

85 Set Name Description RegCreate2(p,n) Event is creation of registry key or value with path p and name n RegCreate3(p,n,v) Event is creation of registry key or value with path p, key name k, and value v MBRRead(s) Event is read of sector offset s of Master Boot Record P ContentRead(f,s) Event is read of sector offset s of file f MetaRead(f) Event is read of metadata associated with file f FileMove(f) Event is move of file to destination file f FileCreate(f) Event is creation of file f RegExMatch(re,s) String s matches the regular expression provided in string re path join Returns the concatenation of an absolute path with F a key or file, resulting in a new path ServicePath The path under which all service subkeys are kept: HKLM\system\ControlSet00X\Services RegExSvcHostEvent The regular expression for an event in which a registry value is created for a service run by svchost.exe (a sys- C tem process that hosts multiple services): REGISTRY CREATION.*ImagePath.*WINDOWS\ system32\svchost.*-k RegExSvcHostFile The regular expression used as the value of the ImagePath registry value when a service is run by svchost.exe: WINDOWS\system32\svchost.*-k

Table 5.1: Function (F), Predicate (P), and Constant (C) symbols for property specifications.

86 We developed the property specifications from domain knowledge. That is, we ob- served both synthetic and real-world software samples, including hand-coded benign software, real-world benign software, and real-world malware samples. We evaluated the models’ accuracy on an entirely different set of samples than the samples we developed the models on.

5.1.1 System Boot

We first model the specification for a system boot, as a detected boot implies the system was shutdown or restarted. A system boot is characterized by a read of Master Boot Record sector 0, followed immediately by a read of the 0th sector of the file content of the file $Boot, followed immediately by a read of the 0th sector of file content of the file $MFT. Equation 5.1 lists the LTPL specification of a system boot.

φSB = F(MBRRead(0) ∧ X(ContentRead(“$Boot”, 0) ∧ XContentRead(“$MFT”, 0))) (5.1)

5.1.2 Service Install

Next, we model the installation of the service. Several events must occur within the trace in order to satisfy the specification for service installation. At some point in the trace, there must be a creation of a key with name k and path equal to the constant string ServiceP ath. There must be a creation of three values; all three have a path that is a concatenation of the constant string ServicePath and the key name k, and with names type, start, and ImagePath, respectively. If there appears any event e in the trace that matches the regular expression of constant RegExSvcHostEvent, there must also be somewhere in the trace a creation of a registry value with name ServiceDll and a path that is the concatenation of

87 the ServicePath, the key name, and the string Parameters. Finally, we require that all the previous events must occur before a system boot. The LTPL specification to perform a service installation is given in Equation 5.2.

φsinst = ∃kF RegCreate2(ServiceP ath, k)

∧ RegCreate2(path join(ServiceP ath, k), “Type”)

∧ RegCreate2(path join(ServiceP ath, k), “Start”)

∧ RegCreate2(path join(ServiceP ath, k), “ImagePath”) . ∧ RegExMatch(RegExSvcHostEvent, e) =⇒

RegCreate2(path join(ServiceP ath, k, “Parameters”), “ServiceDll”)  ∧ FφSB (5.2)

It should be noted that we preprocess the paths provided in ImagePath, ServiceDll, and other path-related registry entries to normalize them. For example, we normalize for Windows allowing certain environment variables in the path, for paths that take advantage of the Window’s specification of a default directory, and we ensure that slashes match the

style of the Dione log (Unix-style forward-slashes).

5.1.3 File Access

A file access is characterized for a given file, f. The file access consists of a read at some point in the trace of a file’s metadata, followed eventually by a read starting at the 0th sector of the same file’s contents. The LTPL specification to perform a file access is given in Equation 5.3.

φFA = ∃f(F(MetaRead(f) ∧ FContentRead(f, 0))) (5.3)

88 As previously noted, this generalization of a file load only checks whether a file was accessed, and does not differentiate between different types of file accesses. In Section 5.4, we will differentiate between a load and another type of access (a file copy) using a machine learning classifier.

5.1.4 Persistent Service Load

In this section, we model the persistent service load. This incorporates several stages. Since the service is installed for the goal of persistence, the specified operations must occur before system reboot in order to ensure that the service binary can automatically load. Additionally, a file access of that service binary must be detected after the reboot. The specification for a service load relies on the idea of a relevant binary. During service installation, there needs to be a registry creation event with name ImagePath and value fip. If the service runs in its own process, fip will be the absolute path and filename of its executable binary. A service load incorporates the registry events required by the service install, plus a file creation or move of a file with the same value as fip. In the future, there must appear a system boot (specified by φSB), followed in the future by a file access of the

same file fip.

Alternatively, if the service is going to be run by the SvcHost service, fip will contain a string to which there must be a regular expression match with the constant string RegExSv-

cHostFile. Furthermore, there must be a creation of a registry value called ServiceDll, and the contents of this value, fdll will specify the path of the executable binary that must be created, then loaded after system reboot. Notice that in Equation 5.2, we specified that a service installation specification required registry creation events of ImagePath and/or ServiceDll, with no requirement of the con- tents of those values. In this specification, we check the value of those registry creation events, and ensure that a file is created to match that executable binary. The LTPL specification to perform a service load given a persistent service installation

89 is given in Equation 5.4.

 φsload = ∃kF RegCreate2(ServiceP ath, k)

∧ RegCreate2(path join(ServiceP ath, k), “Type”)

∧ RegCreate2(path join(ServiceP ath, k), “Start”)  ∧ ∃fip RegCreate3(path join(ServiceP ath, k), “ImagePath”, fip)∧ RegExMatch(RegExSvcHost, fip) ∧ ∃fdll RegCreate3(path join(ServicePath, k, “Parameters”), “ServiceDll”, fdll)

∧ (FileCreate(fdll) ∨ FileMove(fdll))  ∧ F(φSB ∧ F(MetaRead(fdll) ∧ FContentRead(fdll, 0))) ∨ ¬RegExMatch(RegExSvcHostF ile, fip) ∧ (FileCreate(fip) ∨ FileMove(fip))   ∧ F(φSB ∧ F(MetaRead(fip) ∧ FContentRead(fip, 0)))

(5.4)

5.2 Dione Capability Labeler Implementation

DCL is implemented as a behavioral model checker using custom Python code. The behav- ioral model checker implements the specifications of Section 5.1 by hard-coding the states of the model to fit the specification. After the model checker pre-processes the events of the trace, it moves through the states detailed by the specifications of Section 5.1, outputting true if the events satisfy a specification and false otherwise. For each malware sample, it outputs an XML file that lists which of the properties were present in the trace (service installation, reboot, load of service binary). We chose a behavioral implementation, in- stead of an exhaustive model checker, because it is specific to the problem being modeled. Since the three phases occur sequentially, we could break down the traces into smaller sub- traces. As a result, the resulting state machine contains programatically-simple transitions

90 between states. Because of this property, we were able to achieve higher performance with a hand-coded model checker than would be possible if we had used a generic-but-exhaustive model checker, and such scalability is necessary in a malware analysis environment in which hundreds of thousands of samples are discovered each day.

5.3 Experimental Setup

We created a testbed infrastructure in order to automatically load and run a malware sample, observe it with Dione and other tools, and automatically save instrumentation data. The automation of our testbed was instrumental in analyzing more than a thousand real-world samples, so that we could comprehensively evaluate our model checker.

5.3.1 Testbeds

Our testbed consisted of two servers, each running the Xen 4.1.2 hypervisor with an Ubuntu

12.04 Domain 0, and controlled by using virsh commands of the libvirt library. The SUA—the system on which the malware was loaded and instrumented—was 32-bit Windows XP Service Pack 3 with an NTFS file system. Each Windows XP SUA had 512 MB of physical memory, a 16 GB disk, and a secondary 2 GB disk. Two different VMs were loaded with differing amounts and types of software, and had seen varying amounts of use so that they had different disk allocation patterns and fragmentation rates. Our two testbed VMs can be described by the following information:

• VM0: This VM was used to generate the traces that provided us with enough domain knowledge to develop our models. It was also used to train, test, and evaluate our models on over 1,000 samples. This VM was generally clean; it had only enough soft- ware installed to tempt the malware to run. Of the 16 GB disk, it had approximately 9.5 GB of free space, and had a total fragmentation of 7% and a file fragmentation

91 of 16%.

• VM1: The second testbed was used purely for additional testing; we tested approx- imately 350 samples on this testbed, including many samples also run on VM0. We used this testbed to demonstrate that our models work for other 32-bit Windows XP SP3 systems beside the one trained on. As such, no traces generated from this testbed were used in the acquisition of domain knowledge, nor in the creation of our persistence mechanism specifications nor the training of our machine learning classi- fier. This VM was not as clean as VM0; as a result, it had more files on disk and more programs installed, and thus had less free space and higher fragmentation. VM1 had 1.5 GB of free space, 15% total fragmentation, and 30% file fragmentation.

Additionally, we provided a simulated internet to our VMs through the iNetSim frame- work [26]. INetSim simulates common network services, including HTTP, SMTP, DNS, and FTP. Simulating internet services is necessary for two reasons. First, it prevents the malware from causing harm to other systems on the network, since it is not connected to the actual internet. However, it is not enough to simply disable internet access to the VM, as many malware samples will not run unless they can receive responses to simple queries (e.g., by performing a DNS lookup). It is common for malware to attempt to connect to an internet service in order to coarsely detect if it is running in a sandboxed environment, and if so, it will exit to avoid being analyzed. As a result, using iNetSim causes more malware samples to run, allowing us to obtain meaningful traces.

In addition to integrating Dione with the Xen Domain 0, as detailed in Section 4.4, we also integrated the Volatility memory forensics framework [76] into our testbed. Volatility is an open source framework that can extract digital artifacts, including process lists, drivers, and services, from the physical memory of a virtual machine. Volatility can easily integrate with Xen, reading the memory of the VM; our testbed used Volatility to report process lists, lists of DLLs loaded by each file, loaded modules, loaded drivers, and details of installed

92 services. Before running any tests, the VM was booted and warmed up, then the VM was paused and a snapshot was taken of the memory. For each malware sample, the sample was loaded on to the paused SUA, which was then restored from the checkpoint using a copy-on-write (COW) disk (of the Xen VHD disk format) so that any modifications the malware made to the system could be discarded after instrumentation was complete. After running for three minutes, allowing the malware to install itself, the system was restarted. Approximately three minutes after the system restarted, we used Volatility to extract information from the VM’s memory, saved Dione and Volatility logs, and replaced the COW disk with a clean image for the next sample. Due to an artifact of the Xen libvirt library, shutdown of the VM takes on the order of five minutes. As a result, it takes approximately eleven minutes to run and instrument each sample. While this is the dominating time-expensive

part of the analysis, it can be alleviated in future work by parallelizing Dione so that multiple instances can run, each monitoring a single VM, allowing multiple VMs to run simultaneously.

5.3.2 Malware Corpus

Most of the samples were acquired from Anubis [1], an online malware analysis platform. Anubis temporarily stores the samples submitted by the general public for analysis. We acquired these samples from their database over the course of October to December 2012 and took a random sampling. In order to ensure that our corpus contained a sufficient number of service-installing samples to fully evaluate our model checker, we specifically targeted

samples which were known to create a file under the C:\WINDOWS\system32 directory, as many (but not all) services tend to install their executable into this directory. We also manually downloaded numerous samples from Open Malware [56]. In the end, we had obtained 1,084 real-world malware samples with unique MD5 checksums.

93 5.3.3 Assignment of “Truth” Labels

Having obtained samples found in-the-wild, we were tasked with determining ground truth of each malware sample’s behavior in order to evaluate the correctness of our models. Unfortunately, the ground truth problem is nontrivial to solve [4, 49]. As discussed in Section 1.1 and in more detail by Bailey et al. [4], anti-virus companies agree on neither names nor scope and granularity of labels, and clustering algorithms can only evaluate the effectiveness of their own models using these AV labels. Even if trustworthy family/variant labels could be obtained for each, there does not exist a repository of malware descriptions describing each sample, so we could not, for example, verify whether or not a sample was known to install a service. Finally, we face the challenge of environment-sensitive malware. There are many reasons a malware sample may not run in our environment; it may detect it is running in a virtualized platform, it may attempt to download a specific file from the internet and be unable to, it may require a library on the system that is not present, or it may be written for another version of Windows. Malware is also not known to be robust; it is common for either the malware or the entire OS to crash (resulting in the dreaded BSOD, or “Blue Screen of Death”). Thus, even if a malware sample were known to exhibit the capabilities we are modeling in another environment, we could not be certain that it exhibited that capability in our environment. For these reasons, we used Cross View Detection, a technique in which two very different views of the same system are compared in order to determine a truth for samples running in our environment. We used this definition of truth to label our samples. First, we used Volatility to extract installed services, processes, DLLs, modules, and drivers from the memory space of our SUA. Then we compared the results of our models to the results extracted using Volatility, and applied a label based on one of two definitions of truth: (1) If Volatility and DCL agree on labels, those labels are deemed truthful, or (2) If DCL and Volatility do not agree, we performed manual analysis to break the tie and apply a truthful label. Manual analysis consisted of both static and dynamic analysis techniques, including

94 disassembly, analysis with IDA Pro, running the samples using in-host analysis tools like ProcMon and RegShot, and using the WinDbg kernel debugger on a connected system to step through and analyze kernel code and data structures. We ran the 1,084 samples on VM0. We discarded 59 traces, leaving us with 1,025 viable traces. We discarded these traces for one of three reasons. First, occasionally Volatility wasn’t able to parse data structures from the SUA memory, thus preventing us from es- tablishing truth on the samples. Second, some samples didn’t reboot, (either because the malware blocked shutdown, or because the malware corrupted kernel space so badly that boot caused a BSOD). This prevented us from evaluating persistence. Third, we discarded traces due to the error in the Dione registry monitoring infrastructure discussed in Sec- tion 4.6, since this reflects an issue with the instrumentation infrastructure, not the model checker, and is easily detectable. Of the 1,025 samples that had valid traces, DCL and Volatility output agreed on the labels applied to 974 samples, satisfying our first definition for truth. Of the remaining 52 samples, 27 of them disagreed on whether a service installation occurred, while the remaining 25 agreed that the service installation took place, but did not agree on whether a load after reboot occurred. We manually analyzed these 52 samples to determine the cause of the disagreement in order to assign truthful labels. There were three causes for disagreement on service install between DCL and Volatility output. Of these, the analysis revealed that Dione had applied the correct labels in the first two cases; in the last case, the analysis revealed that Dione mislabeled the sample for its service installing capability.

1. Uninstall Before Snapshot (UBS): A subset of samples were labeled as service installing by DCL, but no service was detected in memory by Volatility. The reason behind this discrepancy is that Volatility only detects the services installed at the

time of the single snapshot during which it parses the SUA’s memory. Dione, on the other hand, continually monitors the disk, detecting every action. In these samples,

95 the service mechanism is used to load driver code into kernel space. Once the driver is loaded, the service is uninstalled. Evidence of this can be found in the other Volatility logs, which show loaded drivers and modules matching the service name. As such, the resulting truth label applied to these 24 samples is service installing, and the DCL labels are correct.

2. Delete Before Reboot (DBR): This sample was labeled as service installing by DCL, but no service was detected in memory by Volatility. This sample used the service mechanism only to load code into memory, not for persistence; it created and deleted the service in rapid succession upon installation. This sample is labeled service installing, and the DCL labels are correct.

3. Installation After Reboot (IAR): These samples were labeled as service installing by Volatility, but not by DCL. These samples installed the services after reboot. They used the service mechanism to load code, then deleted the service immediately thereafter. As they do indeed install services, these samples are labeled as service installing, and DCL labels are incorrect.

There were four causes for disagreement between DCL and Volatility, present in 25 sample traces, regarding whether the installed service was automatically loaded after reboot. If the service successfully loads after reboot, it is considered a service-persisting sample.

4. Unload Before Snapshot (ULBS): These samples samples were labeled by DCL as service loading, but not by Volatility. The reason is similar to “Uninstall Before Snapshot”, above: Volatility operates on a single snapshot taken after the system boots up, and these samples had loaded and unloaded their service before the snapshot was taken. As a result, these samples are labeled persistent service loading, and DCL labels are correct.

96 5. Fast File Deletion (FFD): These samples were labeled as service loading by Volatil- ity, but not by DCL. Analysis showed that these samples created and deleted the files so quickly that the evidence is not flushed to disk. However, system call analysis shows the files are indeed created, the services loaded into memory, and then the original files deleted. As a result, these samples are labeled persistent service loading, and DCL labels are incorrect.

6. File Creation After Boot (FCAB): This sample was labeled service loading by Volatility, but not by DCL. This sample installed its service before reboot (verified by both DCL and Volatility), but didn’t create a service binary to correspond with the service until after reboot. Without a valid binary creation before system shutdown, the service cannot be used for persistence, and thus we label this not persistent service loading, and DCL labels are correct. However, we acknowledge that the definition of service persistence prevents us from labeling samples that load a service for any other reason, and address this in Section 5.3.4.

7. Temporary Service Creation (TSC): This sample was labeled service loading by Volatility, but not by DCL. This sample does not use the service mechanism to persist, only to load driver code into the kernel. It installs and loads the service, then deletes the service and accompanying file. After boot, it repeats the process: Installing and starting the service, loading the binary (the service is started by a third mechanism), then deleting all trace of its existence on disk. Since the service installed is not reloaded automatically after reboot, we label this sample not persistent service loading, and DCL labels are correct. Again, we acknowledge a missed opportunity to label a service loading for non-persistence, and discuss this in Section 5.3.4.

97 5.3.4 Model Checker Results

Figure 5.1 presents the results of our experiments on VM0 in the form of a confusion matrix, with the labels applied by DCL compared to the actual labels. Of the 1,025 samples that produced valid traces on VM0, 197 of them were service installing. Of these 197 samples, there were 63 unique malware variants; 14 variants appeared multiple times in the corpus, with 5 malware variants appearing more than 10 times in the corpus. Table 8.1 in the Appendix lists all 197 samples that install services.

DCL Label DCL Label p n p n

P 196 2 198 p 152 7 159 Actual Actual Label Label

N 0 827 827 n 0 866 866

196 829 1025 152 873 1025 (a) Service Installing (b) Service Loading

Figure 5.1: Confusion matrix for service-installing and service-loading labels applied to samples run on VM0 testbed.

DCL correctly labeled 98.9% (196/198) of the service installing samples; of the 827 non- service installing samples, there were no false positives. This means that DCL correctly labeled 99.8% (1,023/1,025) of all samples in our corpus for the service install capability. This included 25 samples that were not correctly labeled by Volatility, and thus would not be labeled in an approach that relies purely on a single memory snapshot in time. Likewise, of the 1,025 samples that produced valid traces, 159 were service loading and relied on the Windows service mechanism for persistence. Our model successfully labeled 95.6% (152/159) of service loading samples, and did not falsely label any of the remaining 866 non-service loading samples. All together, DCL correctly labeled service

98 Service MD5 Install Load Cause Name(s) 0038ee2524f8bc7cb329e01cad411f0f Forter 1 0 FFD 0061d7b4c7db34437695853252a82474 wowsub 1 0 FFD 08cdc80a346508e6d57efe4a782a9531 PSSdk21 1 0 FFD 0ed50455c7ddece3b8989ff5f02dc442 abp470n5 1 0 FFD 0072c5497f7eae033ee9934492f17180 abp470n5 1 0 FFD 015c976f05bdf3942b9f998b9b1eb7e5 asc3360pr 0 0 IAR NdisFile 025f8ecd28e85da68eb73b58b0d1b1c7 0 0 IAR Services32

Table 5.2: Samples (run on VM0) whose service capabilities were mislabeled by DCL, plus labels assigned by DCL and the cause of the mislabel (according to Section 5.3.3). loading samples in 99.3% of samples (1018/1025). This included 43 not caught by our Volatility-based memory forensics (including the 25 for which Volatility did not catch the service installation). Table 5.3 lists all samples that were mislabeled by DCL on VM0. In order to demonstrate the effectiveness of our models on any Windows XP system—not just the particular system in which we tested our samples—we also evaluated the samples on an entirely different testbed, VM1. This Windows XP VM had a different usage history, and as such, it had different software updates applied, different allocation on disk, different programs, and different fragmentation patterns. No traces from VM1 were considered in obtaining the domain knowledge to develop the service models. We reused most of the service-installing samples, as well as many non-servicing installing samples, from VM0. Out of 362 samples, we obtained 353 valid traces. Figure 5.2 shows the confusion matrix of our results from those samples run on VM1. Of the 353 samples, 153 installed a service. Some malware variants appeared more than once in the corpus; altogether, there were 51 unique variants that installed a service, with 3 variants appearing more than 10 times. Of the samples that installed at least one service, 133 loaded that service after reboot. DCL correctly labeled 99.3% (152/153) of the samples that were servicing installing, and 86.4% (115/133) of the services that were service loading.

99 DCL Label DCL Label p n p n

P 152 1 153 p 115 18 133 Actual Actual Label Label

N 0 200 200 n 0 220 220

152 201 353 115 238 353 (a) Service Installing (b) Service Loading

Figure 5.2: Confusion matrix for service-installing and service-loading labels applied to samples run on VM1 testbed.

There were no false positives in either category, meaning that DCL correctly applied the service installing label 99.7% of the time, and the service loading label 94.9% of the time. Table 8.2 in the Appendix lists all 153 samples that install at least one service. Table 5.3 lists all samples that were mislabeled by DCL on VM1. As Table 5.3 shows, the reason for the low detection rate of service loading samples is due to one malware strain,

loading a service called amsint32. Because this sample appeared in the corpus 11 times, and the detection of its load was prevented by a fast creation-deletion cycle, it resulted in the majority of the false negatives. Altogether, across all 1,378 traces generated on VM0 and VM1, DCL correctly applied the service installing label to 99.8% (1376/1378) of traces, and correctly applied the service loading label to 98.1% (1352/1378) of traces. The confusion matrix for the results of all traces is shown in Figure 5.3.

Discussion of Results

Though DCL didn’t attain 100% detection rate of service installations and service loads, the false negatives provide useful insight for malware researchers into stealthy malware

100 Service MD5 Install Load Cause Name(s) 0038ee2524f8bc7cb329e01cad411f0f Forter 1 0 FFD 0061d7b4c7db34437695853252a82474 wowsub 1 0 FFD 08cdc80a346508e6d57efe4a782a9531 PSSdk21 1 0 FFD 0ed50455c7ddece3b8989ff5f02dc442 abp470n5 1 0 FFD 0072c5497f7eae033ee9934492f17180 abp470n5 1 0 FFD 01cf51be1a4bacb550c35e165d4453d4 amsint32 1 0 FFD 026896a7449afcdac48323afcd71d3c0 amsint32 1 0 FFD 02d275b6110444732f9bef39218d1997 amsint32 1 0 FFD 0685b2bf04e60c65be9bd1f667c07c4a amsint32 1 0 FFD 0bc7b598af22bd4c7a496b25811dc362 amsint32 1 0 FFD 0d2970588384bed4bdb1221003b0a45a amsint32 1 0 FFD 0eb98691c031f995c054375a1ebf89b7 amsint32 1 0 FFD 0f068f1d2b014b217773ffaaf79abec2 amsint32 1 0 FFD 0f36825bea6bf4967403dd9dd5f10a11 amsint32 1 0 FFD 0f5adb96c2a975648667451a50c13f28 amsint32 1 0 FFD 12ada88d49498a43cf4b9274f3fb586c amsint32 1 0 FFD 02fe132fbd9657a60e9bca1b5a3fe747 aic32p 1 0 FFD 144cd9ace18008807329e5c6e7c336e6 aic32p 0 0 IAR

Table 5.3: Samples (run on VM1) whose service capabilities were mislabeled by DCL, plus labels assigned by DCL and the cause of the mislabel (according to Section 5.3.3).

DCL Label DCL Label p n p n

P 348 3 351 p 267 25 292 Actual Actual Label Label

N 0 1027 1027 n 0 1086 1086

348 1030 1378 267 1111 1378 (a) Service Installing (b) Service Loading

Figure 5.3: Confusion matrix for service-installing and service-loading labels applied to all traces generated on VM0 and VM1 .

101 behavior. As Tables 5.2 and 5.3 show, the most common reason for DCL to mislabel a sample is due to the FFD: Fast File Deletion. The loading of services was missed in 23 sample traces:

Forter, Wowsub, PSSdk21 (two traces each), abp470n5 (four traces), amsint32 (eleven traces), and aic32p (two traces). This labeling error occurs when the malware sample creates and deletes a file so quickly that the OS doesn’t have a chance to flush the file to disk. Recall that in Section 2.3.3, it may take up to 8 seconds for data to be flushed to disk, but the liveness of the malicious driver files for these samples is less than one second, sometimes even less than 0.2 seconds. The malware uses the service mechanism not only to persist, but also because it provides a simple way to load malicious driver code into kernel space. Then, knowing that this leaves forensic evidence in the Windows registry and on disk, it deletes the file associated with the driver (the code is hidden somewhere else on disk, so that at the next reboot, the file can again be created, mapped to memory, and deleted). Since the driver code has been mapped to main memory, the driver remains loaded, but without the forensic evidence. Unfortunately, this is a shortcoming of using a disk sensor as the only source of events, since it will, by definition, only catch traffic that is flushed to disk. However, the models are still valid in these cases, so combining a disk sensor with another type of sensor (such as a system call interposer) would be able to catch these quick file creations and thus detect the service load.

The mislabeling of the two remaining samples, asc3360 and NdisFileServices32, was due to an over-precise model for our goal. While we aimed to model any service install, by adding in the requirement that the installation happens before the system boot, we limited the number of services installing samples we could find. In future work, we will eliminate the reboot requirement from the service install model, and test for the service persistence only in the service load model.

102 In the analysis of disagreements between Volatility and DCL, we encountered some in- teresting observations, even when the samples were correctly labeled by DCL. We discovered that several samples were not using the service mechanism for persistence, but rather they were using it because it provides a convenient mechanism to load malicious code into the kernel. Thus, we sometimes saw a service binary being loaded, even though this was not a service being used to persist across reboot. Even though our model labeled the samples correctly as not loading a service persistent binary, it did point out a missed opportunity in capability labeling. Therefore, in future work, we will generalize our specification that detects persistent service loads to detect a service load under any circumstance. Then, we will divide these samples with service-loading capabilities into two specialized models that label the samples based on their use of the service capability. The result will be two labels for the service loading capability: One for persistence, and the other as a capability to load code into memory.

Performance of Model Checker

We timed our model checker using the Linux time command, and reported the user output (the time spent executing the process code in user space). We ran DCL on the system described in Section 5.3.1. Obtaining the capability labels for each of the 1,062 samples of VM0 (including those whose traces would be discarded) took 330 seconds, or 0.31 seconds per sample. Running DCL on the 362 samples of VM1 took 128 seconds, or 0.35 seconds per sample.

5.4 Labeling File Access Type

The successful use of a service as a persistence mechanism implies that the file associated with the service is loaded after a system boot. Analysis of Dione traces for program binary loads yielded an interesting observation: Because the Windows loader grabs certain parts

103 of the binary at different times, the ordering of disk sector accesses in a program binary load looks very different from a standard read (for example, reading for a copy operation) of that same file. Given this observation, we labeled a series of of disk accesses as a program binary load, contrasting with another type of disk read, the file copy.

5.4.1 Motivation

The intuition behind the labeling of a series of reads is apparent when looking at a visual- ization of disk reads. Figure 5.4 visualizes the disk reads encountered for the loading of four executables for running. The x-axis represents (unitless) time, while the y-axis represents the file layout, noted in the offset (in sectors) within the file. Then, each bar represents a single disk read—there is one disk read per time unit, and each bar spans the sectors of the file that were read. The bars are overlaid across a visual representation of the sections of the binary—each background bar represents the offset range within the file in which the section resides. As Figure 5.4 shows, individual disk reads tend to fall on boundaries corresponding with binary sections. The first access always encompasses the first two sectors of the binary, as this contains the binary header and thus a mapping between the offsets in the file and the other sections. As shown in Figure 5.4 (a) and (b), the loader tends to request the resource

(.rsrc) section after reading the header. Often this is succeeded by reads to a .rdata section, before reading the .text section. These trends are present in many binary loading patterns; unfortunately, as shown in Figure 5.4 (c) and (d), this is not always the case. However, there are still some important observations. First, the disk reads don’t often start where the last access stopped. Also, there tends to be some overlap in the sectors read; some sectors are read more than once. Additionally, the accesses tend to be non-uniform in size; that is, the standard deviation of the access size is fairly large. Finally, there may be sectors (even sectors in the middle of the binary) which are not read at all. By contrasting Figure 5.4 with Figure 5.5, which visualizes the common disk read pattern

104 Hydraq: Load Darkshell: Load 180 140 .reloc saber

160 .rsrc .rsrc .data 120 .data 140 .rdata .rdata .text 100 .text 120 Headers Headers

100 80

80 60

Offset in File (Sector) 60 Offset in File (Sector) 40 40

20 20

0 0 0 1 2 3 4 5 6 7 0 2 4 6 8 10 Time Time (a) Hydraq malware service executable (b) Darkshell malware service executable (Rasmon.dll). (regedit32.exe).

Dishingy.F: Load 90 Solitaire: Load .rsrc 120 .rsrc 80 .data .text .data .text 70 Headers 100 Headers

60 80 50

40 60

Offset in File (Sector) 30

Offset in File (Sector) 40 20

10 20

0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0 Time 0 1 2 3 4 5 6 7 Time (c) Dishingy.F malware service executable (d) Solitaire executable. (alg.exe).

Figure 5.4: Visualization of disk access patterns for loading program binaries.

105 for a program copy, it is apparent how different a program load looks from a copy of that same program. In a copy, the accesses do not correspond to section boundaries. The reads tend to occur as a linear sweep across the file, with one disk access picking up precisely where the last one left off. There is no overlap between accesses. Aside from the initial read of the header, the reads are fairly uniform in size, and the number of accesses required to copy the whole file is fairly small. Finally, there are no redundant reads—each sector is read precisely once.

Hydraq: Copy 180 Soliaire: Copy .reloc 120 .rsrc 160 .rsrc .data .data .text 140 .rdata 100 .text Headers 120 Headers 80 100

80 60

Offset in File (Sector) 60

Offset in File (Sector) 40 40

20 20

0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0 Time 0.0 0.5 1.0 1.5 2.0 Time (a) Visualization of the copy of the executable (b) Visualization of the copy of the executable associated with the service of the Hydraq game Solitaire. malware, Rasmon.dll.

Figure 5.5: Visualization of disk access patterns for copying program binaries.

It is these observations which motivated our model of disk access patterns. Unfortu- nately, as is clear from even just the few examples of Figures 5.4 and 5.5, the patterns are different enough that a model checking approach will not work accurately. While there are trends present in the differences between loads and copies, the patterns are still varied enough that if we were to specify a load given a specific—even common—pattern, we would miss the loading of other samples. However, if we generalize the model in order to avoid

106 false negatives, as we eventually did in Section 5.1.3, we end up with a specification that is so generic that it actually models any file access. For that reason, we chose to bolster our file access model with a machine learning classifier, which would classify a series of disk reads as belonging to either a file load or a file copy.

5.4.2 Program Binary Load Classifier

In order to classify a series of disk reads as a program binary load, as opposed to a program copy, we chose the two-class Support Vector Machine (SVM) classifier [22]. The SVM classifier is a supervised learning algorithm; that is, it trains on a set of labeled data, generating a model that will then be used to classify new, unlabeled samples. The problem can be modeled easily as a two-class classifier: our first class is a program load, the second class is a program copy. The Linear SVM algorithm is a simplified version of the SVM problem. With this algorithm, each data point xi is represented by an d-dimensional vector (where each value in the vector is a feature), and the goal of the training phase is to find the way to separate the points into two classes with a hyperplane of dimension d − 1. Any hyperplane can be written as the set of points x satisfying w · x − b = 0, where · is the dot product and w is the normal vector to the hyperplane. The best hyperplane is the one that results in the largest margin between the two classes (any points lying on the margin are referred to as support vectors). A representation of an optimal hyperplane splitting two classes of labeled data with an maximum margin is shown in Figure 5.6. Since the hyperplane in the Linear SVM algorithm must be linear, nonlinear SVM pro- vides an algorithm to find a nonlinear hyperplane in the original feature space. With this method, the dot product is replaced with a nonlinear kernel function, and the algorithm de- termines the maximum margin in a transformed, nonlinear, high-dimensional feature space. One such kernel is the Gaussian Radial Basis Function (RBF), defined by Equation 5.5.

107 Figure 5.6: The optimal hyperplane of a SVM classifier on labeled data.

2 k(xi, xj) = exp(−γ k xi − xj k ), γ > 0 (5.5)

5.4.3 SVM Classifier Implementation

We used the scikit-learn open source library for its SVM classifier in Python [65]. We wrote Python code to process Dione traces of disk reads, and for each series of reads to the same file, we generated a feature vector. The scikit-learn library provides several kernels for the SVM classifier; we found that the Radial Basis Function (RBF), described in Equation 5.5, worked the best on our data. We generated labeled training data from several sources on VM0. For the training data generated from load disk accesses, we created a list of binaries that were loaded at boot time (including drivers and services), and extracted these disk reads from boot logs generated by

Dione. We also created a corpus of executables, including Windows executables, malware, and third party software, and generated a script to run each of these executables from the

108 command line. It is important to use the command line because, if using the Windows explorer, Windows may prefetch binary data from disk in anticipation of read request. For training data that was accessed with a copy, we generated a corpus of Windows system executables, Windows drivers, and third party applications. Again, we generated a script that would copy each of these files via the command line. Each trace was processed with our Python script, and labeled appropriately as either load or copy. Each file access is composed of a series of one or more disk reads, such as was visualized in Figures 5.4 and 5.5. After processing the traces into objects of disk accesses, we constructed a feature vector for each file access. The feature vector vector was comprised of the following features:

1. Consecutive Accesses: We count the number of accesses that occur to the next consecutive sector, that is, to the sector that immediately succeeds the last sector accessed. We normalize this number by dividing it by the total number of accesses, resulting in the percent of accesses that are to the next consecutive sector, and then bucketize the value into one of 10 buckets, each representing 10 percentage points.

2. Skip Amount: We calculate the average number of sectors that are skipped between accesses: that is, how many sectors exist before or behind the next consecutive sector. We normalize this value by dividing by the total number of sectors in the file, resulting in the average percentage of the file that is skipped over between accesses. We then bucketize the value into one of 20 buckets, each representing 5 percentage points.

3. Average Access Size: We calculate the average size of all accesses, as a percent of the whole file. We bucketize the value into one of 10 buckets, each representing 10 percentage points.

4. Access Size Standard Deviation: We calculate the standard deviation of the access size over all accesses, with each access represented as a percentage of the whole

109 file. We bucketize the value into one of 20 buckets, each representing 5 percentage points.

5. Percent of File Accessed: We calculate the percentage of the file that was accessed at least once. We bucketize the value into one of 10 buckets, each representing 10 percentage points.

6. Overlapping Accesses: We count the number of accesses that access at least one sector that has already been accessed. We normalize this number by dividing it by the total number of accesses, resulting in the percent of accesses that a previously accessed sector. We then bucketize the value into one of 20 buckets, each representing 5 percentage points.

5.4.4 Results

Training and Testing

We created 918 labeled samples gathered from traces on VM0 for training and evaluation. The training data is comprised of 737 samples labeled copy and 179 samples labeled load. In order to evaluate our algorithm, we performed 10-fold cross validation. That is, we randomly partitioned the sample set into 10 subsets, and used 9 of the subsets for training and 1 subset for testing. We performed this test 10 times, with a different random partition each time, in order to evaluate our SVM algorithm. Each resulting label is categorized as one of the following: True Load (TL), False Load (FL), True Copy (TC), or False Copy (FC). The results of the 10 rounds of Cross Validation (10CV) testing are shown in the first 10 rows of Table 5.4. Additionally, we generated labeled traces on VM1, as specified in Section 5.3.1, to be used purely for testing. That is, we trained our classifier on all 918 samples gathered from VM0, then tested the classifier on traces from VM1. This dataset, labeled VM1, consists

110 TestID TL FL TC FC % Mislabeled 10CV-0 12 0 78 1 1% 10CV-1 16 0 72 3 3% 10CV-2 21 1 67 2 3% 10CV-3 14 0 73 4 4% 10CV-4 18 0 72 1 1% 10CV-5 13 1 74 3 4% 10CV-6 17 0 70 4 4% 10CV-7 19 0 67 5 5% 10CV-8 12 0 73 6 7% 10CV-9 15 0 72 4 4% 10CV-Avg 15.7 0.2 71.8 3.3 3.5% VM1 122 8 388 12 3.8%

Table 5.4: Results of SVM classifier for 10-fold cross validation (10CV) on Testbed 0 dataset and for VM1 dataset. of 530 samples, of which 396 are labeled copy and 134 are labeled load. The goal of this experiment is to show that the classifier is independent of the testbed the training traces are generated on, that it is universally capable of classifying new samples instrumented on any Windows XP SP3 system. The results of the evaluation of VM1 traces is also shown in Table 5.4, with “TestID” VM1. As Table 5.4, there were an average of 3.5% samples mislabeled in the rounds of the 10-fold cross validation, and 3.8% of samples mislabeled in the VM1 dataset, with mislabels occurring predominantly as loads mislabeled as copied. Across both tests, there were 55 samples that resulted in a mislabel. If we look at the samples that caused the mislabeling of loads as copies, the reason behind the mislabel fits what we intuitively expect. The three causes of sample mislabeling are:

1. Single or Double Read: By far the most common source of mislabeled samples, 80% (44/55) of the samples had an access pattern consisting of only one (33) or two (11) disk reads. As we can intuit, there is not enough information to reliably classify an access pattern consisting of one or two reads.

111 2. Large Files: The second category of mislabeled samples was samples that were very large; this accounted for 7.3% (4/55) of the mislabeled samples. Large files require so many accesses to disk that, even with some non-sequential and overlapping reads in the beginning of the file access, eventually it will resort to many uniform sequential reads as it reads in the bulk of the program binary for loading. The result is that the access overall looks more like a copy as, among other things, the percentage of non-sequential and overlapping accesses approaches zero, and the standard deviation of access size is small. This effect is visualized in Figure 5.7.

3. Windows Drivers: The last category of mislabeled samples is Windows drivers; this category accounted for 9.1% (5/55) of the mislabeled samples. Each of these samples had between three and five disk reads, and yet each read was sequential through the file, roughly uniform in size, did not overlap, and did not pay attention to section boundaries. The drivers that are presenting this problem have a start value

of 0x1, or SERVICE SYSTEM START. These driver services , which are set to load at system start, are loaded by the I/O Manager, from the file ntoskrnl.exe. Alterna-

tively, other services and drivers that are set with start value SERVICE AUTO START or SERVICE DEMAND START are loaded by a different loader later in the boot process, from the file services.exe. Unfortunately, the loader in ntoskrnl.exe appears to load files by grabbing sequential blocks of data from disk, and the effect is that the load looks exactly like a common copy operation. Thus it would be very difficult for any classifier to correctly label this as a load. An example of this case, the loading of driver MRxSmb, is visualized in Figure 5.8.

Given the difficulty of identifying a file access type given a pattern of only one or two accesses, or when the loader itself uses an algorithm that resembles a standard file copy, we are confident in asserting that even manual labeling by an expert analyst would result in a similar number of mislabeled accesses.

112 Malware 0x8097: Load Malware 0x8153: Load 1600 2000

1400

1200 1500

1000

800 1000

600 Offset in File (Sector) Offset in File (Sector)

400 500

200

0 0 0 2 4 6 8 10 12 14 0 10 20 30 40 50 60 70 Time Time (a) File load pattern for malware sample MD5 (a) File load pattern for malware sample MD5 0x809705b2f15f33193fb29d204efd7736. 0x815350b4f362b7fa0fd192b5a173ce5f.

Figure 5.7: Visualization of disk access patterns for loading program binaries.

MRxSmb: Load 900 .reloc

800 .rsrc INIT 700 PAGE PAGE5NET 600 PAGE4BRO PAGE 500 .data .rdata 400 SECUR .text

Offset in File (Sector) 300 Headers

200

100

0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Time

Figure 5.8: Visualization of disk access patterns for Windows driver mrxsmb.sys.

113 Evaluating Malware Service Loads

Finally, having demonstrated that it is possible to classify a file access based on its disk read pattern, we set out to demonstrate that DCL can label malware that utilizes the service persistence mechanism to automatically start on a system boot. We accomplished this by bolstering our model for service load with our SVM classifier, with the goal of demonstrably showing that the service does indeed load at the start of system boot. We trained the classifier using the same training data discussed in Section 5.4.4, and then integrated the SVM classifier with DCL. After the model checker has determined whether a service has been installed, which binary is associated with the service, and whether that binary has been accessed during boot, the classifier then attempts to label that file access as a load. This procedure was used on all samples from VM0 and VM1. Ideally, it would detect every access as a load, since none of these malware samples utilize the copy mechanism (though one could conceive of a malware sample which would perform a file copy/delete operation in order to move around its malicious payload). In actuality, on both VM0 and VM1, there were 28 and 27 samples, respectively, for which the service binary was labeled as accessed, but was not labeled a binary load. However, these two testbed had many samples in common, and it turned out that the same 27 samples were mislabeled in both datasets (with one additional mislabel present in VM0 but not VM1). In summary, the load classifier mislabeled 28 samples. For VM0, 124/152 were correctly labeled a load, and for VM1, 88/115 were correctly labeled a load. Of these 28 samples, 19 were the same variant (Koutodoor), and 3 others were another variant (Sality). So, more precisely, only 8 malware variants were incorrectly labeled. As with the previously-mislabeled samples, there were two causes of mislabel. Of the 28 mislabeled samples, 8 were mislabeled because of the Single Read problem described in Section 5.4.4. The other 20 samples were mislabeled because of a combination of the Double Read and Windows Driver error. Of these 20 samples, 19 were the malware

114 variant Koutodoor. Each of these 19 Koutodoor samples had a file access pattern consisting of only two or three reads, these two access patterns are visualized in Figure 5.9. Having only two reads of information would already present a challenge to the classifier, but the problem is made even more challenging because the file to be loaded is a driver, and Windows driver loads follow the pattern of a file copy rather than a file load. Again, we posit that an expert would be unlikely to perform better, given the disk access patterns that were present in the mislabeled samples.

Koutodoor: Load Koutodoor: Load 90 80

80 70

70 60

60 50 50 40 40 30

Offset in File (Sector) 30 Offset in File (Sector)

20 20

10 10

0 0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time Time (a) Visualization of the 2-read disk load pattern (b) Visualization of the 3-read disk load pattern for a Koutodoor sample. for a Koutodoor sample.

Figure 5.9: Visualization of the two observed disk access patterns of the load of the driver associated with malware Koutodoor.

The remaining sample, called Hupigon, is also installed as a driver. Its file access consists of only two reads, which are both sequential and non-overlapping, as shown in Figure 5.10. While each of these samples are mislabeled because of the pure SVM classification, the algorithm could be modified to automatically label file accesses that consist of a single read,

as well as drivers with Start values of 0x1 (SERVICE SYSTEM START), as loads, since this information is already available to DCL when performing the classification. This would

115 eliminate the false negatives at the risk of future false positives.

Hupigon: Load 70

60

50

40

30 Offset in File (Sector) 20

10

0 0.0 0.5 1.0 1.5 2.0 Time

Figure 5.10: Visualization of the disk access pattern for the driver file associated with malware Hupigon.

116 Chapter 6

Directions for Future Work

This dissertation introduces the concept of capability labeling for persistence mechanisms. There are many directions in which the work could continue from here. First of all, there are many persistence mechanisms that are capable of being modeled from Dione logs. Modeling of additional persistence capabilities could include, but is not limited too, auto- start locations, DLL search order hijacking, and slack-space detection. Auto-start locations are used by Windows to automatically load binaries on system start; the Windows service mechanism can be considered one type of auto run location. In addition to the service mechanism, there are dozens of keys in the registry which store the path of binaries that should be loaded on system boot. By modeling the creation of these specific registry keys, a matching file creation, the system reboot, and the post- boot loading of the relevant binary, this persistence capability could be labeled in malware samples. Another auto-start technique involves simply storing the binary, or a soft link to the binary, in certain directories. On boot, Windows loads any binaries in these directories. A model to detect this capability would consist of a file creation or move in the specific directories, following by reboot and successful load. DLL search-order hijacking is another technique used by malware writers to ensure

117 their code gets loaded—though it may occur at any time, not necessarily at system boot. A problem arises because many programs do not list an absolute path of the dynamic libraries that they are dependent on. When a program starts, Windows searches for the library in a pre-defined order. For example, it will first look in the working directory of the binary, then start looking in pre-defined directories. If malware installs a file with the name of the expected DLL in a directory that is higher in the search list than the expected DLL, it will load the malicious DLL instead. A model for the DLL search-order hijacking would be developed based on the names and absolute paths of files created or moved to a new location. Another area ripe for modeling is in detecting slack space writes. Malware often will write its malicious code into unused sectors of the disk; they may even develop entire file systems to store their malicious code. This code is loaded by another mechanism (for example, the malware may modify the Master Boot Record, forcing it to load the code from the slack space locations on disk). Before this persistence capability could be modeled, it would require extending Dione to detect reads and writes to slack space. Then, the specification would model the write to slack space, and likely a mechanism to load the file from slack space (such as a write to the MBR), followed by reboot and the loading of code from slack space sectors. Another avenue of future research is to develop models for these same persistence ca- pabilities on other Windows systems. Our models were verified, and worked very well for, 32-bit Windows XP SP3. Given that Windows XP will be phased out in favor of newer versions (e.g., Windows 7, Windows 8), it makes sense to evaluate and modify (if neces- sary) the models to detect persistence capabilities on malware running on those operating systems as well. Additionally, the models could be evaluated for 64-bit systems, to see if they behave differently.

118 Chapter 7

Thesis Summary and Contributions

In this work, we described persistence capabilities and how malware uses them not only to persist on disk, but to automatically start once the system boots. We argued that labeling malware samples by the capabilities they possess, rather than by their family or variant, is useful in providing valuable information for developing a strategy for removing malware and preventing its spread to other systems, and also provides a tangible benefit to security researchers in need of a labeled corpus of malware samples. The primary goal of this dissertation was to demonstrate the effectiveness of persistence capability labeling using high-integrity traces of file system and registry events. To that end, the contributions of this dissertation are as follows:

• We developed and implemented Dione, a disk I/O monitoring and analysis infras- tructure that provides descriptive, high-integrity traces of file system and registry activity for a system-under-analysis (SUA). We discussed the challenges of develop- ing an interposer that would convert raw, low-level metadata to high-level file system and registry operations. We discussed the particular challenges of instrumenting

119 NTFS, the notoriously complex closed-source file system used by modern Microsoft Windows computing systems, and we explained how we bridged the semantic and temporal gaps to result in descriptive file system traces.

• We evaluated both the performance and the accuracy of Dione, demonstrating that Dione provides 100% accuracy in reconstructing file system operations. Despite this powerful instrumentation capability, Dione has a minimal effect on the performance of the system. For most tests, Dione results in a performance penalty of less than 10%—in many cases less than 2%—even when processing complex sequences of file system operations.

• We implemented DCL: The Dione Capability Labeler. We modeled the three phases needed to demonstrate persistence: Installation, system boot, and binary loading. We chose the Windows service mechanism as our persistence capability to model, due to its common use, its dangerous side effects, and its complexity to model. Using domain knowledge, we generated specifications for the install of a Windows service, a reboot, and a file access, and expressed these specifications using Linear Temporal Predicate Logic (LTPL). We implemented a behavioral model checker, which extracted events

from a Dione trace and compared it to the states of the modeled specification.

• We evaluated our model checker on over one thousand real-world malware samples, and found that it correctly applied the service installing label to over 99% of the traces, and the service loading label to over 98% of the traces. And yet, even when DCL mislabeled a sample, it yielded new and interesting insight into the stealth behaviors employed by malware.

• Understanding that our model for a program was too generalized, we supported it with a Support Vector Machine (SVM) classifier. We visualized disk access patterns, and showed that the disk reads presented different patterns depending on whether the file access was a binary load or a program copy. We used this information to generate

120 feature vectors for our SVM classifier. We evaluated the classifier using 10-fold cross validation from labeled samples generated from one testbed, then tested it on data generated from an entirely different testbed. In both cases, we reported less than 4% of samples were mislabeled. We applied the same classifier to our malware corpus, and showed that it correctly labeled 79% of the file accesses as loads. We showed that the high rate of mislabels was due to one particular variant of malware that appeared multiple times. Finally, we discussed the causes behind all mislabeled samples in all three tests, and explained how even an expert analyst performing manual analysis of the traces would be unlikely to have a lower error rate.

121 Chapter 8

Appendix

8.1 Tables

Table 8.1: Traces gathered from VM0 labeled by Dione as service installing; that is, they install at least one service.

MD5 Service Name(s) Loaded

08ced09a00dd0940fde58c06aebc7ce1 6to4 1

0cc18acc6d1d65d638b1fa3842761cd5 servernabs4 1

10b155861d8db8bb4f5974b1221207c2 wmicucltsvc 1

0b886590d1a62ffd93583993971d22c8 nwcworkstation 0

a43b0d7a6cf8bd85beebffd85cc56740 winhelp32 1

0038ee2524f8bc7cb329e01cad411f0f forter 0

0e0555bafe4fd3c04dab4ac94c65c602 npf 1 kabsctch28278241470203 1 a2769b11fb509d5d136f5b0d8f1765d4 kapfa 1

(Continued on next page)

122 Table8.1 – continued from previous page

MD5 Service Name(s) Loaded a220f4d07d56e2ef6b9dfccd1cd20543 npf 1

089c2785dc08ae217bd0b6f796c10551 mswindows 1 a5e1533b7c58a1b66cb5579c95a3d3c8 mswindows 1 a2d1fb9c7ae9442635ec1c09a8ce72e2 asc3360pr 0

0a852ff18a07539b18f3bf0e50577d66 npf 1 a20d082368334bda4e9724bf13d22002 261d7905 1 lxrqlvb 0 069c3ee1c2251f36633b24312fbab119 jmtst 1

11a5b416c137601753cef2af6e0e81f3 npf 1

0a417e05aacdff2d6f18670e2cf465a1 wmicucltsvc 1 lxrqlvb 0 006cd94a0d2b6506f924d7062c7f7b19 jmtst 1

09040735f1fc5acc8805583d869ddf49 npf 1

153623fcff9d5e57a098d0ce09637d1c npf 1 eywqojgbztrljdbw 0

012d45a4d6fae317ea11bab576bf8633 gbytqljd 0 nthook 0

0a5c8fab8537fe4804c6d485307e1064 sshnas 1

0b6b39890e1b60a4a4b134277431384d nwcworkstation 0 lxrqlvb 0 11c8b2d430cf08612be40015e5986775 jmtst 1 a11dd7c18905e40551123d7ae2bfece6 serahost 1

1500fe465ce684b153b34d771a1d48e4 dnservice 1

(Continued on next page)

123 Table8.1 – continued from previous page

MD5 Service Name(s) Loaded

0c8ac3a3c592409b34108ece2c37cd3f npf 1 lxrqlvb 0 006f8e5ccbee29a2cd5dab8a43f8a496 jmtst 1

136e57e2213cc8d9a614f3daabf64c34 rpclookup 1

0813d5fa325caa7cd932b4bd1ddec3b8 npf 1 wuomgezwrojhbztr 0

0138a435b6d6eae4429afe3cc84a0cb5 nthook 0 trljdbvt 0

eywqojgbztrljdbw 0

3a3d624f78c306b200ff4e05247cf66e nthook 0 trljdbvt 0

12502c798a1f2c86bce55f5662befe30 6to4 1

1167ad095a5167613db6f1772d78e0db ndisfileservices32 1

0925c10addf72ab0aa79ddf7b0e3da16 npf 1 svkp 1 11080175b453b16a79ffad80f1463d44 adsl 1

lxrqlvb 0 07095643ce2e77d0cf95e3da4a3b0fad jmtst 1

0bc17bd9bea675932a93cff5c81b92fd cdralw 0 e3046863 1 0e8403a5ba76118e898a4ff77087f862 a12c759a 1

eywqojgbztrljdbw 0

3a28f6dd414827004b3f91d9c1e17ced nthook 0 (Continued on next page)

124 Table8.1 – continued from previous page

MD5 Service Name(s) Loaded

olgeywqo 0 a30a8c8a9d73ae58d2188d32293445ac npf 1

07f98c9f11744ef73c991cf320eecc35 npf 0

09762d12f1cc882436990f4188308f7c npf 1

0b3e46bb919dac00845a8b6941de11ce 6to4 1

156160432e696b0008db0f954544deb8 system information n321 1 a19389ca75ca11d72442662e43c18541 npf 1

0061d7b4c7db34437695853252a82474 wowsub 0

000fcab138a015cc63b6ad84fb6c0f67 npf 1

0bfea159ebff886a381158dc5ec2b841 17872078 1 a19f08b61c0330da67e8ca1bfb6859a0 nvmini 1

0b65ee8ded5c9fd4306efa11324a7105 network adapter events 1

072b7a011d60593e647eb885ee76316a npf 1 a1d1479ca0a4ae87d7c381538210563e mswindows 1

02fd70e0c08aac516b545d790a770846 nwcworkstation 1

0c623f30e5d60b938afc05c7572eee83 npf 1

13b4fd1595decd18fd0029749f6b0635 npf 1

12515cf32218016a3010805241440dd8 svrwsc 1

06af40767d2e2b63dd40333fed474aa0 mswindows 1

0aa70c1dea42cf425e298d5a71553c17 npf 1 bztrljebwuomgeyw 0

3b452fd0cefabc61d8af7de26f432f6b nthook 0 olgeywqo 0

(Continued on next page)

125 Table8.1 – continued from previous page

MD5 Service Name(s) Loaded

0c7a64bc4b7371bd5e428fc797e46ab1 ndisfileservices32 0 lxrqlvb 0 016e17c8670eed81d03119d968905057 jmtst 1

12608baf20f111a91c473cbc57fae9ad bord 007 1

0fecf8dae7b58c012e8d0ed816f186c1 mswindows 0

13757d8fd8ee42a20b21f5bd8e56ee82 mswindows 1

01ab74292433b59b8edbfdba8ba51f17 npf 1

09285219e6f361f89ecd25abec02a4d0 osevent 1

0f89ef3397417ba569336a6bbe9a3ba9 npf 1

14fe7fa2c3cc308087b4b09b2ea05751 allowstop 1 a62d37e832fb9b291d8385d043bae510 03f4745e 1 a52a4ee6f29cf99cccb19b7517553ea9 6e4779c5 1

0072c5497f7eae033ee9934492f17180 abp470n5 0

06f9added57c0987bdea3b9196393cff uu0tjp9k 1

0f068f1d2b014b217773ffaaf79abec2 amsint32 0 a01135e7260cf8397bc2f4bb9b8210e4 npf 1

0f4a96ba2ec5ee4cf6ea456dece4eee7 lfdl 0 lxrqlvb 0 102306eda101676228af65e6c1a0c8f5 jmtst 1

ljebwtomgeywrojg 0

106c89ac834064c6957f2c8b97777355 dbvtolge 0 nthook 0 a46ed9d45e9fff42cc4a817d42c9719d npf 1

(Continued on next page)

126 Table8.1 – continued from previous page

MD5 Service Name(s) Loaded

a63a40a09137e06fc2ae40d40ca22f00 npf 1

0ed50455c7ddece3b8989ff5f02dc442 abp470n5 0 cdralw 0 0093c98d8f04d3bc60041783dc63e221 ndisfileservices32 1

a0fc9b78a843e90c777123146dcd921b npf 1 lxrqlvb 0 09eb5b25f156bb0df7873d7615249212 jmtst 1

107bab0f05ce3ceba6d1a05f062e20d6 ndisfileservices32 1

06e5be21c9f649e24d37969bf15c367c driver 0

077065b0da8266eac9aae4715fe70245 6to4 1

008fed18ab661bfcd26f422c26f1789e rpcremote 1

Table 8.2: Traces gathered from VM1 labeled by Dione as service installing; that is, they install at least one service.

MD5 Service Name(s) Loaded

0ada410bc95d8d3dbe2e143f01edd617 npf 1

0061d7b4c7db34437695853252a82474 wowsub 0

11a5b416c137601753cef2af6e0e81f3 npf 1

111428cf955378b63f7b593f9fc80833 npf 1

025d4c909c5e6b83c011fc331f62f0c8 svschost.dll 1

0925c10addf72ab0aa79ddf7b0e3da16 npf 1

(Continued on next page)

127 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

0903221e156b71bb50e80990d3fa5abc npf 1 xkqef 0 0283667e42491442a6cba01187b3db3a tig 1

13a9d4896dd289a72750d6ee847d9356 npf 1 ezwrpjhbztrmjecw 0

3ba8960f956cd77a6566d943a175bc8e nthook 0 wrojhbzt 0

0a053b10ea328aae5a9a12e464f1b4ab npf 1

136e57e2213cc8d9a614f3daabf64c34 rpclookup 1

0bc17bd9bea675932a93cff5c81b92fd cdralw 0

1148bb3fc00f3f62523e798fd7e3e055 npf 1

09bdb377e700d0d50bae68ee528f561a 1c10007c 1 xkqef 0 069c3ee1c2251f36633b24312fbab119 tig 1

3b79532a38ebb1c0de2de639a9e1398f microsoft device manager 1

0c3d0d2e90dc3531cfcffcaca6347e05 6to4 1

0f29b8855a64c171c4ffdc2612c91f9e class file redirector discovery 1 urmkecwuomhezxrp 0

012d45a4d6fae317ea11bab576bf8633 ljebwuom 0 nthook 0

ecwuomhezwrpjhbz 0

3a28f6dd414827004b3f91d9c1e17ced mgeywroj 0 nthook 0

(Continued on next page)

128 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

0b6b39890e1b60a4a4b134277431384d nwcworkstation 0

0b886590d1a62ffd93583993971d22c8 nwcworkstation 0

09040735f1fc5acc8805583d869ddf49 npf 1

0bfea159ebff886a381158dc5ec2b841 17872078 1

02d275b6110444732f9bef39218d1997 amsint32 0

0cf7a24db3a4d3abc10299d76038cab6 oyglqecx 1

07b8c7c08ca21865ad6d1cfbd1fc37a6 ndisfileservices32 1

001cd9f69812b1f164c2a463055a7aca mswindows 1

06f9added57c0987bdea3b9196393cff i6rfdkoe 1 uwnoyab 0 07095643ce2e77d0cf95e3da4a3b0fad gngf 1

02fd70e0c08aac516b545d790a770846 nwcworkstation 1

0ecd7673fbecf130c7063de126b344b6 mswindows 1

0cc18acc6d1d65d638b1fa3842761cd5 servernabs4 1

0aa70c1dea42cf425e298d5a71553c17 npf 1 cdralw 0 0093c98d8f04d3bc60041783dc63e221 ndisfileservices32 1

09285219e6f361f89ecd25abec02a4d0 osevent 1

0d2970588384bed4bdb1221003b0a45a amsint32 0

0f068f1d2b014b217773ffaaf79abec2 amsint32 0 ezxrpjhbzurmkecw 0

3a3d624f78c306b200ff4e05247cf66e nthook 0 wrojhbzt 0

(Continued on next page)

129 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

uwnoyab 0 113579812e50bb160088d9aa30721feb gngf 1

153623fcff9d5e57a098d0ce09637d1c npf 1 uwnoyab 0 027bc198706f0f0f3fde0f20b29e3a72 gngf 1

06af40767d2e2b63dd40333fed474aa0 mswindows 0 uwnoyab 0 09eb5b25f156bb0df7873d7615249212 gngf 1

xkqef 0 13bfc2e87e9aac39d2b91ef0f719a50b tig 1

10b0f432a915e9ccda134d3be14ab9d5 npf 1

02fe132fbd9657a60e9bca1b5a3fe747 aic32p 0

09f6f4c52870deebb4ead267b7a90329 42e44980 1

0f5adb96c2a975648667451a50c13f28 amsint32 0

0eb98691c031f995c054375a1ebf89b7 amsint32 0

0cf4ded67ff076c840fff0b30ed4a423 npf 1 abp470n5 1 0e29e9f12006fc7534b0e10c66d49005 mcidrv 2600 6 0 1

00abea875f3260d4430b717062d31258 6to4 0

023b1621a8945f43eaf0c320120cde3c zbsvc 1

06e5be21c9f649e24d37969bf15c367c driver 0

13b83d9fb880bbd3f63ec18613e41e3a npf 1

156160432e696b0008db0f954544deb8 system information n321 1

(Continued on next page)

130 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

12608baf20f111a91c473cbc57fae9ad bord 007 1

1164c0b138b04ad5cb28f8eca4c12098 npf 1

000fcab138a015cc63b6ad84fb6c0f67 npf 1

017013adb36bd021ddcd651aa54de1af 6to4 1 uwnoyab 0 0714d23d108c582444d812525b53d131 gngf 1

0f89ef3397417ba569336a6bbe9a3ba9 npf 1

06822e5d8d318fa244e354e668d9d394 6to4 1 uwnoyab 0 016e17c8670eed81d03119d968905057 gngf 1

08ced09a00dd0940fde58c06aebc7ce1 6to4 1

13757d8fd8ee42a20b21f5bd8e56ee82 mswindows 1 xkqef 0 102306eda101676228af65e6c1a0c8f5 tig 1

1167ad095a5167613db6f1772d78e0db ndisfileservices32 1 uwnoyab 0 002c1e1520db09cfa07a1adf43bf3dc2 gngf 1

0b3e46bb919dac00845a8b6941de11ce 6to4 1

0c623f30e5d60b938afc05c7572eee83 npf 1

0cf08dd774107ee5bd49ab71dc7b5a1f npf 1

0bb4c544c6cf1d758fc26816c2856c95 mswindows 1

14fe7fa2c3cc308087b4b09b2ea05751 allowstop 1

07f547673f8be16535625ba1e076b765 nwcworkstation 1

(Continued on next page)

131 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

hbztrmjecwuomgez 0

3b452fd0cefabc61d8af7de26f432f6b jgbztrlj 0 nthook 0

13b4fd1595decd18fd0029749f6b0635 npf 1

107bab0f05ce3ceba6d1a05f062e20d6 ndisfileservices32 1

09762d12f1cc882436990f4188308f7c npf 1 bzurmkecwuomhezx 0

01fb28e02f21e99ec1102b73a5f89874 ljebwuom 0 nthook 0

080eee3fea8212fd8db2709c574171fe npf 1

112eb493b0e7699dad5e13cad88138b5 vryhsoftebosew 0

0b65ee8ded5c9fd4306efa11324a7105 network adapter events 1

109a5a7f19a9531cba90bfaed04de250 npf 1 oreans32 1 071069c80b36d8ac51ebaea70a568a40 windowsinfo 0

0e0555bafe4fd3c04dab4ac94c65c602 npf 1

01e1e1b626693032d5c8fed4df5e4c09 npf 1 xkqef 0 006cd94a0d2b6506f924d7062c7f7b19 tig 1

0ed50455c7ddece3b8989ff5f02dc442 abp470n5 0

072b7a011d60593e647eb885ee76316a npf 1

12502c798a1f2c86bce55f5662befe30 6to4 1

0197632dfa58d9f60fe97f120536a751 npf 1

(Continued on next page)

132 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

uwnoyab 0 06e4d5f5bc1edf5b791c8ea28deeffe3 gngf 1

0bc7b598af22bd4c7a496b25811dc362 amsint32 0 e3046863 1 0e8403a5ba76118e898a4ff77087f862 a12c759a 1

xkqef 0 016b4b33d080858e4bf043b997bcfbc4 tig 1

0c7a64bc4b7371bd5e428fc797e46ab1 ndisfileservices32 0

073cb375c3ed2bb46af89f587627f0e3 micorsoft windows service 0

0f36825bea6bf4967403dd9dd5f10a11 amsint32 0

0cf4108444b6c3eaa475fcd3c10c0db5 npf 1

0169402bf2554abf52528a703a4461bc npf 1

01cf51be1a4bacb550c35e165d4453d4 amsint32 0

0d4a8a75f6d260f75aaa1fefbe65eb3d npf 1

14719bcf0d5ea15702283da32c57b34e ntptdb 1

14413b24225013394dca3564c7974e6d npf 1

09519d7988057272c196cf34c1d0cba7 npf 1

0fecf8dae7b58c012e8d0ed816f186c1 mswindows 1

1160efbf0de0792c08f99d46883a19ca npf 1

07f98c9f11744ef73c991cf320eecc35 npf 0

0a852ff18a07539b18f3bf0e50577d66 npf 1 uwnoyab 0 11c8b2d430cf08612be40015e5986775 gngf 1

(Continued on next page)

133 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

090eb88b5a44bb1be4f68a28c04accc9 nwcworkstation 1

0b7fd47a3fe835d6d464a2b429929be5 npf 1

11100da9b9fa6b383671ff8af31d6603 npf 1 dc3fdfde66fffb6cfbec946a237787d8 sysmgr 1 kaseyaagent 1 1035cb9f322146a15346716fb68c42a4 kapfa 1

xrpjhbzurmkecwuo 0

0138a435b6d6eae4429afe3cc84a0cb5 nthook 0 rojhbztr 0

01a08b10703e21f150f17e01828f7119 6to4 1

0685b2bf04e60c65be9bd1f667c07c4a amsint32 0

0de2ef8dc54647b0bc3c2d767a2909ba npf 1

078899307e12aa35227c9d9a465bbf91 npf 0

089c2785dc08ae217bd0b6f796c10551 mswindows 1

10c7b70b011f4151b8725c720b178875 mswindows 1

0813d5fa325caa7cd932b4bd1ddec3b8 npf 1 uwnoyab 0 0270b04226c583414450aea969c5a937 gngf 1

kaseyaagent 1 096a915fd3803433b07734b6c13dcf6a kapfa 1

01ab74292433b59b8edbfdba8ba51f17 npf 1 uwnoyab 0 006f8e5ccbee29a2cd5dab8a43f8a496 gngf 1

(Continued on next page)

134 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

0038ee2524f8bc7cb329e01cad411f0f forter 0

0af1b4ebedea4bcdeebca05053e64882 network confg system 1

0c8ac3a3c592409b34108ece2c37cd3f npf 1

1500fe465ce684b153b34d771a1d48e4 dnservice 1

077065b0da8266eac9aae4715fe70245 6to4 1

08cdc80a346508e6d57efe4a782a9531 pssdk21 0 uwnoyab 0 09889986f3605b9bbf5acca56637c238 gngf 1

svkp 1 11080175b453b16a79ffad80f1463d44 adsl 1

0e128dd0e16b50431afb51c1d55d84c1 registry system service 1

0e9f0635c60f7225d10397cdd7d22e4d svrwsc 1

0072c5497f7eae033ee9934492f17180 abp470n5 0

12515cf32218016a3010805241440dd8 svrwsc 1

0f4a96ba2ec5ee4cf6ea456dece4eee7 lfdl 0

02936b913d1688fee664ea02c09bcc03 nwcworkstation 1 zwrpjhbztrmjecwu 0

106c89ac834064c6957f2c8b97777355 nthook 0 ojhbztrm 0

0a5c8fab8537fe4804c6d485307e1064 sshnas 1

026896a7449afcdac48323afcd71d3c0 amsint32 0 uwnoyab 0 025778f4315812baadef07fa35b1a443 gngf 1

(Continued on next page)

135 Table8.2 – continued from previous page

MD5 Service Name(s) Loaded

11d7b18173a7f131b5f59b92c2d985ea npf 1

12ada88d49498a43cf4b9274f3fb586c amsint32 0

008fed18ab661bfcd26f422c26f1789e rpcremote 1

09a20d0147e3b2fd6ac712e6f88496c5 winxzrssf 1

136 Bibliography

[1] Anubis: Analyzing Unknown Binaries. http://anubis.iseclab.org. Accessed on September 1, 2013.

[2] Apel, M., Bockermann, C., and Meier, M. Measuring similarity of malware behavior. In Local Computer Networks (LCN) (2009).

[3] Azmandian, F., Moffie, M., Alshawabkeh, M., Dy, J., Aslam, J., and Kaeli, D. Virtual machine monitor-based lightweight intrusion detection. SIGOPS Operating Systems Review 45 (July 2011).

[4] Bailey, M., Oberheide, J., Andersen, J., Mao, Z. M., Jahanian, F., and Nazario, J. Automated classification and analysis of internet malware. In Recent Advances in Intrusion Detection (RAID) (2007), Springer-Verlag.

[5] Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neuge- bauer, R., Pratt, I., and Warfield, A. Xen and the art of virtualization. In Pro- ceedings of the Nineteenth ACM Symposium on Operating Systems Principles (2003), SOSP ’03, ACM, pp. 164–177.

[6] Bayer, U., Kruegel, C., Kirda, E., Comparetti, P. M., and Hlauschek, C. Scalable, behavior-based malware clustering. In Network and Distributed System Security Symposium (NDSS) (2009).

137 [7] Beaucamps, P., Gnaedig, I., and Marion, J.-Y. Abstraction-based malware analysis using rewriting and model checking. In Computer Security – ESORICS 2012, S. Foresti, M. Yung, and F. Martinelli, Eds., vol. 7459 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2012, pp. 806–823.

[8] Bellard, F. Qemu, a fast and portable dynamic translator. In USENIX Annual Technical Conference (2005), USENIX Association.

[9] Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M., Lavoie, Y., and Tawbi, N. Static detection of malicious code in executable programs. In Symposium on Requirements Engineering for Information Security (2001).

[10] Blunden, B. The Rootkit Arsenal: Escape and Evasion in the Dark Corners of the System. Wordware Publishing, Inc, 2009.

[11] Butler, K. R., McLaughlin, S., and McDaniel, P. D. Rootkit-resistant disks. In Computer and Communications Security (CCS) (2008), ACM, pp. 403–416.

[12] Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., and Kirda, E. A quantitative study of accuracy in system call-based malware de- tection. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (New York, NY, USA, 2012), ISSTA 2012, ACM, pp. 122–132.

[13] Carrier, B. The Sleuth Kit (TSK). http://www.sleuthkit.org. Accessed on October 1, 2011.

[14] Carrier, B. File System Forensic Analysis. Addison-Wesley, 2005.

[15] Chen, X., Andersen, J., Mao, Z., Bailey, M., and Nazario, J. Towards an understanding of anti-virtualization and anti-debugging behavior in modern malware. In Dependable Systems and Networks (DSN) (2008), pp. 177 –186.

138 [16] Chow, J., Garfinkel, T., and Chen, P. M. Decoupling dynamc program analysis from execution in virtual environments. In USENIX Annual Technical Conference (2008), USENIX Assoc.

[17] Christodorescu, M., and Jha, S. Testing malware detectors. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2004) (Boston, MA, USA, July 2004), ACM Press, pp. 34–44.

[18] Christodorescu, M., and Jha, S. Static analysis of executables to detect malicious patterns. Tech. rep., DTIC Document, 2006.

[19] Christodorescu, M., Jha, S., and Kruegel, C. Mining specifications of malicious behavior. In Proceedings of the the 6th joint meeting of the European software engi- neering conference and the ACM SIGSOFT symposium on The foundations of software engineering (New York, NY, USA, 2007), ESEC-FSE ’07, ACM, pp. 5–14.

[20] Christodorescu, M., Jha, S., Seshia, S., Song, D., and Bryant, R. Semantics- aware malware detection. In Security and Privacy, 2005 IEEE Symposium on (may 2005), pp. 32 – 46.

[21] Chubachi, Y., Shinagawa, T., and Kato, K. Hypervisor-based prevention of persistent rootkits. In Symposium on Applied Computing (SAC) (2010), ACM.

[22] Cortes, C., and Vapnik, V. Support-vector networks. Machine Learning 20, 3 (1995), 273–297.

[23] Dinaburg, A., Royal, P., Sharif, M., and Lee, W. Ether: malware analysis via hardware virtualization extensions. In Computer and Communications Security (2008), ACM.

139 [24] Garfinkel, T., and Rosenblum, M. A virtual machine introspection based archi- tecture for intrusion detection. In Network and Distributed System Symposium (NDSS) (2003).

[25] Hennessy, J. L., and Patterson, D. A. Computer Architecture: A Quantitative Approach, 5 ed. Morgan Kaufmann, 2012.

[26] Hungenberg, T., and Eckert, M. INetSim: Internet Services Simulation Suite. http://www.inetsim.org. Accessed on September 25, 2012.

[27] Hunt, J., and McIlroy, M. An algorithm for differential file comparison. Tech. Rep. 41, Bell Laboratories, July 1976.

[28] Huth, M., and Ryan, M. Logic in Computer Science: Modelling and reasoning about systems, vol. 2. Cambridge University Press, 2004.

[29] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual, September 2013.

[30] Jang, J., Brumley, D., and Venkataraman, S. Bitshred: feature hashing mal- ware for scalable triage and semantic analysis. In Computer and communications security (CCS) (2011), ACM.

[31] Jiang, X., Wang, X., and Xu, D. Stealthy malware detection through VMM-based “out-of-the-box” semantic view reconstruction. In Computer and communications se- curity (CCS) (2007), ACM, pp. 128–138.

[32] Johnson, S. ALTER: A Comdeck comparing program. Tech. rep., Bell Laboratories Internal Memorandum, 1971.

[33] Joshi, A., King, S. T., Dunlap, G. W., and Chen, P. M. Detecting past and present intrusions through vulnerability-specific predicates. In ACM Symposium on Operating Systems Principles (SOSP ’05) (2005), pp. 91–104.

140 [34] Kang, M. G., Yin, H., Hanna, S., McCamant, S., and Song, D. Emulating emulation-resistant malware. In Workshop on Virtual Machine Security (2009), ACM.

[35] Kim, G. H., and Spafford, E. H. The design and implementation of tripwire: a file system integrity checker. In Computer and Communications Security (CCS) (1994), ACM, pp. 18–29.

[36] Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H. Detecting ma- licious code by model checking. In Detection of Intrusions and Malware, and Vul- nerability Assessment, K. Julisch and C. Kruegel, Eds., vol. 3548 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2005.

[37] Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H. Proactive de- tection of computer worms using model checking. Dependable and Secure Computing, IEEE Transactions on 7, 4 (2010).

[38] King, S. T., and Chen, P. M. Backtracking intrusions. In Symposium on Operating Systems Principles (SOSP) (2003), ACM.

[39] Kirda, E., Kruegel, C., Banks, G., Vigna, G., and Kemmerer, R. A. Behavior-based spyware detection. In Proceedings of the 15th conference on USENIX Security Symposium - Volume 15 (Berkeley, CA, USA, 2006), USENIX-SS’06, USENIX Association.

[40] Kolbitsch, C., Comparetti, P. M., Kruegel, C., Kirda, E., Zhou, X., and Wang, X. Effective and efficient malware detection at the end host. In Proceedings of the 18th conference on USENIX security symposium (Berkeley, CA, USA, 2009), SSYM’09, USENIX Association, pp. 351–366.

[41] Krishnan, S., Snow, K. Z., and Monrose, F. Trail of bytes: efficient support for forensic analysis. In Computer and Communications Security (2010), ACM.

141 [42] Kroger,¨ F., and Merz, S. Temporal Logic and State Systems. Springer, 2008.

[43] Kruegel, C., Kirda, E., and Bayer, U. TTAnalyze: A tool for analyzing malware. In European Institute for Computer Antivirus Research (EICAR) (2006).

[44] Kruegel, C., Robertson, W., and Vigna, G. Detecting kernel-level rootkits through binary analysis. In Computer Security Applications Conference, 2004. 20th Annual (dec. 2004), pp. 91–100.

[45] Labs, M. Mcafee threats report: First quarter 2013. Report, McAfee Inc., May 2013.

[46] Labs, M. Mcafee threats report: Second quarter 2013. Report, McAfee Inc., May 2013.

[47] Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., and Kirda, E. Accessminer: using system-centric models for malware protection. In Proceedings of the 17th ACM conference on Computer and communications security (New York, NY, USA, 2010), CCS ’10, ACM, pp. 399–412.

[48] Lee, T., and Mody, J. J. Behavioral classification. In European Institute for Com- puter Antivirus Research (2006).

[49] Li, P., Liu, L., Gao, D., and Reiter, M. K. On challenges in evaluating malware clustering. In Proceedings of the 13th international conference on Recent advances in intrusion detection (2010), Springer-Verlag.

[50] Lindorfer, M., Kolbitsch, C., and Comparetti, P. M. Detecting environment- sensitive malware. In Recent Advances in Intrusion Detection (2011).

[51] Mankin, J., and Kaeli, D. Dione: A flexible disk monitoring and analysis frame- work. In Proceedings of the 15th International Conference on Research in Attacks, Intrusions, and Defenses (Berlin, Heidelberg, 2012), RAID’12, Springer-Verlag.

142 [52] Martignoni, L., Stinson, E., Fredrikson, M., Jha, S., and Mitchell, J. C. A layered architecture for detecting malicious behaviors. In Proceedings of the 11th in- ternational symposium on Recent Advances in Intrusion Detection (Berlin, Heidelberg, 2008), RAID ’08, Springer-Verlag, pp. 78–97.

[53] Rootkits, part 1 of 3: The growing threat. White-Paper, McAfee Avert Labs, 2006.

[54] Microsoft security intelligence report, June 2012.

[55] Morgan, T. D., and Carter, G. regfi: Windows NT read-only registry li- brary. http://www.http://projects.sentinelchicken.org/data/doc/reglookup/ regfi/. Accessed on September 1, 2013.

[56] Open Malware. http://www.offensivecomputing.net. Accessed on September 1, 2013.

[57] Paleari, R., Martignoni, L., Roglia, G. F., and Bruschi, D. A fistful of red- pills: How to automatically generate procedures to detect cpu emulators. In USENIX Conference on Offensive Technologies (WOOT) (2009), USENIX Assoc.

[58] Payne, B. D., de A. Carbone, M. D. P., and Lee, W. Secure and flexible monitoring of virtual machines. In Annual Computer Security Applications Conference (ACSAC) (2007).

[59] Pennington, A. G., Strunk, J. D., Griffin, J. L., Soules, C. A. N., Goodson, G. R., and Ganger, G. R. Storage-based intrusion detection: Watching storage activity for suspicious behavior. In USENIX Security Symposium (2003).

[60] Rieck, K., Holz, T., Willems, C., Dussel,¨ P., and Laskov, P. Learning and classification of malware behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) (2008), Springer-Verlag.

143 [61] Rieck, K., Trinius, P., Willems, C., and Holz, T. Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19 (December 2011).

[62] Russinovich, M. DiskMon for Windows v2.01. http://technet.microsoft.com/ en-us/sysinternals/bb896646. Accessed on November 24, 2011.

[63] Russinovich, M. Inside the registry. http://technet.microsoft.com/library/ cc750583.aspx. Accessed on September 25, 2013.

[64] Russinovich, M. E., and Solomon, D. A. Microsoft Windows Internals, 4 ed. Microsoft Press, 2005.

[65] scikit-learn: Machine Learning in Python. http://www.scikit-learn.org. Accessed on September 2, 2013.

[66] Sikorski, M., and Honig, A. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press, 2012.

[67] Singh, P., and Lakhotia, A. Static verification of worm and virus behavior in binary executables using model checking. In Information Assurance Workshop, 2003. IEEE Systems, Man and Cybernetics Society (june 2003), pp. 298 – 300.

[68] Sitaraman, S., and Venkatesan, S. Forensic analysis of file system intrusions using improved backtracking. In Proceedings of the Third IEEE International Workshop on Information Assurance (Washington, DC, USA, 2005), IEEE Computer Society, pp. 154–163.

[69] Song, F., and Touili, T. Efficient malware detection using model-checking. In FM 2012: Formal Methods, D. Giannakopoulou and D. M´ery, Eds., vol. 7436 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2012, pp. 418–433.

[70] Song, F., and Touili, T. Pushdown model checking for malware detection. In Tools and Algorithms for the Construction and Analysis of Systems, C. Flanagan and

144 B. K¨onig,Eds., vol. 7214 of Lecture Notes in Computer Science. Springer Berlin Hei- delberg, 2012, pp. 110–125.

[71] Song, F., and Touili, T. Ltl model-checking for malware detection. In Proceedings of the 19th international conference on Tools and Algorithms for the Construction and Analysis of Systems (Berlin, Heidelberg, 2013), TACAS’13, Springer-Verlag, pp. 416– 431.

[72] Stinson, E., and Mitchell, J. C. Characterizing bots’ remote control behavior. In Proceedings of the 4th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment (Berlin, Heidelberg, 2007), DIMVA ’07, Springer-Verlag, pp. 89–108.

[73] Stolfo, S. J., Hershkop, S., Bui, L. H., Ferster, R., and Wang, K. Anomaly detection in computer security and an application to file system accesses. In Foun- dataions of Intelligent Systems (ISMIS) (2005).

[74] Sundararaman, S., Sivathanu, G., and Zadok, E. Selective versioning in a secure disk system. In Proceedings of the 17th conference on Security symposium (2008), USENIX Association.

[75] http://www.vmware.com. Accessed on May 10, 2012.

[76] The Volatility Framework: Volatile memory artifact extraction utility framework.

http://www.volatilesystems.com/default/volatility. Accessed on May 17, 2012.

[77] Wang, Y.-M., Beck, D., Roussev, R., and Verbowski, C. Detecting stealth software with strider ghostbuster. In 2005 International Conference on Dependable Systems and Networks DSN05 (2005), Ieee, pp. 368–377.

145 [78] Willems, C., Holz, T., and Freiling, F. Toward automated dynamic malware analysis using CWSandbox. IEEE Security Privacy 5, 2 (March-April 2007).

[79] Yan, L.-K., Jayachandra, M., Zhang, M., and Yin, H. V2E: Combining hard- ware virtualization and software emulation for transparent and extensible malware analysis. In Virtual Execution Environments (VEE) (2012).

[80] Yin, H., Song, D., Egele, M., Kruegel, C., and Kirda, E. Panorama: capturing system-wide information flow for malware detection and analysis. In Proceedings of the 14th ACM conference on Computer and communications security (New York, NY, USA, 2007), CCS ’07, ACM, pp. 116–127.

[81] Zhang, Y., Gu, Y., Wang, H., and Wang, D. Virtual-machine-based intrusion detection on file-aware block level storage. In Symposium on Computer Architecture and High Performance Computing (2006), IEEE Computer Society.

146