CLASSIFICATION OF MALWARE PERSISTENCE
MECHANISMS USING LOW-ARTIFACT DISK
INSTRUMENTATION
A Dissertation Presented by
Jennifer Mankin
to The Department of Electrical and Computer Engineering
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Electrical and Computer Engineering
in the field of Computer Engineering
Northeastern University Boston, Massachusetts
September 2013 Abstract
The proliferation of malware in recent years has motivated the need for tools to an- alyze, classify, and understand intrusions. Current research in analyzing malware focuses either on labeling malware by its maliciousness (e.g., malicious or benign) or classifying it by the variant it belongs to. We argue that, in addition to provid- ing coarse family labels, it is useful to label malware by the capabilities they em- ploy. Capabilities can include keystroke logging, downloading a file from the internet, modifying the Master Boot Record, and trojanizing a system binary. Unfortunately, labeling malware by capability requires a descriptive, high-integrity trace of malware behavior, which is challenging given the complex stealth techniques that malware employ in order to evade analysis and detection. In this thesis, we present Dione, a flexible rule-based disk I/O monitoring and analysis infrastructure. Dione interposes between a system-under-analysis and its hard disk, intercepting disk accesses and re- constructing high-level file system and registry changes as they occur. We evaluate the accuracy and performance of Dione, and show that it can achieve 100% accuracy in reconstructing file system operations, with a performance penalty less than 2% in many cases.
ii Given the trustworthy behavioral traces obtained by Dione, we convert file system- level events to high-level capabilities. For this, we use model checking, a formal veri- fication approach that compares a model extracted from a behavioral trace to a given specification. Since we use Dione traces of file system and registry events, we aim to label persistence capabilities—that is, we label a sample by the mechanism it uses not only to persist on disk, but to restart after a system boot. We model the Windows service, a commonly-employed capability used by malware to persist, load a binary after reboot, and even load dangerous code into the kernel. We model the installation of a Windows service, the system boot, and the file access of the service binary. We test our models on over 1000 real-world malware samples, and show that it success- fully identifies service-installing malware samples over 99% of the time, and malware that loads that service over 98% of the time. Moreover, we demonstrate that we are able to use traces of disk reads to differentiate between two types of file accesses. We show that we can not only detect when a persistence mechanism is installed, but also that the persistence mechanism is successful because we detect the automatic load of the program binary after a system reboot. We correctly identify file access types from disk access patterns with less than 4% of samples mislabeled, and demonstrate that even an expert analyst would have difficulty correctly identifying the mislabeled accesses.
iii Acknowledgements
First and foremost, I would like to thank my husband Dana. Not only would it have been nearly impossible to complete this work without his love and support, but it most definitely would not have been this much fun! I would also like to thank my family for everything they’ve done for me and for supporting me throughout the years. I specifically owe my success to my parents for instilling in me a love of learning and logic, and for emphasizing to me the most important thing is to try.
The insightful and inspiring help from both my academic and industry advisors was critical throughout this entire process, culminating with this dissertation. I would like to acknowledge the tremendous support of my advisor at Northeastern, Dr. David Kaeli, and thank him for his many years of dedication to helping his students achieve great things. I also want to thank my technical supervisors at MIT Lincoln Labo- ratory, Charles Wright and Graham Baker, for developing this exciting research and guiding me throughout the process. Finally, I would like to thank my colleagues at Northeastern and MIT Lincoln Labs for their invaluable feedback and discussions.
iv [This page intentionally left blank.]
v Contents
Abstract ii
Acknowledgements iv
v
1 Introduction 1 1.1 Motivation ...... 3 1.2 Contributions ...... 10 1.3 Organization of Dissertation ...... 12
2 Background 14 2.1 Malicious Software ...... 15 2.1.1 Malware Types ...... 15 2.1.2 Anti-Forensics Techniques ...... 16 2.1.3 Evasion Techniques ...... 18 2.2 Malware Analysis ...... 26 2.2.1 Static Binary Analysis ...... 27 2.2.2 Dynamic Analysis ...... 28 2.3 Windows Concepts ...... 30
vi 2.3.1 The Windows Registry ...... 30 2.3.2 NTFS File System ...... 33 2.3.3 Performance Optimizations for Disk Accesses ...... 36 2.4 Formal Verification and Model Checking ...... 37 2.4.1 Predicate Logic ...... 39 2.4.2 Temporal Logic ...... 41 2.4.3 Linear Temporal Predicate Logic ...... 43 2.5 Summary ...... 44
3 Related Work 45 3.1 Malware Analysis and Instrumentation ...... 45 3.2 Characterizing Malware Behavior ...... 52 3.2.1 Characterizing Malware with Machine Learning ...... 53 3.2.2 Characterizing Malware Using Modeling ...... 55
4 Dione: A Disk Instrumentation Framework 60 4.1 Threat Model and Assumptions ...... 60 4.2 Dione Operation ...... 61 4.2.1 Dione Policy Commands ...... 64 4.2.2 Dione State Commands ...... 65 4.3 Live Updating ...... 66 4.3.1 Live Updating Challenges ...... 66 4.3.2 Live Updating Operation ...... 68 4.4 Disk Sensor Integration ...... 70 4.5 Experimental Results ...... 72 4.5.1 Experimental Setup ...... 72
vii 4.5.2 Evaluation of Live Updating Accuracy ...... 72 4.5.3 Evaluation of Performance ...... 74 4.6 Registry Monitoring ...... 81
5 Labeling Malware Persistence Mechanisms with Dione 84 5.1 Modeling Persistence Mechanisms with LTPL ...... 84 5.1.1 System Boot ...... 87 5.1.2 Service Install ...... 87 5.1.3 File Access ...... 88 5.1.4 Persistent Service Load ...... 89 5.2 Dione Capability Labeler Implementation ...... 90 5.3 Experimental Setup ...... 91 5.3.1 Testbeds ...... 91 5.3.2 Malware Corpus ...... 93 5.3.3 Assignment of “Truth” Labels ...... 94 5.3.4 Model Checker Results ...... 98 5.4 Labeling File Access Type ...... 103 5.4.1 Motivation ...... 104 5.4.2 Program Binary Load Classifier ...... 107 5.4.3 SVM Classifier Implementation ...... 108 5.4.4 Results ...... 110
6 Directions for Future Work 117
7 Thesis Summary and Contributions 119
8 Appendix 122
viii 8.1 Tables ...... 122
Bibliography 137
ix Chapter 1
Introduction
The past decade has been boldly marked by the ongoing arms race between mali- cious software creators and security researchers. Not only are security companies and researchers overwhelmed by the several million new unique samples discovered each month, but the sophistication of malicious software continues to increase as well [46]. Malicious software, or malware, can take many forms. While the amount of harm caused by a malware sample can vary, all malware share the property of having not been installed with the full consent and knowledge of the user. Spyware or adware can be installed on a user’s system, causing annoying pop-ups or violating privacy expectations by tracking user habits [54]. Alternatively, malware may force the system to become part of a network of hijacked machines used to send spam, hijack other systems, or perpetuate Distributed Denial of Service (DDOS) attacks on banks or targets of political protest [10]. Increasingly, malware is used for financial gain. For example, banking threats seek to steal credentials from users or banking systems in order to perpetuate financial crimes, while fake-alert and ransomware threats trick the user into paying either for impostor security software or for the safe return of
1 their “ransomed” data [45]. Rootkits can be particularly dangerous, as they exist to provide additional stealth measures to prevent the user or security products from detecting the presence of the rootkit and any other malware it is packaged with [10]. Rootkits can execute with administrator privilege by attacking and patching the code of the operating system. Though the number of new rootkits discovered in the wild has been decreasing since 2011, tens of thousands of new samples are still discovered every month [46]. Furthermore, there is a common adage in security that the winner between malware and a security product is that which was loaded first. As a result, rootkits are increasingly turning to infecting the Master Boot Record (MBR); since it performs key startup operations, infection of the MBR is a devastating attack on the system [45]. Once a rootkit has breached kernel-level code, it is difficult to trust any security product or malware analyzer running on the infected system. In the past couple decades, research into labeling malware has focused on identi- fying the malware by family or variant. While having labels available for new samples is useful to provide a coarse-grained identification, we argue that labeling the behav- ior of the malware could be more useful than identifying the family it belongs to. Capability labeling is a promising solution to understanding how malware behaves. Instead of identifying malware by its family or strain, identifying malware by the capabilities it possesses allows security products to identify the high-level behaviors that new malware is employing. There are several benefits to labeling or identifying capabilities present in malware or software. A system equipped with on-the-fly capability detection could provide notifications to users when software or malware is installed with certain malicious ca- pabilities. The information could also be used by security researchers and products to
2 outline the necessary steps to clean a system of the infection and prevent intrusions. Furthermore, it allows security researchers to build up large corpuses of labeled sam- ples for future research and experimentation, identifying what each sample actually does. Unfortunately, identifying high-level malware capabilities is a challenging problem. First, it is difficult to obtain a descriptive, high-integrity trace of system events, since malware writers employ a variety of techniques in order to prevent their malware from being analyzed. Second, it is difficult to derive useful high-level behaviors from the trace of events that has been obtained, as high-level behaviors can manifest themselves in a variety of ways.
1.1 Motivation
Currently, state of the art research into malware labeling focuses predominantly on one of two areas: labeling new samples as either malicious or benign, or labeling new samples by family or variant. Early anti-virus technologies relied on signature match- ing to identify and label software samples as malicious; these signatures contained unique byte patterns, such as sequences of instructions, with each signature typically only covering a single malware variant [17]. In order to counter attempts at obfus- cation, researchers and AV vendors introduced the ability to use regular expressions over the byte sequences, for example to skip over arbitrarily-inserted nop instructions, though these too are easily evaded with polymorphic and metamorphic obfuscation techniques [12].
3 Instead of using syntactic signatures—that is, raw byte patterns or regular expressions— researchers have developed semantic models of malicious behavior based on instruc- tion sequences [18]. Malicious behaviors are modeled, and the models abstract in- struction sequences to use variables names and symbolic constants. Then, templates of malicious behaviors are compared against potentially-malicious binaries to detect instruction sequences that are semantically equivalent, rather than identical on a byte level. Abstracting semantic awareness to an even higher level, recent work has focused on behavioral signatures. These behavioral signatures often looked at sequences of system calls, or even higher-level behaviors represented by semantically-equivalent system calls [4, 6, 52]. After building behavioral representations of malware samples, both formal verification and machine learning techniques can be applied to label samples by their maliciousness, or in an effort to divide them into classes based on their family or variant. Unfortunately, deriving family-based labels to identify malware samples presents some significant challenges. Bailey et al. performed a detailed study of anti-virus (AV) products and found that not only do different AV vendors use different labels for different malware samples, but these AV vendors actually disagree on the number and granularity of unique labels in general [4]. The goal of applying familial labels to malware samples is to have a concise clustering of samples, with similar items grouped into clusters that reflect appropriate differences while avoiding having so many labels that the labels become meaningless. With too coarse a clustering, malware samples may be labeled as being from the same family, when in reality they do not share all functionality or capabilities. With too fine a granularity, similar variants within the same family could be labeled as individual families, resulting in a clustering that
4 becomes less distinct as clusters blur together. The problem of labeling by family is further exacerbated by the lack of “ground truth” labels. When researchers attempt to assess the quality of their clustering algorithms, they often choose samples that many AV vendors can easily label. This results in a malware corpus of “easy-to-label” samples, and thus the effectiveness of labeling algorithms cannot be extrapolated onto larger datasets for which ground truth is not known [49]. The blending and merging of malware samples arises from the relative ease with which bad-actors can generate new malware samples. Malware writers can use ob- fuscation techniques to produce samples with unique hashes and signatures. For example, polymorphic techniques encrypt the body of the code, decrypting on the fly during execution [20]. Meanwhile, metamorphic techniques change the structure of the code—for example, using instruction reordering, insertion of junk instructions, and registry renaming—while ensuring that the semantics of the code remains the same [20]. Additionally, malware can be written in high-level programming lan- guages, and source code and malware kits can be found on the Internet for little or no cost, allowing even those with minimal programming skills to generate malware. This means that malware writers can create new variants by adding new functionality to old variants or by mixing existing components. The result of these techniques is that the differentiation between malware families and variants begins to blur. As the number of unique malware samples found in the wild continues to increase, we posit that it is more useful to identify a malware sample by the behavioral char- acteristics it possesses than by a variant label. A “capability” can be broadly defined as being any intended feature of the software. Keystroke logging, downloading a file from the internet, trojanizing a system binary, and overwriting the Master Boot Record are all examples of malware capabilities. Instead of applying a single family
5 label to a malware sample, each sample would be labeled with all of the capabilities it employs. Labeling a sample by the capabilities it possesses, rather than by its family, pro- vides several benefits. The first is that it provides an opportunity for alerting a user or administrator when benign or malicious software installs or employs a potentially dangerous or intrusive capability. Capability labeling can also identify how malware infects a system, how it propagates to other systems, how it survives and restarts after a reboot, or how it hides from the user or security products. Understanding each of these characteristics is critical in developing products or advisories to clean systems after infection, and to prevent malware from spreading to other systems. A secondary benefit to capability labeling is in assisting security and malware researchers, as it would allow researchers to build up large corpuses of malware for which the high-level behaviors of each sample is known. If a researcher needs to test a malware removal tool, for example, on real-world samples, they could simply query the corpus for all samples that are labeled as having the specific capabilities that allow them to persist on a system. The first challenge in the labeling of malware based on behavior, whether by capability, by family, or by maliciousness, is in obtaining a descriptive, high-integrity behavioral trace, as even the best malware labeling algorithm cannot be accurate if is processing an incomplete or inaccurate behavioral trace [20]. Behavioral traces can be acquired using both static and dynamic techniques. Static analysis, in which the binary is examined without actually executing it, is scalable but can be prevented by malware looking to escape analysis [6, 43, 79]. For example, disassembling a malware sample to obtain an instruction trace is useful for extracting the control and data flow of a malware sample. However, in practice, malware often utilizes techniques to
6 prevent disassembly from occurring in the first place, maintaining the code’s original functionality while transforming the binary. Dynamic analysis attempts to understand the malware behavior by observing the malware as it runs, collecting system call information, instruction traces, or other events. While dynamic analysis avoids the obfuscation problems of static analysis, it too has limitations. Manual analysis, performed by attaching a piece of malware to a debugger, is too time-consuming to be scalable; furthermore, malware can detect when it is running with a debugger and evade analysis [34, 57]. Similarly, analyzers running in-host can be detected by malware and uninstalled or misled. Running malware in a virtualization or emulation layer to collect event traces provides protection from the malware and, as a result, provides the broadest coverage of malware analysis. However, even these techniques can be detected by malware, and the malware can then voluntarily exit to avoid analysis. Recent work suggested that nearly 25% of malware utilized techniques to detect dynamic analyzers, and evaded analysis by exiting [50]. Thus, the biggest problem of dynamic analysis is that it only reveals what was actually executed, not all potential behaviors that could manifest on a given system. In order to address this challenge—the acquisition of a descriptive, high-integrity trace—it is important for malware analyzers to work at either a higher privilege level or a lower semantic level than the malware. In this dissertation, we present a disk instrumentation and analysis infrastructure that does both. Dione, the Disk I/O aNalysis Engine, is a flexible, portable, policy-based disk monitoring infrastructure which facilities the collection and analysis of disk I/O. It uses information from a sensor interposed between a System Under Analysis (SUA) and its hard disk. Since it monitors I/O outside the reach of the operating system, it is resilient to stealth
7 measures employed by rootkits—including those with administrator-level privilege. Instead of relying on constructs that can be manipulated by malware, Dione recon- structs high-level file system and Windows registry operations using only low-level intercepted metadata and disk sector addresses. The second challenge to capability labeling is that, even after a high-integrity, descriptive event trace has been obtained, it is necessary to convert the lower-level events of the trace into higher-level behaviors or capabilities. The type of trace that is available informs the types of capabilities that can be labeled. Since Dione provides comprehensive, high-integrity events taking place at the disk level, we show that we can infer high-level properties relating to the persistence capabilities of malware. That is, we use the traces generated from Dione to demonstrate not only how malware persists on disk, but how malware automatically restarts after a system is rebooted. Persistence capabilities include trojanizing a system binary, overwriting the MBR to force malicious code to load, utilizing the Windows service mechanism to automati- cally load code or drivers at boot time, or pointing special auto-start registry keys to the malicious code. Given the descriptive, high-integrity traces produced by Dione, we set out to label malware samples possessing certain capabilities that are used to persist and restart upon system reboot. For our Dione Capability Labeler (DCL), we use model check- ing, an algorithmic formal verification method used to verify properties of software. Model checking is a property verification approach in which a property is specified using a description language, resulting in a logic formula [28]. Likewise, a system (in this case, a trace of malware execution) is also modeled using description language. Then, the model of the system is compared to the logic formula to determine whether the model satisfies the specification.
8 In the context of detecting a property—in this case, a capability—in malware, the capability is specified in the description language, or logic, of the model checker. This specification describes the behaviors (and the temporal ordering between behaviors) that would be present in the behavioral trace if the malware possessed that capability. Then, a model is extracted from each behavioral trace, and the model is compared to the specification to determine whether the model fits the specification. If so, the malware is labeled as having the specified capability. In this work, we use the specification language Linear Temporal Predicate Logic [7, 42, 71], or LTPL, to model our capability specifications. We model three phases of program behavior based on events gathered from Dione traces: (1) Installation of persistence capability, (2) System boot, and (3) File access (program load) after reboot. We chose these three stages because automatic loading of a program after reboot demonstrates persistence and successful automatic loading. While the models of system reboot and program load are shared across all persistence capabilities, each type of persistence capability will require a separate model for the installation phase. We also demonstrate that we can use the patterns detected from the file-read events recorded in Dione traces to differentiate between two types of high-level file accesses: file copy and program binary load. As a result, we can label malware as not only having the capability to install a service, but also that it has successfully utilized this mechanism to automatically load after a system boot. We chose the Windows service as the persistence mechanism to model, since it is a common mechanism used by malware to persist and restart after reboot [66]. The service mechanism can cause a lot of damage because it allows malware to load malicious code into kernel space, it can be set to run automatically when the system boots up, and it may not show up in Task Manager as a process [10, 66]. Using
9 domain knowledge about malware behavior, the NTFS file system, and the relation- ship between the Windows XP operating system and corresponding disk behaviors, we generate models for a service installation, a system reboot, and a program load; we then combine the stages into a specification that detects the automatic loading of the service after reboot. Because the pattern of disk accesses for a file load can vary dramatically, we generalize the model for program load such that it specifies any type of file access to avoid false negatives. Then, we bolster the model of a file access with a supervised learning approach that differentiates between a program binary load and another type of file read operation, a file copy. We generate features based on the file content read pattern of the Dione trace and use a Support Vector Machine (SVM) algorithm [22] to classify a series of disk accesses as either a file load or a file copy. We demonstrate that we not only detect the persistence mechanism being in- stalled, but we also verify that the persistence mechanism is successful because we can detect the program binary load automatically after a system reboot.
1.2 Contributions
With this dissertation, we provide the following contributions to malware analysis and disk forensics:
• We present Dione: The Disk I/O aNalysis Engine. Dione is the first portable disk and file system analyzer to analyze disk traffic outside the system under analysis to provide comprehensive, high-integrity traces for the NTFS file sys- tem. We detail the challenges, the design, and the implementation of Dione, explaining how we bridge both the semantic and temporal gaps in reconstructing high-level operations from raw low-level metadata.
10 • We analyze the accuracy and performance of Dione demonstrating that it pro- duces traces of file system operations with 100% accuracy, with a performance overhead generally less than 10%—and often below 2%—in reconstructing file system operations.
• We present DCL: the Dione Capability Labeler. We detail our models for three properties: the Windows service installation, system reboot, and file access. We model these properties using the logic language LTPL [71], and implement a problem-specific model checker that checks the events of a Dione disk trace against the specifications for each property. We demonstrate that DCL can process a large number of samples in a short amount of time, labeling each sample based on whether it exhibits a service persistence capability.
• We present a machine learning classifier that identifies a file binary load given a disk access pattern, using this classifier to bolster our model for service per- sistence. Our classifier mislabels fewer than 4% of traces, yet we show that correctly labeling the mislabeled traces would be difficult for even an expert analyst. By demonstrating that we can detect a file binary being automati- cally loaded after a system boot, we can decisively label a sample as having a successful persistence mechanism.
• We create an automated malware analysis testbed, which can automatically in- strument malware samples using Dione and the Volatility memory introspec- tion framework [76]. We run DCL with integrated file access pattern classifier on over 1,000 real-world malware samples, detecting Windows service installation over 99% of the time and service persistence over 97% of the time. Furthermore,
11 we show that, using Dione’s ability to generate on-the-fly traces of malware be- havior, we can label more service installs and loads than a memory introspection framework operating on a single snapshot in time.
1.3 Organization of Dissertation
The rest of the dissertation is organized as follows. In Chapter 2, we provide relevant background material. This includes a discussion of malware types, as well as in-depth explanation of the techniques that malware can utilize in order to hide from analysis. We discuss the techniques of static and binary malware analysis, including the advan- tages and disadvantages of each. We explain relevant Windows concepts; specifically, we describe the structure of the NTFS file system as it pertains to disk instrumenta- tion, as well as the optimizations used by the Windows operating system that make instrumentation more challenging. Additionally, we discuss model checking, and the logic language LTPL. In Chapter 3, we discuss the related research in this area. This section includes discussions of previous research on disk instrumentation, malware analysis, and the use of machine learning and model checking to perform intrusion detection, malware identification, and capability labeling. In Chapter 4, we describe the Dione infrastructure. We detail the implemen- tation of Dione, including design challenges and solutions of Dione. We evaluate the accuracy of the Dione live updating engine, and the performance of full disk instrumentation using the Xen hypervisor. Finally, we conclude with an explanation of the limitations of Dione. In Chapter 5, we describe our behavioral models for service install, system boot,
12 and service load; additionally, we model these properties in the logic language LTPL. We describe the classification algorithm used to bolster the file access model, and we detail the results of our integrated model checker and file access classifier on over a thousand real-world samples. Finally, we conclude the dissertation with objectives for future research in Chap- ter 6 and a summary of contributions in Chapter 7.
13 Chapter 2
Background
In this chapter, we will outline relevant background information. We will begin with a discussion of malware, including malware types, the common anti-forensics techniques used by malware to avoid detection, and the evasion techniques they use to hide from malware analyzers and security products. We will then discuss the ways in which malware can be analyzed, including both static and dynamic analysis techniques. Understanding how these analyzers can be misled by stealthy analyzers will motivate the need for an analyzer that provides descriptive yet high-integrity traces. Before we introduce Dione, our file system and disk I/O analysis infrastructure, in Chapter 4, we will provide a thorough introduction to relevant Windows concepts, including the NTFS file system, and the optimizations used by the Windows operating system that make file system instrumentation more challenging. Since the Dione Capability Labeler relies on model checking using formal specifications, we will discuss model checking, and common description logic languages, including the Linear Temporal Predicate Logic that we use to model persistence capabilities in Chapter 5.
14 2.1 Malicious Software
The term malware, or malicious software, can be used to describe a variety of un- wanted or undesirable software or scripts. Generally, malware includes anything that causes harm to a user, a computer system, or a network, though the amount of harm can vary [54]. In this section, we will define and describe malware types, and de- tail the anti-forensics and evasion techniques they may employ in order to hide from malware analysis tools.
2.1.1 Malware Types
Viruses and worms can both be categorized as infectious agents; they are similar in that they not only serve some nefarious purpose, but they are also capable of replicating themselves [10]. A virus, however, requires an explicit user interaction— double clicking on an executable, or opening a corrupted email attachment—whereas a worm can propagate on its own, automatically transmitting itself over the network. A trojan or trojan horse is malicious software that a user downloads or installs believing that the software serves some benevolent, useful function [66]. The trojan may indeed be bundled with useful software, or the software may be entirely malicious. The verb trojanizing is also increasing in usage, and refers to malware hijacking and patching an executable that already exists on the system so that the malicious code will execute when the previously-benign program is loaded or run. Spyware and adware may be used separately or together, and vary between merely annoying and malicious [10]. Adware exists to disciple advertisements on the user’s computer, while spyware tracks the users habits, usernames, passwords, or keystrokes. Once a machine has been compromised by a worm, virus, or trojan, additional
15 types of malware may be installed. A backdoor is a method of bypassing standard authentication to allow the attacker remote access to the compromised machine in the future [54]. A botnet is a collection of machines that have been compromised and are commanded and controlled by a bot herder [10]. Machines in a botnet may wait for orders from the bot herder; these order could include sending spam, perpe- trating Distributed Denial of Service (DDOS) attacks, and harvesting usernames and passwords to commit financial crimes [10]. A rootkit is a particularly interesting component of malware. It exists to conceal itself and other components, and to command and control a system remotely [10]. A rootkit’s most important quality is that it is stealthy: a good rootkit will go undetected by the user to ensure that it stays present on the system as long as possible. A rootkit may attain administrator-level privilege, either by exploiting a program that is running with supervisor privilege, or by tricking an administrator into installing malicious software. If the rootkit has unmitigated access to the kernel code and data, then it can be difficult or impossible to detect. Intuitively, it follows that malware will often be an amalgam of multiple malicious components. For example, a virus may be packaged with a rootkit, so that the rootkit can hide the presence of the virus. A rootkit may install a backdoor, so that the attacker can command and control the compromised system. A botnet may be composed of systems that have all been compromised by a rootkit.
2.1.2 Anti-Forensics Techniques
In order to thwart post-mortem forensic analysis of a compromised system, mal- ware may utilize anti-forensics techniques [66]. On the simplest level, malware could download and then subsequently delete any file-based payload, possible overwriting
16 the sectors that held the contents, to avoid any signature-based antivirus disk scan. At a lower level, malware can manipulate the properties of a file. The hidden prop- erty specifies whether a file or directory is hidden from the user, both in the graphical explorer and through the command line. Another set of properties are the MAC timestamps. Though “MAC” actually stands for Modified, Accessed, and Creation times, NTFS utilizes a fourth timestamp as well, the Change time, which indicates that metadata was changed. By setting any of the numbers to an unreasonably low number, the Windows explorer will not display the time [10]. Alternatively, the malware could set the MAC times of a newly created malicious file to the same times- tamps as system files, so that the file appears to have been there since the operating system was first installed. Instead of hiding through the use of the hidden property, malware can also hide through more sophisticated mechanisms. The first technique is called In-Band Hiding, as it involves hiding in spaces that are specified by the file system. An example of this is hiding in Alternate Data Streams (ADSs) [10]. As will be detailed further in Section 2.3.2, the contents of a file in NTFS are stored in an attribute called $DATA. However, a file can have multiple $DATA properties. These ADSs are a way to persistently store information on disk, but they will not appear in Windows explorer or in command line listings unless explicitly requested. Furthermore, the data stored in an ADS is not included in the total size property of a file; this is because sizes are associated with attributes, so the stated file size is actually just the size of the default $DATA attribute. Conversely, Out-of-Band Hiding utilizes space not specified by the file system. Malware may hide in the Master Boot Record (MBR), discussed in further detail in Section 2.1.3. Alternatively, malware can hide in slack space. Because file content is always allocated in clusters (commonly 4KB, or 8 sectors), there may be up
17 to 7 unused sectors for a given file. For certain versions of Windows, writing to this slack space only requires repositioning the logical End-of-File (EOF) pointer, writing to the space, and then non-destructively truncating the file by resetting the logical EOF [10].
2.1.3 Evasion Techniques
Dione can be useful in instrumenting and analyzing the intrusion and presence of each of these various types of malware, assuming that there is some symptom of compromise which percolates to the disk. However, Dione’s particular strength is in instrumenting and analyzing “hard” malware; that is, malware which uses rootkits or rootkit-like technology to hide itself and any other malicious software with which it is packaged. Once a rookit has attained kernel-level privilege, in-host analysis—and even some virtualization-based analysis infrastructures—cannot be trusted, as the rootkit could thwart or misdirect any attempts to analyze it. These techniques can broadly divided into three categories: altering control flow, system call patching, and modifying kernel objects.
The Windows System Call Mechanism
Since the system call provides the interface into kernel space, many of the methods used by rootkits to hide themselves or associated malware occur within the steps used when a system call is invoked [10]. In this section, we describe the system call interface in Windows. The steps taken for a system call in Windows running on a modern x86 processor are summarized in Figure 2.1 [10, 64]. First, a user application calls a native API function (the native API implements the system call interface in Windows). The address of the function is obtained through
18 User Application Portable Executable (PE) File Format
.text (Code Section) ... CreateFile(); kernel32.dll ntdll.dll ... .data NtCreateFile() (Data Section) 0x12345678 ... CreateFile() .idata (Import Data Section) KiFastSystemCall() ... CreateFile 0x12345678 1
User Mode SYSENTER Kernel Mode 0x------3D 2 EAX
0x00000008 0x81864880 ntoskrnl.exe 0x81864880 IA32_SYSENTER_CS IA32_SYSENTER_EIP File System Driver KiFastCallEntry() ntfs.sys I/O SSDT Manager 0x001 File System Driver KiSystemService() ... 4 malfilt.sys 0x03C 3 0x03D NtCreateFile Disk Driver NtCreateFile() 0x03E disk.sys ... 0x187
Disk
Figure 2.1: System call mechanism for Windows running on a modern x86 processor. Four mechanisms by which a rootkit can alter control flow are: (1) Import Address Table, (2) SYSENTER Machine Specific Registers, (3) System Service Dispatch Ta- ble, and (4) Filter Driver.
19 the Import Address Table in the executable. This address points to a function in the kernel32.dll dynamic linked library, which calls another function exported by ntdll.dll. The dynamic linked library ntdll.dll routes system calls between the user mode and kernel mode interfaces [64]. The ntdll.dll function KiFastSystemCall populates regular registers and three Machine Specific Registers (MSRs). Of particular note, this code will store a system call dispatch id in the lower 12 bits of the EAX register. For example, the CreateFile system call will store a number containing 0x3D as the 12 least-significant bits into EAX. Of the three MSRs, two of them (IA32 SYSENTER CS and IA32 SYSENTER EIP) contain the Ring 0 code segment and offset into the code segment, respectively, at which the processor will start executing the code [29]. For Windows, this address will point to the KiFastCallEntry function. Finally, KiFastSystemCall will call the SYSENTER instruction, which is used by modern processors to switch from user mode (Ring 3) to kernel mode (Ring 0). Once in Ring 0, execution begins in the ntoskrnl.exe executable [10]. The execution proceeds to the function KiSystemService, which obtains the system call dispatch ID from EAX and uses it to index into the System Service Dispatch Table (SSDT). An SSDT is an array of addresses, in which each address is a pointer to the entry point of a function in kernel space. There are two SSDT’s; one is for Windows GUI functions, and the other is for the Windows Native API (e.g., system calls). Once the kernel-mode function is obtained from the SSDT, control flow proceeds to the appropriate kernel mode component. For disk I/O commands, control flow proceeds to the I/O Manager, which then uses the appropriate driver stack to execute the I/O command.
20 Altering Control Flow Through Hooking
A rootkit may have many motivations for modifying control flow in its attempt to hide itself and its actions from the user and to collect information from the machine it has compromised. It may block system calls to disrupt the work done by a program (e.g., security software), replace kernel functions altogether, track all system calls made and their input parameters (e.g., to instrument a system or application), or filter out all output parameters (e.g., to hide a file or process). One such way to alter control flow is to modify a call table; a call table is simply an array of addresses, where each address points to a function or routine. By swapping out the address with a new address, the system will call the attacker’s function instead of the correct kernel function. The process of swapping out function pointers is referred to as hooking, and there are several call tables that can be hooked [10]. The Import Address Table (IAT) is an application-level userspace call table. Each entry in the IAT contains the addresses of all library routines that a program imports from a Dynamic Linked Library (DLL). The IAT is populated when a DLL is linked to at load time. While a rootkit could hook a function to any DLL, the user-space functions that implement part of the system call interface are particularly dangerous. For example, a rootkit could hook the IAT entries pointing to the user-space library kernel32.dll in order to hide newly created malicious files; this scenario is labeled (1) in the system call diagram of Figure 2.1. While any exported library routine can be hooked in this manner, the disadvantage of this approach is that each hook only applies to the given application, and since it hooks a user-space call table, any program (such as security software) running in kernel space could easily detect this. Unfortunately, several other call tables reside in kernel space; hooking any of these tables results in a system-wide, rather than application-level, hook. The first option
21 is to hook hardware call tables at the system call interface. Old processors (e.g., pre- Pentium II [10]) jumped to kernel space to handle system calls via an interrupt (specif- ically, INT 0x2E). Contemporary processors use the dedicated SYSENTER instruction. To hook the former, a rootkit would hook the interrupt handler corresponding to interrupt 0x2E in the Interrupt Descriptor Table (IDT). To hook the SYSENTER in- struction, an attacker would hook an MSR. Given the flat memory model, the Code Segment MSR is unnecessary: it is enough to hook only the IA32 SYSENTER EIP regis- ter. This is done by swapping the original pointer (which points to the kernel function KiFastCallEntry) with a pointer to a new function. This hooking location is labeled (2) in Figure 2.1. The unique disadvantage to the hardware-based approaches is that these call gates are passthrough. Control passes through the hook to the system call interface, but does not return through the hook. It is possible to instrument or block any system calls, but not to filter output results, thus eliminating the opportunity for a rootkit to hide processes or files. Instead of hooking the hardware system call interface, a rootkit could instead hook a Windows-specific table: the System Service Dispatch Table (SSDT). With this approach, the attacker can both instrument and monitor input, and filter output, since control can return to the hook after the system call execution completes. The 391 functions of the Windows API comprise the kernel-mode system call interface, and thus provide a dangerous path for control flow modification. Hooking the SSDT is performed by obtaining the index of the function in the SSDT, and swapping the function pointer with one that points to its nefarious replacement. This hook is labeled (3) in Figure 2.1. Hooking, whether performed in the IAT, IDT, or SSDT, always suffers from the same disadvantage: it is relatively easy to detect. In order to detect any of these
22 hooks, security software would need to iterate through each of the pointers in the vari- ous tables to ensure that it points to a location in memory that falls within the library or executable that implements them. In other words, the pointers in the IAT should point to the region of memory containing the corresponding DLL, and the pointers in the IDT, IA32 SYSENTER EIP register, and the SSDT all point to memory correspond- ing to ntoskrnl.exe. However, determining the address ranges of these libraries and modules itself requires using a system call, such as ZwQuerySystemInformation. As a result, if a rootkit can hook this system call before being detected, it can deflect hooking countermeasures.
System Call Patching
Given that hooking can be detected by confirming that critical function pointers point within the bounds of the library executable it is expected to reside in, it makes sense that a rootkit may attempt to modify the executable code itself. This technique is more challenging for the malware creator, but also more difficult to detect [10]. Patching is a technique in which the raw bytes of an executable are overwritten in order to, for example, mask or replace instructions. Patching can be performed in two locations: in memory or on disk. Binary patching modifies the bytes of the executable on disk; while the patch is permanent and persistent, it can be detected by looking at binary file checksums. A run-time patch, on the other hand, modifies the binary while it resides in memory, and thus would not survive a reboot. Whether performed in memory or on disk, patching requires overwriting the ma- chine code of system calls or other useful kernel routines. The simplest example of patching would be to perform in-place modification of bytes. For example, the at- tacker could replace instructions with NOPS to prevent the execution of the original
23 instructions. This approach is severely limited by the number of bytes which are patched. A far more flexible approach is to overwrite the original instructions with a jump instruction (JMP, CALL, or RET) that redirects the control flow into another region of code, called trampoline or detour code [10]. The trampoline code has more space in which to do things, including executing the original, overwritten instructions. With this approach, a rootkit could instrument system calls and parameters by plac- ing the trampoline at the start of the patched routine; the trampoline would execute this prologue before optionally calling the original instructions to perform the stated system call’s operation. By placing a trampoline after the system call executes, as an epilogue, a rootkit could filter output parameters. The steps of patching broadly consist of: (a) Saving the original code that will be patched, (b) injecting trampoline code, and then (c) performing in-place patching of the original code to force execution to jump to the specified address of the injected trampoline code. The primary means for security software to detect patching would be to look for suspicious jump instructions at the start of a function. Even this heuristic is not foolproof, as a rootkit creator could simply move the jump patch farther from the start of the the function. To be particularly stealthy, a rootkit could patch code in the Master Boot Record (MBR). The MBR is located at the first sector on disk. The code of the MBR is loaded by the BIOS at system startup; the MBR then loads the boot sector of the active disk partition, which in turn loads the operating system. This method uses a combination of run-time and binary patching; it patches MBR boot code on disk in order to have it alter system code in memory. The advantage of this approach is that it is performed before any security software is loaded, and the winner of a battle between security software and malware is often that which embeds itself in the kernel
24 first.
Modifying Kernel Objects
A third rootkit evasion technique addresses some of the limitations of the previous methods, though it has some limitations of its own. This techniques is called Direct Kernel Object Modification (DKOM), and it involves modifying kernel data structures representing processes, drivers, and authentication tokens [10]. A similar method is used whether hiding processes or drivers from the user or security software. Both processes and drivers are maintained in doubly-linked lists. Therefore, to hide a particular process or driver, a rootkit needs only to traverse the appropriate list to find the process or driver to be hidden, and adjust the forward and backward links of it and its immediate neighbors. A rootkit can also elevate the privileges of a process by modifying the privilege substructures in the process object. There are a few disadvantages to DKOM. First, not all objects have a kernel object to represent them; for example, there is no kernel file object, so DKOM could not be used to hide a file. Second, data structures are undocumented, so Microsoft can adjust the fields of a structure between major and even minor releases, which could break the bit-specific object patches.
Filter Drivers
The next rootkit evasion technique moves past the system call interface and into the device driver stack. This kernel mode technique takes advantage of the layered device driver architecture supported by Windows. A Windows device driver does have not a monolithic structure; rather, it features a modular approach by which a series of drivers perform some work and pass along the Interrupt Request Packet (IRP) to the
25 next driver in the chain. This is advantageous in that new drivers can be added to the series and still leverage the work done by other drivers in the chain. A Filter Driver is a driver that intercepts and modifies information as it makes its way through the driver stack [10]. While this can be a good thing—for example, filter drivers could be used to encrpyt and decrpyt data as it passes to persistent storage—it can also be used for malicious purposes. A filter driver could be used for keylogging, to filter network traffic, and to hide files and directories. This scenario is labeled (4) in Figure 2.1, as a malicious filter driver is inserted before the disk driver. As a result, any analyzer or security software examining files by hooking the system call interface will still be deceived, as the filter driver will hide any files before they reach the system call boundary.
2.2 Malware Analysis
Once a sample of malware has been obtained, there are several methods that can be used to learn about the malware’s behavior. Given the sophisticated techniques that can be employed by malware to prevent security software from detecting itself or other malware, the job of analyzing malware behavior is a difficult one. Any malware analysis solution will face several tradeoffs. The closer the analyzer is to the malware, the more semantic information there will be to analyze. There will also be more types of semantic information to analyze. However, if an analyzer operates at the same or lower privilege level than the malware, it can be evaded, thwarted, or misled. This section discusses various options for malware analysis. Malware analysis can be roughly divided into two categories: static binary analysis and dynamic analysis.
26 2.2.1 Static Binary Analysis
In static binary analysis, the binary is analyzed before it is run; the binary itself is disassembled to learn more about how the malware might behave. On a basic level, static analysis can yield the architecture it was compiled for, the executable type, and the operating system on which it would run. Static analysis can also yield string names, such as passwords, paths, and file names, and imported library functions and symbols can be extracted (though this task is easier if the binary was dynamically linked). Another step of static analysis is disassembly of the binary. In this step, the raw binary bytes are converted into machine code. This step is difficult for x86 bina- ries because text and data can exist together, and because instructions are variable length. Additionally, both compiler optimizations and a crafty malware writer may take steps to further obfuscate the code. There are two common techniques for disas- sembling a binary. The first, linear sweep, will iteratively disassemble one instruction at a time [66]. An attacker could complicate this process by inserting junk between instructions that does not alter control flow, but may cause the disassembler to be- come out of step with the instructions. The second technique is to use a flow-oriented approach, disassembling instructions until a branch instruction is encountered, then building a list of locations to disassemble (for example, the locations of both the true and false branches of a conditional branch). Anti-disassembly techniques take advantage of the assumptions that a disassembler makes, resulting in inaccurate dis- assembly [66]. Additionally, disassembly can be difficult or impossible if the binary is compressed or encrypted, and will not capture the program’s behavior if it is un- packed as it runs. In short, static analysis can yield some good first observations about a binary, but it will not always provide a detailed evaluation of the malware’s
27 behavior.
2.2.2 Dynamic Analysis
Unlike static analysis, dynamic analysis watches the malware as it runs in order to detect its control flow and behavior. One dynamic analysis method is to use a debugger. A debugger is flexible; the analyst can set breakpoints to pause execution at any point in order to construct a control flow for the program, as well as examine memory and CPU registers. Unfortunately, malware writers have developed ways to check for the presence of a debugger, either through an API, by looking for tell- tale signs of breakpoints (e.g., an INT 0x03 instruction for a software breakpoint or by using hardware breakpoint registers to stymie hardware breakpoints), or even by performing timing analysis to determine if the execution is taking too long. If any of these detection methods comes up true, then the malware can simply quit or perform other benign activities in order to prevent the analyzer from learning anything useful about its behavior. Dynamic analyzers can also utilize the same techniques that rootkits themselves use to monitor or change control flow. For example, the analyzer could hook into the system call interface using one of the methods described in Section 2.1.3. Then, the analyzer can create a trace of system calls and their parameters in order to understand the behavior of the malware. Host-based tools could also use the Windows API to track registry processes, registry modifications, and file system operations. In exchange for the rich semantics that such approaches provide, the analyzer sacrifices fidelity, as malware operating at the same privilege level could use evasion techniques to undermine the analysis. In order to operate at a higher privilege level than the malware, analyzers can
28 run the malware in a sandbox, such as in a virtualization solution (e.g., VMware [75], Xen [5]) or emulation solution (e.g., Qemu [8]). This solution is better logistically, since the analyzer can run the malware in an uncompromised sandbox, and then quickly and easily revert to a previously obtained clean snapshot to be ready for the next analysis. Furthermore, the analyzer can utilize Virtual Machine Introspection (VMI) techniques to understand what is occurring inside the VM, without hooking directly into the kernel structures. This way, the analyzer is operating at both a higher privilege level (at a so-called Ring -1 level), and also looking at lower-level but higher-integrity data. For example, since a system call ID is stored in the EAX register before the SYSENTER instruction is executed, the VMI could create a system call trace by directing examining the EAX register when SYSENTER is executed. Since the analyzer doesn’t rely on host-level interfaces (such as the Windows API) to obtain information, it is not as easily misdirected by malware. It is in malware’s best interest to remain unanalyzable as long as possible, so that it can continue to survive in the wild, perhaps adding additional systems to its botnet or continuing to acquire financial gains. Understanding that a virtualization layer allows analysis and security software to essentially run at a higher privilege level than the operating system, malware may test whether it is being run in an emulated or virtualized environment. If the test is positive, the malware may gracefully exit, or perform some benign operations, in order to hide its true malicious operation. These tests are referred to as red pills, and they can take many forms. Brute-force, high-level red pills include checking hardware adapters (for example, the VGA adapter in the VMware environment) for a well-known device string [15], or even checking that the disk serial number or user name corresponds to those used by a well-known dynamic analysis emulator such as Anubis [50]. Red pills can also operate on a low-level,
29 such as checking for a well-known bug in common emulators. For example, malware can test whether it is being run in the popular emulator Qemu [8] by executing the following instruction:
or %bh, 0x04(%ebx) [57]
Due to a Qemu bug, the instruction will reference the wrong memory address, so the malware can detect that it is being run in an emulator based on the result of this operation. Even in the presence of perfect emulation, timing or secondary hard- ware effects (such as TLB flushes on VM exit operations) will still serve as red pills for malware [23]. It is generally agreed that it is impossible to guarantee perfect transparency for virtualization or emulation solutions.
2.3 Windows Concepts
Because the majority of malware attacks are directed at the Windows operating sys- tem [53], Dione performs disk instrumentation for the Windows NTFS file system. 1 In this section, we discuss relevant Windows-specific concepts, including the particu- lars of the NTFS file system and the disk optimizations made by the Windows cache manager.
2.3.1 The Windows Registry
The Windows registry is a centralized database for configuration data, storing infor- mation about hardware, device configuration, drivers, user preferences, network and firewall configuration, and program startup information [63]. The hierarchy of the
1Plans to expand Dione to instrument other file system, such as ext3, are left for future work.
30 Windows registry can be thought of in terms similar to a file system. At the top level, there are root keys; below each root key are more keys, or subkeys. Thus, keys can be thought of as directories, and each key will have a path, which can be fully qualified from the root key. Just like a directory in a file system, a key will also have a name. Figure 2.2 breaks down a creation of a new key. The key has path HKLM\system\CurrentControlSet\Services and name Beep. The key has no value; for clarity, we also list the type KEY.
Figure 2.2: Breakdown of a Windows Registry key.
Below each key are values. A value is analogous to a file. It has a path, which is comprised of every key above it in the hierarchy. It has a name, and just as a file, the combination of a path and name uniquely identifies the registry value. Finally, just as a file often stores contents, a value stores contents as well. The common terminology is to (confusingly) refer to the data the value is storing as its value, referring to the value structure itself by its name. Keeping this terminology, we refer to the value contents as the value, though we may also refer to the value-name and value-value for clarity. The value may be one of several types, including an integer (DWORD), a string (SZ), or even any arbitrary binary data (BINARY). In Figure 2.3, we show some new values created under the Beep key created in Figure 2.2. The Beep key that was previously created now becomes part of the path, and two values are created below it: Start, which is an integer type and stores the value 0x02, and DisplayName, which
31 stores the string “BeepService”.
Figure 2.3: Breakdown of two Windows Registry values; each value is associate with a key (in this case, Beep, and has both a name and a value.
There are six root keys in the registry; each has a long name, but is more commonly referred by an acronym:
• HKEY USERS (HKU): Stores configuration data for all users with accounts on the machine
• HKEY CURRENT USER (HKCU): Stores configuration data for the user that is cur- rently logged in (and is actually just a link to the subkey in HKU for the logged-in user)
• HKEY CLASSES ROOT (HKCR): Stores file association and Component Object Model (COM) object registration information
• HKEY LOCAL MACHINE (HKLM): Stores system-related information
• HKEY PERFORMANCE DATA (HKPD): Stores performance information
• HKEY CURRENT CONFIG (HKCC): Stores a current hardware profile (and is actu- ally just a link to a subkey under the HKLM root key)
Some of the data stored in the registry is populated on system startup, and resides only in memory. Other data is stored on disk, and is loaded into memory when
32 the system starts up. On disk, the registry is stored in five (extensionless) files in the path WINDOWS\system32\config: default, SAM, SECURITY, software, and system. These files do not correspond to the root keys; most of the information stored in the hive files appears in the Windows registry under the HKLM root key. For example, the registry key of Figure 2.2, showed some persistent data stored in a hive file, but when shown in the registry hierarchy the subkeys fall below the HKLM root key. The data stored in the registry hive files has its own file system-like structure; open source tools including regfi [55] can parse the hive files, outputting registry key paths, names, and values.
2.3.2 NTFS File System
Many of the challenges of interpreting NTFS arise from its design goals of being scalable and reliable. Scalability is achieved through multiple levels of indirection. Reliability is accomplished through redundancy and by ordering writes in a system- atic way to ensure a consistent result. Unfortunately, from an instrumentation and operation reconstruction view, this is often in the least-convenient ordering. The primary metadata structure of NTFS is the Master File Table, or MFT [14]. The MFT is composed of entries, which are each 1 KB in size. Each file or directory has at least one MFT entry to describe it. The MFT entry is flexible: The first 42 bytes are the MFT entry header and have a defined purpose and format, but the rest of the bytes store only what is needed for the particular file it describes. Among other things, the MFT header contains a sequence number (which is incremented whenever that entry is reused for a new file), a flag indicating whether the entry is currently used, and whether it describes a file or a directory. In NTFS, everything is a file—even file system administrative metadata. This
33 means that the MFT itself is a file called $MFT; its contents are the entries of the MFT (therefore, the MFT has an entry in itself for itself). Figure 2.4 shows a representation of the MFT file, and expands $MFT’s entry (which always resides at index 0 in the MFT). Like any other file, the $MFT file expands and contracts as needed, and if the disk is fragmented, the $MFT can expand into fragmented, non-consecutive clusters anywhere on disk. This is shown in Figure 2.4, whereby the contents of $MFT are stored in two non-contiguous runs of clusters.
MFT Entry Header Signature: FILE Seq Num: 1 In-Use: 1 Is-Directory: 0 Attribute Headers Attribute Content Unused Space Base Ref: 0
Cluster 104
105 Name: Name: Name: $STANDARD_ $FILE_NAME $DATA INFORMATION Type ID: 48 Type ID: 128 106 Type ID: 16 Resident: 1 Resident: 0 Resident: 1 107 Created: Name: $MFT Run 0: 2011 06 06 Parent MFT: 5 Start: 104 . . . . . 20:04:37 Count: 4 . File Modified: Run 1: 220 2011 09 06 Start: 220 15:31:32 Count: 2 . .
MFT Modified: . 221 2011 09 06 15:31:32 Accessed: 2011 09 06 15:31:32 . . .
Figure 2.4: Representation of the MFT, which is saved in a file called $MFT. The first entry holds the information to describe $MFT itself; the contents of this entry are expanded to show the structure and relevant information of a typical MFT entry.
Everything associated with a file is stored in an attribute. The attribute types are pre-defined by NTFS to serve specific purposes. For example, the $STANDARD INFORMATION
34 attribute contains access times and permissions, and the $FILE NAME attribute con- tains the file name and the parent directory’s MFT index. Even the contents of a file—after all, a file’s purpose for existing is to store contents—are stored in an attribute, called the $DATA attribute. The contents of a directory are references to its children; these too are stored in attributes (referred to as $INDEX ROOT and $INDEX ATTRIBUTE. Each attribute consists of the standard attribute header, a type-specific header, and the contents of the attribute. If the contents of an attribute are small, then the contents will follow the headers and will reside in the MFT entry itself. These attributes will be called resident; there is a flag in the attribute header to indicate whether the attribute contents are resident or not. In Figure 2.4, the contents of $STANDARD INFORMATION and $FILE NAME attributes are resident. If the contents are large, then an additional level of indirection is used. In this case, a runlist follows the attribute header. A runlist describes all the disk clusters which actually contain the contents of the attribute, where a run is defined as a starting cluster address and a length of consecutive clusters. (in NTFS terminology, a cluster is the minimum unit of disk access, and is generally eight sectors long). In the example MFT of Figure 2.4, since the contents of the MFT file are very large, $DATA’s contents are not resident; its runlist indicates that the contents of $MFT can be found in clusters 104-107 and 220-221. It is easy to see that a small file will occupy only the two sectors of its MFT entry. A large file will occupy the two sectors of its MFT entry, plus the content clusters themselves. Consider, then, the problem of a very large file on a highly fragmented disk: it might take more than the 1024 bytes just to store the content runlist. In this case, NTFS scales with another level of indirection and another
35 attribute, and multiple MFT entries are allocated (in addition to the base entry) to store all attributes. Each of the non-base MFT entries will contain the MFT index of the base index; this reference will be 0 for the base index.
2.3.3 Performance Optimizations for Disk Accesses
Disk accesses are expensive in terms of performance. While accesses to disk may take upwards of 5-10 ms, accesses to RAM in a modern computer may take 50-100 ns (with cache speeds even faster) [25]. Therefore, the operating system uses optimizations to minimize unnecessary disk accesses. One such optimization is the page cache. The page cache is a buffer of disk-backed pages that are stored in main memory; as a result, frequently-accessed disk clusters will be available more quickly. Disk contents will be paged to the page cache on the granularity of clusters; this is convenient in modern systems because a cluster is the same size as a page (4KB). Windows has different policies for reads and writes as they relate to the page cache; these policies are carried out by the cache manager. The multi-threaded cache manager utilizes a thread for intelligent read-ahead. The goal of intelligent read- ahead is that the data will already be in faster main memory before it needed. With intelligent read-ahead, spatial locality is used to prefetch data from disk according to some perceived pattern of read accesses. For example, if the reads are streaming through the disk, the operating system will prefetch the next sequential clusters; if the reads follow a strided pattern, the operating system will prefetch the next clusters that follow the strided pattern. The size of data that is prefetched is double the size of the last access. For write accesses, Windows uses Lazy Writing, courtesy of the cache manager’s delay thread [64]. Instead of immediately flushing writes to disk, writes are buffered
36 and flushed as a burst to disk. When a page is written to, it is marked dirty. Every second, Windows flushes one-eighth of the dirty pages to disk; therefore, it could take as long as 8 seconds for a write to be flushed to disk from RAM. The advantage of this scheme is that it reduces contention on the disk; this happens because the number of disk I/O operations is reduced when multiple writes occur to the same cluster within a short time frame. Instead of flushing the write to disk every time a change is made, it will only perform one write at the end of the interval. The performance advantage comes at the cost of reliability; while the user is under the impression that a change has been committed to persistent storage, it may actually still be in volatile memory for several seconds more, and would be lost in the event of a hard shutdown.
2.4 Formal Verification and Model Checking
Formal verification is a technique that has been used to specify and validate sequential circuit designs, communication protocols, and software correctness. Due to its ability to model software behaviors in the face of obfuscated code, it has also been recently used to model malware behaviors and capabilities. Model checking is a property verification approach that compares a model ex- tracted from a behavioral trace to a given specification [28]. The specification of the property to be detected is represented by a formula φ, which is written using the description language, or logic, of the model checker. Additionally, a model M is ex- tracted from each behavioral trace, and represented in the same description language. Then, the model from each behavioral trace is compared to the specification in order to determine whether the model M satisfies φ. The model checker outputs true or false, indicating whether the property is verified for the system.
37 The description languages used to describe models and property formulas are based on propositional logic. Verifying a property requires constructing a declara- tive statement, or proposition, about that system, and then determining whether that proposition is true or false. Propositional logic provides a formal language for describing these declarative statements, and includes the familiar operators of not (¬), and (∧), or (∨), and implication ( =⇒ ). For example, the proposition: “(¬p ∨ r) =⇒ (p ∧ q)” can be translated as ‘if not p or r, then p and q’. To create a running example of a type of malware behavior that we want to provide a formal specification for, let’s assume that we have a trace of all x86 instructions executed, gathered using dynamic instrumentation. Our specification formalizes the following behavior:
In the program execution path, at some point in time a register is set to zero, and after that, this same register is eventually pushed on the stack before any other modification occurs to that register.
By looking at key words of that statement, we can determine the ideal way to represent it. The word and indicates that we will need propositional operators, the phrases at some point and after that imply a preferred temporal ordering, and the phrase this same register implies that we care not just about the operations, but the inputs to the operations. Using pure propositional logic, we cannot be very specific in formulating this specification. For simplification, we’ll focus on a single register, eax. We can only specify that the trace should contain both an instruction that sets register eax to 0 (mov(eax,0)), and an instruction that pushes register eax onto the stack (push(eax)). Each instruction opcode and parameter(s) combination form a single propositional
38 atom; for example, mov(eax,0) and mov(eax,1) are as different of propositional atoms as p and q are, above. Therefore, our propositional statement is specified as:
φ = mov(eax, 0) ∧ push(eax) (2.1) and this statement only evaluates to true if both of the instruction opcode/parameter events appear in the trace. Note that this statement does not specify an ordering between the instructions, merely that they both must appear in the trace in order for the statement to evaluate to true (a binary bag-of-words-like specification).
2.4.1 Predicate Logic
Predicate logic extends propositional logic, satisfying the need for a richer language. It includes modifiers such as there exists (∃) and for all (∀). It also allows for the use of variables to generalize a statement, working as place holders for concrete values. In order to use Predicate Logic as a formal language, we define two types of “objects” that can appear in a predicate logic statement: Terms and Formulas. A term is an object; it can refer to a variable, or a function. Consider a formula that will describe file properties. We can refer generically to our file as variable f, and we can describe certain properties of our file with functions of f, such as p(f), the path of f, and n(f), the name of f. A term can be recursive: If x is a variable (a term), and f(x) is a function of that variable (also a term), then g(f(x)), a function of the function, is also a term. Conversely, a formula is a predicate—it is a statement that resolves to true or false. For example, we can use predicates to describe whether a certain type of
39 operation occurred on a certain file. For example, C(f) is true if file f was created. Formulas can be connected using propositional symbols, such as ¬, ∧, ∨, and =⇒ .
For example, if φ1 is a formula and φ2 is a formula, then φ1 ∨ φ2 is also a formula. Formulas can also be combined with variables in such a way that utilize the predicate symbols ∃ and ∀. For example, if φ is a formula and f is a variable, then ∃fφ is also a formula, and would read as there exists some f for which the formula φ evaluates to true. Given these two types of objects, we can define a vocabulary for Predicate Logic (as a formal language) as having three sets: A set of predicate (or formula) symbols P, a set of function symbols F, and a set of constant symbols C (since a constant is a function without arguments, C can also be treated as a part of the function set F). For the instruction trace example, our vocabulary of predicates consists of: P = {mov(x, y), push(x)}, where x and y are variables (terms) over the set of Functions F. Using predicates allows us to write specifications that differentiate between the type of operation (for example, the high level behavior, or the instruction opcode), and the parameters of that operation. For example, in Equation 2.1, we simplified the original statement, which referred to “some register”, to refer only to register eax. If we were to keep the original statement, the specification would be:
φ = mov(eax, 0) ∧ push(eax) ∨ mov(ebx, 0) ∧ push(ebx) (2.2) ∨ mov(ecx, 0) ∧ push(ecx) ∨ ...
In this case, it is more succinct to generalize the statement using predicate logic, creating a variable to represent the register that has a finite number of values. We
40 combine the variable with the predicate operator there exists, which can be used in combination with variables. Equation 2.2 can be rewritten in predicate logic as:
φ = ∃r mov(r, 0) ∧ push(r) (2.3)
This translates to: In the program execution path, some register is set to zero, and and this same register is pushed on the stack. However, this statement still says nothing about the ordering between the instructions.
2.4.2 Temporal Logic
Model checking is based on temporal logic; that is, the system is represented as a sequence of states. The formula representing the model is not always true for the model, but rather, the formula is true only at some point in time, when the system has moved through a correct series of states. There are two ways to think of time in a temporal formula. The first way to think of time is as branching. With branching time, time is represented as a tree, with an initial state as a single node at the root and possible future paths branching out from that state. Branching time is useful when there are many possible paths, but not all will occur, such as in a trace consisting of statically disassembled instructions of a program. There are multiple possible paths of execution, though not all are taken. For example, at a conditional statement, there are two two different branches that are possible depending on whether the if or else branch is followed. The second way to think of time is as linear; that is, there time is a set of paths, where each path is a sequence of states. For example, a trace of instructions intercepted during the course of execution of a program would be represented in linear time, as each event occurred sequentially in the path of execution.
41 Due to the temporal nature of model checkers, the language used to specify a behavior or capability uses temporal operators, which define how the different states connect to each other in time. As our dynamically-obtained traces of disk events occur in linear time, we focus here on the logic specification language Linear Temporal Logic, or LTL [28]. LTL contains the expected propositional operators, but also includes temporal operators: X, F, G, U, and R. Respectively, these temporal operators stand for neXt state, some Future state, Globally in all future states, Until, and Release. More formally, for a formula φ along a linear path π:
• Xp is true on a path π if p holds in the next state, π1
• Fp is true if p holds at any point in the future on path π
• Gp is true if p holds globally throughout the future on path π
• pUq is true if p holds on the path π until q holds
• pRq is true if q always holds on path π, with this requirement released once p holds. Furthermore, it is possible that p will never hold.
Using LTL, we can provide a more specific specification by forcing a temporal ordering between the instructions. The following LTL statement formula translates to “in the program execution path, at some point, eax is set to zero, and immediately after that, eax is pushed on the stack.”
φ = F(mov(eax, 0) ∧ Xpush(eax)) (2.4)
Or alternatively, the following LTL specification allows other instructions to appear between the mov and push instructions, specifying: In the program execution
42 path, at some point, eax is set to zero, and after that, eax register is eventually pushed on the stack.
φ = F(mov(eax, 0) ∧ Fpush(eax)) (2.5)
Again, we can see the limit of LTL when we want to generalize the statement over all registers, and the statement becomes unnecessarily complex:
φ =F(mov(eax, 0) ∧ Fpush(eax))∨
F(mov(ebx, 0) ∧ Fpush(ebx))∨ (2.6)
F(mov(ecx, 0) ∧ Fpush(ecx)) ∨ ...
2.4.3 Linear Temporal Predicate Logic
Formulas using Linear Temporal Logic are composed only of propositions—of other formulas, connected by propositional operators and temporal operators, while pred- icate logic lacks a notion of time. Therefore, in order to write robust yet succinct specifications, it makes sense to combine the two, to form Linear Temporal Predi- cate Logic. As with predicate logic, it allows us to differentiate between operations and parameters, while maintaining the temporal operators that specify the ordering between operations. Using LTPL, we can write:
φ = ∃r F(mov(r, 0) ∧ Fpush(r)) (2.7)
Translating to: There exists some register r, which is assigned the value 0, and at some point in the future, that same register r is pushed onto the stack.
43 Using the powerful combination of LTPL, we can even succinctly specify the orig- inal statement, using multiple variables and multiple temporal operators.
φ = ∃r F(mov(r, 0) ∧ X(¬∃mov(r, t))Upush(r)) (2.8)
Translating to: In the program execution path, at some point, some register is set to zero, and after that, this same register is eventually pushed on the stack before any other modification occurs to that register.
2.5 Summary
In this chapter, we outlined the background information that is relevant to this the- sis. In discussing the types of malware and the techniques employed by malware to hide from security products and analyzers, we motivated the need for an analyzer that provides a high-integrity, behavioral trace. We discussed the techniques used by malware analyzers to understand malware behavior, including static and dynamic techniques. To prepare for our description of Dione, our file system analysis frame- work, we introduced relevant Windows concepts, including the NTFS file system and the optimizations used by Windows to increase disk access performance. Finally, in advance of our technique to detect malware persistence capabilities, we introduced model checking, and described the logic languages to generate specifications of mal- ware behaviors.
44 Chapter 3
Related Work
3.1 Malware Analysis and Instrumentation
The ability to instrument disk accesses and file system operations is useful in many security fields, including intrusion detection and prevention and malware analysis. However, Dione is the first disk analysis infrastructure to provide live, up-to-date instrumentation for Windows NTFS file systems. Research on Intrusion Detection Systems (IDSs) has frequently included tech- niques to monitor disk accesses or modifications to the file system [3, 31, 33, 35, 58, 59, 73, 77, 81]. Kim and Spafford demonstrated that malware intrusions could be de- tected by monitoring Unix systems for unauthorized modifications to the file system with Tripwire [35]. Tripwire performed file-level integrity checks and compared the result to a reference database. While it worked quite well to discover modifications to files, it did not discover changes made to files if they are reverted before the utility is run again. Furthermore, it inherently produced many false positives. Stolfo et al. also developed host-based anomaly detection system which monitored changes to the
45 file system [73]. Their File Wrapper Anomaly Detection System (FWRAP) consisted of a host-based sensor which wrapped around a modified file system to extract infor- mation about each file access. Their anomaly detection algorithm then determined the probability that a file access was abnormal and generated an alert based on the score. Both these host-based solutions require a trusted OS, whereas Dione does not require that the host is uncompromised. On the other end of the spectrum from host-based solutions, Pennington et al. implemented a rule-based IDS that resided on an NFS server [59]. The authors enumerated the specific ways in which malware modifies data on disk, such as modi- fications to system administration files, log scrubbing, and timestamp reversal. Their IDS was effective at catching rootkits that modified persistent data on disk. It was, however, implemented for Linux (which has a far lower share of malware intrusions), and it resided on a separate storage processor, and thus could not be easily utilized for a desktop computer. Dione addresses both of these issues, as it monitors Windows systems with NTFS file systems, and it can monitor either a virtual machine or any desktop with an interposing hardware sensor. While host-based IDSs are problematic because a privileged rootkit can override or misdirect malware detectors, and network-based IDSs lack visibility into the host events, virtualization-based IDSs offer both high visibility and isolation from com- promised operating systems. Garfinkel and Rosenblum introduced the first IDS to leverage virtualization technology, thus revolutionizing malware detection [24]. Their IDS, Livewire, utilized Virtual Machine Introspection (VMI) techniques, such as the monitoring of memory and register contents and events such as interrupts, memory accesses, and device state changes. However, it did not incorporate disk accesses, thus missing out on additional system information.
46 Payne et al. proposed requirements that should guide any virtual machine mon- itoring infrastructure, and implemented XenAccess to incorporate VMI capabili- ties [58]. We observed their requirements in our implementation of Dione, as they provide an excellent guide for the proper design of an infrastructure for monitor- ing VMs. The disk-monitoring capabilities of their proof-of-concept implementation, however, can only be used for paravirtualized guest OSes, which is a simplification of the problem of interpreting a complex file system like NTFS for fully-virtualized Windows guest OSs. Azmandian et al. used low-level architectural events and disk and network accesses in their machine learning-based VMI-IDS [3]. While their instrumentation platform captured more types of events in addition to disk accesses, providing a rich set of features for their IDS, their disk instrumentation lacked the higher-level file system semantics provided by Dione. The work of Zhang et al. is very similar to ours; they presented a VMI-IDS that monitored the disk accesses of the virtual machine under analysis [81]. Their IDS creates a mapping between files and their sectors and monitor accesses to these sec- tors. Their system allows for the creation of rules that that watch for the types of accesses, discussed in [59], that might indicate an intrusion. However, their moni- toring framework is dependent upon virtualization technology, and it only runs for FAT32 file systems, a significantly more simple challenge. Jiang et al. also implemented a VMI-IDS, called VMwatcher, which incorporates disk, memory, and events [31]. However, they too cannot analyze the ubiquitous NTFS file system, and instead their Windows VMs must use the Linux ext2/ext3 file systems. The VMI-IDS of Joshi et al. detect intrusions before the vulnerability is disclosed [33]. Unfortunately, their solution to inspecting disk accesses requires
47 invoking code in the address space of an application process within the guest operating system itself, so to undo the effects of this intrusive action, their heavyweight solution must checkpoint and rollback. With Ghostbuster, Wang et al. [77] present a cross-view diff-based approach to de- tecting rootkits. By enumerating files, configuration settings, and processes at both a high-level (Windows APIs) and low-level (examining the data structures themselves), Ghostbuster can determine whether a stealthy rootkit is hiding evidence of infection. Their host-based solution has the advantage of being able to check for more than just file system operations; however, it will not provide a ground truth. While their approach will catch many file-evasion techniques (including many described in Sec- tion 2.1.3), it can still be evaded by particularly stealthy malware that interposes between the calls to obtain the raw metadata used to construct the low-level view. Furthermore, it only detects file hiding, as opposed to other file system operations, and it performs detection with dedicated snapshots views of the file system. Dione continually updates its views as metadata is written to disk, and thus maintains an up-to-date view. Other researchers have acknowledged the role of disk accesses in malware intru- sions by providing rootkit prevention solutions [11, 21]. With Rootkit Resistant Disks, Butler et al. provide a means to block accesses to directories containing sensitive op- erating system configuration files and executables [11]. Their hardware-based solution requires that all sensitive directories reside on a separate partition from the rest of the file system, and they physically block access to that partition unless a secure token is present. Chubachi et al. also provide a mechanism to block accesses to disk, and they can operate on a file-level granularity [21]. Unfortunately, they collect their mappings of files and sectors before the VM boots, and do not provide a live updating
48 capability as files are created, deleted, or changed in size. As a result, their sector watch list becomes inaccurate as the VM executes. Sundararaman et al. protect disk with a different approach: they developed a new disk format which provides data versioning for roll-back in the event of an intrusion [74]. They selectively version all metadata and user-specified content, allowing users to have block-based protection of their disks but through high-level semantics. However, it requires new file system modification, and thus is only applicable with open-source file systems. Previous work has also addressed the need for malware analyzers, with different solutions operating at different levels of semantics and isolation from malware. Sev- eral solutions perform malware analysis in-host [23, 62, 78]. DiskMon, part of the SysInternals tools for Windows, is an in-host solution which uses kernel event tracing to track file system operations [62]. Another solution for dynamically analyze mal- ware samples (including file system operations) is CWSandbox, which uses hooks in the Windows API to obtain the information it needs and to hide from malware [78]. Since both solutions reside in-host, malware could detect their presences (for example, by checking for hooks in the Windows API) and attempt to deceive the analyzers by providing their own in-host hooks. Many dynamic analyzers instrument the behavior of malware by tracing system calls [23, 38, 43, 41, 68]. These analyzers can use an emulation or virtualization layer to achieve isolation from the malware, and perform low-level semantic reconstruction by introspecting on registers and the VM’s memory. King and Chen’s BackTracker uses a virtualized environment to gather process and file system-related events that led to a system compromise of a Linux guest [38]. Despite BackTracker’s residence in the virtualization layer, the authors concede that
49 malware can hide from it, preventing live analysis. Sitaraman and Venkatesan ex- tended the functionality of BackTracker, providing improvements that will reduce the size of the dependency graph generated by BackTracker [68]. However, the implemen- tation of their event logger is compiled into the kernel or implemented as a loadable kernel module, and as such is not isolated from the malware. The work of Krishnan et al. creates a whole-system analysis by monitoring disk accesses, physical memory, and system calls, and reconstructing their intertwined relationships to provide a complete post-mortem forensic analysis [41]. Their disk monitoring infrastructure logs accesses to disk blocks and periodically performs a scan of the disk to connect blocks to files. The result is that their mappings are only accurate at the time of the scan, and do not reflect the file system changes that may occur between scans. Dione, on the other hand, uses live updating to maintain a perpetually up-to-date of the file system for accurate file system analysis. Kruegel et al. developed TTAnalyze (later renamed Anubis) to profile malware behavior, including file system activities, of Windows systems emulated in Qemu [43]. This approach has many advantages. Their instrumentation provides a rich opportu- nity to track all system calls and their parameters, and also all Windows API functions and parameters. This allows them to have a full-system, on-the-fly reconstruction. They can also identify the running process to limit the trace to only the functions called by the malware sample. Ether is another dynamic malware analyzer which isolates itself from malware through the virtualization layer [23]. Ether monitors malware at different levels of granularity: a fine-grained (instruction-level) granularity, or a coarse-grained (system call level) granularity. The goal of Ether is complete transparency, so that the malware cannot detect that it is being analyzed. Unfortunately, the performance cost of Ether
50 is steep (approximately 3000 times slowdown for single-step instrumentation [79]). Chow et al. introduced the idea of a replay approach with Aftersight [16]. Though it was built with bug-detection in mind, rather than malware-analysis, it provided an interesting foundation for future work. Yan et al. expanded on the heteroge- neous replay approach; their V2E records the malware’s behavior in a transparent virtual machine, then replay its behavior in a software-based dynamic binary analysis platform [79]. V2E provides both transparency and strong instrumentation support, without the high overhead seen in Ether. The authors were able to demonstrate that V2E could defeat common anti-emulation attacks. While Dione only provides file system-level instrumentation traces, and many of these analyzers provide multi-faceted analysis information, Dione provides some advantages not found in other analyzers. First, they are inextricably tied to the platforms they were developed for (for example, Qemu and Xen). Therefore, they cannot be ported to other environments in order to provide side-by-side comparison of environment sensitive malware. Dione, by contrast, can be ported to a variety of virtualization, emulation, and bare hardware platforms, and will produce comparable output reports. Additionally, they cannot provide the same level of ground truth that Dione can provide. The in-host solutions can be misled by malware utilizing lower-level call table hooking or filter drivers; analysis could also be bypassed if the analyzer provides its own hooks and the malware restores the call tables to eliminate the hooks. Even analyzers which reside in the virtualization or emulation layer face the theoretical chance of being intentionally misled by malware. For example, consider an analyzer that identifies library calls by comparing the executing instruction pointer with their exported library function addresses to determine which library function
51 was called. Malware could hook a call table (such as the SSDT, as described in Section 2.1.3), diverting system calls to a different location in memory which does not correspond to the exported library addresses. As a result, the system call would not be recorded unless the hook eventually returns control to the original library function. Analyzers that rely on other tells for the system call id (for example, by recording the value in EAX at each SYSENTER invocation) might even be misled by theoretical malware that encodes and decodes the system dispatch id before and after the SYSENTER transition 1. Since Dione intercepts raw disk accesses, and relies only on state changes and the actual intercepted disk sectors and contents, it cannot be misled by kernel-level malware. Finally, many other analyzers can only obtain information from intercepted system calls and possibly their parameters (Anubis can, by contrast, obtain more information both through the additional Windows API and by injecting new function calls into the executing instruction stream). Dione intercepts all the raw metadata of a file, and can therefore determine every property relating to that file, whether or not it can be read or modified by a Windows API or system call.
3.2 Characterizing Malware Behavior
Though capability labeling for malware behaviors is a more recent discipline, it draws upon work from several related areas. Specifically, previous work has utilized behav- ioral traces and profiling in order to label malware by its family or variant (malware classification and clustering) or by its maliciousness (intrusion detection). Since the templates or specifications of malicious behavior used in these areas tend to identify
1It is worth noting that such deceptive system call obfuscation would only be performed with the unique purpose of thwarting VMI or emulator-based analyzers, and therefore it is far more likely that malware would simply detect the analyzer and exit.
52 specific malware capabilities and techniques, research in this area is directly related to labeling based on the capabilities themselves. Behavioral traces can be gener- ated either statically or dynamically. With the traces, researchers can characterize malware using one of several techniques: machine learning, informal modeling, and formal verification.
3.2.1 Characterizing Malware with Machine Learning
Malware clustering and classification naturally follows from the generation of mal- ware analysis traces. Lee et al. conducted early research using behavioral, rather than signature-based, clustering and classification of malware samples [48]. However, they used simple system call traces to describe malware behavior, and more recent re- search has shown that better results can be achieved with a high-level behavior-based approach, rather than with system call traces [4, 6]. Bailey et al. also used a malware’s behavior in order to create a fingerprint [4]. Rather than system calls, they focus on higher level descriptions of what the malware is doing on the system. They showed that existing antivirus solutions for character- izing malware are inconsistent across products, incomplete across malware strains, and do not contain concise semantics. With a classification technique that focuses on system state changes, instead of low-level system calls or binary signatures, they could do a better job classifying malware (including malware that hadn’t been seen, and therefore didn’t have a signature) than existing antivirus products. Similarly, Bayer et al. use a behavioral profile, rather than just a system call trace, to cluster malware [6]. They introduce taint analysis to Anubis to track dependencies between both native API and Windows API functions, and also track control flow dependencies and network traffic, in order to generate the behavioral profile. This
53 work achieved better clustering than Bailey et al., and with their Locality-Sensitive Hashing based clustering algorithm they can scale to real-world data sets. They also achieved significantly better results than a purely system-call based approach, which they attribute to too much noise in the system call traces. Similarly, Jang et al. present BitShred, a clustering technique for malware triage [30]. They use feature hashing to reduce the high-dimensional feature space drawn from behavioral profiles, and use the Jaccard and BitVector Jaccard distance to measure similarity. Rather than focusing on clustering of known malware variants, Reick et al. de- veloped a classification scheme that can determine whether a new malware instance belongs to a known malware family or is a new malware strain [60]. Behavior traces are obtained from the in-host CWSandbox [78] dynamic analysis platform. They use the Support Vector Machine (SVM) model to classify new behavior; with two vari- ants, they can alternately perform multi-class classification and predict and detect novel malware behavior. Additionally, for each malware family they obtain a feature ranking in order to gain additional insight into its typical behavior patterns. Rieck et al. later argued that batch clustering of malware samples can be extended to include the iterative classification of new samples [61]. However, their approach is closer to system call analysis, as they capture system call traces, encode them into their Malware Instruction Set (MIST), and create behavioral patterns through a slid- ing window in the instruction stream. They bridge clustering and classification with reports that demonstrate typical behavior for homogeneous groups. These prototypes maintain intermediate results, such as cluster assignments in previous iterations of the algorithm, for use in incrementally analyzing new samples. Though this work had some success in identifying malicious code behavior, it
54 was dependent upon static analysis using disassembly, which is known to be easily defeated with malicious hand-tuned assembly code [66]. Interestingly, there has been enough work in malware classification and clustering that additional research seeks to verify, analyze, and even constructively criticize the research and evaluation of previous work. Since classification and clustering algorithms require some sort of distance metric to determine how similar two pieces of malware are, Apel et al. devoted research to evaluating different types of distance metrics [2], and found that the Manhattan distance satisfied their criterion the best. Li et al. attempt to shed light on the inherent problems in using machine learning for classification by constructively criticizing the evaluation previous work [49], in- cluding the state-of-the-art by Bayer et al. [6]. They conclude that the problem arises from the lack of “ground truth”—that using malware samples that can be identi- fied by anti-virus scanners will bias the corpus in favor of easy-to-cluster instances. While the problem is still not solved, it is useful to consider effects such as these in evaluating clustering algorithms.
3.2.2 Characterizing Malware Using Modeling
Instead of using machine learning techniques on behavioral profiles, complementary research aims to use formal [7, 9, 20, 19, 36, 37, 67, 70, 69, 71] and informal [39, 44, 72] verification techniques to label or classify malicious malware samples. Kruegel et al. aimed to determine whether a Loadable Kernel Module (LKM) in Linux resembled that of a rootkit when loaded into kernel space [44]. They created an abstract model of program behavior using static analysis, generating a control flow graph of preprocessed kernel module code, and compared it to an informally-defined specification of rootkit behavior. Kirda et al. also generated informal specifications
55 of spyware behavior using a combined static and dynamic analysis approach [39] and a customized browser instrumentation infrastructure. In their work on AccessMiner, Lanzi et al. analyze and model benign program behavior to better understand malicious behavior [47]. After first attempting a ma- chine learning-based approach and demonstrating that an n-gram sliding windows of system calls does not produce sufficiently accurate results, they model benign behav- ior with an access activity model, whereby benign activity is expressed in terms of access tokens to system resources (eg, files and registry keys). Researchers have observed that a common behavior employed by malicious pro- grams relates to the way sensitive data is treated, and have developed informal policies to define this behavior [72, 80]. Stinson et al. informally define a malicious bot be- havior as one in which data is received from the network and subsequently used as an input parameter to a system call—that is, an untrusted source is fed into a trusted sink [72]. They use system call interposition and tainting to achieve their dynamic analysis. With Panorama, a whole-system information flow tracking system, Yin et al. also used taint propagation to detect malicious behavior. Panorama can detect when a malware sample accesses sensitive data that it should not have access to, and can track what it does with that sensitive data [80]. Many researchers have chosen static analysis to obtain the traces that will be used in their behavioral analysis. Bergeron et al. were among the first researchers to utilize formal verification techniques to detecting malicious code patterns in malware [9]. The authors used static analysis to generate a control flow graph of security-critical API calls, and then used model checking to verify these graphs against a malicious code specification. Likewise, Singh et al. identify fundamental functionality that sufficiently capture malicious properties of a virus, which they call organs, including
56 survey, concealment, propagation, injection, and self-identification [67]. They use Linear Temporal Logic (LTL) formulas to encode malicious behaviors. However, these early works do not provide comprehensive evaluations of their methodologies on sufficient amounts of real-world malware samples. Kinder et al. also sought to describe and identify malware based on behavioral signatures using model checking [36]. In order to succinctly and comprehensively de- scribe these behaviors, they developed and demonstrated the use of a new temporal logic, Computation Tree Predicate logic (CTPL), on statically-generated instruction traces. They demonstrated that the same specification of malicious behaviors could be used to identify several different real-world worms [37]. CTPL was extended to ex- press stack operations by Song and Touili [70]. The resulting logic was called SCTPL and allowed them to model a program using a Pushdown System with predicates over the stack. They further expanded this work to produce SCTPL formulas that consider values, rather than names, of registry and memory locations [69], and they also improve the efficiency of the detection algorithm. Finally, they abandon the branching logic variants of CTL for a Linear Temporal Logic in [71]. In doing so, they describe LTPL, a linear temporal logic with predicates, and then extend it for their Pushdown System with stack semantics for SLTPL. Similarly, Beaucamps et al. utilized a variation of LTL with predicates [7]. Their two contributions were to abstract static traces into high-level behaviors, and then use model checking to com- pare them against a malware specification expressed in First-Order Linear Temporal Logic (FOLTL). While these papers utilized model checking in a novel way, they were more concerned with intrusion detection—labeling a sample as malicious or benign— rather than on generating specifications and detecting capabilities on samples (both malicious and benign), as ours does.
57 Christodorescu et al. provided a richer specification of malware behavior [20]. They developed formal templates of malicious behavior consisting of instruction se- quences with variables and symbolic constants; a match for malicious behavior is detected when a malware sample’s instruction sequence is a match for a template. Since many of these formal and informal modeling methodologies require hand- written malware specifications, Christodorescu et al. developed an automated system to generate malware specifications, or malspecs [19]. A malspec is generated from the system call-based dependence graphs of malicious programs, and is represented as a dependence graph. Recognizing the much of the previous work in malware detection demonstrated high detection rates, though without a common testing methodology and lacking large datasets on which to test the algorithms (in some cases, models were tested with only a dozen or two benign and malicious samples), Likewise, in their preventative system, Kolbitsch et al. represent malicious behavior in a dependency graph of relevant system calls [40]. Then, their on-line scanner monitors system call invocations and parameters, and determine on-the-fly whether the program matches one of the behavior graphs. Canali et al. performed a detailed study of previously-researched malware de- tectors [12]. They explored the design space of hundreds of models, and tested the models on hundreds of thousands of samples. They demonstrated that analytical rea- soning does not demonstrate utility, but that it must be supplemented by a rigorous evaluation with a sufficiently-large dataset. Finally, the research of Martignoni et al. into capability labeling most closely aligns with the work described in this dissertation [52]. They create high-level be- havior specifications from domain knowledge. From malware samples, they generate behavior graphs, and use a behavior matching algorithm to determine whether the
58 sample exhibits each high-level behavior. However, they target different types of ca- pabilities (generally, network-related capabilities and keylogging), and they represent their specifications using behavior graphs (or and/or graphs), instead of the more succinct LTPL. Additionally, they tested their approach on a mere 25 samples (11 benign); given the variety of ways in which malware can manifest a certain behavior, it does not sufficiently demonstrate the effectiveness of their solution.
59 Chapter 4
Dione: A Disk Instrumentation Framework
Dione is a flexible, policy-based disk I/O monitoring and analyzing infrastructure [51]. Dione maintains a view of the file system under analysis. A disk sensor intercepts all accesses from the System-Under-Analysis (SUA) to its disk, and passes that low-level information to Dione. The toolkit then reconstructs the operation, updates its view of the file system (if necessary), and passes a high-level summary of the disk access to an analysis engine as specified by the user-defined policies. The rest of this section discusses Dione in more detail.
4.1 Threat Model and Assumptions
Our threat model does not require that the SUA is trusted or uncompromised. The SUA can be compromised by malware with administrator-level privileges that can hide its presence from host-level detection mechanisms. The attacker may access, modify, create, or delete files anywhere in the file system.
60 However, we assume that there is some disk-level artifact of the malware infection. This means that the malware needs to either download files to the hard disk, create new files, or modify existing files. We can still observe these operations even if a kernel-level rootkit has attempted to hide these operations and artifacts from a host detection mechanism. We assume that there is a sensor that interposes between the SUA and its hard disk and provides disk access information. This sensor can be a software sensor (e.g., a virtualization layer) or a hardware sensor. We assume that both the sensor providing the disk access information and the Analysis Machine (that is, the machine which runs Dione) are trusted. Therefore, in a virtualization-based solution, neither the hypervisor nor the virtual domain which is serving as the Analysis Machine can have been compromised. In a physical solution, the separate machine running Dione cannot have been compromised.
4.2 Dione Operation
There are four discrete components to Dione: A sensor, a processing engine, an anal- ysis engine, and the Dione Manager. The Dione architecture is shown in Figure 4.1. The Sensor interposes between the SUA and its disk. It intercepts each disk access, and summarizes the access in terms of a Logical Block Address (LBA, or simply sector), a sector count, the operation (read/write), and the actual contents of the disk access. The sensor type is flexible. It can be a physical sensor, which interposes between a physical SUA and the analysis machine, or a virtual sensor, such as a hypervisor, which intercepts disk I/O of a virtual SUA. The Processing Engine is a daemon on the analysis machine. The multithreaded
61 Anallyssiiss Macchiine Disk Processing Engine (Dione Daemon)
Disk SSeenssorr Live Policy Analysis Access Updating Engine Engine Classification
Dione Manager SSyysstteem Unnddeerr AAnnaallyyssiiss ((SSUAA))
Figure 4.1: High-level overview of Dione Architecture.
Dione daemon interacts with both the user and the sensor. It receives disk access information from the sensor, and performs three steps. The first step is Disk Access Classification; for each sector, it determines which file it belongs to (if known) and whether the access was to file content or metadata. In the Live Updating phase, it compares the intercepted metadata to its view of the file system to determine if any high-level changes occurred. It passes the high-level access summary to the Policy Engine, which determines if any policies apply to the file accessed. If so, it passes the information along to the analysis engine. The Analysis Engine performs some action on the information it has received from the processing engine. Currently, the analysis engine logs the accesses to a file, but future work will extend the analysis engine. An example of a portion of an outputted analysis log is provided in Figure 4.2. The Dione Manager is a command line program which the user invokes to send commands to the Dione daemon. The commands can be roughly divided into two categories: Policy Commands and State Commands. A summary of all commands is presented in Table 4.1.
62 Command Description declare-rule Declare a new rule for instrumentation. Types of rules include: • access: Record an access to file content/metadata • operation: Record high-level file system operation, (e.g., file cre- ation, deletion, move) • anti-forensics: Record anti-forensics operation, (e.g., file hiding, timestamp reversal, Alternate Data Stream (ADS) creation/dele- tion) • MBR Alert: Record read/write access to Master Boot Record (MBF) delete-rule Delete a previously-declared rule list List all rules apply Bulk-apply declared rules to file record data structures scan Perform a full scan of a disk image (or mounted disk partition), creating all file records from the raw bytes and automatically applying all declared rules save Save the state of the Dione file record hierarchy to a file to be loaded from later load Load the Dione file record hierarchy from a previously-save configuration file
Table 4.1: Commands used for communication with the Dione daemon.
63
Figure 4.2: Sample Dione Disk Trace.
4.2.1 Dione Policy Commands
As Dione instruments the file system under analysis, the user can specify policies to determine whether the instrumentation data should be passed along to the Analysis Engine. A policy specifies an action to be taken on a file for a given operation. The Policy Engine is a flexible framework for declaring new policies. Currently, we have implemented four types of policies: Record, TimeStamp Alert, Hide-Alert, and MBR Alert. Policies can be declared or deleted at any point when Dione is running, including when it is actively monitoring a live system. The Record policy specifies whether accesses should be recorded to a log file. When an access is recorded, Dione will specify whether it was to file content or metadata, whether it was a read or write, and whether it was a special operation such as a file creation, deletion, or renaming. A special annotation is provided for files which are created with their hidden property set to hide from the user. The
64 Timestamp Alert detects a specific symptom of intrusion: the reversal of any of the time-stamp properties of a file (the so-called Modification, Access, and Creation (MAC) times). The Hide-File Alert detects the hiding of a file. For each of these three policy types, optional arguments specify whether the policy should apply for reads, writes, or both. If the specified file is a directory, the policy can optionally apply to all of its descendants. If a file does not exist when the policy is declared, the policy will remain in the system and will be automatically applied when the file is created in the SUA. The MBR Alert looks for an access to a specific region of disk: the sectors on the partition containing the Master Boot Record (MBR). This policy, when applied, records reads and writes to the sectors on the MBR partition. In the Policy Command category, the user can declare, delete, list, or bulk-apply policies.
4.2.2 Dione State Commands
In the State Command category, Dione loads and saves a view of state of the file sys- tem under analysis. The load step is necessary to pre-populate Dione data structures with the state. This step is required before Dione will begin monitoring I/O. The goal of this stage is that Dione will already know everything about the file system before the SUA boots, so that it can immediately begin monitoring and analyzing disk I/O. This step can be accomplished with a disk scan, which reconstructs the file system from the raw bytes of the disk, or by loading a previously saved configuration file. The advantage of the load/save functionality is that a disk scan only needs to be performed once, which is useful in the case of very large disks with many files for which a raw scan takes longer than a load.
65 4.3 Live Updating
As the SUA boots and runs, new files are created, deleted, moved, expanded, shrunk, and renamed. As a result, the pre-populated view of the SUA’s file system, including the mappings between sectors and files, quickly become out-of-date, reducing the accuracy of the monitoring and logging of disk I/O. The solution to this problem is Live Updating: an on-the-fly reconstruction of disk events based solely on the intercepted disk access information. In the next sections, we will detail the challenges and solutions to live updating. As our implementation is initially geared toward Windows systems with the NTFS file system, and NTFS is particularly susceptible to the challenges inherent to live updating, we will begin with an introduction to those NTFS concepts which will aid in the understanding of the live updating implementation.
4.3.1 Live Updating Challenges
There are two big challenges to live updating: overcoming the Semantic Gap and the Temporal Gap. The Semantic Gap is a well-studied problem in which low-level data must be mapped to high-level data. In our case, we need to map the raw byte contents of a disk access to files and their properties. Fortunately, there are existing techniques, such as the open-source The Sleuth Kit (TSK) [13], which do much of the work to bridge the semantic gap. The Temporal Gap occurs when low-level behaviors occurring at different points in time must be pieced together to reconstruct high-level operations. The high-level operations that Dione monitors include file creation, deletion, expansion, move/re- name, and updates in MAC times and the hidden property.
66 The first challenge of live updating is identifying the fields in an intercepted MFT entry for which a change indicates a high-level operation. Often is is not just a single change in an intercepted MFT entry that indicates a high-level operation, but a combination of changes across multiple intercepted MFT entries. Due to requirements for reliability, these changes will be propagated to disk in an inconvenient ordering. As a result, Dione must piece together the low-level changes across time in order to reconstruct high-level events. The biggest challenge resulting from the temporal gap is the detection of file creation. An intercepted MFT entry lacks two critical pieces of information: the MFT index of that entry, and the full path of the file it describes. For a static image, it not a challenge to calculate both. However, in live analysis, the metadata creation will occur before the $MFT file’s runlist is updated—and just like any other file, $MFT can expand to a non-contiguous location on disk. Therefore, in certain cases it can be impossible to determine (at the time of interception) the index of a newly created file. In fact, it can be impossible to determine at interception time whether a file creation actually occurred in the first place. A similar challenge arises in determining the absolute path of a file. The MFT entry contains only the MFT index of that file’s parent, not its entire path. If the parent’s file creation has not yet been intercepted, or the intercepted parent did not have an MFT index when its creation was intercepted (due to the previously described problem), Dione has no way to identify the parent and thus reconstruct the path. This situation occurs quite frequently whenever an application is being installed. In this case, many (up to hundreds or thousands) of files are created in a very short amount of time. Since the OS bunches writes to disk in one delayed burst, many hierarchical directory levels are created in which files cannot determine their paths.
67 The temporal gap also proves a challenge when a file’s attributes are divided over multiple MFT entries. As Dione will only intercept one MFT entry at a time, it will never see the full picture at once. Therefore, it needs to account for the possibility of only intercepting a partial view of metadata.
4.3.2 Live Updating Operation
Live updating in Dione occurs in three steps. First, file metadata is intercepted as it is written to disk. Next, the pertinent properties of the file are parsed from the metadata, resulting in a reconstructed description of the file whose metadata was intercepted. Finally, Dione uses the intercepted sector, the existing view of the file system, and the reconstructed file description from the second step to determine what event occurred. It updates the data structures to represent the file system change. After intercepting an access to disk, Dione looks at the intercepted disk contents and approximates whether the disk contents “look like” metadata (i.e., whether the contents appear to be an intercepted MFT entry). If it looks like metadata, Dione parses the raw bytes and extracts the NTFS attributes. It also attempts to calculate the MFT index by determining where the intercepted sector falls within Dione’s copy of the MFT runlist. With this calculated index, it can attempt to retrieve a File Record. There are two outcomes of this lookup: either a valid File Record is retrieved, or no File Record matches the index. If a valid File Record is found, Dione will compare the extracted attributes to those attributes found in the existing File Record. If any changes are detected, it will modify the File Record to reflect the changes. A summary of the semantic and temporal artifacts of each type of file operation is presented in Table 4.2. However, if a valid File Record is not found, one of three situations has occurred.
68 Operation Artifacts • No existing File Record for calculated index File Creation • Sector falls within MFT runlist, otherwise buffer until MFT runlist expands to include sector
• File Record exists for calculated index File Deletion • In-Use flag off in intercepted MFT entry header
• File Record exists for calculated index • Creation Time: Intercepted > F ileRecord, File Replacement∗ OR MFT Entry Sequence Number: Intercepted > F ileRecord OR MFT Entry type (base vs. nonbase) changed
• File Record exists for calculated index File Rename • File Name: Intercepted 6= F ileRecord
• File Record exists for calculated index File Move • Parent’s MFT Index: Intercepted 6= F ileRecord
• File Record exists for calculated index File Shrink/Expand • Runlist: Intercepted 6= F ileRecord
• File Record exists for calculated index Timestamp Reversal • MAC Times: Intercepted < F ileRecord
• File Record exists for calculated index File Hidden • Hidden flag: Intercepted = 1 && F ileRecord = 0
• File Record exists for calculated index ADS Creation • List of $Data attributes: Intercepted 6= F ileRecord
• File Record exists for calculated index ADS Deletion • List of $Data attributes: Intercepted 6= F ileRecord
Table 4.2: Summary of the artifacts for each file system operation. An MFT index is computed based on the intercepted sector and the known MFT runlist. If a file record is found with the calculated index, properties of the file record are compared with properties parsed from the intercepted metadata. ∗ A replacement is characterized by a file deletion and creation within the same flush to disk, whereby the same MFT entry is reused. 69 In the first case, a new file has just been created, and it has been inserted into a “hole” in the MFT. The file creation can be verified because the intercepted sector falls within the known runlist of the MFT. In the second case, a new file has just been created, but the MFT was full, and thus it could not be inserted into a hole. Dione buffers a reference to this file in a list called the Wait Buffer 1. Eventually Dione will intercept the $MFT file’s expansion, and the file creation can be validated and the path constructed. In the final case, the intercepted data had the format of metadata (e.g., the data looked like an MFT entry), but the data actually turned out to be the contents of another file. This happens for redundant copies of metadata and for the file system’s $Logfile; additionally, a malicious user could create file contents which mimic the format of a MFT entry. In any of these cases, a reference to this suspected file—and the sector at which it was discovered—will be saved in the Wait Buffer. However, the Wait Buffer will be periodically purged of any File Records when their corresponding sectors are verified as belonging to a file which is not $MFT.
4.4 Disk Sensor Integration
In order to be portable to any type of sensor, the Dione instrumentation library is compiled as a library. The Dione daemon is an executable created from the library. Communication between the Dione daemon requires two corresponding components: A sensor-side API and a Dione receiver. The receiver is compiled into the Dione library, whereas the sensor-side library is compiled separately. Therefore, an inter- process (in the case of a virtualization or emulation-based sensor) or inter-system (in the case of a physical sensor) communication protocol is required.
1A newly-created file will also be placed in the Wait Buffer if it has a valid MFT index, but its path cannot be constructed because its parent has yet to be intercepted.
70 The virtualization and emulation based sensors (using Xen and Qemu, respec- tively) utilize an interprocess communication protocol in order to communicate disk access information between the hypervisor/emulator and the Dione daemon. We have implemented a producer/consumer communication protocol using shared mem- ory and semaphores. A sensor-side API (called the DiskMonitor) provides two exter- nally available functions. The first is an initialize function; it is called from the Xen or Qemu I/O initialization function, and it sets up the shared memory region and semaphores. The second is a disk access function; it marshalls the disk access information (LBA, count, operation, and access contents) into the shared memory region, and is therefore called once per multi-sector disk access. The Xen-based implementation calls these functions from within the block device driver. The Qemu-based implementation calls these functions from within the dma- helpers device driver. The Xen implementation works for raw disk images, whereas the Qemu implementation works for both raw and the new Qemu Copy-on-Write (QCOW2) disk image formats. The physical sensor, created with a custom FPGA board, interposes between a system and its hard disk; therefore, it allows Dione to instrument a physical SUA, preventing the malware from detecting that it is being analyzed. The physical sensor parses the disk access information (LBA, count, operation, and access contents) from the SATA commands. It then passes them along to the Dione daemon, which is running on another physical system, through ethernet. The Dione library is compiled with a client-side receiver that opens a socket for the given network interface and waits for packets on that socket. The disk access information for each packet is unmarshalled and passed to the rest of the Dione daemon.
71 4.5 Experimental Results
Next, we evaluate the accuracy and performance of Dione and demonstrate its utility using real-world malware. Though Dione is a flexible instrumentation framework capable of collecting and analyzing data from both physical and virtual sensors, we use a hypervisor-based solution which utilizes the virtualization layer as a data-collecting sensor.
4.5.1 Experimental Setup
Our virtualization-based solution uses the Xen 4.0.1 hypervisor. Our host system contains a dual-core Intel Xeon 3060 processor with 4 GB RAM and Intel VMX hardware virtualization extensions to enable full-virtualization. The 160 GB, 7200 RPM SATA disk was partitioned with a 25 GB partition for the root directory and a 80 GB partition for the home directory. The virtual machine SUA ran Windows XP Service Pack 3 with the NTFS file system.
4.5.2 Evaluation of Live Updating Accuracy
In order to gauge the accuracy of live updating, we ran a series of tests to determine if Dione correctly reconstructed the file system operations for live updating. For our tests, we chose installation and uninstallation programs, as they perform many file system operations very quickly and stress the live updating system. We chose three open source applications (OpenOffice, Gimp, and Firefox), and performed both an installation and a uninstallation for each. We also ran an all-inclusive test that installed all three, then uninstalled all three.
72 Program Creations (Delayed) Deletions Moves Errors OpenOffice Install 3934 3930 1 0 0 Gimp Install 1380 1380 0 0 0 Firefox Install 152 135 71 0 0 OpenOffice Uninstall 353 62 3788 3836 0 Gimp Uninstall 5 0 1388 0 0 Firefox Uninstall 6 0 80 0 0 All 6500 6114 5986 3815 0
Table 4.3: Breakdown of file system operations for each benchmark. The subset of file creations which wait for the delayed expansion of the MFT are also indicated.
These benchmarks perform a varying number of changes to the file system hier- archy. Table 4.3 lists each of the seven benchmarks and the number of file creations, deletions, and moves 2. As discussed in Section 2.3.1, if many new files are created at once and the MFT does not have enough free space to describe them, there is a delay between when the file creation is intercepted and when the MFT expands to fit the new file metadata (at which point the file creation can be verified). We also include the number of delayed-verification file creations in Table 4.3, as these create additional stress to Dione’s live updating accuracy. For each test, we started from a clean Windows XP SP3 disk image. We executed one of the seven programs in a VM, instrumenting the file system. We then shut down the VM, and dumped Dione’s view of the dynamically-generated state of the file system to a file. We then ran a disk scan on the raw static disk image, and compared the results of the static raw disk scan to the results of the dynamic execution instrumentation. An error is defined as any difference between the dynamically- generated state and the static disk scan. This includes a missing file (one that was not reported created), an extraneous file (one that was not reported deleted), a misnamed
2The “All” test is not a sum of the individual tests, because the operating system also creates, deletes, and moves files, and the number of these may change slightly through tests.
73 file, a file with the wrong parent ID or path, a file mislabeled as a file or directory, a file mislabeled as hidden, a file with an incorrect timestamp (of any of the four timestamps maintained by Windows), or a file with an incorrect runlist. Table 4.3 shows the result of the accuracy tests. In each case, Dione maintained a 100% accurate view of the file system, with no differences between the dynamically-generated view and the static disk scan.
4.5.3 Evaluation of Performance
In order to gauge the performance degradation associated with disk I/O instrumen- tation using Dione, we ran two classes of benchmarks: one high in file content reads and writes, and one high in file metadata reads and writes.
Iozone Benchmark
Iozone generates and measures a variety of file operations. It varies both the file size and the record size (e.g., the amount of data read/written in a given transaction). Because it creates very large files, reading and writing to the same file for each test, this is a content-heavy benchmark with very little metadata being processed. We ran all Iozone tests on a Windows XP virtual machine with a 16 GB virtual disk and 512 MB of virtual RAM. We used the Write and Read tests (which stream accesses through the file), and Random Write and Random Read (which perform random accesses). We varied the file size from 32 MB to 4 GB, and chose two record sizes: 64 KB and 16 MB. We ran each test 50 times to average out some of the variability that is inherent with running a user-space program in a virtual machine. For each test, we ran three different instrumentation configurations. For the Base- line configuration, we ran all the tests without instrumentation (that is, with Dione
74 turned off). In the second configuration, called Inst, Dione is on, and performing full instrumentation of the system. There are, however, no rules in the system, so it does not log any of these accesses. This configuration measures the minimum cost of instrumentation, including live updating. The final configuration is called Inst+Log. For these tests, Dione is on and providing instrumentation; additionally, a rule is set to record every access to every file on the disk. Figure 4.3 shows the results of the tests. Each of the lines represents the performance with instrumentation, relative to the baseline configuration. For the Read Iozone tests (Figures 4.3(a) and 4.3(b)), the slowdown attributed to instrumentation is near 0 for files 512 MB and smaller. Since the virtual machine has 512 MB of RAM, Windows prefetches and keeps data in the page cache for nearly the entire test. Practically, this means that the accesses rarely go to the virtual disk. Since Dione only instruments actual I/O to the virtual disk—and not file I/O within the guest OS’s page cache—Dione is infrequently invoked. At larger file sizes, Windows needs to fetch data from the virtual disk, which Xen intercepts and communicates to Dione. At this point, the performance of in- strumentation drops relative to the baseline case. In the worst case for streaming reads, Dione no-log instrumentation achieves 97% of the performance of the unin- strumented execution. For the random read tests with large file sizes, there is a larger penalty for in- strumentation. Recall that Dione incurs a penalty relative to the amount of data accessed on the virtual disk. Therefore, the penalty is higher when more accesses are performed than are necessary. Windows XP utilizes intelligent read-ahead, in which the cache manager prefetches data from a file according to some perceived pattern. For random reads, the prefetched data may be evicted from the cache before it is
75 Instrumentation Overhead Instrumentation Overhead Read Tests (64 KB Record) Read Tests (16 MB Record)
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4 Inst (Stream) Inst (Stream) Inst (Random) Inst (Random) 0.2 0.2 Inst+Log (Stream) Inst+Log (Stream)
Performance Relative to Baseline Inst+Log (Random) Performance Relative to Baseline Inst+Log (Random) 0.0 0.0 32 64 128 256 512 1024 2048 4096 32 64 128 256 512 1024 2048 4096 File Size (MB) File Size (MB) (a) Read Test, 64 KB Record Size (b) Read Test, 16 MB Record Size
Instrumentation Overhead Instrumentation Overhead Write Tests (64 KB Record) Write Tests (16 MB Record)
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4 Inst (Stream) Inst (Stream) Inst (Random) Inst (Random) 0.2 0.2 Inst+Log (Stream) Inst+Log (Stream)
Performance Relative to Baseline Inst+Log (Random) Performance Relative to Baseline Inst+Log (Random) 0.0 0.0 32 64 128 256 512 1024 2048 4096 32 64 128 256 512 1024 2048 4096 File Size (MB) File Size (MB) (c) Write Test, 64 KB Record Size (d) Write Test, 16 MB Record Size
Figure 4.3: Performance of instrumentation, normalized to the baseline (no instru- mentation) configuration for Iozone benchmarks for streaming and random read and write tests.
76 used, resulting in more accesses than necessary. This also explains why the penalty is not as high for the tests using the larger record size (for a given file size). Win- dows adjusts the amount of data to be prefetched based on the size of the access, so the ratio of prefetched data to file size is higher with a larger record size. With more prefetched data, there is a higher likelihood that the data will be used before it is evicted from the cache. Fortunately, this overhead is unlikely to be incurred in practice, as random-access of a 2 GB file is rarely performed. Another observation is that the performance of Dione actually improves for streaming and random reads as file sizes get larger than 1 and 2 GB (respectively). This is explained by considering the multiple levels of memory hierarchy in a virtu- alized system. As the file size grows larger than the VM’s RAM, I/O must go to the virtual disk. However, the file may still be small enough to fit in the RAM of the host, as the host will naturally map files (in this case, the VM’s disk image) to its own page cache. Thus, disk reads are not performed from the physical disk until the working size of the file becomes larger than available physical RAM. Since phys- ical disk accesses are very slow, any cost associated with Dione instrumentation is negligible compared to the cost of going to disk. The Iozone Write tests (Figures 4.3(c) and 4.3(d)), show some performance degra- dation at small files sizes. Windows must periodically flush writes to the virtual disk, even if the working set fits in the page cache. However, the performance impact is minimal for all file sizes, with a worst-case 10% performance degradation, though it is generally closer to 3%. Additionally, the random write tests do not show the penalty associated with random reads. Since Windows only writes dirty blocks to disk, there are fewer unnecessary accesses to disk. It is also noticeable that speedup values are sometimes greater than 1 for the
77 32 MB file size write tests. This would imply that the benchmark runs faster with instrumentation than without. In reality, this effect is explained by an optimization Windows uses when writing to disk. Instead of immediately flushing writes to disk, writes are buffered and flushed as a burst to disk. With this Lazy-Writing, one eighth of the dirty pages are flushed to disk every second, meaning that that a flush could be delayed up to eight seconds. From the perspective of the user—and therefore, the timer—the benchmark is reported to have completed. In reality, the writes are stored in the page cache and have yet to be flushed to disk. The long-running benchmarks will have flushed the majority of their writes to disk before the process returns. However, a short-running benchmark—such as the Iozone benchmarks operating on a 32 MB file—may still have outstanding writes to flush. The time it will take to flush these will vary randomly through the tests. We reported a 21-24% standard deviation (normalized to the mean) for the baseline, instrumentation, and logging tests. This effect is examined in more detail in the next section. For all tests, the cost of logging all accesses is relatively low, falling anywhere from 0-8%. For these tests, the root directory (under which the logs were stored) was on a separate partition than the disk image under instrumentation. Therefore, logging introduced an overhead, as the disk alternated between writing to the log file and accessing the VM’s disk image. This performance penalty can be reduced by storing the log on the same partition as the disk image. Future work can also reduce the overhead by buffering log messages in memory—performing a burst write to the log—to reduce the physical movements of the disk.
78 Normalized Instrumenation Performance Execution Time 180 Inst No Inst Inst + Log 160 Inst 1.2 Inst+Log 140 1.0 120
0.8 100
0.6 80
60 0.4 Execution Time (s) 40 0.2 20 Performance Relative to Baseline
0.0 0 OO Gimp Firefox OO Gimp Firefox OO Gimp Firefox OO Gimp Firefox Install Install Install Uninstall Uninstall Uninstall Install Install Install Uninstall Uninstall Uninstall (a) Performance of Dione instrumentation (b) Average execution time with and without (error bar equals one standard deviation). Dione instrumentation.
Figure 4.4: Evaluation of Dione instrumentation for Open Office, Gimp, and Firefox Install/Uninstall benchmarks.
Installation Benchmarks
In the second set of performance experiments, we evaluated the overhead of bench- marks that are high in metadata accesses. These tests will heavily stress the live updating part of Dione’s execution. We ran the same six install/uninstall bench- marks as in the accuracy tests; the number of creations (including delayed), deletions, and moves were listed in Table 4.3. We ran each test ten times to average out the variation inherent in running a user-space application on a virtual machine; for each run, we started from a the same clean disk image snapshot. We used a Windows XP SP3 virtual machine with an 8 GB virtual disk and 512 MB of virtual RAM. We compared the baseline execution (with no instrumentation) to full instrumen- tation with Dione with and without logging. Figure 4.4 graphs the execution times of the three configurations, as well as the performance of Dione instrumentation
79 relative to the baseline execution. As Figure 4.4 shows, even when the workload requires frequent metadata analy- sis for live updating, the overhead of instrumentation is low. Without logging, the full instrumentation of the benchmarks causes between a 1% and 5% performance degradation. The three benchmarks with the least penalty are OpenOffice installation and unin- stallation and Gimp installation. These have between 1-2% performance degradation for instrumentation without logging, compared to 5% for Firefox Install and Gimp Uninstall (Firefox Uninstall is excluded for now, and explained in more detail be- low). Figure 4.4(b), which graphs the average execution times of the six benchmarks, provides more insight. These three benchmarks are the longest running of the six benchmarks, which is important because of how Windows performs writes to disk. As described in the previous section, Windows will perform a burst flush to disk, and writes could be delayed as many as eight seconds before they are flushed from the page cache. While the program is reported to have completed, there are still out- standing writes that need to be flushed to disk. This effect is especially pronounced in any program with a runtime on the same order of magnitude as the write delay. We can see this effect in Figure 4.4, which includes error bars showing the normal- ized standard deviation for the 10 runs of each benchmark. The 3 longest-running benchmarks also have the lowest standard deviations. This means that the results of these three tests are the most precise, and the average reflects the true cost of instrumentation. While two of the three shortest-running benchmarks have the high- est reported cost of instrumentation, the standard deviation between tests is greater than the reported performance penalty. The execution time of the Firefox Uninstall is dwarfed by the time Windows may delay its writes—as reflected in its high standard
80 deviation. In practice, this means that a user is unlikely to ever notice a slowdown attributed to disk instrumentation for short bursts of disk activity. These tests also show between a 0% and 9% performance decrease. In these tests, the disk image resided on the same partition as the log file. Therefore, the cost of logging to a file was lower than for the content tests.
4.6 Registry Monitoring
As discussed in Section 2.3.1, Windows stores configuration data for the operating system, users, and applications in the Windows registry. While some of the registry is created on system boot and remains only in memory, much of it is backed up on the disk in the form of Windows registry hive files. There are five registry hive files: system, security, software, default, and SAM. In order to have a dynamic view of the Windows registry hive files that is always up to date, we integrated registry monitoring into Dione. In addition to the file system operations already tracked by Dione, it also tracks when registry keys are created, deleted, or changed. We keep track of the registry hive files in the same way that Windows does: by mapping the files to memory. Initialization of the registry hive files can occur in one of two ways. If the file system state is obtained through a scan of the raw disk, via the operation scan, then an optional argument will carve the registry files out of the raw disk, saving them to memory. These files can also be saved to disk, so that in future system starts, they can be loaded automatically from the saved state, rather then carved from the raw disk. We detail the additional commands that are needed for registry monitoring, as well as new arguments for existing commands, in Table 4.4.
81 New Commands save-registry Save given raw registry hive file from Dione to disk load-registry Load Dione with a previously-saved raw registry hive file stored on disk New Parameters to Existing Commands declare-rule Declare a new rule for instrumentation. New rule type includes: • registry: Save all registry key creations, deletions, changes to ex- isting keys and values. scan Perform a full scan of a disk image (or mounted disk partition), creating all file records from the raw bytes and automatically applying all declared rules. Optionally carve the registry files from the raw disk, saving them in memory for Dione use.
Table 4.4: New commands, as well as new arguments to existing commands, for use to communicate with the Dione daemon to perform registry monitoring.
When a disk write comes across the wire, Dione determines whether that write is to a content sector of one of the registry hive files. If so, it patches its view of the hive file in memory using the sector number, sector count, and associated raw file content. Though writes to other files may be intertwined with writes to the registry, there are never more than three files written to simultaneously; this means that we can judge a series of writes to a hive file to be complete once there have been writes to three other consecutive files. Once a series of writes to the hive file is complete, we parse the hive file using the regfi open source library [55], storing the information for each key and subkey in list. We then compare the previous view of the hive file to this newly-parsed view, and look for any differences. We use a naive algorithm, originally described by Johnson et al. [32] in an internal document, but summarized by [27]. The differencing algorithm we used consists of three steps: First, it goes through the list item by item until two items disagree. Second, it compares the kth item
82 ahead in each list with the k lines following the mismatch, incrementing k in each round, until a match is found. The advantage of this approach is that, if the item to be matched occurs quickly, it will find it quickly. Once the match is found, the algorithm continues to the next disagreement. This algorithm we used is known to be a naive algorithm, though it works well in practice when there are relatively few differences between the items to be compared, and relatively few duplications [27]. With registry modifications, the first case is often true (there are few modifications relative to the list of all keys), and the second is always true (there are no duplications). Unfortunately, occasionally the algorithm produced an inaccurate, noisy trace. In approximately 22 of 1,084 samples, an error in the diff algorithm resulted in an event trace listing the deletion of every key in the registry, followed by the creation of every key in the registry. This error is easily detectable; as a result, we discarded the traces for which the error occurred. Future work will implement a more robust algorithm to avoid this problem.
83 Chapter 5
Labeling Malware Persistence Mechanisms with Dione
In this section, we discuss persistence capability labeling with DCL, the Dione Capa- bility Labeler. We generate specifications for properties, including the service install, service load, system boot, and file access, using Linear Temporal Predicate Logic (LTPL). We support our file loading model with a machine learning classifier that differentiates between two types of file access patterns. We implement an automated testbed and generate Dione traces from over one thousand real-world malware sam- ples, evaluating the accuracy of our models in their ability to detect persistence mechanisms.
5.1 Modeling Persistence Mechanisms with LTPL
In order to demonstrate the successful use of a persistence mechanism to survive and automatically restart after a reboot, we broke each persistence capability into three phases, and DCL models each of the phases. The first phase is installation,
84 whereby the malware makes the necessary changes to the file system (creating new files and modifying existing files) and to the registry (adding new keys and values, and modifying the contents of existing subkeys). The second phase is system boot, whereby we model the sequence of disk operations that are indicative of a system boot. Without the reboot, we cannot test whether the persistence mechanism was successful. Finally, we model the service load, whereby the binary associated with a service, if one was installed, is automatically loaded after reboot. This stage incorporates another model, the file access. This stage demonstrates that the file associated with the persistence mechanism was accessed after the system booted. In order to eliminate false negatives from occurring—with a file access going unlabeled—we keep the model sufficiently generic. In Section 5.4, we bolster the file access model with a machine learning algorithm to differentiate between different types of file accesses, to ensure we correctly label the loading of the program binary associated with the persistence mechanism. In Section 2.4, we discussed model checking and the specification language Linear Temporal Predicate Logic, LTPL, using examples from an x86 instruction trace. In this section, we model persistence capabilities using LTPL, replacing the x86 instruc- tion predicates with seven predicates representing operations obtained from a Dione trace, plus a predicate to perform a regular expression match of two strings. The predicate vocabulary used to model the persistence capabilities from Dione events is provided in Table 5.1. 1
1Recall from Section 2.3.1 that, since keys and values are hierarchically organized, it is useful to think of the hierarchy as analogous to a file system. Each key or value has a path (the concatenation of all keys higher in the hierarchy) and a name, and just like a file, it may optionally hold contents (which we also refer to as the value). Consequently, we can use similar terminology between files and registry keys.
85 Set Name Description RegCreate2(p,n) Event is creation of registry key or value with path p and name n RegCreate3(p,n,v) Event is creation of registry key or value with path p, key name k, and value v MBRRead(s) Event is read of sector offset s of Master Boot Record P ContentRead(f,s) Event is read of sector offset s of file f MetaRead(f) Event is read of metadata associated with file f FileMove(f) Event is move of file to destination file f FileCreate(f) Event is creation of file f RegExMatch(re,s) String s matches the regular expression provided in string re path join Returns the concatenation of an absolute path with F a key or file, resulting in a new path ServicePath The path under which all service subkeys are kept: HKLM\system\ControlSet00X\Services RegExSvcHostEvent The regular expression for an event in which a registry value is created for a service run by svchost.exe (a sys- C tem process that hosts multiple services): REGISTRY CREATION.*ImagePath.*WINDOWS\ system32\svchost.*-k RegExSvcHostFile The regular expression used as the value of the ImagePath registry value when a service is run by svchost.exe: WINDOWS\system32\svchost.*-k
Table 5.1: Function (F), Predicate (P), and Constant (C) symbols for property specifications.
86 We developed the property specifications from domain knowledge. That is, we ob- served both synthetic and real-world software samples, including hand-coded benign software, real-world benign software, and real-world malware samples. We evaluated the models’ accuracy on an entirely different set of samples than the samples we developed the models on.
5.1.1 System Boot
We first model the specification for a system boot, as a detected boot implies the system was shutdown or restarted. A system boot is characterized by a read of Master Boot Record sector 0, followed immediately by a read of the 0th sector of the file content of the file $Boot, followed immediately by a read of the 0th sector of file content of the file $MFT. Equation 5.1 lists the LTPL specification of a system boot.
φSB = F(MBRRead(0) ∧ X(ContentRead(“$Boot”, 0) ∧ XContentRead(“$MFT”, 0))) (5.1)
5.1.2 Service Install
Next, we model the installation of the service. Several events must occur within the trace in order to satisfy the specification for service installation. At some point in the trace, there must be a creation of a key with name k and path equal to the constant string ServiceP ath. There must be a creation of three values; all three have a path that is a concatenation of the constant string ServicePath and the key name k, and with names type, start, and ImagePath, respectively. If there appears any event e in the trace that matches the regular expression of constant RegExSvcHostEvent, there must also be somewhere in the trace a creation of a registry value with name ServiceDll and a path that is the concatenation of
87 the ServicePath, the key name, and the string Parameters. Finally, we require that all the previous events must occur before a system boot. The LTPL specification to perform a service installation is given in Equation 5.2.