Classification of Malware Persistence Mechanisms Using Low-Artifact Disk
Total Page:16
File Type:pdf, Size:1020Kb
CLASSIFICATION OF MALWARE PERSISTENCE MECHANISMS USING LOW-ARTIFACT DISK INSTRUMENTATION A Dissertation Presented by Jennifer Mankin to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering in the field of Computer Engineering Northeastern University Boston, Massachusetts September 2013 Abstract The proliferation of malware in recent years has motivated the need for tools to an- alyze, classify, and understand intrusions. Current research in analyzing malware focuses either on labeling malware by its maliciousness (e.g., malicious or benign) or classifying it by the variant it belongs to. We argue that, in addition to provid- ing coarse family labels, it is useful to label malware by the capabilities they em- ploy. Capabilities can include keystroke logging, downloading a file from the internet, modifying the Master Boot Record, and trojanizing a system binary. Unfortunately, labeling malware by capability requires a descriptive, high-integrity trace of malware behavior, which is challenging given the complex stealth techniques that malware employ in order to evade analysis and detection. In this thesis, we present Dione, a flexible rule-based disk I/O monitoring and analysis infrastructure. Dione interposes between a system-under-analysis and its hard disk, intercepting disk accesses and re- constructing high-level file system and registry changes as they occur. We evaluate the accuracy and performance of Dione, and show that it can achieve 100% accuracy in reconstructing file system operations, with a performance penalty less than 2% in many cases. ii Given the trustworthy behavioral traces obtained by Dione, we convert file system- level events to high-level capabilities. For this, we use model checking, a formal veri- fication approach that compares a model extracted from a behavioral trace to a given specification. Since we use Dione traces of file system and registry events, we aim to label persistence capabilities|that is, we label a sample by the mechanism it uses not only to persist on disk, but to restart after a system boot. We model the Windows service, a commonly-employed capability used by malware to persist, load a binary after reboot, and even load dangerous code into the kernel. We model the installation of a Windows service, the system boot, and the file access of the service binary. We test our models on over 1000 real-world malware samples, and show that it success- fully identifies service-installing malware samples over 99% of the time, and malware that loads that service over 98% of the time. Moreover, we demonstrate that we are able to use traces of disk reads to differentiate between two types of file accesses. We show that we can not only detect when a persistence mechanism is installed, but also that the persistence mechanism is successful because we detect the automatic load of the program binary after a system reboot. We correctly identify file access types from disk access patterns with less than 4% of samples mislabeled, and demonstrate that even an expert analyst would have difficulty correctly identifying the mislabeled accesses. iii Acknowledgements First and foremost, I would like to thank my husband Dana. Not only would it have been nearly impossible to complete this work without his love and support, but it most definitely would not have been this much fun! I would also like to thank my family for everything they've done for me and for supporting me throughout the years. I specifically owe my success to my parents for instilling in me a love of learning and logic, and for emphasizing to me the most important thing is to try. The insightful and inspiring help from both my academic and industry advisors was critical throughout this entire process, culminating with this dissertation. I would like to acknowledge the tremendous support of my advisor at Northeastern, Dr. David Kaeli, and thank him for his many years of dedication to helping his students achieve great things. I also want to thank my technical supervisors at MIT Lincoln Labo- ratory, Charles Wright and Graham Baker, for developing this exciting research and guiding me throughout the process. Finally, I would like to thank my colleagues at Northeastern and MIT Lincoln Labs for their invaluable feedback and discussions. iv [This page intentionally left blank.] v Contents Abstract ii Acknowledgements iv v 1 Introduction 1 1.1 Motivation . 3 1.2 Contributions . 10 1.3 Organization of Dissertation . 12 2 Background 14 2.1 Malicious Software . 15 2.1.1 Malware Types . 15 2.1.2 Anti-Forensics Techniques . 16 2.1.3 Evasion Techniques . 18 2.2 Malware Analysis . 26 2.2.1 Static Binary Analysis . 27 2.2.2 Dynamic Analysis . 28 2.3 Windows Concepts . 30 vi 2.3.1 The Windows Registry . 30 2.3.2 NTFS File System . 33 2.3.3 Performance Optimizations for Disk Accesses . 36 2.4 Formal Verification and Model Checking . 37 2.4.1 Predicate Logic . 39 2.4.2 Temporal Logic . 41 2.4.3 Linear Temporal Predicate Logic . 43 2.5 Summary . 44 3 Related Work 45 3.1 Malware Analysis and Instrumentation . 45 3.2 Characterizing Malware Behavior . 52 3.2.1 Characterizing Malware with Machine Learning . 53 3.2.2 Characterizing Malware Using Modeling . 55 4 Dione: A Disk Instrumentation Framework 60 4.1 Threat Model and Assumptions . 60 4.2 Dione Operation . 61 4.2.1 Dione Policy Commands . 64 4.2.2 Dione State Commands . 65 4.3 Live Updating . 66 4.3.1 Live Updating Challenges . 66 4.3.2 Live Updating Operation . 68 4.4 Disk Sensor Integration . 70 4.5 Experimental Results . 72 4.5.1 Experimental Setup . 72 vii 4.5.2 Evaluation of Live Updating Accuracy . 72 4.5.3 Evaluation of Performance . 74 4.6 Registry Monitoring . 81 5 Labeling Malware Persistence Mechanisms with Dione 84 5.1 Modeling Persistence Mechanisms with LTPL . 84 5.1.1 System Boot . 87 5.1.2 Service Install . 87 5.1.3 File Access . 88 5.1.4 Persistent Service Load . 89 5.2 Dione Capability Labeler Implementation . 90 5.3 Experimental Setup . 91 5.3.1 Testbeds . 91 5.3.2 Malware Corpus . 93 5.3.3 Assignment of \Truth" Labels . 94 5.3.4 Model Checker Results . 98 5.4 Labeling File Access Type . 103 5.4.1 Motivation . 104 5.4.2 Program Binary Load Classifier . 107 5.4.3 SVM Classifier Implementation . 108 5.4.4 Results . 110 6 Directions for Future Work 117 7 Thesis Summary and Contributions 119 8 Appendix 122 viii 8.1 Tables . 122 Bibliography 137 ix Chapter 1 Introduction The past decade has been boldly marked by the ongoing arms race between mali- cious software creators and security researchers. Not only are security companies and researchers overwhelmed by the several million new unique samples discovered each month, but the sophistication of malicious software continues to increase as well [46]. Malicious software, or malware, can take many forms. While the amount of harm caused by a malware sample can vary, all malware share the property of having not been installed with the full consent and knowledge of the user. Spyware or adware can be installed on a user's system, causing annoying pop-ups or violating privacy expectations by tracking user habits [54]. Alternatively, malware may force the system to become part of a network of hijacked machines used to send spam, hijack other systems, or perpetuate Distributed Denial of Service (DDOS) attacks on banks or targets of political protest [10]. Increasingly, malware is used for financial gain. For example, banking threats seek to steal credentials from users or banking systems in order to perpetuate financial crimes, while fake-alert and ransomware threats trick the user into paying either for impostor security software or for the safe return of 1 their \ransomed" data [45]. Rootkits can be particularly dangerous, as they exist to provide additional stealth measures to prevent the user or security products from detecting the presence of the rootkit and any other malware it is packaged with [10]. Rootkits can execute with administrator privilege by attacking and patching the code of the operating system. Though the number of new rootkits discovered in the wild has been decreasing since 2011, tens of thousands of new samples are still discovered every month [46]. Furthermore, there is a common adage in security that the winner between malware and a security product is that which was loaded first. As a result, rootkits are increasingly turning to infecting the Master Boot Record (MBR); since it performs key startup operations, infection of the MBR is a devastating attack on the system [45]. Once a rootkit has breached kernel-level code, it is difficult to trust any security product or malware analyzer running on the infected system. In the past couple decades, research into labeling malware has focused on identi- fying the malware by family or variant. While having labels available for new samples is useful to provide a coarse-grained identification, we argue that labeling the behav- ior of the malware could be more useful than identifying the family it belongs to. Capability labeling is a promising solution to understanding how malware behaves. Instead of identifying malware by its family or strain, identifying malware by the capabilities it possesses allows security products to identify the high-level behaviors that new malware is employing. There are several benefits to labeling or identifying capabilities present in malware or software. A system equipped with on-the-fly capability detection could provide notifications to users when software or malware is installed with certain malicious ca- pabilities.