Tracking Rootkit Footprints with a Practical Memory Analysis System
Total Page:16
File Type:pdf, Size:1020Kb
Tracking Rootkit Footprints with a Practical Memory Analysis System Weidong Cui Marcus Peinado Microsoft Research Microsoft Research [email protected] [email protected] Zhilei Xu Ellick Chan Massachusetts Institute of Technology University of Illinois at Urbana-Champaign [email protected] [email protected] Abstract An important task in detecting and analyzing kernel rootkits is to identify all the changes a rootkit makes to In this paper, we present MAS, a practical memory anal- an infected OS kernel for hijacking code execution or ysis system for identifying a kernel rootkit’s memory hiding its activities. We call these changes a rootkit’s footprint in an infected system. We also present two memory footprint. We perform this task in two common large-scale studies of applying MAS to 848 real-world scenarios: We detect if real-world computer systems are Windows kernel crash dumps and 154,768 potential mal- infected by kernel rootkits. We also analyze suspicious ware samples. software in a controlled environment. One can use either Error propagation and invalid pointers are two key execution tracing or memory analysis in a controlled en- challenges that stop previous pointer-based memory vironment, but is usually limited to memory analysis for traversal solutions from effectively and efficiently ana- real-world systems. In this paper we focus on the mem- lyzing real-world systems. MAS uses a new memory ory analysis approach since it can be applied in both sce- traversal algorithm to support error correction and stop narios. error propagation. Our enhanced static analysis allows After many years of research on kernel rootkits, we the MAS memory traversal to avoid error-prone opera- still lack a practical memory analysis system that is ac- tions and provides it with a reliable partial type assign- curate, robust, and performant. In other words, we ex- ment. pect such a practical system to correctly and quickly Our experiments show that MAS was able to analyze identify all memory changes made by a rootkit to arbi- all memory snapshots quickly with typical running times trary systems that may have a variety of kernel modules between 30 and 160 seconds per snapshot and with near loaded. Furthermore, we lack a large-scale study of ker- perfect accuracy. Our kernel malware study observes nel rootkit behaviors, partly because there is no practi- that the malware samples we tested hooked 191 differ- cal system that can analyze memory infected by kernel ent function pointers in 31 different data structures. With rootkits in an accurate, robust and performant manner. MAS, we were able to determine quickly that 95 out of In this paper, we present MAS, a practical memory the 848 crash dumps contained kernel rootkits. analysis system for identifying a rootkit’s memory foot- print. We also present the results of two large-scale ex- 1 Introduction periments in which we use MAS to analyze 837 kernel crash dumps of real-world systems running Windows 7, Kernel rootkits represent a significant threat to computer and 154,768 potential malware samples from the reposi- security because, once a rootkit compromises the OS ker- tory of a major commercial anti-malware vendor. These nel, it owns the entire software stack which allows it to are the two major contributions of this paper. evade detections and launch many kinds of attacks. For Previous work [2, 3, 19] has established that, to iden- instance, the Alureon rootkit [1] was infamous for steal- tify a rootkit’s memory footprint, we need to check not ing passwords and credit card data, running botnets, and only the integrity of kernel code and static data but also causing a large number of Windows systems to crash. the integrity of dynamic data, and the real challenge lies Kernel rootkits also present a serious challenge for mal- in the latter task. ware analysis because, to hide its existence, a rootkit at- In order to locate dynamic data, these systems first lo- tempts to manipulate the kernel code and data of an in- cate static data objects in each loaded module, then re- fected system. cursively follow the pointers in these objects and in all newly identified data objects, until no new data object crash dumps taken from 837 real-world systems run- can be added. Unlike the earlier systems, KOP [3] in- ning Windows 7 and memory snapshots taken from Win- cludes generic pointers (e.g., void∗) in its memory traver- dows XP SP3 VMs subjected to one of 154,768 potential sal, and shows that failing to do so will prevent the mem- real-world malware samples. For the Windows 7 crash ory traversal from reaching about two thirds of the dy- dumps, MAS took 105 seconds to analyze a single dump namic objects. on average. It identified a total of about 400,000 suspi- Previous solutions do not sufficiently address an im- cious function pointers. We were able to verify the cor- portant practical problem of this memory traversal pro- rectness of all but 24 of them. Moreover, with the results cedure: its tendency to accumulate and propagate errors. of MAS, we were able to quickly identify 90 Windows 7 A typical large real-world kernel memory image is bound crash dumps (and five Windows Vista SP1 crash dumps) to contain invalid pointers. That is, there are likely to be that were infected by kernel rootkits. In our study of dynamic objects with pointer fields not pointing to valid malware samples, MAS required about 30 seconds to objects. Following such pointers results in objects being analyze each VM memory snapshot. Our study shows incorrectly included in the object mapping. Worse, such that the kernel rootkits we tested hooked 191 function identification errors can be propagated due to the nature pointer fields in 31 data structures. It also shows that of the recursive, greedy memory traversal. A single in- many malware samples had identical footprints, which correctly identified data object may cause many more suggests that we can use MAS to detect new malware mistakes in the subsequent traversal. samples/families that have different memory footprints. Invalid pointers may exist for a variety of reasons. For The rest of this paper is organized as follows. Sec- example, an object may have been allocated, but not yet tion 2 provides an overview of the paper. Sections 3 and initialized. KOP is exposed to a second source of poten- 4 describe the design of MAS and explain the algorithms tial errors. KOP tries to follow all generic pointers. If the used for static analysis and memory traversal. Section 5 pointer type cannot be uniquely determined, KOP tries explains how we evaluate the set of objects found by to decide the correct type using a heuristic. A fraction of MAS for suspicious activity. Section 6 describes our these guesses are bound to be incorrect. implementation of MAS. Section 7 describes our evalua- In light of these problems, we design MAS to con- tion of MAS. Section 8 and Section 9 describe two large- trol the number of errors that arise from following invalid scale experiments in which we analyze malware samples pointers and to contain their effects. Instead of perform- and identify rootkits from crash dumps. Sections 10 and ing a greedy memory traversal that is vulnerable to error Section 11 discuss related work and limitations. Finally, propagation, MAS uses a new traversal scheme to sup- Section 12 concludes the paper. port error correction. MAS also uses static analysis to derive information that can be used to uniquely identify 2 Overview many objects and their types without having to rely on the recursive traversal procedure. Furthermore, MAS is The goal of MAS is to identify all memory changes a not subject to errors caused by ambiguous pointers, i.e., rootkit makes for hijacking execution and hiding its ac- pointers whose type cannot be uniquely determined. It tivities. MAS does so in three steps: static analysis, uses an enhanced static analysis to identify unique types memory traversal, and integrity checking. for a large fraction of generic pointers and ignores all remaining ambiguous pointers. While this may reduce Static Analysis: MAS takes the source code of the OS coverage, it will never cause an object to be recognized kernel and drivers as the input and uses a pointer incorrectly. Our evaluation will show that the impact on analysis algorithm to identify candidate types for coverage is minor. Finally, before accepting an object, generic pointers such as void∗ and linked list con- MAS checks a number of constraints, including new con- structs. Furthermore, it also computes the associa- straints we derive from our static analysis. tions between data types and pool tags [18]. We implemented a prototype of MAS and compared Memory Traversal: MAS tries to identify dynamic it with KOP on eleven crash dumps of real-world sys- data objects in a given memory snapshot. Besides tems running Windows Vista SP1. MAS’s performance the snapshot, the input includes the type related in- is one order of magnitude better than KOP regarding both formation derived from static analysis and the sym- static analysis and memory traversal. MAS did not miss bol information [15] for each loaded module (if it is or misidentify any function pointers found by KOP, but available). KOP missed or misidentified up to 40% of suspicious function pointers (i.e., function pointers that point to un- Integrity Checking: MAS identifies the memory trusted code). changes a rootkit makes by inspecting the integrity In our large-scale experiments, we ran MAS over of code, static data and dynamic data (recognized 2 from memory traversal). In addition to checking #"$"%&'" !" !" +" if some code section is modified, MAS detects two %&" !" &" !" (&" +" *" (#"$"%)'" kinds of violations: (1) a function pointer points to +" !" !" +" (&"$"*'" +" +" a memory region outside of a list of known good !" +" !" modules; (2) a data object is hidden from a system !")"$"*" #" !" (#" +" %)" !" )" program.