Taint Analysis for Automatic Malware Detection and Analysis

Whole-system Fine-grained Taint Analysis for Automatic Malware Detection and Analysis Heng Yin Dawn Song [email protected] [email protected] College of William and Mary Carnegie Mellon University Abstract or bundled-ware containing spyware or adware. More surprisingly, even software provided by reputable ven- As malware is becoming increasingly sophisticated and dors could contain code that performs undesired actions stealthy, effective techniques for malware detection and such as leaking users’ private data. For example, Google analysis are imperative. Previous detection mechanisms Desktop, a popular local file system search tool, has are insufficient. Signature-based detection cannot detect been reported to send users’ private information back to new malware, and watch-point based behavioral detec- Google’s servers [15]. In another example, SONY Me- tion can be evaded by stealthier design. Most previous dia Player has been reported to send users’ listening be- analysis mechanisms are too coarse-grained to capture havior such as which songs the user has listened to back malware behavior and fail to address kernel-level attacks. to SONY [35]. Thus, as users and computers cannot We propose whole-system fine-grained taint analysis for live in isolation from the rest of the Internet and for- automatic malware detection and analysis, and build a eign code gets downloaded and installed unknowingly or prototype called TaintQemu. By tainting data from hard- knowingly to the local system all the time, the users and ware inputs and monitoring its propagation, TaintQemu computers are completely oblivious to what code is ac- generate taint graphs. The taint graph represents how in- tually installed on the local system and whether they will formation propagates during the system execution. We have malicious or undesired behavior or consequences. demonstrate that such whole-system fine-grained taint When a piece of code is installed and executed, how can analysis can capture the intrinsic properties of many dif- we detect whether it will have certain malicious or unde- ferent classes of malware and thus offer effective meth- sired behavior such as violating user’s privacy? This is ods for automatic malware detection and analysis. Our an important open research question. evaluation using a wide spectrum of real-world malware demonstrates that our system is effective in detecting Traditional methods are insufficient in addressing this many different classes of malware including keyloggers, problem. First, previous approaches on sandboxing or backdoor, etc., and offer indispensable assistance to sys- access control do not apply well in this scenario. For ex- tem administrators and analyzers for better understand- ample, system utility programs such as Google Desktop ing of the behavior and consequences of malware. also need to access user’s files and may need to communicate used to automatically detect a wide spectrum of malware, back to Google servers for updates, etc. Users’ privacy by checking the violations from the normal patterns in is only violated when users’ private information is sent taint graph. back to Google. Thus, the traditional sandboxing or access control model is too rigid for this scenario to be able to detect the undesired behavior and at the same time al- 1 Introduction low Google Desktop to perform its alleged functions. Second, previous malware detection mechanisms are Malicious software (i.e., Malware) is becoming increas- insufficient. Previous malware detection mechanisms ingly prevalent and sophisticated. They creep into users’ mostly fall into two categories: signature-based detec- computer systems in ever more creative ways: They tion and watch-point based behavioral detection. The could be mistakenly installed when a user clicks on an at- first category, signature-based detection suffers from the tachment containing a virus, or visits a malicious website drawback that it cannot detect new malware since it which installs malicious software in the background, or needs to know the signature first, and it can be defeated unknowingly installed when the user installs a free-ware by polymorphic and metamorphic malware variants. 1 The second category, watch-point based behavioral gers capture keystrokes belonging to other processes, and detection monitors specific points in the computer sys- packet sniffers monitor or intercept all incoming and out- tem and detects malicious code based on certain heuris- going packets; (2) Access the information in an eccentric tics. An example is Microsoft’s Strider Gatekeeper [38], way, to circumvent security mechanisms enforced on the which monitors auto-start extensibility points to deter- system. For example, to circumvent the firewall, a back- mine surreptitious restart-surviving behavior. A main door may access incoming packets at a lower layer of the drawback for this approach, however, is that it needs network stack. Thus, by monitoring taint tracking in a to know what detection points should be watched and fine-grained manner, we enable automatic detection and what behavior at that detection point should be moni- analysis of a wide-spectrum of malware based on these tored. Malware authors keep exploring innovative tech- insights. niques to achieve their malicious intent and also conceal In particular, we monitor system execution and en- their presence without going through the previously iden- able taint tracking at the instruction level. The result tified detection points, and thus render the existing detec- from taint tracking forms a taint graph. The taint graph tion points useless. For example, early malware detec- represents how information gets propagated. We then tion monitors userland APIs and system calls to detect show that the taint graph can be used in three ways malware behavior that go through these APIs and sys- for automatic malware detection and analysis. First, we tem calls. To evade detection, more sophisticated mal- can build a policy engine to enforce invariants/properties ware moves into the operating system kernel such as on the taint graph—taint flow violates these invari- kernel rootkit. When detection also moves to the ker- ants/properties would indicate an attack. Second, by nel to identify malicious activities, such as detecting ker- observing the taint graph over time, we can learn about nel hooks [6] and kernel data object manipulation [31], normal patterns/profiles in the taint graph. These pro- much stealthier malware design has recently been in- files can then be used for anomaly detection, thus we can vented to circumvent the detection mechanisms [33]. even detect attacks that we do not have specified policies Its demonstration, a stealthy backdoor called deepdoor for. Finally, the taint graph gives us the causal relation- achieves the backdoor functionality by only modifying a ship which allows us to conduct diagnosis and analysis of few DWORDs in NDIS data block, (which is supposed malware behavior. Given a detection point or a malicious to be modified), and eventually evades all the existing action, we can trace back to see how it happened, where detection tools. Another recent study shows the viabil- the malware came from, and how it was installed, and we ity of building malware completely out of the victim op- can also trace forward to see the subsequent actions that erating system, leveraging the technique of virtual ma- the malware has performed. chines [22]. Thus, the history of the arms race between Note that even though information flow tracking has the malware writers’ innovations and the watch-point been proposed previously for intrusion analysis [21], based behavioral detection mechanisms demonstrate that these previous approaches such as Backtracker [21] suf- we need a holistic approach to prevent the malware to fer from several drawbacks: first, they are at process evade monitoring and once-for-all put such an arms race level which is often too coarse grained for malware de- to an end. tection and analysis; second, they do not apply to ker- To address the above issues, we propose a new ap- nel attacks such as rootkit because they monitor program proach for automatic malware detection and analysis: execution through system calls and do not monitor op- whole-system fine-grained taint tracking. By monitor- erating system behavior. By doing whole-system fine ing the whole system (including the operating system), grained taint tracking, our method provides much higher malware cannot evade our detection by avoiding previ- accuracy than previous work and we can handle kernel ously identified detection points as in the watch-point attacks as well such as rootkit. based behavioral detection methods. Moreover, by an- To examine this approach, we further design and im- alyzing the common traits of malware, we identify a uni- plement a prototype called TaintQemu to analyze and de- fied approach that enables the detection and analysis of tect a wide spectrum of malware in Windows, because a wide spectrum of malware: fine-grained taint tracking. the majority of malware aims for Windows platforms. One fundamental trait of malware lies in its data access. Hence, the following discussion assumes Windows sys- While benign software usually accesses the data of its tem, in particular, Windows 2000 and above versions, own interest, malware is inclined to monitor, intercept, although the fundamental technique can be adapted to and modify the data belonging to other programs and the other operating systems. Through extensive exper- even the operating system. Such malware

Taint Analysis for Automatic Malware Detection and Analysis

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support