Generic Binary Unpacking with Orders-Of-Magnitude Performance Boost
Total Page:16
File Type:pdf, Size:1020Kb
Session 3A: Binary Analysis CCS’18, October 15-19, 2018, Toronto, ON, Canada Towards Paving the Way for Large-Scale Windows Malware Analysis: Generic Binary Unpacking with Orders-of-Magnitude Performance Boost Binlin Cheng∗‡ Jiang Ming∗† Jianming Fu†‡ Wuhan University & Hubei Normal University of Texas at Arlington Wuhan University University Arlington, TX 76019, USA Wuhan, Hubei 430072, China Wuhan, Hubei 430072, China [email protected] [email protected] [email protected] Guojun Peng‡ Ting Chen Xiaosong Zhang Wuhan University University of Electronic Science and University of Electronic Science and Wuhan, Hubei 430072, China Technology of China Technology of China [email protected] Chengdu, Sichuan 611731, China Chengdu, Sichuan 611731, China [email protected] [email protected] Jean-Yves Marion Université de Lorraine, CNRS, LORIA F-54000 Nancy, France [email protected] ABSTRACT dynamic loader, will reconstruct IAT before original code resumes Binary packing, encoding binary code prior to execution and deco- execution. During a packed malware execution, if an API is invo- ding them at run time, is the most common obfuscation adopted ked through looking up a rebuilt IAT, it indicates that the original by malware authors to camouflage malicious code. Especially, most payload has been restored. This insight motivates us to design an packers recover the original code by going through a set of “written- efficient unpacking approach, called BinUnpack. Compared to the then-executed” layers, which renders determining the end of the previous methods that suffer from multiple “written-then-executed” unpacking increasingly difficult. Many generic binary unpacking unpacking layers, BinUnpack is free from tedious memory access approaches have been proposed to extract packed binaries without monitoring, and therefore it introduces very small runtime over- the prior knowledge of packers. However, the high runtime over- head. To defeat a variety of ever-evolving evasion tricks, we design head and lack of anti-analysis resistance have severely limited their BinUnpack’s API monitor module via a novel kernel-level DLL hi- adoptions. Over the past two decades, packed malware is always a jacking technique. We have evaluated BinUnpack’s efficacy extensi- veritable challenge to anti-malware landscape. vely with more than 238K packed malware and multiple Windows This paper revisits the long-standing binary unpacking problem utilities. BinUnpack’s success rate is significantly better than that of from a new angle: packers consistently obfuscate the standard use existing tools with several orders of magnitude performance boost. of API calls. Our in-depth study on an enormous variety of Win- Our study demonstrates that BinUnpack can be applied to speeding dows malware packers at present leads to a common property: up large-scale malware analysis. malware’s Import Address Table (IAT), which acts as a lookup table for dynamically linked API calls, is typically erased by packers for CCS CONCEPTS further obfuscation; and then unpacking routine, like a custom • Security and privacy → Software reverse engineering; ∗Both authors contributed equally to the paper. †Corresponding authors: [email protected] and [email protected]. ‡(1) Key Laboratory of Aerospace Information Security and Trust Computing; KEYWORDS (2) School of Cyber Science and Engineering, Wuhan University Windows Malware Analysis, Generic Binary Unpacking, Import Address Table, Kernel-level DLL Hijacking Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation ACM Reference Format: on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, Binlin Cheng, Jiang Ming, Jianming Fu, Guojun Peng, Ting Chen, Xiaosong to post on servers or to redistribute to lists, requires prior specific permission and/or a Zhang, and Jean-Yves Marion. 2018. Towards Paving the Way for Large-Scale fee. Request permissions from [email protected]. Windows Malware Analysis: Generic Binary Unpacking with Orders-of- CCS ’18, October 15–19, 2018, Toronto, ON, Canada Magnitude Performance Boost. In CCS ’18: 2018 ACM SIGSAC Conference on © 2018 Association for Computing Machinery. Computer & Communications Security Oct. 15–19, 2018, Toronto, ON, Canada. ACM ISBN 978-1-4503-5693-0/18/10...$15.00 https://doi.org/10.1145/3243734.3243771 ACM, New York, NY, USA, 17 pages. https://doi.org/10.1145/3243734.3243771 395 Session 3A: Binary Analysis CCS’18, October 15-19, 2018, Toronto, ON, Canada 1 INTRODUCTION a well-known commercial packer, even applies virtualization obfus- Malicious software (malware) has become a significant threat to cy- cation to its unpacking routine [85, 102], which will result in 630X bersecurity. Big malware attacks such as ransomware ripple across instruction size explosion when tracing unpacking progress [63]. the world and cause catastrophic damage [6, 66]. Driven by the rich Worse still, malware authors can customize new packers from exis- 2 profit, cyber-criminals are highly motivated to undermine malware ting ones, as many packers are open source (e.g., UPX and Yoda’s 3 detection/analysis by applying numerous obfuscation schemes [94]. Protector ). The lack of anti-analysis resistance renders existing Among them, binary packing is believed to be a panacea to thwart generic unpacking futile for sophisticated packers. Security com- the widely used anti-virus scanning [11, 50, 65, 70]. Binary packers panies have been overwhelmed by packed malware over the past first encode malware code through encryption or compression and two decades, which slows down the response to emerging malware attach an unpacking routine to the packed malware. In addition, threats [15, 64, 73]. An online packing service even utilizes existing packers also erase the Portable Executable (PE) file’s import address packers and anti-malware scanners as a feedback mechanism, and table (IAT) to complicate the analysis of API calls. When a packed it returns the packer that presents the optimal evasion result [69]. malware starts running, the unpacking routine first decodes the In this paper, we present a new generic unpacking idea by stu- payload binary to memory pages and reconstructs its IAT. After dying how packers obfuscate payload binary’s API call resolution. that, the execution flow will jump to the original entry point (OEP) Our approach, named BinUnpack, is motivated by two key observa- to resume malware payload execution. In this way, the actual mali- tions. The first one is, no matter how sophisticated a packer may cious code and data stays unrecognizable until run time, making it evolve, malware payload always interacts with Windows OS to immune to malware analysis techniques that measure static featu- perform malicious behavior (e.g., code remote injection and ran- res. For example, applying machine learning to large-scale packed somware’s file encryption). Malware authors achieve this mainly 4 malware will lead to the detection of packers rather than malicious by calling user-level Windows APIs rather than native APIs , since behavior [100]. An embarrassing fact is many anti-virus scanners, most API semantic information is missing at the native level. Typi- which are widely deployed at the end host, take a particular pac- cally, binary code resolves a Windows API’s address by visiting PE ker’s signature as the identification of malware [93, 104]. A study header’s IAT, which is an address lookup table when calling APIs in 2013 shows that nearly 70% of packed Windows system files are exported by a dynamic-link library (DLL). As the Windows APIs in falsely labeled as malware [67]. payload reveal rich semantics about malware and hence can provide Binary packing technique has evolved from the simple, single- security analysts with an upper hand, our second observation is a layer packer to the complicated, multi-layer packer with a variety packer usually removes the payload’s import address table (IAT) of anti-analysis tricks [97]. When a packed binary starts running, to impede reverse engineering. Afterwards, the unpacking routine the original code is written in memory pages sometime and then will obtain each API’s address and rebuild the IAT at run time. In get executed. Besides, this “written-then-executed” procedure can this way, the restored payload can invoke Windows APIs properly. iterate many times, i.e., the dynamically generated code itself may These observations inspire us to chase down a new heuristic to continue generating new code and executing it. Each iteration of determine the end of unpacking: if an API call is invoked through dynamically generated code is called a layer [97] or wave [14]. The looking up a rebuilt IAT, it indicates that the original code has been 5 previous generic unpacking tools have fully utilized this feature restored, and the control flow has reached OEP already . We call by tracking the “written-then-executed” instructions [3, 9, 36, 55, this property as “rebuilt-then-called”. The key idea of BinUnpack is 77, 81] or memory pages [28, 31, 49, 56]. However, no silver bullet to capture such a “rebuilt-then-called” feature instead of “written- can sharply determine the end of multi-layer unpacking because it then-executed” behavior. To this end, BinUnpack hooks API calls has been proven to be an undecidable problem [83] 1. The existing and finds the first one whose related IAT is rebuilt at run time. Then, approaches have to continuously monitor all possible “written- tracing back from that API, BinUnpack is able to locate OEP within then-executed” layers and detect the existence of original code a very short distance. Compared to the existing work, BinUnpack in a certain layer with several heuristics [28, 35, 37, 55].