Towards Efficient Heap Overflow Discovery
Total Page:16
File Type:pdf, Size:1020Kb
Towards Efficient Heap Overflow Discovery Xiangkun Jia, TCA/SKLCS, Institute of Software, Chinese Academy of Sciences; Chao Zhang, Institute for Network Science and Cyberspace, Tsinghua University; Purui Su, Yi Yang, Huafeng Huang, and Dengguo Feng, TCA/SKLCS, Institute of Software, Chinese Academy of Sciences https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/jia This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX Towards Efficient Heap Overflow Discovery Xiangkun Jia1;3, Chao Zhang2 , Purui Su1;3 , Yi Yang1, Huafeng Huang1, Dengguo Feng1 1TCA/SKLCS, Institute of Software, Chinese Academy of Sciences 2Institute for Network Science and Cyberspace 3University of Chinese Academy of Sciences Tsinghua University {jiaxiangkun, yangyi, huanghuafeng, feng}@tca.iscas.ac.cn [email protected] [email protected] Abstract to attack. As the heap layout is not deterministic, heap Heap overflow is a prevalent memory corruption vulner- overflow vulnerabilities are in general harder to exploit ability, playing an important role in recent attacks. Find- than stack corruption vulnerabilities. But attackers could ing such vulnerabilities in applications is thus critical for utilize techniques like heap spray [16] and heap feng- security. Many state-of-art solutions focus on runtime shui [43] to arrange the heap layout and reliably launch detection, requiring abundant inputs to explore program attacks, making heap overflow a realistic threat. paths in order to reach a high code coverage and luckily Several solutions are proposed to protect heap overflow trigger security violations. It is likely that the inputs being from being exploited, e.g., Diehard [4], Dieharder [34], tested could exercise vulnerable program paths, but fail Heaptherapy [52] and HeapSentry [33]. In addition to to trigger (and thus miss) vulnerabilities in these paths. runtime overheads, they also cause denial of service, be- Moreover, these solutions may also miss heap vulnerabil- cause they will terminate the process when an attack is ities due to incomplete vulnerability models. detected. So it is imperative to discover and fix heap over- In this paper, we propose a new solution HOTracer to flows in advance. discover potential heap vulnerabilities. We model heap In general, both static analysis and dynamic analysis overflows as spatial inconsistencies between heap allo- can be used to detect heap vulnerabilities. But static anal- cation and heap access operations, and perform an in- ysis solutions (e.g., [21, 36]) usually have high false pos- depth offline analysis on representative program execu- itives, and are only fit for small programs. In addition tion traces to identify heap overflows. Combining with to its intrinsic challenge (i.e., alias analysis), static anal- several optimizations, it could efficiently find heap over- ysis may generate false positives because the heap layout flows that are hard to trigger in binary programs. We im- is not deterministic [27]. But for any specific execution, plemented a prototype of HOTracer, evaluated it on 17 the spatial relationships between heap objects are deter- real world applications, and found 47 previously unknown ministic. So, it’s easier and more reliable to use dynamic heap vulnerabilities, showing its effectiveness. analysis to detect heap vulnerabilities. Online dynamic analysis is the mostly used state-of- 1 Introduction art heap vulnerability detection solutions. In general, Memory corruption vulnerabilities are the root cause of they monitor target programs’ runtime execution (e.g., many severe threats, including control flow hijacking and by tracking some metadata), and detect vulnerabilities by information leakage attacks. Among them, stack corrup- checking for security violations or program crashes. For tion vulnerabilities used to be the most popular ones. As example, AddressSanitizer [40] creates redzones around effective defenses [3, 12, 15, 19, 24, 35, 46, 47] against objects and tracks addressable bytes at run time, and de- stack corruption are deployed gradually, nowadays heap tects heap overflows when unaddressable redzone bytes overflow vulnerabilities become more popular. For ex- are accessed. Fuzzers (e.g., AFL [51]) test target pro- ample, it is reported that about 25% of exploits against grams with abundant inputs and report vulnerabilities Windows 7 utilized heap corruption vulnerabilities [28]. when crashes are found during testing. There are a lot of sensitive data stored in the heap, in- These solutions are widely adopted by industry to find cluding heap management metadata associated with heap vulnerabilities in their products. However, they all work objects (e.g., size attributes, and linked list pointers), and in a passive way and could miss vulnerabilities. To re- sensitive pointers within heap objects (e.g., pointers for port a vulnerability, they expect a testcase to exercise a virtual function calls). It makes the heap a valuable target vulnerable path and trigger a security violation. Even if a USENIX Association 26th USENIX Security Symposium 989 passive solution could generate a bunch of inputs to reach of-concept) input for the potential vulnerability. a high code coverage and could catch all security viola- In this way, our solution could discover potential vul- tions, e.g., by combining AFL and AddressSanitizer, it nerabilities that may be missed by existing online and of- could still miss vulnerabilities. For example, it may gen- fline solutions. For online solutions, e.g., AFL and Ad- erate a bunch of inputs to exercise a vulnerable path, but dressSanitizer, they rely on delicate inputs to trigger the fail to trigger the vulnerability in that path due to some vulnerabilities. Our solution could work fine as long as critical vulnerability conditions. the inputs could exercise any heap allocation and heap ac- Moreover, multiple vulnerabilities may exist in one cess operations. program path. Passive solutions (e.g., fuzzers) may only On the other hand, a combination of existing offline so- focus on the first one and miss the others. For example, lutions, e.g., DIODE and Dowser, seems to be able to when analyzing a known vulnerability CVE-2014-1761 achieve the same goal as HOTracer. However, the com- in Microsoft Word with our tool HOTracer, we found bination is incomplete. DIODE only considers heap allo- two new heap overflows, in the exact same program path cation vulnerable to integer overflows, and Dowser only which we believe many researchers have analyzed many focuses on heap accesses via loops. Moreover, to make times. It shows that, even for a vulnerable path in the spot- the combination practical and efficient in the real world, light, online solutions could not guarantee to find out all we have to solve several challenges. potential vulnerabilities in it. First, there could be numerous execution traces to an- On the other hand, offline analysis solutions could ex- alyze. Since recording and analyzing a trace is time- plore each program path thoroughly and discover poten- consuming, we could not aim for a high code cover- tial heap vulnerabilities in a more proactive way, e.g., by age. Instead, we analyze programs with representative use reasoning about the relationship between program inputs cases, and explore significantly different program paths. and candidate vulnerable code locations. For example, Second, we need to identify all heap operations from DIODE [41] focuses on memory allocation sites vulner- the huge execution traces (without source code). Even able to integer overflow which will further lead to heap worse, many programs utilize custom memory allocators overflow, and then infers and reasons about constraints and custom memory accesses. HOTracer utilizes a set of the memory allocation size to discover vulnerabilities. of features to identify potential heap operations. More- Dowser [23] and BORG [31] focus on memory accesses over, we need to group related heap allocation and heap sites that are vulnerable to heap overflow, and guide sym- access operations that operate on same heap objects into bolic execution engine to explore suspicious buffer ac- pairs. But the number of such pairs is extraordinary large. cessing instructions. However, neither of these solutions HOTracer reduces the number of pairs by promoting low- accurately model the root cause of heap overflow, and thus level heap access instructions into high-level heap access will miss many heap overflow vulnerabilities. operations, and prioritizes pairs to explore pairs that are We point out that the root cause of heap overflow vul- more likely to be vulnerable. nerabilities is not the controllability of either memory al- Finally, it is challenging to generate concrete inputs to location or memory access, but the spatial inconsistency trigger the potential vulnerability in a specific heap oper- between heap allocation and heap access operations. For ation pair, especially in large real-world applications, due example, if a program first allocates a buffer of size x+2, to the program trace size and constraint complexity. HO- and then writes x + 1 bytes into it, a heap overflow will Tracer mitigates this issue by collecting only partial traces happen if attackers make x+2 integer overflows but x+1 and concretizes inputs that do not affect the vulnerability not. It is nearly impossible to identify this heap over- conditions.