DOUBLETAKE: Fast and Precise Error Detection Via Evidence-Based Dynamic Analysis
Total Page:16
File Type:pdf, Size:1020Kb
2016 IEEE/ACM 38th IEEE International Conference on Software Engineering DOUBLETAKE: Fast and Precise Error Detection via Evidence-Based Dynamic Analysis Tongping Liu ∗ Charlie Curtsinger ∗ Emery D. Berger Dept. of Computer Science Dept. of Computer Science College of Information and University of Texas Grinnell College Computer Sciences at San Antonio 1116 8th Ave. University of Massachusetts San Antonio, TX 78249 Grinnell, IA 50112 Amherst [email protected] [email protected] Amherst, MA 01003 [email protected] ABSTRACT Categories and Subject Descriptors Programs written in unsafe languages like C and C++ often suffer D.2.5 [Software Engineering]: Testing and Debugging–Debugging from errors like buffer overflows, dangling pointers, and memory Aids, Monitors, Tracing; D.2.4 [Software Engineering]: Soft- leaks. Dynamic analysis tools like Valgrind can detect these er- ware/Program Verification–Reliability rors, but their overhead—primarily due to the cost of instrumenting every memory read and write—makes them too heavyweight for use in deployed applications and makes testing with them painfully Keywords slow. The result is that much deployed software remains suscepti- Dynamic Analysis, Software Quality, Testing, Debugging, Leak ble to these bugs, which are notoriously difficult to track down. Detection, Buffer Overflow Detection, Use-After-Free Detection This paper presents evidence-based dynamic analysis, an ap- proach that enables these analyses while imposing minimal over- 1. INTRODUCTION head (under 5%), making it practical for the first time to perform these analyses in deployed settings. The key insight of evidence- Dynamic analysis tools are widely used to find bugs in appli- based dynamic analysis is that for a class of errors, it is possible to cations. They are popular among programmers because of their ensure that evidence that they happened at some point in the past precision—for many analyses, they report no false positives—and remains for later detection. Evidence-based dynamic analysis al- can pinpoint the exact location of errors, down to the individual lows execution to proceed at nearly full speed until the end of an line of code. Perhaps the most prominent and widely used dynamic epoch (e.g., a heavyweight system call). It then examines program analysis tool for C/C++ binaries is Valgrind [28]. Valgrind’s most state to check for evidence that an error occurred at some time dur- popular use case, via its default tool, MemCheck, can find a wide ing that epoch. If so, it rolls back execution and re-executes the range of memory errors, including buffer overflows, use-after-free code with instrumentation activated to pinpoint the error. errors, and memory leaks. We present DOUBLETAKE, a prototype evidence-based dynamic Unfortunately, these dynamic analysis tools often impose sig- analysis framework. DOUBLETAKE is practical and easy to de- nificant performance overhead that precludes their use outside of ploy, requiring neither custom hardware, compiler, nor operating testing scenarios. An extreme example is the widely-used tool Val- system support. We demonstrate DOUBLETAKE’s generality and grind. Across the SPEC CPU2006 benchmark suite, Valgrind de- × efficiency by building dynamic analyses that find buffer overflows, grades performance by almost 17 on average (geometric mean); × × memory use-after-free errors, and memory leaks. Our evaluation its overhead ranges from 4.5 and 42.8 , making it often too slow shows that DOUBLETAKE is efficient, imposing under 5% over- to use even for testing (see Table 5). head on average, making it the fastest such system to date. It is While faster dynamic analysis frameworks exist for finding par- also precise: DOUBLETAKE pinpoints the location of these errors ticular errors (leveraging compiler support to reduce overhead), to the exact line and memory addresses where they occur, providing they sacrifice precision while continuing to impose substantial over- valuable debugging information to programmers. head that would impede their use in deployed settings. The current state-of-the-art, Google’s AddressSanitizer, detects buffer overflows and use-after-free errors, but slows applications by around 30% [38]. AddressSanitizer also identifies memory leaks but only at the end * This work was initiated and partially conducted while Liu and Curtsinger of program execution, which is not necessarily useful for servers or were PhD students at the University of Massachusetts Amherst. other long-lived applications. Because of their overhead, this class of dynamic analysis tools Permission to make digital or hard copies of all or part of this work for personal or can generally only be used during testing. However, they are lim- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation ited by definition to the executions that are tested prior to deploy- on the first page. Copyrights for components of this work owned by others than the ment. Even exhaustive testing regimes will inevitably fail to un- author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or cover these errors, which are notoriously difficult to debug. republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. This paper presents an approach called evidence-based dynamic ICSE ’16, May 14 - 22, 2016, Austin, TX, USA analysis that is based on the following key insight: it is often possi- ble to discover evidence that an error occurred or plant markers that © 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-3900-1/16/05. $15.00 ensure that such evidence exists. By combining evidence place- DOI: http://dx.doi.org/10.1145/2884781.2884784 ment with checkpointing and infrequent checking, we can run app- 911 plications at nearly full speed in the common case (no errors). If we find an error, we can use the checkpoint to roll back and re- execute the program with instrumentation activated to pinpoint the exact cause of the error. Certain errors, including the ones we describe here, naturally ex- hibit a monotonicity property: when an error occurs, evidence that it happened tends to remain or even grow so that it can be discov- ered at a later point during execution. When this evidence is not naturally occurring or not naturally monotonic, it can be forced to exhibit this property by planting what we call tripwires to ensure later detection. A canonical example of such a tripwire is a ran- dom canary value placed in unallocated space between heap ob- jects [11]. A corrupted canary is incontrovertible evidence that a Figure 1: Overview of DOUBLETAKE in action: execution is buffer overflow occurred at some time in the past. divided into epochs at the boundary of irrevocable system calls. This paper presents a prototype evidence-based dynamic analy- Each epoch begins by taking a snapshot of program state. Exe- sis framework called DOUBLETAKE that locates such errors with cution runs at nearly full-speed during epochs. Evidence-based extremely low overhead and no false positives. DOUBLETAKE analysis takes place once an epoch ends, replaying execution checkpoints program state and performs most of its error analyses from the previous snapshot until it pinpoints the exact location only at epoch boundaries (what we call irrevocable system calls) or where the error is introduced. Relatively-long epochs amortize when segfaults occur; these occur relatively infrequently, amortiz- the cost of snapshots and analysis, keeping overhead low. ing DOUBLETAKE’s overhead. If DOUBLETAKE finds evidence of an error at an epoch bound- ary or after a segmentation violation, it re-executes the application Outline from the most recent checkpoint. During re-execution, DOUBLE- This paper first provides an overview of the basic operation of TAKE enables instrumentation to let it precisely locate the source DOUBLETAKE in Section 2. Section 3 details the dynamic anal- of the error. For example, for buffer overflows, DOUBLETAKE sets yses we have built using DOUBLETAKE. Section 4 describes key hardware watchpoints on the tripwire memory locations that were implementation details. Section 5 evaluates DOUBLETAKE’s effec- found to be corrupted. During re-execution, DOUBLETAKE pin- tiveness, performance, and memory overhead, and compares these points exactly the point where the buffer overflow occurred. to the state of the art. Section 6 describes how users can add new We have implemented DOUBLETAKE as a drop-in library that analyses to DOUBLETAKE, and discusses limitations of evidence- can be linked directly with the application, without the need to based analysis and the detectors implemented to date. Section 7 modify code or even recompile the program. DOUBLETAKE works describes key related work and Section 8 concludes. without the need for custom hardware, compiler, or OS support. Using DOUBLETAKE as a framework, we have built three dif- 2. OVERVIEW ferent analyses that attack three of the most salient problems for DOUBLETAKE is an efficient dynamic analysis framework for a unsafe code: a buffer overflow detector (as described above), a class of errors that exhibit or can be forced to exhibit a monotonic- use-after-free detector, and a memory leak detector. These anal- ity property: evidence of the error is persistent and can be gathered yses can all run concurrently. By virtue of being evidence-based, after-the-fact. With DOUBLETAKE, program execution is divided they have a zero false positive rate, precisely pinpoint the error lo- into epochs, during which execution proceeds at full speed (Fig- cation, and operate with extremely low overhead: for example, with ure 1). At the beginning of an epoch, DOUBLETAKE checkpoints DOUBLETAKE, buffer overflow analysis alone operates with virtu- program state. Epochs end only when the application issues an irre- ally no overhead. When all three of these analyses are enabled, vocable system call (e.g., a socket read); most system calls are not DOUBLETAKE’s average overhead is under 5%.