Dynamic Binary Analysis and Instrumentation
Total Page:16
File Type:pdf, Size:1020Kb
UCAM-CL-TR-606 Technical Report ISSN 1476-2986 Number 606 Computer Laboratory Dynamic binary analysis and instrumentation Nicholas Nethercote November 2004 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 2004 Nicholas Nethercote This technical report is based on a dissertation submitted November 2004 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Trinity College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/TechReports/ ISSN 1476-2986 Abstract Dynamic binary analysis (DBA) tools such as profilers and checkers help programmers create better software. Dynamic binary instrumentation (DBI) frameworks make it easy to build new DBA tools. This dissertation advances the theory and practice of dynamic binary analysis and instrumentation, with an emphasis on the importance of the use and support of metadata. The dissertation has three main parts. The first part describes a DBI framework called Valgrind which provides novel features to support heavyweight DBA tools that maintain rich metadata, especially location metadata| the shadowing of every register and memory location with a metavalue. Location metadata is used in shadow computation, a kind of DBA where every normal operation is shadowed by an abstract operation. The second part describes three powerful DBA tools. The first tool performs detailed cache profiling. The second tool does an old kind of dynamic analysis|bounds-checking|in a new way. The third tool produces dynamic data flow graphs, a novel visualisation that cuts to the essence of a program's execution. All three tools were built with Valgrind, and rely on Valgrind's support for heavyweight DBA and rich metadata, and the latter two perform shadow computation. The third part describes a novel system of semi-formal descriptions of DBA tools. It gives many example descriptions, and also considers in detail exactly what dynamic analysis is. The dissertation makes six main contributions. First, the descriptions show that metadata is the key component of dynamic analysis; in particular, whereas static analysis predicts approximations of a program's future, dynamic analysis remembers approximations of a program's past, and these approximations are exactly what metadata is. Second, the example tools show that rich metadata and shadow computation make for powerful and novel DBA tools that do more than the traditional tracing and profiling. Third, Valgrind and the example tools show that a DBI framework can make it easy to build heavyweight DBA tools, by providing good support for rich metadata and shadow computation. Fourth, the descriptions are a precise and concise way of characterising tools, provide a directed way of thinking about tools that can lead to better implementations, and indicate the theoretical upper limit of the power of DBA tools in general. Fifth, the three example tools are interesting in their own right, and the latter two are novel. Finally, the entire dissertation provides many details, and represents a great deal of con- densed experience, about implementing DBI frameworks and DBA tools. 4 Contents 1 Introduction 11 1.1 Background . 11 1.1.1 Static Analysis vs. Dynamic Analysis . 11 1.1.2 Source Analysis vs. Binary Analysis . 12 1.1.3 Four Kinds of Program Analysis . 13 1.1.4 Static Binary Instrumentation vs. Dynamic Binary Instrumentation . 13 1.2 This Dissertation . 14 1.2.1 Dissertation Structure . 14 1.2.2 Contributions . 14 1.2.3 A Note About Implementations . 15 2 A Framework for Building Tools 17 2.1 Introduction . 17 2.1.1 Dynamic Binary Instrumentation Frameworks . 17 2.1.2 Overview of Valgrind . 17 2.1.3 Chapter Structure . 18 2.2 Using Valgrind . 18 2.3 How Valgrind Works: The Core . 19 2.3.1 Overview . 19 2.3.2 Definition of a Basic Block . 20 2.3.3 Resource Conflicts . 21 2.3.4 Starting Up . 22 2.3.5 Making Translations . 23 2.3.6 Executing Translations . 26 2.3.7 Floating Point, MMX and SSE Instructions . 26 2.3.8 Segment Registers . 27 2.3.9 Pthreads . 27 2.3.10 System Calls . 28 2.3.11 Signals . 29 2.3.12 Client Requests . 30 2.3.13 Self-modifying Code . 30 2.3.14 Memory Management . 30 2.3.15 Ensuring Correctness . 30 2.3.16 Termination . 31 2.3.17 Self-hosting . 31 2.4 How Valgrind Works: Tool Plug-ins . 31 5 2.4.1 An Example Tool: Memcheck . 31 2.4.2 Execution Spaces . 32 2.4.3 Tool Structure . 33 2.4.4 Shadow Computation . 34 2.4.5 Crucial Features . 35 2.5 Size of Core and Tool Plug-ins . 39 2.6 Performance . 39 2.7 Related Work . 41 2.7.1 Not Quite Dynamic Binary Analysis . 41 2.7.2 Not Quite Dynamic Binary Instrumentation . 42 2.7.3 Dynamic Binary Instrumentation Frameworks . 43 2.8 Conclusion . 53 3 A Profiling Tool 55 3.1 Introduction . 55 3.1.1 Profiling Tools . 55 3.1.2 Cache Effects . 55 3.1.3 Cache Profiling . 56 3.1.4 Overview of Cachegrind . 57 3.1.5 Chapter Structure . 57 3.2 Using Cachegrind . 57 3.3 How Cachegrind Works . 58 3.3.1 Metadata . 58 3.3.2 Instrumentation . 60 3.3.3 Code Unloading . 62 3.3.4 Output and Source Annotation . 62 3.3.5 Performance . 63 3.3.6 Useful Features . 64 3.3.7 Simulation Shortcomings . 64 3.3.8 Usability Shortcomings . 66 3.4 In Practice . 66 3.4.1 Language and Implementation . 66 3.4.2 Benchmark Suite . 67 3.4.3 Motivating Measurements . 67 3.4.4 Quantifying Cachegrind's Accuracy . 68 3.4.5 Use of Cachegrind . 69 3.4.6 Avoiding Data Write Misses . 69 3.5 Related Work . 70 3.6 Conclusion . 71 4 A Checking Tool 77 4.1 Introduction . 77 4.1.1 Checking Tools . 77 4.1.2 Bounds Errors . 77 4.1.3 Bounds-Checking . 78 4.1.4 Overview of Annelid . 78 4.1.5 Chapter Structure . 79 6 4.2 Using Annelid . 79 4.3 How Annelid Works: Design . 79 4.3.1 Overview . 80 4.3.2 Metadata . 80 4.3.3 Checking Accesses . 81 4.3.4 Life-cycle of a Segment . 82 4.3.5 Heap Segments . 82 4.3.6 Static Segments . 83 4.3.7 Stack Segments . 84 4.3.8 Shadow Computation Operations . 85 4.4 How Annelid Works: Implementation . 88 4.4.1 Metadata Representation . 88 4.4.2 Segment Structure Management . 89 4.4.3 Static Segments . 90 4.4.4 Stack Segments . 90 4.4.5 Range Tests . 91 4.4.6 Pointer Differences . 91 4.4.7 System Calls . ..