Efficient, Transparent, and Comprehensive Runtime Code
Total Page:16
File Type:pdf, Size:1020Kb
Efficient, Transparent, and Comprehensive Runtime Code Manipulation by Derek L. Bruening Bachelor of Science, Computer Science and Engineering Massachusetts Institute of Technology, 1998 Master of Engineering, Electrical Engineering and Computer Science Massachusetts Institute of Technology, 1999 Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2004 c Massachusetts Institute of Technology 2004. All rights reserved. Author Department of Electrical Engineering and Computer Science September 24, 2004 Certified by Saman Amarasinghe Associate Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by Arthur C. Smith Chairman, Departmental Committee on Graduate Students 2 Efficient, Transparent, and Comprehensive Runtime Code Manipulation by Derek L. Bruening Submitted to the Department of Electrical Engineering and Computer Science on September 24, 2004 in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamically- generated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every instruction — which is critical for program analysis, instrumentation, trace gathering, optimization, and similar tools — it can now only be done at runtime. Existing run- time tools are successful at inserting instrumentation calls, but no general framework has been developed for fine-grained and comprehensive code observation and modification without high overheads. This thesis demonstrates the feasibility of building such a system in software. We present Dy- namoRIO, a fully-implemented runtime code manipulation system that supports code transforma- tions on any part of a program, while it executes. DynamoRIO uses code caching technology to provide efficient, transparent, and comprehensive manipulation of an unmodified application running on a stock operating system and commodity hardware. DynamoRIO executes large, com- plex, modern applications with dynamically-loaded, generated, or even modified code. Despite the formidable obstacles inherent in the IA-32 architecture, DynamoRIO provides these capabilities efficiently, with zero to thirty percent time and memory overhead on both Windows and Linux. DynamoRIO exports an interface for building custom runtime code manipulation tools of all types. It has been used by many researchers, with several hundred downloads of our public release, and is being commercialized in a product for protection against remote security exploits, one of numerous applications of runtime code manipulation. Thesis Supervisor: Saman Amarasinghe Title: Associate Professor of Electrical Engineering and Computer Science 4 Acknowledgments I thank my advisor, Professor Saman Amarasinghe, for his boundless energy, immense time com- mitment to his students, accessibility, and advice. Thanks also go to my readers, Professors Martin Rinard, Frans Kaashoek, and Arvind, for their time and insightful feedback. Several students at MIT contributed significantly to the DynamoRIO project. I would like to thank Vladimir Kiriansky for his fresh insights and competent approach. He was the primary inven- tor of program shepherding. He also came up with and implemented our open-address hashtable, and proposed our current method of performing arithmetic operations without modifying the con- dition codes. Thank-yous go also to Timothy Garnett for his contributions to our dynamic opti- mizations and to the interpreter project, as well as his help with the DynamoRIO release; to Iris Baron and Greg Sullivan, who spearheaded the interpreter project; and to Josh Jacobs, for imple- menting hardware performance counter profiling for DynamoRIO. I would like to thank the researchers at Hewlett-Packard's former Cambridge laboratory. Dy- namoRIO's precursor was the Dynamo dynamic optimization system [Bala et al. 2000], developed at Hewlett-Packard. Vas Bala and Mike Smith built an initial dynamic optimization framework on IA-32, while Giuseppe Desoli gave valuable early advice on Windows details. I would like to especially thank Evelyn Duesterwald and Josh Fisher for facilitating the source code contract be- tween MIT and Hewlett-Packard that made the DynamoRIO project possible. Evelyn gave support throughout and was instrumental in enabling us to release our work to the public as part of the Hewlett-Packard-MIT Alliance. I also extend thanks to Sandy Wilbourn and Nand Mulchandani, for understanding and aiding my return to school to finish my thesis. Last but not least, special thanks to my wife, Barb, for her innumerable suggestions, invaluable help, homemade fruit pies, and moral support. This research was supported in part by Defense Advanced Research Projects Agency awards DABT63-96-C-0036, N66001-99-2-891702, and F29601-01-2-0166, and by a grant from LCS Project Oxygen. 5 6 Contents 1 Introduction 21 1.1 Goals . 22 1.2 DynamoRIO . 24 1.3 Contributions . 26 2 Code Cache 29 2.1 Basic Blocks . 31 2.2 Linking . 34 2.3 Traces . 37 2.3.1 Trace Shape . 39 2.3.2 Trace Implementation . 43 2.3.3 Alternative Trace Designs . 50 2.4 Eliding Unconditional Control Transfers . 53 2.4.1 Alternative Super-block Designs . 57 2.5 Chapter Summary . 57 3 Transparency 59 3.1 Resource Usage Conflicts . 60 3.1.1 Library Transparency . 60 3.1.2 Heap Transparency . 62 3.1.3 Input/Output Transparency . 62 3.1.4 Synchronization Transparency . 62 3.2 Leaving The Application Unchanged When Possible . 63 3.2.1 Thread Transparency . 63 7 3.2.2 Executable Transparency . 64 3.2.3 Data Transparency . 64 3.2.4 Stack Transparency . 64 3.3 Pretending The Application Is Unchanged When It Is Not . 65 3.3.1 Cache Consistency . 65 3.3.2 Address Space Transparency . 65 3.3.3 Application Address Transparency . 66 3.3.4 Context Translation . 66 3.3.5 Error Transparency . 67 3.3.6 Timing Transparency . 69 3.3.7 Debugging Transparency . 69 3.4 Chapter Summary . 70 4 Architectural Challenges 71 4.1 Complex Instruction Set . 71 4.1.1 Adaptive Level-of-Detail Instruction Representation . 72 4.1.2 Segments . 76 4.1.3 Reachability . 79 4.2 Return Instruction Branch Prediction . 79 4.2.1 Code Cache Return Addresses . 80 4.2.2 Software Return Stack . 83 4.3 Hashtable Lookup . 87 4.3.1 Indirect Jump Branch Prediction . 87 4.3.2 Lookup Routine Optimization . 89 4.3.3 Data Cache Pressure . 92 4.4 Condition Codes . 93 4.5 Instruction Cache Consistency . 101 4.5.1 Proactive Linking . 102 4.6 Hardware Trace Cache . 103 4.7 Context Switch . 104 8 4.8 Re-targetability . 104 4.9 Chapter Summary . 105 5 Operating System Challenges 107 5.1 Target Operating Systems . 108 5.1.1 Windows . 108 5.1.2 Linux . 110 5.2 Threads . 111 5.2.1 Scratch Space . 111 5.2.2 Thread-Local State . 115 5.2.3 Synchronization . 116 5.3 Kernel-Mediated Control Transfers . 117 5.3.1 Callbacks . 118 5.3.2 Asynchronous Procedure Calls . 123 5.3.3 Exceptions . 124 5.3.4 Other Windows Transfers . 127 5.3.5 Signals . 127 5.3.6 Thread and Process Creation . 133 5.3.7 Inter-Process Communication . 135 5.3.8 Systematic Code Discovery . 135 5.4 System Calls . 136 5.5 Injection . 138 5.5.1 Windows . 139 5.5.2 Linux . 140 5.6 Chapter Summary . 140 6 Memory Management 143 6.1 Storage Requirements . 143 6.1.1 Cache Management Challenges . 144 6.1.2 Thread-Private Versus Shared . 145 6.2 Code Cache Consistency . 146 9 6.2.1 Memory Unmapping . 147 6.2.2 Memory Modification . 148 6.2.3 Self-Modifying Code . 150 6.2.4 Memory Regions . 153 6.2.5 Mapping Regions to Fragments . 154 6.2.6 Invalidating Fragments . 156 6.2.7 Consistency Violations . 157 6.2.8 Non-Precise Flushing . 158 6.2.9 Impact on Cache Capacity . ..