Practical Low-Overhead Enforcement of Memory Safety for C Programs
Total Page:16
File Type:pdf, Size:1020Kb
PRACTICAL LOW-OVERHEAD ENFORCEMENT OF MEMORY SAFETY FOR C PROGRAMS Santosh Ganapati Nagarakatte A DISSERTATION in Computer and Information Science Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 2012 Milo M. K. Martin, Associate Professor of Computer and Information Science Supervisor of Dissertation Val Tannen, Professor of Computer and Information Science Graduate Group Chairperson Dissertation Committee Rajeev Alur, Professor of Computer and Information Science Andre´ DeHon, Professor of Electrical and System Engineering Steve Zdancewic, Associate Professor of Computer and Information Science Emery Berger, Associate Professor, University of Massachusetts Amherst PRACTICAL LOW-OVERHEAD ENFORCEMENT OF MEMORY SAFETY FOR C PROGRAMS COPYRIGHT 2012 Santosh Ganapati Nagarakatte This dissertation is dedicated to my parents. Without them, this would not have been possible. iii Acknowledgments This dissertation is a direct result of constant support and encouragement from my parents who had more confidence in my abilities than I had, at times, in my ability. Apart from my parents, there are numerous people who have been instrumental in the growth of my research career and my development as an individual. My adviser, Milo Martin has had a transformative influence on me as a researcher. I am fortunate to have worked with him for the last five years. Milo provided me the initial insights, the motivation to work on the problem, and eventually has nourished my ideas. He was generous with his time and wisdom. He provided me an excellent platform where I could excel. Apart from the research under him, he gave me freedom to collaborate with other researchers independently. I have also learned a great deal about teaching, presentations, and mentoring that will be amazingly useful in my future career. Among the other faculty members at Penn, Steve Zdancewic was like a second adviser to me. Collaborating with Steve kept my programming language interests active and thriving. Amir Roth convinced me to join Penn. I had a great time working with Amir for the first two years. I would like to thank other faculty members—Rajeev Alur, Benjamin Pierce, Boon Thau Loo, Andreas Haeberlen, and Sudipto Guha—for their great courses, advice, encouragement and discussions on various topics. I would like to thank my committee—Rajeev Alur, Steve Zdancewic, Andre DeHon, and Emery Berger—for carefully reading my dissertation, providing great feedback, useful advice and sugges- tions along the way. I would also like to thank Madanlal Musuvathi and Sebastian Burckhardt for mentoring during my internship at MSR and for the subsequent collaboration over the last few years. I would also like to thank my adviser at IISc, R. Govindarajan for encouraging me to pursue this path in my initial years. iv Jianzhou Zhao and Andrew Hilton have been two of my closest collaborators. I learned a great deal about computer architecture, programming, and a passion for teaching from Drew. Jianzhou has been my primary collaborator from my first day at Penn. We did course projects, research projects, and papers together. He inspired and amazed me with his devotion to work, work ethics, and the constant focus on the task at hand. Other members of the safety project—Jianzhou Zhao, Christian Delozier, Peter Michael Osera and Richard Eisenberg—have provided me help, encour- agement, and have spent time polishing and refining ideas together. I also thank the past and the present members of ACG group—Tingting Sha, Andrew Hilton, Colin Blundell, Vivek Rane, Arun Raghavan, Christian Delozier, Sela Mador Haim, Abhishek Udupa, James Anderson, and Laurel Emurian—for the interesting reading groups, numerous dis- cussions, and for attending my practice talks. I have had many amazing friends here, who made graduate school pleasant. Numerous dis- cussions with Arun Raghavan on a wide variety of topics—cricket, politics, books, research, and many others—was stimulating and fun. I also enjoyed spending time with other friends—Adam Aviv, Michael Greenberg, Marie Jacob, Changbin Liu, Annie Louis, Emily Pitler, Sudeepa Roy, and Wenchao Zhao. Thank you all. I want to thank the LLVM community for building a robust and a modular compiler, answering our questions, and for the entire ecosystem. I also thank Mike Felker, who single handedly managed numerous things for me and many others in the department. I also want to thank all other faculty members at Penn, other students at Penn, Moore business office, and the CIS departmental staff for the various activities at Penn. I also want to thank Penn recreation for the great squash courts and my numerous squash partners: Rick Titlebaum, Richard Burgos, John Hansen, Vivek Rane, Riju Ray, Shivjit Patil, Christina Castelino, and many others. I had a great time playing squash with you all. The dissertation research at Penn was also funded in part from grants by numerous fund- ing agencies1: National Science Foundation (NSF) grants CNS-0524059, CCF-0644197, CNS- 1116682, CCF-1065166, and CCF-0810947, Defense Advanced Research Projects Agency (DARPA) 1This research was funded in part by the U.S. Government. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. v contracts HR0011-10-9-0008, Office of Naval Research (ONR) award N000141110596, and dona- tions from Intel Corporation. Finally, I would like to thank my family for all their sacrifices, motivation, and encouragement. I would like to thank my parents for providing me good education throughout my childhood. vi ABSTRACT PRACTICAL LOW-OVERHEAD ENFORCEMENT OF MEMORY SAFETY FOR C PROGRAMS Santosh Ganapati Nagarakatte Milo M. K. Martin The serious bugs and security vulnerabilities that result from C’s lack of bounds checking and unsafe manual memory management are well known, yet C remains in widespread use. Unfortunately, C’s arbitrary pointer arithmetic, conflation of pointers and arrays, and programmer-visible memory layout make retrofitting C with memory safety guarantees challenging. Existing approaches suffer from incompleteness, have high runtime overhead, or require non-trivial changes to the C source code. Thus far, these deficiencies have prevented widespread adoption of such techniques. This dissertation proposes mechanisms to provide comprehensive memory safety that works with mostly unmodified C code with a low performance overhead. We use a pointer-based ap- proach where we maintain metadata with pointers and check every pointer dereference. To enable compatibility with existing code, we maintain the metadata for the pointers in memory in a disjoint metadata space leaving the memory layout of the program intact. For detecting spatial violations, we maintain bounds metadata with every pointer. For detecting temporal violations, we maintain a unique identifier with each pointer. This pointer metadata is propagated with pointer operations and checked on pointer dereferences. Coupling disjoint metadata with a pointer-based approach enables comprehensive detection of all memory safety violations in unmodified C programs. This dissertation demonstrates the compatibility of this approach by hardening legacy C/C++ code with minimal source code changes. Further, this dissertation shows the effectiveness of the approach by detecting new memory safety errors and previously known memory safety errors in large code bases. To attain low performance overheads, this dissertation proposes efficient instantiations of this approach (1) within a compiler, (2) within hardware, and (3) with a hybrid hardware acceler- ated compiler instrumentation that reduces the overhead of enforcing memory safety, and thereby enabling their use in deployed systems. vii Contents 1 Introduction 1 1.1 Challenges with Existing Proposals . 4 1.2 Pointer-Based Checking with Disjoint Metadata . 5 1.3 Can Pointer-Based Checking be Performed within the Compiler? . 6 1.4 Can Pointer-Based Checking be Performed within the Hardware? . 7 1.5 Can Pointer-Based Checking be Performed with a Hardware/Compiler Hybrid? . 8 1.6 Contributions of this Dissertation . 9 1.7 Dissertation Structure . 10 1.8 Differences from Previously Published Versions of this Work . 11 2 Overview of Memory Safety Enforcement 12 2.1 The Problem of Memory Safety in C . 13 2.1.1 Spatial and Temporal Safety Violations . 13 2.1.2 Why are Memory Safety Violations Common with C? . 14 2.1.3 Consequences of Memory Safety Violations . 15 2.2 Enforcing Memory Safety . 16 2.2.1 Relationship Between Type Safety and Memory Safety . 16 2.2.2 Memory Safety with Safe Languages . 17 2.2.3 Design Alternatives in Enforcing Memory Safety for C . 17 2.2.4 Bug Finding vs Always-on Dynamic Checking Tools . 18 2.3 Detecting Spatial Safety Violations . 19 2.3.1 Tripwire Approaches . 19 2.3.2 Object-Based Approaches . 20 viii 2.3.3 Pointer-Based Approaches . 22 2.3.4 Source Incompatibility with Fat Pointers and Arbitrary Type Casts . 23 2.3.5 Comparison of Spatial Safety Approaches . 24 2.4 Detecting Temporal Safety Violations . 26 2.4.1 Garbage Collection in Safe Languages . 26 2.4.2 Garbage Collection in Type-Unsafe Languages like C . 27 2.4.3 Location-Based Temporal Checking . 28 2.4.4 Identifier-Based Temporal Checking . 29 2.4.5 Analysis of the Temporal Checking Design Space . 30 2.5 Program Instrumentation . 31 2.6 Summary . 32 3 Pointer-Based Checking with Disjoint Metadata 34 3.1 Approach Overview . 34 3.2 Metadata with Pointers . 35 3.3 Spatial Memory Safety Metadata and Checking . 36 3.4 Temporal Memory Safety Metadata and Checking . 37 3.4.1 Lock and Key Metadata . 37 3.4.2 Temporal Safety Checks . 40 3.5 Control Flow Integrity Checks . 40 3.6 Propagation on Pointer Arithmetic and Assignment . 41 3.7 Optional Narrowing of Pointer Bounds . 42 3.8 Disjoint Metadata . 43 3.8.1 Mapping Pointers to their Metadata .