Safe and Efficient Execution of LLVM-Based Languages
Total Page:16
File Type:pdf, Size:1020Kb
Submitted by Dipl.-Ing. Manuel Rigger, M.Phil. Submitted at Institute for System Software Supervisor and First Examiner o.Univ.-Prof. Dipl.-Ing. Dr. Dr.h.c. Safe and Efficient Hanspeter M¨ossenb¨ock Second Examiner Reader Execution of Dr. Cristian Cadar LLVM-based Languages October 2018 Doctoral Thesis to obtain the academic degree of Doktor der technischen Wissenschaften in the Doctoral Program Technische Wissenschaften JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, Osterreich¨ www.jku.at DVR 0093696 ii Statutory Declaration I hereby declare that the thesis submitted is my own unaided work, that I have not used other than the sources indicated, and that all direct and indirect sources are acknowledged as references. This printed thesis is identical with the electronic version submitted. Linz, October 29, 2018 iii iv STATUTORY DECLARATION Abstract In unsafe languages like C/C++, errors such as buffer overflows cause Un- defined Behavior. Typically, compilers handle undefined behavior in some arbitrary way, for example, they disregard it when optimizing and omit in- serting checks that could detect it. Consequently, undefined behavior often results in hard-to-find bugs and security vulnerabilities, for example, when a buffer overflow corrupts memory. Existing bug-finding tools that instrument code to detect undefined be- havior often suffer from compilers that possibly optimize code so that errors are no longer detected. Alternatively, unsafe code could be rewritten in a safe language like Java, which is well defined and where such errors are de- tected. However, this would incur an infeasible-high cost for many projects. To tackle undefined behavior, we came up with an approach to exe- cute unsafe languages on the Java Virtual Machine. We implemented this approach as Safe Sulong, a system that includes an interpreter for unsafe languages, which is written in Java. By relying on Java’s well-definedness and its automatic run-time checks, the interpreter can detect buffer over- flows and other errors during its execution and can terminate the program in such cases. Safe Sulong tracks metadata such as types and object bounds, which we provide to programmers over an introspection interface, so that they can use this data to mitigate errors and to implement additional checks. The interpreter also supports unstandardized elements in C code such as the most common inline assembly and GCC builtins. To implement them, we first studied their usage in a large number of open-source projects. Sulong is used in GraalVM, a commercially-used multi-lingual virtual machine. Since Sulong allows the implementation of efficient native function interfaces, our safe execution mechanism could also make the execution of native extensions of other languages such as Ruby, Python, and R safer. v vi ABSTRACT Kurzfassung In unsicheren Sprachen wie C/C++ f¨uhrenFehler wie Puffer¨uberl¨aufezu undefiniertem Verhalten, das von Compilern in der Regel auf undefinierte Weise behandelt wird. Sie ignorieren es zum Beispiel bei Optimierungen und f¨uhren keine Uberpr¨ufungen¨ durch, die es erkennen k¨onnten. Das f¨uhrt oft zu schwer lokalisierbaren Fehlern und Sicherheitsl¨ucken, beispielsweise wenn ein Puffer¨uberlauf Daten im Speicher zerst¨ort. Vorhandene Fehlererkennungswerkzeuge, die das Programm zum Erken- nen von undefiniertem Verhalten instrumentieren, benutzen oft Compiler, die den Code so optimieren, dass Fehler nicht mehr erkannt werden. Al- ternativ k¨onnten unsichere Programme in einer sicheren Sprache wie Java neu geschrieben werden, die vollst¨andigdefiniert und sicher ist. Dies w¨urde jedoch f¨urviele Projekte zu untragbar hohen Kosten f¨uhren. Um undefiniertes Verhalten in den Griff zu bekommen, haben wir einen Ansatz entwickelt, bei dem unsichere Sprachen auf der Java Virtual Ma- chine ausgef¨uhrtwerden. Der Ansatz wurde unter dem Namen Safe Su- long implementiert, einem System, das auf einem in Java geschriebenen Interpreter f¨urunsichere Sprachen basiert. Indem der Interpreter auf die Wohldefiniertheit und die automatischen Laufzeitpr¨ufungen von Java ver- traut, kann er Puffer¨uberl¨aufeund andere Fehler zur Laufzeit erkennen und das Programm in solchen F¨allenabbrechen. Safe Sulong merkt sich Meta- daten wie Typen und Objektgrenzen, die Programmierern ¨uber eine Intro- spektionsschnittstelle zur Verf¨ugunggestellt werden, mit denen sie Fehler abfangen und zus¨atzliche Pr¨ufungenimplementieren k¨onnen. Der Inter- preter unterst¨utztauch unstandardisierte Elemente in C-Code wie g¨angige Inline-Assembly-Codest¨ucke und GCC-Builtins. Zu diesem Zweck haben wir zun¨achst die Verwendung solcher Elemente in einer großen Anzahl von Open-Source-Projekten untersucht. vii viii KURZFASSUNG Sulong wird in der GraalVM verwendet, einer kommerziellen, mehrsprachi- gen virtuellen Maschine. Sulong erlaubt die Definition von Schnittstellen zu nativem Code, wodurch unsere Sicherheitsmechanismen auch verwendet werden k¨onnten um die Ausf¨uhrungnativer Erweiterungen anderer Sprachen wie Ruby, Python und R sicherer zu machen. Acknowledgments This thesis would not have been possible without the help of many people. First and foremost, I want to thank my advisor Hanspeter M¨ossenb¨ock, for being an excellent mentor, for his constant feedback and support, for allowing me to freely follow my research interests, and for maintaining a successful and mutually-beneficial collaboration with Oracle Labs. I am thankful to Cristian Cadar for the time and effort he put into evaluating this work and providing feedback. I also want to thank the other members of the dissertation jury for their time, namely Armin Biere and Paul Gr¨unbacher. I sincerely thank Stefan Marr for teaching me much about academia and for his mentoring, which helped me to develop myself during the second half of my PhD. I want to thank Matthias Grimmer who mentored me and taught me the essential academic skills during thefirst year of my PhD. I am thankful to Ren´eMayrhofer for his feedback from the viewpoint of a security researcher. I am grateful to Stephen Kell for his insightful feedback on my work, and Bram Adams for his help with improving the study on the use of GCC builtins. I want to thank Ingrid Abfalter for correcting my papers and improving my writing. I would like to express my gratitude to Thomas W¨urthinger for funding my research, and his trust for letting me develop an important puzzle piece of the Graal project. I am thankful to all of the Oracle Labs employees who worked together with me, shared their knowledge, helped me debug issues, gave feedback on my work, and developed the Sulong research prototype into a product. I especially want to thank Roland Schatz, Matthias Grimmer, Lukas Stadler, Chris Seaton, and Christian Wimmer. I want to thank my fellow PhD students, Josef Eisl, David Leopoldseder, and Benoit Daloze for their feedback and help. I want to thank the students that worked on Sulong-related topics, all of which were excellent: Jacob ix x ACKNOWLEDGMENTS Kreindl, Daniel Pekarek, Thomas Pointhuber, and Raphael Mosaner. I am grateful for my family and friends. I want to thank Jin Ting for her love, and for her patience even when I worked on evenings and weekends. I want to thank my parents, sister, and grandparents for all their support. Contents Statutory Declaration iii Abstract v Kurzfassung vii Acknowledgments ix Contents x I Introduction and Overview 1 1 Introduction 3 1.1 Problem . ............. 3 1.2 State of the Art . .......... 4 1.3 Remaining Challenges . 8 1.4 Suggested Approach . 9 1.5 Contributions . 10 1.6 Limitations . 14 1.7 Project Context . 15 1.8 Research Context . 18 1.9 Outline . 20 2 Overview 21 2.1 Sulong . 21 2.2 Introspection . 29 2.3 Empirical Studies . 30 xi xii CONTENTS 3 Publications 35 3.1 Journal Papers . 35 3.2 Conference Papers . 35 3.3 Full Workshop Papers . 37 3.4 Other Publications . 37 3.5 Under Review . 38 Bibliography 38 II Publications 61 4 Native Sulong 63 5 Safe Sulong 75 6 Lenient C 91 7 Introspection 105 8 Context-aware Failure-oblivious Computing 137 9 A Study on the Use of Inline Assembly 153 10 A Study on the Use of GCC Builtins 171 III Future Work and Conclusion 185 11 Future Work 187 11.1 Safe Sulong . 187 11.2 Introspection . 191 11.3 Empirical Studies . 192 12 Conclusion 195 Part I Introduction and Overview 1 Chapter 1 Introduction 1.1 Problem In unsafe languages like C, the semantics of operations are only specified for valid input. For example, while dereferencing a valid pointer is well specified, the effect of dereferencing an out-of-bounds pointer, which is known as a buffer overflow, is not specified by the C standard.1 Such an error is said to cause Undefined Behavior. Compilers are not required to produce code that detects undefined behavior while the program executes, so they do not, for example, insert bounds checks to detect buffer overflows. In fact, state-of- the-art compilers such as Clang or GCC even optimize the program based on the assumption that undefined behavior never occurs [139]. Executing programs with undefined behavior that have been compiled by Clang or GCC can yield various unintended or even disastrous results. For example, out-of-bounds reads can result in sensitive data of adjacent objects being read, even when those objects were not explicitly referenced [125]. Ex- amples for disastrous out-of-bounds reads were Heartbleed in the OpenSSL SSL library [35] and Cloudbleed in the online service Cloudfare [46], which both allowed attackers to leak sensitive information on web servers, and af- fected millions of users. Attackers might not only exploit buffer overflows to read private data; out-of-bounds writes can be exploited to overwrite control- flow data, allowing attackers to divert the controlflow of the program and even to take control over the process [114, 18, 9, 132]. Furthermore, a pro- 1If not further specified, we refer to the ISO/IEC 9899:2018 standard, which is infor- mally known as C17 or C18.