A Framework for File Format Fuzzing with Genetic Algorithms

University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 5-2012 A Framework for File Format Fuzzing with Genetic Algorithms Roger Lee Seagle Jr. University of Tennessee - Knoxville, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Part of the Computer Sciences Commons Recommended Citation Seagle, Roger Lee Jr., "A Framework for File Format Fuzzing with Genetic Algorithms. " PhD diss., University of Tennessee, 2012. https://trace.tennessee.edu/utk_graddiss/1347 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Roger Lee Seagle Jr. entitled "A Framework for File Format Fuzzing with Genetic Algorithms." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, with a major in Computer Science. Michael D. Vose, Major Professor We have read this dissertation and recommend its acceptance: Itamar Arel, Bradley Vander Zanden, Sergey Gavrilets Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) A Framework for File Format Fuzzing with Genetic Algorithms A Thesis Presented for The Doctor of Philosphy Degree The University of Tennessee, Knoxville Roger Lee Seagle, Jr. May 2012 © by Roger Lee Seagle, Jr., 2012 All Rights Reserved. i Dedication To my wife, Mindy and my son, Bodhi, with your constant love and inspiration, you put my dreams within reach. ii Acknowledgements Over the course of my academic career, I have had the distinct pleasure of being enlightened by a prestigious group of academics. From this group, several individuals have fostered an everlasting love for computer science and made a significant impact upon my career. For their infinite erudition, I owe a debt of gratitude. First and foremost, I am grateful for my academic advisor, Dr. Michael D. Vose. His guidance, numerous reviews, and wealth of knowledge shaped my vision into fruition. He demonstrated a mastery of knowledge and ingenuity to which I can only strive to procure. I am also grateful for my committee, Dr. Itamar Arel, Dr. Tom Dunigan, Dr. Sergey Gavrilets, and Dr. Bradley Vander Zanden, for their advice and feedback which greatly refined my research. While my professors from my graduate institution have been a significant influence, I cannot omit professors from my undergraduate institution, Wake Forest University. They are responsible for planting the seed that culminated in this dissertation. I am thankful for Dr. David John for introducing me to genetic algorithms and Dr. Errin Fulp for igniting a passion for computer security. However, none of this would have been possible without extensive support and guidance from my family. I am thankful for my father, Dr. Roger Seagle Sr., for instilling a life-long pursuit of learning and exemplifying the perseverance and determination needed to achieve my dreams. I am grateful for my mother, Janie Halley, whose nurturing spirt laid a foundation upon which my career has been built. To my wife's parents, Jackie and Lisa Shelton, I owe a special thanks for their enduring support throughout this process. I thank my step-parents for their love and compassion, my grandparents who continue to be role models and challenge me to be a better person, and my siblings for embodying the true spirit of life. Last but not least, I am grateful for my friends and colleagues, Mark Eklund, Kenny Gilbert, Jon Gill, Jay Koehler, Steve Rich, and Mark Shirley, who have consistently introduced me to bleeding edge technologies, inspired me with their fervor and openly shared their creative forces. It is their provocation that makes me aspire to be a better computer scientist. iii Abstract Secure software, meaning software free from vulnerabilities, is desirable in today's marketplace. Consumers are beginning to value a product's security posture as well as its functionality. Software development companies are recognizing this trend, and they are factoring security into their entire software development lifecycle. Secure development practices like threat modeling, static analysis, safe programming libraries, run-time protections, and software verification are being mandated during product development. Mandating these practices improves a product's security posture before customer delivery, and these practices increase the difficulty of discovering and exploiting vulnerabilities. Since the 1980's, security researchers have uncovered software defects by fuzz testing an application. In fuzz testing's infancy, randomly generated data could discover multiple defects quickly. However, as software matures and software development companies integrate secure development practices into their development life cycles, fuzzers must apply more sophisticated techniques in order to retain their ability to uncover defects. Fuzz testing must evolve, and fuzz testing practitioners must devise new algorithms to exercise an application in unexpected ways. This dissertation's objective is to create a proof-of-concept genetic algorithm fuzz testing framework to exercise an application's file format parsing routines. The framework includes multiple genetic algorithm variations, provides a configuration scheme, and correlates data gathered from static and dynamic analysis to guide negative test case evolution. Experiments conducted for this dissertation illustrate the effectiveness of a genetic algorithm fuzzer in comparison to standard fuzz testing tools. The experiments showcase a genetic algorithm fuzzer's ability to discover multiple unique defects within a limited number of negative test cases. These experiments also highlight an application's increased execution time when fuzzing with a genetic algorithm. To combat increased execution time, a distributed architecture is implemented and additional experiments demonstrate a decrease in execution time comparable to standard fuzz testing tools. A final set of experiments provide guidance on fitness function selection with a CHC genetic algorithm fuzzer with different population size configurations. iv Contents List of Figures vii 1 Introduction 1 1.1 Problem Statement . .3 1.2 Approach . .4 1.3 Contributions . .6 1.4 Scope . .7 2 Literature Review 9 3 Reverse Engineering 25 3.1 Mach Object (Mach-O) Format Analysis . 28 3.2 Mach-O Disassembly . 30 3.3 Attribute Extraction . 33 4 Mamba Fuzzing Framework 38 4.1 Configuration . 40 4.2 Attack Heuristics . 43 4.3 Executor . 44 4.4 Monitor . 49 4.5 Logger . 51 4.6 Reporter . 52 5 Genetic Algorithms 53 5.1 Foundations . 53 5.2 CHC-GA . 59 5.3 Configuration . 62 5.4 Fitness Function Variables . 66 v 5.5 Performance . 75 6 Distributed File Fuzzing 80 6.1 Architecture . 81 6.2 Results . 89 7 Fitness Function Case Study 91 7.1 Results . 95 8 Conclusions and Future Work 97 8.1 Conclusions . 97 8.2 Future Work . 98 Bibliography 101 Appendix A: Abbreviations 110 Appendix B: Code Examples 116 Appendix C: Command Reference 121 Appendix D: Graphs 133 Vita 133 vi List of Figures 2.1 Fuzzer Classifications . 12 3.1 C Code TLV Parser Example . 30 3.2 Assembly Code TLV Parser Example . 31 3.3 YAML Export TLV Parser Example . 36 4.1 Mamba Global Configuration File . 41 4.2 Mangle Fuzzer Configuration File . 42 4.3 VEX IR Example . 46 4.4 Vulnerable Library Trace . 47 4.5 Rufus Fault Detection . 50 5.1 Illustration of Selection, Crossover, and Mutation . 56 5.2 Mamba Genetic Algorithm Configuration . 63 5.3 Fitness Function Variables . 66 5.4 Function Control Flow Graph . 71 5.5 Transition Probability Matrices . 73 6.1 Distributed Fuzzer Architecture . 82 6.2 Mamba.yml Distributed Global Configuration File . 83 B.1 Mangle Wrapper Script . 113 B.2 notSPIKEfile Data Model . 114 B.3 Peach XML Configuration . 115 C.1 Mamba Fuzzing Framework Command Listing . 117 C.2 Mamba tools:otool Command Reference . 117 C.3 Mamba tools:disassemble Command Reference . 117 C.4 Mamba create Command Reference . 118 vii C.5 Mamba tools:seed Command Reference . 118 C.6 Mamba fuzz:package Command Reference . 118 C.7 Mamba fuzz:unpackage Command Reference . 119 C.8 Mamba distrib:qstart Command Reference . 119 C.9 Mamba distrib:qstop Command Reference . 119 C.10 Mamba distrib:qreset Command Reference . 119 C.11 Mamba distrib:qstatus Command Reference . 119 C.12 Mamba distrib:dstart Command Reference . 119 C.13 Mamba distrib:dstop Command Reference . 120 C.14 Mamba distrib:start Command Reference . 120 C.15 Mamba distrib:stop Command Reference . 120 D.1 Mangle Fuzzer Performance . 122 D.2 Peach Fuzzer Performance . 123 D.3 Simple Genetic Algorithm Fuzzer Performance . 124 D.4 Simple Genetic Algorithm Fuzzer with Mangle Mutator Performance . 125 D.5 Simple Genetic Algorithm Fuzzer with Mangled Initial Population Performance . 126 D.6 Byte Genetic Algorithm Fuzzer Performance . 127 D.7 CHC Genetic Algorithm Fuzzer Performance . 128 D.8 Defects Found per Fuzzing Algorithm . 129 D.9 Performance Comparison with Distributed Environment . 130 D.10 Distributed CHC GA Unique Defects Found per Fitness Function (250) . 131 D.11 Distributed CHC GA Unique Defects Found per Fitness Function (100) . 132 viii Chapter 1 Introduction Software testing is an integral component of a software development life cycle. Many different software development processes, from older constructs such as the Spiral Model [Boehm 1986] to modern practices like Agile Development [Beck 1999], mandate thorough testing. Oftentimes, these models differentiate themselves by prescribing a unique testing methodology or testing interval. While these variances can help classify a development process, the goal behind prescribing software testing remains consistent. This goal is to improve software quality and resiliency through structured, rigorous verification and validation.

A Framework for File Format Fuzzing with Genetic Algorithms

Types of Software Testing

Website Testing • Black-Box Testing

Configuration Fuzzing Testing Framework for Software Vulnerability Detection

Software Testing Training Module

The Dangers of Use Cases Employed As Test Cases Bernie Berger

Evaluating Testing with a Test Level Matrix

Different Types of Application Testing

Dlint: Dynamically Checking Bad Coding Practices in Javascript

Challenges of Large-Scale Software Testing and the Role of Quality

Software Testing – Levels, Methods and Types

System and Software Testing Types

Platform Compatibility Testing and Additional Information Contact Us at 1-877-442-4669 Or by Email [email protected]