Towards Resource-Aware Security Testing of Software
Total Page:16
File Type:pdf, Size:1020Kb
Towards Resource-Aware Security Testing of Software Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Sang Kil Cha B.S., Electrical Engineering, Korea University M.S., Electrical and Computer Engineering, Carnegie Mellon University Thesis Committee: Dr. David Brumley, Chair Dr. Lujo Bauer Dr. David Molnar Dr. Vyas Sekar Carnegie Mellon University Pittsburgh, PA August 10, 2015 Copyright c 2015 Sang Kil Cha For my daughter, Jaen. Abstract As software permeates every facet of life, it is imperative to assure the safety of soft- ware systems. Software vulnerabilities—exploitable software bugs—allow an attacker to destroy privacy, steal identities, and even extort money from victims. Therefore, software bugs must be discovered before an attacker can exploit them. This dissertation presents our work on mutational fuzzing, a software testing tech- nique for finding software bugs. Specifically, we argue that the efficiency of mutational fuzzing can drastically change depending on its parameters, and thus, automatic pa- rameter optimization can help in improving the fuzzing efficiency. We validate this ar- gument by designing, implementing, and evaluating several systems that employ novel techniques optimizing parameter selection for mutational fuzzing. Our specific contri- butions are that (1) we precisely define fuzzing and its parameter space; (2) we analyti- cally study the effectiveness of mutational fuzzing in terms of bug finding probability; (3) we then address three strategies in optimizing mutational fuzzing over the parame- ter space in terms of the number of bugs found; and (4) we finally show a post-fuzzing strategy that enables prioritizing security-relevant bugs under limited resources. Acknowledgments I am deeply indebted to my advisor David Brumley for his support. I came to the USA as a master student without a clue. I struggled with courses due to language barrier and cultural differences. Fortunately, I met David in one of his courses. He rekindled my passion in computer hacking and security research. I would like to say that he have turned a computer hacker into a researcher. I am grateful to Mahadev Satyanarayanan for teaching me the spirit of being a great engineer and a researcher. He has always been my role model during my graduate years, and his courses were my favorite of all time. I am thankful to Charles P. Neuman for giving me a great opportunity to think about being a good professor. I would espe- cially like to thank Weidong Cui for being a great mentor. His relentless passion to his work always inspires me. I would also like to thank my mentors David Andersen, Lujo Bauer, David Molnar, Marcus Peinado, and Vyas Sekar for their helpful feedback, and constant support. Heejo Lee gave me invaluable opportunities both in Korea and in Pittsburgh. I had the privilege of teaching a short course in Korea, which gave me a lot of inspiration. We also had fruitful discussion in Pittsburgh about life and research. Thanassis Avgerinos and Alexandre Rebert tolerated me as a collaborator and as a friend. I am incredibly fortunate to have had the opportunity to work closely with them. I will never forget the team Mayhem. John Truelove helped me get through cultural differences. He always tried to respect me even though I could not express myself, and it was one of the reasons why I could gain confidence during my master years. I am also grateful to my awesome colleagues: Tiffany Bao, Jonathan Burket, Peter Chapman, Nicholas Christine, Anupam Datta, Samantha Gottlieb, Ivan Jager, Jiyong Jang, Limin Jia, Minsuk Kang, Gihyuk Ko, Jonghyup Lee, Soobum Lee, Yanlin Li, Brent Lim, Yue-Hsun Lin, Matthew Maurer, Iulian Moraru, Brian Pak, Adrian Perrig, Edward Schwartz, Divya Sharma, Arunesh Sinha, Michael Stroucken, Spencer Whitman, and Maverick Woo. My apologies to the other people I may have likely forgotten as an oversight. In addition, I would like to thank Ramakrishna Battala and Prasanna Kumar for welcoming me into Indian culture. The first year of my master cannot be explained without my awesome Indian friends. Finally, my deep and heartfelt gratitude goes to my wife, Yeon Yim for her support, patience, constant encouragement and love. Without her, I could not have done this. Funding Acknowledgments This material was supported fully or in part by grants from the National Science Foundation, the Department of Defense, the Defense Advanced Research Projects Agency, Software Engineering Institute, CyLab Army Research Office, Lockheed Martin, and Northrop Grumman as part of the Cybersecurity Research Consortium. Any opinions, findings, and conclusions or recommendations expressed herein are those of the au- thors and do not necessarily reflect the views of the sponsors. Contents 1 Introduction 1 1.1 A Vision for Securing Software...............................1 1.2 Overview: Resource-Aware Security Testing Challenge.................2 1.3 Fuzzing for Bug Finding...................................4 1.4 Parameter Space Reduction for Resource-Aware Fuzzing................4 1.5 Parameter Inference for Resource-Aware Fuzzing.....................5 1.6 Resource-Aware Fuzzing with Dynamic Parameter Scheduling.............6 1.7 Resource-Aware Bug Prioritization.............................7 1.8 Summary of Contributions..................................8 2 Theory of Fuzzing9 2.1 Terminology.......................................... 10 2.2 Our Mathematical Model.................................. 13 2.3 Fuzzing............................................. 13 2.4 Taxonomy of Fuzzing..................................... 16 2.5 Fuzzing Algorithms...................................... 18 2.5.1 Random Fuzzing................................... 19 2.5.2 Ball-based Mutational Fuzzing........................... 19 2.5.3 Surface-based Mutational Fuzzing......................... 20 2.6 Measuring the Fuzzing Efficiency.............................. 20 2.6.1 Random Fuzzing................................... 20 2.6.2 Ball-based Mutational Fuzzing........................... 22 2.6.3 Surface-based Mutational Fuzzing......................... 23 2.6.4 Algorithmic Implementation............................ 24 3 Parameter Reduction 27 3.1 Exploiting Characteristics of Fuzzing Outcome...................... 28 3.2 Seed Selection Challenge................................... 28 3.3 Seed Selection Algorithms.................................. 30 3.4 Measuring Seed Selection Quality.............................. 33 3.4.1 ILP Formulation................................... 35 3.4.2 Optimal Seed Selection for Round-Robin..................... 36 3.5 Experiments.......................................... 37 3.5.1 Establishing Ground Truth............................. 38 3.5.2 Seed Selection Algorithms vs. Random Sampling................ 39 3.5.3 Comparison...................................... 41 3.5.4 Seed Reduction Usefulness............................. 44 3.5.5 Seed Transferability................................. 45 3.6 Discussion........................................... 48 3.7 Summary............................................ 48 4 Parameter Inference 49 4.1 Exploiting Characteristics of Fuzzing Outcome...................... 50 4.2 Input-Bit Dependence.................................... 50 4.3 Failure Rate based on Mutation Ratio............................ 52 4.4 Mutation Ratio Optimization................................ 55 4.4.1 Mutation Ratio Optimization Challenge...................... 55 4.4.2 Solving for an Optimal Mutation Ratio...................... 55 4.4.3 Estimating r ...................................... 57 4.5 Input-Bit Dependence Inference............................... 59 4.5.1 The Algorithm.................................... 60 4.5.2 Example........................................ 65 4.6 SymFuzz Design........................................ 67 4.6.1 Implementation.................................... 68 4.6.2 Symbolic Analysis.................................. 68 4.6.3 Safe Stack Hash.................................... 69 4.7 Evaluation........................................... 70 4.7.1 Experimental Setup.................................. 70 4.7.2 Mutation Ratio Optimization............................ 72 4.7.3 Distribution of b Values............................... 75 4.7.4 Estimating r ...................................... 76 4.7.5 SymFuzz Practicality................................. 77 4.8 Discussion........................................... 79 4.9 Summary............................................ 81 5 Parameter Scheduling 82 5.1 Exploiting Characteristics of Fuzzing Outcome...................... 82 5.2 Problem Setting........................................ 83 5.3 Algorithmic Considerations................................. 84 5.4 Multi-Armed Bandits..................................... 85 5.5 Fuzzing as a Weighted CCP................................. 86 5.6 Impossibility Results..................................... 88 5.7 Scheduling Algorithm Design................................ 89 5.7.1 Rule of Three..................................... 90 5.7.2 Design Space..................................... 91 5.8 Design & Implementation of FuzzSim ........................... 95 5.9 FuzzSim Evaluation...................................... 97 5.9.1 Experimental Setup.................................. 97 5.9.2 Fuzzing Data Collection............................... 98 5.9.3 Data Analysis..................................... 99 5.9.4 Simulation......................................