An Ir-Based Fuzzing Approach for Finding Context-Aware Bugs in Api-Based Systems

AN IR-BASED FUZZING APPROACH FOR FINDING CONTEXT-AWARE BUGS IN API-BASED SYSTEMS A Dissertation Presented to The Academic Faculty By Wen Xu In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Computing Department of Computer Science Georgia Institute of Technology May 2021 © Wen Xu 2021 AN IR-BASED FUZZING APPROACH FOR FINDING CONTEXT-AWARE BUGS IN API-BASED SYSTEMS Thesis committee: Dr. Taesoo Kim (Advisor) Dr. Qirun Zhang School of Computer Science School of Computer Science Georgia Institute of Technology Georgia Institute of Technology Dr. Wenke Lee Dr. Weidong Cui School of Computer Science Microsoft Research Lab - Redmond Georgia Institute of Technology Microsoft Dr. Alessandro Orso School of Computer Science Georgia Institute of Technology Date approved: April 27, 2021 ACKNOWLEDGEMENTS First of all, I would like to thank my advisor, Dr. Taesoo Kim. He is, indeed, a smart and hardworking person who has strongly motivated, guided, and supported me through my entire graduate study. I would also like to acknowledge my thesis committee, Dr. Wenke Lee, Dr. Alessandro Orso, Dr. Qirun Zhang and Dr. Weidong Cui, for their invaluable comments and suggestions on my dissertation. I feel fortunate to work with many colleagues in the SSLab, including Dr. Sanidhya Kashyap, Dr. Changwoo Min, Dr. Sanghoo Lee, Dr. Byoungyoung Lee, Dr. Insu Yun, Dr. Hyungon Moon, Dr. Daehee Jang, Dr. Hong Hu, Dr. Meng Xu, Soyeon Park, Seulbae Kim, Ren Ding, Po-Ning Tseng, Fan Sang, and Jungyeon Yoon. Last but not least, I would like to thank my parents, Dongnian Xu and Yiping Xiao, for their endless love and support during these years. iii TABLE OF CONTENTS Acknowledgments................................... iii List of Tables...................................... ix List of Figures..................................... xi Summary........................................ xv Chapter 1: Introduction................................ 1 Chapter 2: Overview................................. 5 2.1 Background.................................. 5 2.2 Approach ................................... 7 2.3 Challenges................................... 8 Chapter 3: RPG IR.................................. 10 3.1 Scope ..................................... 10 3.2 Object..................................... 12 3.3 API ...................................... 14 3.3.1 API Return .............................. 14 3.3.2 API Arguments............................ 15 3.3.3 API Effects .............................. 19 iv 3.3.4 API Body............................... 21 3.3.5 API Display.............................. 21 3.4 Summary ................................... 22 Chapter 4: Context-aware API Fuzzing via RPG IR................ 23 4.1 Context-aware API Generation........................ 23 4.2 Context-aware API Mutation......................... 25 Chapter 5: FreeDom: Context-aware DOM Fuzzing via RPG IR......... 27 5.1 Background.................................. 27 5.1.1 DOM: An HTML Document Representation............. 27 5.1.2 DOM Engine Bugs.......................... 29 5.1.3 A Primer on DOM Fuzzing...................... 29 5.2 Motivation................................... 30 5.2.1 On the Ineffectiveness of Static Grammars.............. 30 5.2.2 Exploring Coverage-guided DOM Fuzzing ............. 32 5.3 Design..................................... 33 5.3.1 Document Representation in RPG IR................. 33 5.3.2 Context-aware DOM Fuzzing .................... 34 5.4 Evaluation................................... 40 5.4.1 Discovering New DOM Engine Bugs ................ 40 5.4.2 Effectiveness of Context-aware Fuzzing............... 42 5.4.3 Effectiveness of Coverage-driven Fuzzing.............. 45 5.5 Conclusion .................................. 48 v Chapter 6: Janus: Context-aware File System Fuzzing via RPG IR........ 49 6.1 File System Fuzzing.............................. 49 6.1.1 Disk Image Fuzzing.......................... 50 6.1.2 File Operation Fuzzer......................... 50 6.2 Design..................................... 51 6.2.1 Overview ............................... 51 6.2.2 Building Corpus............................ 53 6.2.3 Fuzzing Images............................ 53 6.2.4 File Operations in RPG IR...................... 54 6.2.5 File Operation Generation and Mutation............... 55 6.2.6 Exploring Two-Dimensional Input Space .............. 56 6.2.7 Library OS based Kernel Execution Environment.......... 57 6.3 Evaluation................................... 57 6.3.1 Bug Discovery in the Upstream File Systems ............ 58 6.3.2 Exploring File Operations ...................... 60 6.3.3 Exploring Two-Dimensional Input Space .............. 62 6.4 Conclusion .................................. 63 Chapter 7: RPG: Random Program Generator................... 65 7.1 Overview ................................... 65 7.2 ASL...................................... 67 7.2.1 Object Declaration .......................... 68 7.2.2 Macro................................. 68 vi 7.2.3 Argument Definition ......................... 71 7.2.4 API Definition ............................ 73 7.2.5 Further Notes............................. 75 7.2.6 Summary ............................... 76 7.3 Implementation................................ 77 7.3.1 ASL Compiler and Transpiler .................... 77 7.3.2 LIBRPG................................ 78 7.3.3 DOM Fuzzing via RPG........................ 78 7.4 Evaluation................................... 78 7.4.1 Comparison between ASL and Existing Grammars ......... 79 7.4.2 Effectiveness of RPG......................... 81 Chapter 8: Discussion................................. 84 8.1 Comparison between RPG IR and Existing Input Representations . 84 8.2 Limited API Semantics by RPG IR...................... 85 8.3 Automated ASL Programming........................ 85 8.4 Potential Improvements in RPG IR...................... 86 8.5 Potential Improvements in ASL........................ 86 Chapter 9: Related work............................... 87 9.1 API Fuzzing.................................. 87 9.2 Description-based Fuzzing .......................... 87 9.3 Semantics-aware Fuzzing........................... 88 vii Chapter 10:Conclusion................................ 89 References....................................... 90 viii LIST OF TABLES 2.1 A list of recent API fuzzers. The Grammar column indicates whether the fuzzer is based on grammars. The Context-awareness column indicates whether the fuzzer is aware of the context when generating random API calls. Syzkaller is not completely context-aware and still suffers from semantic errors. FreeDom and Janus are two context-aware API fuzzer presented in this thesis. .................................. 5 5.1 The classification of existing DOM fuzzers. Dynamic fuzzers themselves are web pages executed by the target browser, while static fuzzers gener- ate documents first and then test them. FreeDom is a static DOM fuzzer, which supports both blackbox generation and coverage-guided mutation in a context-aware manner............................. 29 5.2 The examples of the mutation algorithms used by FreeDom for three dif- ferent parts of a document. The Wgt. column indicates the preference of FreeDom to those algorithms. We mark the algorithms that are beyond simple appending and difficult to support by extending old DOM fuzzers. 36 5.3 The reported bugs found by FreeDom in Apple Safari (WebKit), Google Chrome, and Mozilla Firefox. We mark out the latest browser versions that are affected by the bugs. The Component column indicates what specific DOM components a document is required to contain for triggering the bugs. In particular, bugs #18, #19, and #24 only affect the beta version and are never shipped into a release; though they are security bugs, there are no CVEs assigned for them. ........................... 41 5.4 The unique crashes discovered by Domato, FreeDom, and both fuzzers during three 24-hour runs. We also count the number of crashes that have one of the four types of context-dependent values (CDVs) described in §5.2.1. Note that some crashes that involve more than one type of CDVs are counted more than once. ........................... 45 ix 5.5 Fuzzing results of running FreeDom with random generation (FDG) and coverage-guided mutation (FDM) against WebKit for 24 hours. We list the total number of unique security-related crashes found in three fuzzing runs and the average values for other metrics. The New column presents the number of distinct crashes that are only found by the coverage-guided mutation-based fuzzing. ........................... 47 6.1 An overview of bugs found by Janus in eight widely-used file systems in upstream Linux kernels. The column #Reported shows the number of bugs reported to the Linux kernel community; #Confirmed presents the number of reported bugs that are previously unknown and confirmed by kernel developers; #Fixed indicates the number of bugs that have already been fixed, at least in the development branch, and #Patches reports the number of git commits for fixing found bugs; #CVEs lists the number of CVEs assigned for confirmed bugs....................... 59 6.2 The list of previously unknown bugs in widely used file systems found by Janus that have already been fixed in Linux kernel v4.16, v4.17, and v4.18. 64 7.1 Syntax of object state assertions in ASL and their corresponding definitions in RPG IR.................................... 72 7.2 Syntax of argument value expressions in ASL and their

Load more