Seed Selection for Successful Fuzzing
Total Page:16
File Type:pdf, Size:1020Kb
Seed Selection for Successful Fuzzing Adrian Herrera Hendra Gunadi Shane Magrath ANU & DST ANU DST Australia Australia Australia Michael Norrish Mathias Payer Antony L. Hosking CSIRO’s Data61 & ANU EPFL ANU & CSIRO’s Data61 Australia Switzerland Australia ABSTRACT ACM Reference Format: Mutation-based greybox fuzzing—unquestionably the most widely- Adrian Herrera, Hendra Gunadi, Shane Magrath, Michael Norrish, Mathias Payer, and Antony L. Hosking. 2021. Seed Selection for Successful Fuzzing. used fuzzing technique—relies on a set of non-crashing seed inputs In Proceedings of the 30th ACM SIGSOFT International Symposium on Software (a corpus) to bootstrap the bug-finding process. When evaluating a Testing and Analysis (ISSTA ’21), July 11–17, 2021, Virtual, Denmark. ACM, fuzzer, common approaches for constructing this corpus include: New York, NY, USA, 14 pages. https://doi.org/10.1145/3460319.3464795 (i) using an empty file; (ii) using a single seed representative of the target’s input format; or (iii) collecting a large number of seeds (e.g., 1 INTRODUCTION by crawling the Internet). Little thought is given to how this seed Fuzzing is a dynamic analysis technique for finding bugs and vul- choice affects the fuzzing process, and there is no consensus on nerabilities in software, triggering crashes in a target program by which approach is best (or even if a best approach exists). subjecting it to a large number of (possibly malformed) inputs. To address this gap in knowledge, we systematically investigate Mutation-based fuzzing typically uses an initial set of valid seed and evaluate how seed selection affects a fuzzer’s ability to find bugs inputs from which to generate new seeds by random mutation. Due in real-world software. This includes a systematic review of seed to their simplicity and ease-of-use, mutation-based greybox fuzzers selection practices used in both evaluation and deployment con- such as AFL [74], honggfuzz [64], and libFuzzer [61] are widely texts, and a large-scale empirical evaluation (over 33 CPU-years) of deployed, and have been highly successful in uncovering thousands six seed selection approaches. These six seed selection approaches of bugs across a large number of popular programs [6, 16]. This include three corpus minimization techniques (which select the success has prompted much research into improving various as- smallest subset of seeds that trigger the same range of instrumen- pects of the fuzzing process, including mutation strategies [39, 42], tation data points as a full corpus). energy assignment policies [15, 25], and path exploration algo- Our results demonstrate that fuzzing outcomes vary significantly rithms [14, 73]. However, while researchers often note the impor- depending on the initial seeds used to bootstrap the fuzzer, with min- tance of high-quality input seeds and their impact on fuzzer perfor- imized corpora outperforming singleton, empty, and large (in the mance [37, 56, 58, 67], few studies address the problem of order of thousands of files) seed sets. Consequently, we encourage optimal de- for mutation-based fuzzers [56, 58], seed selection to be foremost in mind when evaluating/deploying sign and construction of corpora and none assess the precise impact of these corpora in coverage- fuzzers, and recommend that (a) seed choice be carefully considered guided mutation-based greybox fuzzing. and explicitly documented, and (b) never to evaluate fuzzers with Intuitively, the collection of seeds that form the initial corpus only a single seed. should generate a broad range of observable behaviors in the target. Similarly, candidate seeds that are behaviorally similar to one an- CCS CONCEPTS other should be represented in the corpus by a single seed. Finally, • Software and its engineering ! Software testing and de- both the total size of the corpus and the size of individual seeds bugging; • Security and privacy ! Software and application should be minimized. This is because previous work has demon- security. strated the impact that file system contention has on industrial-scale fuzzing. In particular, Xu et al.[71] showed that the overhead from KEYWORDS opening/closing test-cases and synchronization between workers each introduced a 2× overhead. Time spent opening/closing test- fuzzing, corpus minimization, software testing cases and synchronization is time diverted from mutating inputs and expanding code coverage. Minimizing the total corpus size and the size of individual test-cases reduces this wastage and enables Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed time to be (better) spent on finding bugs. for profit or commercial advantage and that copies bear this notice and the full citation Under these assumptions, simply gathering as many input files on the first page. Copyrights for components of this work owned by others than ACM as possible is not a reasonable approach for constructing a fuzzing must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a corpus. Conversely, these assumptions also suggest that beginning fee. Request permissions from [email protected]. with the “empty corpus” (e.g., consisting of one zero-length file) ISSTA ’21, July 11–17, 2021, Virtual, Denmark may be less than ideal. And yet, as we survey here, the majority © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8459-9/21/07. of published research uses either (a) the “singleton corpus” (e.g., a https://doi.org/10.1145/3460319.3464795 single seed representative of the target program’s input format), ISSTA ’21, July 11–17, 2021, Virtual, Denmark Adrian Herrera, Hendra Gunadi, Shane Magrath, Michael Norrish, Mathias Payer, and Antony L. Hosking or (b) the empty corpus. In contrast, industrial-scale fuzzing (e.g., chain model for seed scheduling [15], and MOpt’s particle swarm Google’s OSSFuzz [6]) typically uses large corpora containing hun- optimization for selecting mutation operators [42]. dreds of inputs. These inputs may generate similar behavior in the target, potentially leading to wasted fuzzing effort in exhaustively 3 SEED SELECTION PRACTICES exploring mutation from all available seeds. The practice is to use The process for selecting the initial corpus of seeds varies wildly corpus minimization tools that eliminate seeds that display duplicate across fuzzer evaluations and deployments. We systematically re- behavior. However, these tools are based on heuristic algorithms view this variation in the following sections, first considering ex- and generate corpora that vary wildly in size. It is unclear which of perimental evaluations, followed by industrial-scale deployments. these approaches (fuzzing with an empty, singleton, or minimized corpus) is best, or even if a best approach exists. 3.1 In Experimental Evaluation Thus, we undertake a systematic study to better understand the Remarkably, Klees et al.[37] found that “most papers treated the impact that seed selection has on a fuzzer’s ultimate goal: finding choice of seed casually, apparently assuming that any seed would bugs in real-world software. We make the following contributions: work equally well, without providing particulars”. In particular, of • A systematic review of seed selection practices and corpus the 32 papers that they surveyed (authored between 2012 and 2018): minimization techniques used across fuzzer evaluations and ten used non-empty seed(s), but it was not clear whether these deployments. Our review finds that seed selection practices seeds were valid inputs; nine assumed the existence of valid seed(s), vary wildly, and that there is no consensus on the best way to but did not report how these were obtained; five used a random select the seeds that bootstrap the fuzzing process (Section3). sampling of seed(s); four used manually constructed seed(s); and • A new corpus minimization tool, OptiMin, which produces two used empty seed(s). Additionally, six studies used a combination an optimal minimum corpus (Section4). of seed selection techniques. Klees et al.[37] find thatit “ is clear • A quantitative evaluation and comparison of various seed- that a fuzzer’s performance on the same program can be very different selection practices. This evaluation covers three corpus min- depending on what seed is used” and recommend that “papers should imization tools (including OptiMin), and finds that corpora be specific about how seeds are collected”. produced with these tools perform better than singleton and We examine an additional 28 papers published since 2018, to see empty seeds with respect to bug-finding ability (Section5). if these recommendations have been adopted. Table1 summarizes our findings. 2 FUZZING Unreported seeds. Three studies make no mention of their seed Fuzzing is the most popular technique for automatically finding selection procedure. One, FuzzGen, explicitly mentions that “stan- bugs and vulnerabilities in software. This popularity can be attrib- dardized common seeds”[33] are key to a valid comparison, yet does uted to its simplicity and success in finding bugs in “real-world” not mention the seeds used. software [16, 61, 64, 74]. How a fuzzer generates test-cases depends on whether it is gen- Benchmark and fuzzer-provided seeds. Three studies (Hawkeye, eration-based or mutation-based. Generation-based fuzzers (e.g., FuzzFactory,