Whitebox Fuzzing for Security Testing

Total Page:16

File Type:pdf, Size:1020Kb

Whitebox Fuzzing for Security Testing practice DOI:10.1145/2093548.2093564 and its users millions of dollars. If a Article development led by queue.acm.org monthly security update costs you $0.001 (one tenth of one cent) in just electricity or loss of productivity, then SAGE has had a remarkable this number multiplied by one bil- impact at Microsoft. lion people is $1 million. Of course, if malware were spreading on your ma- BY PATRICE GODEFROID, MICHAEL Y. LEVIN, AND DAVID MOLNAR chine, possibly leaking some of your private data, then that might cost you much more than $0.001. This is why we strongly encourage you to apply those pesky security updates. SAGE: Many security vulnerabilities are a result of programming errors in code for parsing files and packets that are transmitted over the Internet. For ex- Whitebox ample, Microsoft Windows includes parsers for hundreds of file formats. If you are reading this article on a computer, then the picture shown in Figure 1 is displayed on your screen Fuzzing for after a jpg parser (typically part of your operating system) has read the image data, decoded it, created new data structures with the decoded data, Security and passed those to the graphics card in your computer. If the code imple- menting that jpg parser contains a bug such as a buffer overflow that can Testing be triggered by a corrupted jpg image, then the execution of this jpg parser on your computer could potentially be hijacked to execute some other code, possibly malicious and hidden in the jpg data itself. This is just one example of a pos- MOST COMMUNicatiONS READERS might think of sible security vulnerability and at- tack scenario. The security bugs dis- “program verification research” as mostly theoretical cussed throughout the rest of this with little impact on the world at large. Think again. article are mostly buffer overflows. If you are reading these lines on a PC running some Hunting for “Million-Dollar” Bugs form of Windows (like over 93% of PC users—that is, Today, hackers find security vulnera- more than one billion people), then you have been bilities in software products using two primary methods. The first is code in- affected by this line of work—without knowing it, spection of binaries (with a good dis- which is precisely the way we want it to be. assembler, binary code is like source Every second Tuesday of every month, also known code). The second is blackbox fuzzing, as “Patch Tuesday,” Microsoft releases a list of a form of blackbox random testing, security bulletins and associated security patches to which randomly mutates well-formed program inputs and then tests the be deployed on hundreds of millions of machines program with those modified inputs,3 worldwide. Each security bulletin costs Microsoft hoping to trigger a bug such as a buf- 40 COMMUNICATIONS OF THE ACM | MARCH 2012 | VOL. 55 | NO. 3 Figure 1. A sample jpg image. fer overflow. In some cases, grammars then branch of the conditional state- called whitebox fuzzing.5 It builds upon are used to generate the well-formed ment in recent advances in systematic dynamic inputs. This also allows encoding test generation4 and extends its scope application-specific knowledge and int foo(int x) { // x is an input from unit testing to whole-program test-generation heuristics. int y = x + 3; security testing. Blackbox fuzzing is a simple yet if (y == 13) abort(); // error Starting with a well-formed input, effective technique for finding se- retur n 0; whitebox fuzzing consists of symboli- curity vulnerabilities in software. } cally executing the program under test Thousands of security bugs have dynamically, gathering constraints on been found this way. At Microsoft, has only 1 in 232 chances of being ex- inputs from conditional branches en- fuzzing is mandatory for every un- ercised if the input variable x has a countered along the execution. The trusted interface of every product, as randomly chosen 32-bit value. This in- collected constraints are then sys- prescribed in the Security Develop- tuitively explains why blackbox fuzzing tematically negated and solved with a NASA 7 F ment Lifecycle, which documents usually provides low code coverage and constraint solver, whose solutions are recommendations on how to devel- can miss security bugs. mapped to new inputs that exercise OURTESY O OURTESY op secure software. different program execution paths. C H P Although blackbox fuzzing can be Introducing Whitebox Fuzzing This process is repeated using novel RA G remarkably effective, its limitations A few years ago, we started develop- search techniques that attempt to sweep PHOTO are well known. For example, the ing an alternative to blackbox fuzzing, through all (in practice, many) feasible MARCH 2012 | VOL. 55 | NO. 3 | COMMUNICATIONS OF THE ACM 41 practice execution paths of the program while straint leading to it, and attempted checking simultaneously many prop- to be solved by a constraint solver. erties using a runtime checker (such as This way, a single symbolic execution Purify, Valgrind, or AppVerifier). can generate thousands of new tests. For example, symbolic execution of (In contrast, a standard depth-first the previous program fragment with SAGE has had or breadth-first search would negate an initial value 0 for the input vari- a remarkable only the last or first constraint in each able x takes the else branch of the path constraint and generate at most conditional statement and generates impact at Microsoft. one new test per symbolic execution.) the path constraint x+3 ≠ 13. Once It combines and The program shown in Figure 2 this constraint is negated and solved, takes four bytes as input and con- it yields x = 10, providing a new input extends program tains an error when the value of that causes the program to follow the analysis, testing, the variable cnt is greater than or then branch of the conditional state- equal to four. Starting with some ment. This allows us to exercise and verification, initial input good, SAGE runs this test additional code for security bugs, program while performing a sym- even without specific knowledge of model checking, bolic execution dynamically. Since the input format. Furthermore, this and automated the program path taken during this approach automatically discovers and first run is formed by all the else tests “corner cases” where program- theorem-proving branches in the program, the path mers may fail to allocate memory or techniques constraint for this initial run is the manipulate buffers properly, leading conjunction of constraints i0 ≠ b, to security vulnerabilities. that have been i1 ≠ a, i2 ≠ d and i3 ≠ !. Each of these In theory, systematic dynamic test developed over constraints is negated one by one, generation can lead to full program placed in a conjunction with the path coverage, that is, program verifica- many years. prefix of the path constraint lead- tion. In practice, however, the search ing to it, and then solved with a con- is typically incomplete both because straint solver. In this case, all four the number of execution paths in the constraints are solvable, leading to program under test is huge, and be- four new test inputs. Figure 2 also cause symbolic execution, constraint shows the set of all feasible program generation, and constraint solving can paths for the function top. The left- be imprecise due to complex program most path represents the initial run statements (pointer manipulations of the program and is labeled 0 for and floating-point operations, among Generation 0. Four Generation 1 in- others), calls to external operating- puts are obtained by systematically system and library functions, and large negating and solving each constraint numbers of constraints that cannot in the Generation 0 path constraint. all be solved perfectly in a reasonable By repeating this process, all paths amount of time. Therefore, we are are eventually enumerated for this forced to explore practical trade-offs. example. In practice, the search is typically incomplete. SAGE SAGE was the first tool to perform Whitebox fuzzing was first imple- dynamic symbolic execution at the mented in the tool SAGE, short for x86 binary level. It is implemented on Scalable Automated Guided Execu- top of the trace replay infrastructure tion.5 Because SAGE targets large ap- TruScan,8 which consumes trace files plications where a single execution generated by the iDNA framework1 may contain hundreds of millions and virtually re-executes the recorded of instructions, symbolic execution runs. TruScan offers several features is its slowest component. Therefore, that substantially simplify symbolic SAGE implements a novel directed- execution, including instruction de- search algorithm—dubbed genera- coding, providing an interface to pro- tional search—that maximizes the gram symbol information, monitor- number of new input tests generated ing various input/output system calls, from each symbolic execution. Given keeping track of heap and stack frame a path constraint, all the constraints allocations, and tracking the flow of in that path are systematically negat- data through the program structures. ed one by one, placed in a conjunc- Thanks to offline tracing, constraint tion with the prefix of the path con- generation in SAGE is completely de- 42 COMMUNICATIONS OF THE ACM | MARCH 2012 | VOL. 55 | NO. 3 practice Figure 2. Example of program (left) and its search space (right) with the value of cnt at the end of each run. void top(char input[4]) { int cnt=0; if (input[0] == ’b’) cnt++; if (input[1] == ’a’) cnt++; if (input[2] == ’d’) cnt++; if (input[3] == ’!’) cnt++; if (cnt >= 4) abort(); // error } 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 good goo! godd god!gaod gao! gadd gad! bood boo! bodd bod! baod bao! badd bad! terministic because it works with an SAGE Architecture constraint solver (we currently use the execution trace that captures the out- The high-level architecture of SAGE Z3 SMT solver2).
Recommended publications
  • Automatically Bypassing Android Malware Detection System
    FUZZIFICATION: Anti-Fuzzing Techniques Jinho Jung, Hong Hu, David Solodukhin, Daniel Pagan, Kyu Hyung Lee*, Taesoo Kim * 1 Fuzzing Discovers Many Vulnerabilities 2 Fuzzing Discovers Many Vulnerabilities 3 Testers Find Bugs with Fuzzing Detected bugs Normal users Compilation Source Released binary Testers Compilation Distribution Fuzzing 4 But Attackers Also Find Bugs Detected bugs Normal users Compilation Attackers Source Released binary Testers Compilation Distribution Fuzzing 5 Our work: Make the Fuzzing Only Effective to the Testers Detected bugs Normal users Fuzzification ? Fortified binary Attackers Source Compilation Binary Testers Compilation Distribution Fuzzing 6 Threat Model Detected bugs Normal users Fuzzification Fortified binary Attackers Source Compilation Binary Testers Compilation Distribution Fuzzing 7 Threat Model Detected bugs Normal users Fuzzification Fortified binary Attackers Source Compilation Binary Testers Compilation Distribution Fuzzing Adversaries try to find vulnerabilities from fuzzing 8 Threat Model Detected bugs Normal users Fuzzification Fortified binary Attackers Source Compilation Binary Testers Compilation Distribution Fuzzing Adversaries only have a copy of fortified binary 9 Threat Model Detected bugs Normal users Fuzzification Fortified binary Attackers Source Compilation Binary Testers Compilation Distribution Fuzzing Adversaries know Fuzzification and try to nullify 10 Research Goals Detected bugs Normal users Fuzzification Fortified binary Attackers Source Compilation Binary Testers Compilation
    [Show full text]
  • BUGS in the SYSTEM a Primer on the Software Vulnerability Ecosystem and Its Policy Implications
    ANDI WILSON, ROSS SCHULMAN, KEVIN BANKSTON, AND TREY HERR BUGS IN THE SYSTEM A Primer on the Software Vulnerability Ecosystem and its Policy Implications JULY 2016 About the Authors About New America New America is committed to renewing American politics, Andi Wilson is a policy analyst at New America’s Open prosperity, and purpose in the Digital Age. We generate big Technology Institute, where she researches and writes ideas, bridge the gap between technology and policy, and about the relationship between technology and policy. curate broad public conversation. We combine the best of With a specific focus on cybersecurity, Andi is currently a policy research institute, technology laboratory, public working on issues including encryption, vulnerabilities forum, media platform, and a venture capital fund for equities, surveillance, and internet freedom. ideas. We are a distinctive community of thinkers, writers, researchers, technologists, and community activists who Ross Schulman is a co-director of the Cybersecurity believe deeply in the possibility of American renewal. Initiative and senior policy counsel at New America’s Open Find out more at newamerica.org/our-story. Technology Institute, where he focuses on cybersecurity, encryption, surveillance, and Internet governance. Prior to joining OTI, Ross worked for Google in Mountain About the Cybersecurity Initiative View, California. Ross has also worked at the Computer The Internet has connected us. Yet the policies and and Communications Industry Association, the Center debates that surround the security of our networks are for Democracy and Technology, and on Capitol Hill for too often disconnected, disjointed, and stuck in an Senators Wyden and Feingold. unsuccessful status quo.
    [Show full text]
  • Moonshine: Optimizing OS Fuzzer Seed Selection with Trace Distillation
    MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation Shankara Pailoor, Andrew Aday, and Suman Jana Columbia University Abstract bug in system call implementations might allow an un- privileged user-level process to completely compromise OS fuzzers primarily test the system-call interface be- the system. tween the OS kernel and user-level applications for secu- OS fuzzers usually start with a set of synthetic seed rity vulnerabilities. The effectiveness of all existing evo- programs , i.e., a sequence of system calls, and itera- lutionary OS fuzzers depends heavily on the quality and tively mutate their arguments/orderings using evolution- diversity of their seed system call sequences. However, ary guidance to maximize the achieved code coverage. generating good seeds for OS fuzzing is a hard problem It is well-known that the performance of evolutionary as the behavior of each system call depends heavily on fuzzers depend critically on the quality and diversity of the OS kernel state created by the previously executed their seeds [31, 39]. Ideally, the synthetic seed programs system calls. Therefore, popular evolutionary OS fuzzers for OS fuzzers should each contain a small number of often rely on hand-coded rules for generating valid seed system calls that exercise diverse functionality in the OS sequences of system calls that can bootstrap the fuzzing kernel. process. Unfortunately, this approach severely restricts However, the behavior of each system call heavily de- the diversity of the seed system call sequences and there- pends on the shared kernel state created by the previous fore limits the effectiveness of the fuzzers.
    [Show full text]
  • How to Analyze the Cyber Threat from Drones
    C O R P O R A T I O N KATHARINA LEY BEST, JON SCHMID, SHANE TIERNEY, JALAL AWAN, NAHOM M. BEYENE, MAYNARD A. HOLLIDAY, RAZA KHAN, KAREN LEE How to Analyze the Cyber Threat from Drones Background, Analysis Frameworks, and Analysis Tools For more information on this publication, visit www.rand.org/t/RR2972 Library of Congress Cataloging-in-Publication Data is available for this publication. ISBN: 978-1-9774-0287-5 Published by the RAND Corporation, Santa Monica, Calif. © Copyright 2020 RAND Corporation R® is a registered trademark. Cover design by Rick Penn-Kraus Cover images: drone, Kadmy - stock.adobe.com; data, Getty Images. Limited Print and Electronic Distribution Rights This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited. Permission is given to duplicate this document for personal use only, as long as it is unaltered and complete. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial use. For information on reprint and linking permissions, please visit www.rand.org/pubs/permissions. The RAND Corporation is a research organization that develops solutions to public policy challenges to help make communities throughout the world safer and more secure, healthier and more prosperous. RAND is nonprofit, nonpartisan, and committed to the public interest. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors. Support RAND Make a tax-deductible charitable contribution at www.rand.org/giving/contribute www.rand.org Preface This report explores the security implications of the rapid growth in unmanned aerial systems (UAS), focusing specifically on current and future vulnerabilities.
    [Show full text]
  • The CLASP Application Security Process
    The CLASP Application Security Process Secure Software, Inc. Copyright (c) 2005, Secure Software, Inc. The CLASP Application Security Process The CLASP Application Security Process TABLE OF CONTENTS CHAPTER 1 Introduction 1 CLASP Status 4 An Activity-Centric Approach 4 The CLASP Implementation Guide 5 The Root-Cause Database 6 Supporting Material 7 CHAPTER 2 Implementation Guide 9 The CLASP Activities 11 Institute security awareness program 11 Monitor security metrics 12 Specify operational environment 13 Identify global security policy 14 Identify resources and trust boundaries 15 Identify user roles and resource capabilities 16 Document security-relevant requirements 17 Detail misuse cases 18 Identify attack surface 19 Apply security principles to design 20 Research and assess security posture of technology solutions 21 Annotate class designs with security properties 22 Specify database security configuration 23 Perform security analysis of system requirements and design (threat modeling) 24 Integrate security analysis into source management process 25 Implement interface contracts 26 Implement and elaborate resource policies and security technologies 27 Address reported security issues 28 Perform source-level security review 29 Identify, implement and perform security tests 30 The CLASP Application Security Process i Verify security attributes of resources 31 Perform code signing 32 Build operational security guide 33 Manage security issue disclosure process 34 Developing a Process Engineering Plan 35 Business objectives 35 Process
    [Show full text]
  • Active Fuzzing for Testing and Securing Cyber-Physical Systems
    Active Fuzzing for Testing and Securing Cyber-Physical Systems Yuqi Chen Bohan Xuan Christopher M. Poskitt Singapore Management University Zhejiang University Singapore Management University Singapore China Singapore [email protected] [email protected] [email protected] Jun Sun Fan Zhang∗ Singapore Management University Zhejiang University Singapore China [email protected] [email protected] ABSTRACT KEYWORDS Cyber-physical systems (CPSs) in critical infrastructure face a per- Cyber-physical systems; fuzzing; active learning; benchmark gen- vasive threat from attackers, motivating research into a variety of eration; testing defence mechanisms countermeasures for securing them. Assessing the effectiveness of ACM Reference Format: these countermeasures is challenging, however, as realistic bench- Yuqi Chen, Bohan Xuan, Christopher M. Poskitt, Jun Sun, and Fan Zhang. marks of attacks are difficult to manually construct, blindly testing 2020. Active Fuzzing for Testing and Securing Cyber-Physical Systems. In is ineffective due to the enormous search spaces and resource re- Proceedings of the 29th ACM SIGSOFT International Symposium on Software quirements, and intelligent fuzzing approaches require impractical Testing and Analysis (ISSTA ’20), July 18–22, 2020, Virtual Event, USA. ACM, amounts of data and network access. In this work, we propose active New York, NY, USA, 13 pages. https://doi.org/10.1145/3395363.3397376 fuzzing, an automatic approach for finding test suites of packet- level CPS network attacks, targeting scenarios in which attackers 1 INTRODUCTION can observe sensors and manipulate packets, but have no existing knowledge about the payload encodings. Our approach learns re- Cyber-physical systems (CPSs), characterised by their tight and gression models for predicting sensor values that will result from complex integration of computational and physical processes, are sampled network packets, and uses these predictions to guide a often used in the automation of critical public infrastructure [78].
    [Show full text]
  • The Art, Science, and Engineering of Fuzzing: a Survey
    1 The Art, Science, and Engineering of Fuzzing: A Survey Valentin J.M. Manes,` HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo Abstract—Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective. Index Terms—software security, automated software testing, fuzzing. ✦ 1 INTRODUCTION Figure 1 on p. 5) and an increasing number of fuzzing Ever since its introduction in the early 1990s [152], fuzzing studies appear at major security conferences (e.g. [225], has remained one of the most widely-deployed techniques [52], [37], [176], [83], [239]). In addition, the blogosphere is to discover software security vulnerabilities. At a high level, filled with many success stories of fuzzing, some of which fuzzing refers to a process of repeatedly running a program also contain what we consider to be gems that warrant a with generated inputs that may be syntactically or seman- permanent place in the literature.
    [Show full text]
  • Psofuzzer: a Target-Oriented Software Vulnerability Detection Technology Based on Particle Swarm Optimization
    applied sciences Article PSOFuzzer: A Target-Oriented Software Vulnerability Detection Technology Based on Particle Swarm Optimization Chen Chen 1,2,*, Han Xu 1 and Baojiang Cui 1 1 School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China; [email protected] (H.X.); [email protected] (B.C.) 2 School of Information and Navigation, Air Force Engineering University, Xi’an 710077, China * Correspondence: [email protected] Abstract: Coverage-oriented and target-oriented fuzzing are widely used in vulnerability detection. Compared with coverage-oriented fuzzing, target-oriented fuzzing concentrates more computing resources on suspected vulnerable points to improve the testing efficiency. However, the sample generation algorithm used in target-oriented vulnerability detection technology has some problems, such as weak guidance, weak sample penetration, and difficult sample generation. This paper proposes a new target-oriented fuzzer, PSOFuzzer, that uses particle swarm optimization to generate samples. PSOFuzzer can quickly learn high-quality features in historical samples and implant them into new samples that can be led to execute the suspected vulnerable point. The experimental results show that PSOFuzzer can generate more samples in the test process to reach the target point and can trigger vulnerabilities with 79% and 423% higher probability than AFLGo and Sidewinder, respectively, on tested software programs. Keywords: fuzzing; model-based fuzzing; vulnerability detection; code coverage; open-source program; directed fuzzing; static instrumentation; source code instrumentation Citation: Chen, C.; Han, X.; Cui, B. PSOFuzzer: A Target-Oriented 1. Introduction Software Vulnerability Detection The appearance of an increasing number of software vulnerabilities has made auto- Technology Based on Particle Swarm matic vulnerability detection technology a widespread concern in industry and academia.
    [Show full text]
  • Threats and Vulnerabilities in Federation Protocols and Products
    Threats and Vulnerabilities in Federation Protocols and Products Teemu Kääriäinen, CSSLP / Nixu Corporation OWASP Helsinki Chapter Meeting #30 October 11, 2016 Contents • Federation Protocols: OpenID Connect and SAML 2.0 – Basic flows, comparison between the protocols • OAuth 2.0 and OpenID Connect Vulnerabilities and Best Practices – Background for OAuth 2.0 security criticism, vulnerabilities related discussion and publicly disclosed vulnerabilities, best practices, JWT, authorization bypass vulnerabilities, mobile application integration. • SAML 2.0 Vulnerabilities and Best Practices – Best practices, publicly disclosed vulnerabilities • OWASP Top Ten in Access management solutions – Focus on Java deserialization vulnerabilites in different commercial and open source access management products • Forgerock OpenAM, Gluu, CAS, PingFederate 7.3.0 Admin UI, Oracle ADF (Oracle Identity Manager) Federation Protocols: OpenID Connect and SAML 2.0 • OpenID Connect is an emerging technology built on OAuth 2.0 that enables relying parties to verify the identity of an end-user in an interoperable and REST-like manner. • OpenID Connect is not just about authentication. It is also about authorization, delegation and API access management. • Reasons for services to start using OpenID Connect: – Ease of integration. – Ability to integrate client applications running on different platforms: single-page app, web, backend, mobile, IoT. – Allowing 3rd party integrations in a secure, interoperable and scalable manner. • OpenID Connect is proven to be secure and mature technology: – Solves many of the security issues that have been an issue with OAuth 2.0. • OpenID Connect and OAuth 2.0 are used frequently in social login scenarios: – E.g. Google and Microsoft Account are OpenID Connect Identity Providers. Facebook is an OAuth 2.0 authorization server.
    [Show full text]
  • Configuration Fuzzing Testing Framework for Software Vulnerability Detection
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Columbia University Academic Commons CONFU: Configuration Fuzzing Testing Framework for Software Vulnerability Detection Huning Dai, Christian Murphy, Gail Kaiser Department of Computer Science Columbia University New York, NY 10027 USA ABSTRACT Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations and inputs together with a certain runtime environment. One approach to detecting these vulnerabilities is fuzz testing. However, typical fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, we present a new testing methodology called Configuration Fuzzing. Configuration Fuzzing is a technique whereby the configuration of the running application is mutated at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants'' that, if violated, indicate a vulnerability. We discuss the approach and introduce a prototype framework called ConFu (CONfiguration FUzzing testing framework) for implementation. We also present the results of case studies that demonstrate the approach's feasibility and evaluate its performance. Keywords: Vulnerability; Configuration Fuzzing; Fuzz testing; In Vivo testing; Security invariants INTRODUCTION As the Internet has grown in popularity, security testing is undoubtedly becoming a crucial part of the development process for commercial software, especially for server applications. However, it is impossible in terms of time and cost to test all configurations or to simulate all system environments before releasing the software into the field, not to mention the fact that software distributors may later add more configuration options.
    [Show full text]
  • Fuzzing with Code Fragments
    Fuzzing with Code Fragments Christian Holler Kim Herzig Andreas Zeller Mozilla Corporation∗ Saarland University Saarland University [email protected] [email protected] [email protected] Abstract JavaScript interpreter must follow the syntactic rules of JavaScript. Otherwise, the JavaScript interpreter will re- Fuzz testing is an automated technique providing random ject the input as invalid, and effectively restrict the test- data as input to a software system in the hope to expose ing to its lexical and syntactic analysis, never reaching a vulnerability. In order to be effective, the fuzzed input areas like code transformation, in-time compilation, or must be common enough to pass elementary consistency actual execution. To address this issue, fuzzing frame- checks; a JavaScript interpreter, for instance, would only works include strategies to model the structure of the de- accept a semantically valid program. On the other hand, sired input data; for fuzz testing a JavaScript interpreter, the fuzzed input must be uncommon enough to trigger this would require a built-in JavaScript grammar. exceptional behavior, such as a crash of the interpreter. Surprisingly, the number of fuzzing frameworks that The LangFuzz approach resolves this conflict by using generate test inputs on grammar basis is very limited [7, a grammar to randomly generate valid programs; the 17, 22]. For JavaScript, jsfunfuzz [17] is amongst the code fragments, however, partially stem from programs most popular fuzzing tools, having discovered more that known to have caused invalid behavior before. LangFuzz 1,000 defects in the Mozilla JavaScript engine. jsfunfuzz is an effective tool for security testing: Applied on the is effective because it is hardcoded to target a specific Mozilla JavaScript interpreter, it discovered a total of interpreter making use of specific knowledge about past 105 new severe vulnerabilities within three months of and common vulnerabilities.
    [Show full text]
  • Seed Selection for Successful Fuzzing
    Seed Selection for Successful Fuzzing Adrian Herrera Hendra Gunadi Shane Magrath ANU & DST ANU DST Australia Australia Australia Michael Norrish Mathias Payer Antony L. Hosking CSIRO’s Data61 & ANU EPFL ANU & CSIRO’s Data61 Australia Switzerland Australia ABSTRACT ACM Reference Format: Mutation-based greybox fuzzing—unquestionably the most widely- Adrian Herrera, Hendra Gunadi, Shane Magrath, Michael Norrish, Mathias Payer, and Antony L. Hosking. 2021. Seed Selection for Successful Fuzzing. used fuzzing technique—relies on a set of non-crashing seed inputs In Proceedings of the 30th ACM SIGSOFT International Symposium on Software (a corpus) to bootstrap the bug-finding process. When evaluating a Testing and Analysis (ISSTA ’21), July 11–17, 2021, Virtual, Denmark. ACM, fuzzer, common approaches for constructing this corpus include: New York, NY, USA, 14 pages. https://doi.org/10.1145/3460319.3464795 (i) using an empty file; (ii) using a single seed representative of the target’s input format; or (iii) collecting a large number of seeds (e.g., 1 INTRODUCTION by crawling the Internet). Little thought is given to how this seed Fuzzing is a dynamic analysis technique for finding bugs and vul- choice affects the fuzzing process, and there is no consensus on nerabilities in software, triggering crashes in a target program by which approach is best (or even if a best approach exists). subjecting it to a large number of (possibly malformed) inputs. To address this gap in knowledge, we systematically investigate Mutation-based fuzzing typically uses an initial set of valid seed and evaluate how seed selection affects a fuzzer’s ability to find bugs inputs from which to generate new seeds by random mutation.
    [Show full text]