Change-Aware Dynamic Program Analysis for Javascript

Total Page:16

File Type:pdf, Size:1020Kb

Change-Aware Dynamic Program Analysis for Javascript Change-aware Dynamic Program Analysis for JavaScript Dileep Ramachandrarao Krishna Murthy Michael Pradel Department of Computer Science Department of Computer Science TU Darmstadt TU Darmstadt Darmstadt, Germany Darmstadt, Germany [email protected] [email protected] Abstract—Dynamic analysis is a powerful technique to detect promising usage scenario is that an analysis produces results correctness, performance, and security problems, in particular quickly. for programs written in dynamic languages, such as JavaScript. To catch mistakes as early as possible, developers should run Unfortunately, most dynamic analyses implemented on top such analyses regularly, e.g., by analyzing the execution of a of general-purpose analysis frameworks, such as Valgrind and regression test suite before each commit. Unfortunately, the high Jalangi, are not only powerful but also heavyweight. Typically, overhead of these analyses make this approach prohibitively such an analysis imposes a significant runtime overhead, expensive, hindering developers from benefiting from the power ranging between factors of tens and hundreds. For projects of heavyweight dynamic analysis. This paper presents change- aware dynamic program analysis, an approach to make a with large regression test suites, this overhead practically common class of dynamic analyses change-aware. The key idea prohibits running the analysis after every change. Such large is to identify parts of the code affected by a change through regression test suites are particularly common for dynamic a lightweight static change impact analysis, and to focus the languages, where static checks are sparse and likely to miss dynamic analysis on these affected parts. We implement the idea problems. For example, applying two dynamic analyses for based on the dynamic analysis framework Jalangi and evaluate it with 46 checkers from the DLint and JITProf tools. Our results JavaScript, DLint [6] and JITProf [4], to the regression test 1 show that change-aware dynamic analysis reduces the overall suite of the widely used underscore.js library takes over five analysis time by 40%, on average, and by at least 80% for 31% minutes on a standard PC, whereas running the test suite of all commits. without any analysis takes only about five seconds. Index Terms—program analysis, JavaScript As a motivating example, consider Figure 1, which shows a change in a piece of JavaScript code. Suppose to use a dynamic I. INTRODUCTION analysis that warns about occurrences of NaN (not a number), Dynamic analyses help developers identify programming which may result in JavaScript when arithmetic operations are mistakes by analyzing the runtime behavior of a program. applied to non-numbers, but it does not trigger any runtime For example, the Valgrind analysis framework [1] is the basis exception. Furthermore, suppose that the project’s regression for various tools, e.g., to identify references of undefined test suite has three tests that call the functions a, b, and e, values via a memory analysis and unintended flows of data respectively, as indicated in the last three lines of the example. via a taint analysis. More recently, the Jalangi framework [2] In the original code, the analysis reports a NaN warning at has popularized dynamic analysis for dynamic languages, in line 22. In the modified code, the analysis warns again about particular, JavaScript, and is the basis for several tools that line 22 and also about the newly introduced NaN at line 12. find, e.g., inconsistent types [3], optimization opportunities [4], Naively applying the dynamic analysis after the change will memory leaks [5], and various other code quality issues [6]. analyze the executions of all code in Figure 1. However, most Given the power of these analyses, it would be desirable of the code is not impacted by the change, and there is no to use them frequently during the development process. For need to re-analyze the non-impacted code. example, one could apply such analyses on each commit, Existing approaches that reduce the overhead of dynamic e.g., by running a regression test suite and analyzing its analysis use sampling. One approach is to sample the execu- execution. By running each analysis after each commit, the tion in an analysis-specific way, e.g., based on how often a analyses could warn developers about mistakes right when the code location has been executed [4]. While such sampling mistakes are introduced, similar to static checks performed by can significantly reduce the overhead, it still remains too a compiler or lint-like tools. An important prerequisite for this high for practical regression analysis. Another approach is to distribute the instrumentation overhead across multiple This work was supported by the German Federal Ministry of Education and Research and by the Hessian Ministry of Science and the Arts within users [7]. However, this approach is not suitable for regression CRISP, by the German Research Foundation within the ConcSys and Perf4JS projects, and by the Hessian LOEWE initiative within the Software-Factory 4.0 project. 1http://underscorejs.org Old version: New version: produces a function dependence graph. Starting from functions 1 var x=4, y, z; 1 var x=4, y, z; that are modified in a particular change, the approach then 2 2 propagates the change to all possibly impacted functions. Key 3 function a() { 3 function a() { to the success of our approach is that the change impact 4 x = 42; 4 x = "aaa"; // changed 5 } 5 } analysis is efficient. To this end, we use a scalable yet not 6 function b() { 6 function b() { provably sound analysis. We find that, despite the theoretical 7 c(); 7 c(); unsoundness, the analysis is sufficient in practice, in the spirit 8 d(); 8 d(); 9 } 9 } of soundiness [9]. 10 function c() { 10 function c() { For the example in Figure 1, the change impact analysis 11 11 // new NaN warning determines that function a has been changed and that, as 12 y = x / 2; 12 y = x / 2; 13 } 13 } a result of this change, also function c is impacted by the 14 function d() { 14 function d() { change. By only applying the dynamic analysis to these 15 / expensive 15 / expensive * * functions, but not to any others, the approach reduces the 16 * computation 16 * computation 17 * independent 17 * independent number of analyzed functions from five to two. As a result, the 18 * of x, y, z */ 18 * of x, y, z */ change-aware analysis reports the newly introduced warning at 19 } 19 } line 12. To obtain the set of all warnings in the program, one 20 function e() { 20 function e() { 21 // NaN warning 21 // old NaN warning can combine the warnings found by the change-aware analysis 22 var n = "hi" - 23; 22 var n = "hi" - 23; for the current version with warnings found for previous 23 return n; 23 return n; versions of the program. 24 } 24 } 25 25 To evaluate the CADA approach, we apply it to the version 26 a(); 26 a(); histories of seven popular JavaScript projects. Without our 27 b(); 27 b(); 28 z = e(); 28 z = e(); approach, analyzing the changed versions of the projects takes 133 seconds, on average over all commits. CADA reduces the Fig. 1: Code change in a program to be analyzed dynamically. analysis time by 40%, on average, and by at least 80% for 31% of all commits. Comparing CADA to naively running non- analysis, where results are expected in short time and without change-aware analyses on every version, we find that CADA relying on users. does not miss any warnings reported by the class of dynamic analyses targeted in this paper. As a technique complementary to reducing the overhead of dynamic analysis, regression test selection [8] can reduce the In summary, this paper contributes the following: number of tests to execute, which will also reduce the overall analysis time. Regression test selection partly addresses the • An approach for turning a common class of dynamic problem but has the drawback that even after reducing the analyses into change-aware dynamic analyses without number of tests, the analysis may still consider code that is modifying the analyses themselves. unaffected by a change. For example, in Figure 1, selecting the • A lightweight, static change impact analysis for Java- test that calls function b will nevertheless cause the dynamic Script. We believe this analysis to be applicable beyond analysis of function d, which is called by b but not affected by the current work, e.g., for helping developers to under- the change. Furthermore, as a practical limitation, we are not stand the impact of a potential change or for test case aware of any generic regression test selection tool available prioritization. for JavaScript, which is the target language of this paper. • Empirical evidence that the approach significantly re- duces the time required to apply heavyweight dy- This paper addresses the problem of reducing the time namic analysis. As a result, such analyses become more required to run a dynamic analysis that has already been amenable to regular use, e.g., at every commit. applied to a previous version of the analyzed code. We present change-aware dynamic analysis, or short CADA, an approach II. PROBLEM STATEMENT that turns a common class of dynamic analyses into change- aware analyses. The key idea is to reduce the number of code The following section describes the problem that this paper locations to dynamically analyze by focusing on functions that tackles. We consider dynamic analyses that reason about the are impacted by a code change. When entering a function at behavior of a program to identify potential programming runtime, the dynamic analysis checks whether this function is mistakes and that report warnings to the developer. Given a impacted, and only if it is impacted, analyzes the function.
Recommended publications
  • Executable Code Is Not the Proper Subject of Copyright Law a Retrospective Criticism of Technical and Legal Naivete in the Apple V
    Executable Code is Not the Proper Subject of Copyright Law A retrospective criticism of technical and legal naivete in the Apple V. Franklin case Matthew M. Swann, Clark S. Turner, Ph.D., Department of Computer Science Cal Poly State University November 18, 2004 Abstract: Copyright was created by government for a purpose. Its purpose was to be an incentive to produce and disseminate new and useful knowledge to society. Source code is written to express its underlying ideas and is clearly included as a copyrightable artifact. However, since Apple v. Franklin, copyright has been extended to protect an opaque software executable that does not express its underlying ideas. Common commercial practice involves keeping the source code secret, hiding any innovative ideas expressed there, while copyrighting the executable, where the underlying ideas are not exposed. By examining copyright’s historical heritage we can determine whether software copyright for an opaque artifact upholds the bargain between authors and society as intended by our Founding Fathers. This paper first describes the origins of copyright, the nature of software, and the unique problems involved. It then determines whether current copyright protection for the opaque executable realizes the economic model underpinning copyright law. Having found the current legal interpretation insufficient to protect software without compromising its principles, we suggest new legislation which would respect the philosophy on which copyright in this nation was founded. Table of Contents INTRODUCTION................................................................................................. 1 THE ORIGIN OF COPYRIGHT ........................................................................... 1 The Idea is Born 1 A New Beginning 2 The Social Bargain 3 Copyright and the Constitution 4 THE BASICS OF SOFTWARE ..........................................................................
    [Show full text]
  • Studying the Real World Today's Topics
    Studying the real world Today's topics Free and open source software (FOSS) What is it, who uses it, history Making the most of other people's software Learning from, using, and contributing Learning about your own system Using tools to understand software without source Free and open source software Access to source code Free = freedom to use, modify, copy Some potential benefits Can build for different platforms and needs Development driven by community Different perspectives and ideas More people looking at the code for bugs/security issues Structure Volunteers, sponsored by companies Generally anyone can propose ideas and submit code Different structures in charge of what features/code gets in Free and open source software Tons of FOSS out there Nearly everything on myth Desktop applications (Firefox, Chromium, LibreOffice) Programming tools (compilers, libraries, IDEs) Servers (Apache web server, MySQL) Many companies contribute to FOSS Android core Apple Darwin Microsoft .NET A brief history of FOSS 1960s: Software distributed with hardware Source included, users could fix bugs 1970s: Start of software licensing 1974: Software is copyrightable 1975: First license for UNIX sold 1980s: Popularity of closed-source software Software valued independent of hardware Richard Stallman Started the free software movement (1983) The GNU project GNU = GNU's Not Unix An operating system with unix-like interface GNU General Public License Free software: users have access to source, can modify and redistribute Must share modifications under same
    [Show full text]
  • Chapter 1 Introduction to Computers, Programs, and Java
    Chapter 1 Introduction to Computers, Programs, and Java 1.1 Introduction • The central theme of this book is to learn how to solve problems by writing a program . • This book teaches you how to create programs by using the Java programming languages . • Java is the Internet program language • Why Java? The answer is that Java enables user to deploy applications on the Internet for servers , desktop computers , and small hand-held devices . 1.2 What is a Computer? • A computer is an electronic device that stores and processes data. • A computer includes both hardware and software. o Hardware is the physical aspect of the computer that can be seen. o Software is the invisible instructions that control the hardware and make it work. • Computer programming consists of writing instructions for computers to perform. • A computer consists of the following hardware components o CPU (Central Processing Unit) o Memory (Main memory) o Storage Devices (hard disk, floppy disk, CDs) o Input/Output devices (monitor, printer, keyboard, mouse) o Communication devices (Modem, NIC (Network Interface Card)). Bus Storage Communication Input Output Memory CPU Devices Devices Devices Devices e.g., Disk, CD, e.g., Modem, e.g., Keyboard, e.g., Monitor, and Tape and NIC Mouse Printer FIGURE 1.1 A computer consists of a CPU, memory, Hard disk, floppy disk, monitor, printer, and communication devices. CMPS161 Class Notes (Chap 01) Page 1 / 15 Kuo-pao Yang 1.2.1 Central Processing Unit (CPU) • The central processing unit (CPU) is the brain of a computer. • It retrieves instructions from memory and executes them.
    [Show full text]
  • Some Preliminary Implications of WTO Source Code Proposala INTRODUCTION
    Some preliminary implications of WTO source code proposala INTRODUCTION ............................................................................................................................................... 1 HOW THIS IS TRIMS+ ....................................................................................................................................... 3 HOW THIS IS TRIPS+ ......................................................................................................................................... 3 WHY GOVERNMENTS MAY REQUIRE TRANSFER OF SOURCE CODE .................................................................. 4 TECHNOLOGY TRANSFER ........................................................................................................................................... 4 AS A REMEDY FOR ANTICOMPETITIVE CONDUCT ............................................................................................................. 4 TAX LAW ............................................................................................................................................................... 5 IN GOVERNMENT PROCUREMENT ................................................................................................................................ 5 WHY GOVERNMENTS MAY REQUIRE ACCESS TO SOURCE CODE ...................................................................... 5 COMPETITION LAW .................................................................................................................................................
    [Show full text]
  • Advanced Systems Security: Symbolic Execution
    Systems and Internet Infrastructure Security Network and Security Research Center Department of Computer Science and Engineering Pennsylvania State University, University Park PA Advanced Systems Security:! Symbolic Execution Trent Jaeger Systems and Internet Infrastructure Security (SIIS) Lab Computer Science and Engineering Department Pennsylvania State University Systems and Internet Infrastructure Security (SIIS) Laboratory Page 1 Problem • Programs have flaws ‣ Can we find (and fix) them before adversaries can reach and exploit them? • Then, they become “vulnerabilities” Systems and Internet Infrastructure Security (SIIS) Laboratory Page 2 Vulnerabilities • A program vulnerability consists of three elements: ‣ A flaw ‣ Accessible to an adversary ‣ Adversary has the capability to exploit the flaw • Which one should we focus on to find and fix vulnerabilities? ‣ Different methods are used for each Systems and Internet Infrastructure Security (SIIS) Laboratory Page 3 Is It a Flaw? • Problem: Are these flaws? ‣ Failing to check the bounds on a buffer write? ‣ printf(char *n); ‣ strcpy(char *ptr, char *user_data); ‣ gets(char *ptr) • From man page: “Never use gets().” ‣ sprintf(char *s, const char *template, ...) • From gnu.org: “Warning: The sprintf function can be dangerous because it can potentially output more characters than can fit in the allocation size of the string s.” • Should you fix all of these? Systems and Internet Infrastructure Security (SIIS) Laboratory Page 4 Is It a Flaw? • Problem: Are these flaws? ‣ open( filename, O_CREAT );
    [Show full text]
  • Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK
    1 From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK Abstract—This paper1 describes some of the challenges and research questions that target the most productive intersection opportunities when deploying static and dynamic analysis at we have yet witnessed: that between exciting, intellectually scale, drawing on the authors’ experience with the Infer and challenging science, and real-world deployment impact. Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, Many industrialists have perhaps tended to regard it unlikely impacting billions of people worldwide. that much academic work will prove relevant to their most The paper identifies open problems that have yet to receive pressing industrial concerns. On the other hand, it is not significant attention from the scientific community, yet which uncommon for academic and scientific researchers to believe have potential for profound real world impact, formulating these that most of the problems faced by industrialists are either as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. boring, tedious or scientifically uninteresting. This sociological phenomenon has led to a great deal of miscommunication between the academic and industrial sectors. I. INTRODUCTION We hope that we can make a small contribution by focusing on the intersection of challenging and interesting scientific How do we transition research on static and dynamic problems with pressing industrial deployment needs. Our aim analysis techniques from the testing and verification research is to move the debate beyond relatively unhelpful observations communities to industrial practice? Many have asked this we have typically encountered in, for example, conference question, and others related to it.
    [Show full text]
  • The Art, Science, and Engineering of Fuzzing: a Survey
    1 The Art, Science, and Engineering of Fuzzing: A Survey Valentin J.M. Manes,` HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo Abstract—Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective. Index Terms—software security, automated software testing, fuzzing. ✦ 1 INTRODUCTION Figure 1 on p. 5) and an increasing number of fuzzing Ever since its introduction in the early 1990s [152], fuzzing studies appear at major security conferences (e.g. [225], has remained one of the most widely-deployed techniques [52], [37], [176], [83], [239]). In addition, the blogosphere is to discover software security vulnerabilities. At a high level, filled with many success stories of fuzzing, some of which fuzzing refers to a process of repeatedly running a program also contain what we consider to be gems that warrant a with generated inputs that may be syntactically or seman- permanent place in the literature.
    [Show full text]
  • Language Translators
    Student Notes Theory LANGUAGE TRANSLATORS A. HIGH AND LOW LEVEL LANGUAGES Programming languages Low – Level Languages High-Level Languages Example: Assembly Language Example: Pascal, Basic, Java Characteristics of LOW Level Languages: They are machine oriented : an assembly language program written for one machine will not work on any other type of machine unless they happen to use the same processor chip. Each assembly language statement generally translates into one machine code instruction, therefore the program becomes long and time-consuming to create. Example: 10100101 01110001 LDA &71 01101001 00000001 ADD #&01 10000101 01110001 STA &71 Characteristics of HIGH Level Languages: They are not machine oriented: in theory they are portable , meaning that a program written for one machine will run on any other machine for which the appropriate compiler or interpreter is available. They are problem oriented: most high level languages have structures and facilities appropriate to a particular use or type of problem. For example, FORTRAN was developed for use in solving mathematical problems. Some languages, such as PASCAL were developed as general-purpose languages. Statements in high-level languages usually resemble English sentences or mathematical expressions and these languages tend to be easier to learn and understand than assembly language. Each statement in a high level language will be translated into several machine code instructions. Example: number:= number + 1; 10100101 01110001 01101001 00000001 10000101 01110001 B. GENERATIONS OF PROGRAMMING LANGUAGES 4th generation 4GLs 3rd generation High Level Languages 2nd generation Low-level Languages 1st generation Machine Code Page 1 of 5 K Aquilina Student Notes Theory 1. MACHINE LANGUAGE – 1ST GENERATION In the early days of computer programming all programs had to be written in machine code.
    [Show full text]
  • INTRODUCTION to .NET FRAMEWORK NET Framework .NET Framework Is a Complete Environment That Allows Developers to Develop, Run, An
    INTRODUCTION TO .NET FRAMEWORK NET Framework .NET Framework is a complete environment that allows developers to develop, run, and deploy the following applications: Console applications Windows Forms applications Windows Presentation Foundation (WPF) applications Web applications (ASP.NET applications) Web services Windows services Service-oriented applications using Windows Communication Foundation (WCF) Workflow-enabled applications using Windows Workflow Foundation (WF) .NET Framework also enables a developer to create sharable components to be used in distributed computing architecture. NET Framework supports the object-oriented programming model for multiple languages, such as Visual Basic, Visual C#, and Visual C++. NET Framework supports multiple programming languages in a manner that allows language interoperability. This implies that each language can use the code written in some other language. The main components of .NET Framework? The following are the key components of .NET Framework: .NET Framework Class Library Common Language Runtime Dynamic Language Runtimes (DLR) Application Domains Runtime Host Common Type System Metadata and Self-Describing Components Cross-Language Interoperability .NET Framework Security Profiling Side-by-Side Execution Microsoft Intermediate Language (MSIL) The .NET Framework is shipped with compilers of all .NET programming languages to develop programs. Each .NET compiler produces an intermediate code after compiling the source code. 1 The intermediate code is common for all languages and is understandable only to .NET environment. This intermediate code is known as MSIL. IL Intermediate Language is also known as MSIL (Microsoft Intermediate Language) or CIL (Common Intermediate Language). All .NET source code is compiled to IL. IL is then converted to machine code at the point where the software is installed, or at run-time by a Just-In-Time (JIT) compiler.
    [Show full text]
  • 18Mca42c .Net Programming (C#)
    18MCA42C .NET PROGRAMMING (C#) Introduction to .NET Framework .NET is a software framework which is designed and developed by Microsoft. The first version of .Net framework was 1.0 which came in the year 2002. It is a virtual machine for compiling and executing programs written in different languages like C#, VB.Net etc. It is used to develop Form-based applications, Web-based applications, and Web services. There is a variety of programming languages available on the .Net platform like VB.Net and C# etc.,. It is used to build applications for Windows, phone, web etc. It provides a lot of functionalities and also supports industry standards. .NET Framework supports more than 60 programming languages in which 11 programming languages are designed and developed by Microsoft. 11 Programming Languages which are designed and developed by Microsoft are: C#.NET VB.NET C++.NET J#.NET F#.NET JSCRIPT.NET WINDOWS POWERSHELL IRON RUBY IRON PYTHON C OMEGA ASML(Abstract State Machine Language) Main Components of .NET Framework 1.Common Language Runtime(CLR): CLR is the basic and Virtual Machine component of the .NET Framework. It is the run-time environment in the .NET Framework that runs the codes and helps in making the development process easier by providing the various services such as remoting, thread management, type-safety, memory management, robustness etc.. Basically, it is responsible for managing the execution of .NET programs regardless of any .NET programming language. It also helps in the management of code, as code that targets the runtime is known as the Managed Code and code doesn’t target to runtime is known as Unmanaged code.
    [Show full text]
  • Namespaces, Source Code Analysis, and Byte Code Compilation User! 2004
    Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Introduction Namespaces, Source Code Analysis, • R is a powerful, high level language. and Byte Code Compilation • As R is used for larger programs there is a need for tools to Luke Tierney – help make code more reliable and robust Department of Statistics & Actuarial Science – help improve performance University of Iowa • This talk outlines three approaches: – name space management – code analysis tools – byte code compilation 1 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Why Name Spaces Static Binding of Globals Two issues: • R functions usually use other functions and variables: • static binding of globals f <- function(z) 1/sqrt(2 * pi)* exp(- z^2 / 2) • hiding internal functions • Intent: exp, sqrt, pi from base. Common solution: name space management tools. • Dynamic global environment: definitions in base can be masked. 2 3 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Hiding Internal Functions Name Spaces for Packages Starting with 1.7.0 a package can have Some useful programming guidelines: a name space: • Build more complex functions from simpler ones. • Only explicitly exported variables are • Create and (re)use functional building blocks. visible when attached or imported. • A function too large to fit in an editor window may be too • Variables needed from other packages complex. can be imported. Problem: All package variables are globally visible • Imported packages are loaded; may • Lots of little functions means clutter for user.
    [Show full text]
  • Static Program Analysis As a Fuzzing Aid
    Static Program Analysis as a Fuzzing Aid Bhargava Shastry1( ), Markus Leutner1, Tobias Fiebig1, Kashyap Thimmaraju1, Fabian Yamaguchi2, Konrad Rieck2, Stefan Schmid3, Jean-Pierre Seifert1, and Anja Feldmann1 1 TU Berlin, Berlin, Germany [email protected] 2 TU Braunschweig, Braunschweig, Germany 3 Aalborg University, Aalborg, Denmark Abstract. Fuzz testing is an effective and scalable technique to per- form software security assessments. Yet, contemporary fuzzers fall short of thoroughly testing applications with a high degree of control-flow di- versity, such as firewalls and network packet analyzers. In this paper, we demonstrate how static program analysis can guide fuzzing by aug- menting existing program models maintained by the fuzzer. Based on the insight that code patterns reflect the data format of inputs pro- cessed by a program, we automatically construct an input dictionary by statically analyzing program control and data flow. Our analysis is performed before fuzzing commences, and the input dictionary is sup- plied to an off-the-shelf fuzzer to influence input generation. Evaluations show that our technique not only increases test coverage by 10{15% over baseline fuzzers such as afl but also reduces the time required to expose vulnerabilities by up to an order of magnitude. As a case study, we have evaluated our approach on two classes of network applications: nDPI, a deep packet inspection library, and tcpdump, a network packet analyzer. Using our approach, we have uncovered 15 zero-day vulnerabilities in the evaluated software that were not found by stand-alone fuzzers. Our work not only provides a practical method to conduct security evalua- tions more effectively but also demonstrates that the synergy between program analysis and testing can be exploited for a better outcome.
    [Show full text]