Fuzzing: a Survey Jun Li, Bodong Zhao and Chao Zhang*

Li et al. Cybersecurity (2018) 1:6 Cybersecurity https://doi.org/10.1186/s42400-018-0002-y SURVEY Open Access Fuzzing: a survey Jun Li, Bodong Zhao and Chao Zhang* Abstract Security vulnerability is one of the root causes of cyber-security threats. To discover vulnerabilities and fix them in advance, researchers have proposed several techniques, among which fuzzing is the most widely used one. In recent years, fuzzing solutions, like AFL, have made great improvements in vulnerability discovery. This paper presents a summary of the recent advances, analyzes how they improve the fuzzing process, and sheds light on future work in fuzzing. Firstly, we discuss the reason why fuzzing is popular, by comparing different commonly used vulnerability discovery techniques. Then we present an overview of fuzzing solutions, and discuss in detail one of the most popular type of fuzzing, i.e., coverage-based fuzzing. Then we present other techniques that could make fuzzing process smarter and more efficient. Finally, we show some applications of fuzzing, and discuss new trends of fuzzing and potential future directions. Keywords: Vulnerability discovery, Software security, Fuzzing, Coverage-based fuzzing Introduction The concept of fuzzing was first proposed in 1990s Vulnerabilities have become the root cause of threats (Wu et al. 2010). Though the concept stays fixed dur- towards cyberspace security. Defined in RFC 2828 (Shirey ing decades of development, the way how fuzzing is 2000), a vulnerability is a flaw or weakness in a system’s performed has greatly evolved. However, years of actual design, implementation, or operation and management practice reveals that fuzzing tends to find simple mem- that could be exploited to violate the system’s secu- orycorruptionbugsintheearlystageandseemstocover rity policy. Attack on vulnerabilities, especially on zero very small part of target code. Besides, the randomness day vulnerabilities, can result in serious damages. The and blindness of fuzzing results in a low efficiency in find- WannaCry ransomware attack (Wikipedia and Wannacry ing bugs. Many solutions have been proposed to improve ransomware attack 2017) outbroke in May 2017, which the effectiveness and efficiency of fuzzing. exploits a vulnerability in Server Message Block (SMB) The combination of feedback-driven fuzzing mode and protocol, is reported to have infected more than 230,000 genetic algorithms provides a more flexible and cus- computers in over 150 countries within one day. It has tomizable fuzzing framework, and makes the fuzzing pro- caused serious crisis management problems and huge cess more intelligent and efficient. With the landmark of losses to many industries, such as finance, energy and AFL, feedback-driven fuzzing, especially coverage-guided medical treatment. fuzzing, has made great progress. Inspired by AFL, many Considering the serious damages caused by vulnera- efficient solutions or improvements are proposed recently. bilities, much effort has been devoted to vulnerability Fuzzing is much different from itself several years ago. discovery techniques towards software and information Therefore, it’s necessary to summarize recent works in systems. Techniques including static analysis, dynamic fuzzing and shed lights on future works. analysis, symbolic execution and fuzzing (Liu et al. 2012) In this paper, we try to summarize the state-of-the-art are proposed. Compared with other techniques, fuzzing fuzzing solution, and how they improve the effective- requires few knowledge of targets and could be easily ness and efficiency of vulnerability discovery. Besides, we scaled up to large applications, and thus has become the show how traditional techniques can help improving the most popular vulnerability discovery solution, especially effectiveness and efficiency of fuzzing, and make fuzzers in the industry. smarter. Then, we give an overview of how state-of- the-art fuzzers detect vulnerabilities of different targets, *Correspondence: [email protected] including file format applications, kernels, and protocols. Tsinghua University, Beijing 100084, China © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Li et al. Cybersecurity (2018) 1:6 Page 2 of 13 At last, we try to point out new trends of how fuzzing speed, low efficiency, high requirements on the technical technique develops. level of testers, poor scalability, and is difficult to carry out The rest of the paper is organized as follows: “Background” large-scale testing. section presents background knowledge on vulnerability discovery techniques, “Fuzzing” section gives a detailed Symbolic execution introduction to fuzzing, including the basic concepts Symbolic execution (King 1976) is another vulnerabil- and key challenges of fuzzing. In “Coverage-based ity discovery technique that is considered to be very fuzzing” section, we introduce the coverage-based fuzzing promising. By symbolizing the program inputs, the sym- and related state-of-the-art works. In “Techniques inte- bolic execution maintains a set of constraints for each grated in fuzzing” section we summarize that how other execution path. After the execution, constraint solvers techniques could help improve fuzzing, and “Fuzzing will be used to solve the constraint and determine what towards different applications” section presents seve- inputs cause the execution. Technically, symbolic exe- ral applications of fuzzing. In “New trends of fuzzing” cution could cover any execution path in a program section, we discuss and summarize the possible new and has shown good effect in tests of small programs, trends of fuzzing. And we conclude our paper in “Conclusion” while there exists many limitations, either. First, the section. path explosion problem. As with the scale of program grows, the execution states explodes, which exceeds the Background solving ability of constraint solvers. Selective symbolic In this section, we give a brief introduction to traditional execution is proposed as a compromise. Second, the envi- vulnerability discovery techniques, including: static anal- ronment interactions. In symbolic execution, when tar- ysis, dynamic analysis, taint analysis, symbolic execution, get program execution interacts with components out and fuzzing. Then we summarize the advantages and of the symbolic execution environments, such as sys- disadvantages of each technique. tem calls, handling signals, etc., consistency problems may arise. Previous work has proved that symbolic exe- Static analysis cution is still difficult to scale up to large applications Static analysis is the analysis of programs that is per- (Böhme et al. 2017). formed without actually executing the programs (Wichmann et al. 1995). Instead, static analysis is usually performed Fuzzing on the source code and sometimes on the object code Fuzzing (Sutton et al. 2007) is currently the most pop- as well. By analysis on the lexical, grammar, semantics ular vulnerability discovery technique. Fuzzing was first features, and data flow analysis, model checking, static proposed by Barton Miller at the University of Wisconsin analysis could detect hiding bugs. The advantage of static in 1990s. Conceptually, a fuzzing test starts with gen- analysis is the high detection speed. An analyst could erating massive normal and abnormal inputs to target quickly check the target code with a static analysis tool applications, and try to detect exceptions by feeding the and perform the operation timely. However, static analy- generated inputs to the target applications and monitor- sis endures a high false rate in practice. Due to the lack ing the execution states. Compared with other techniques, of easy to use vulnerability detection model, static anal- fuzzing is easy to deploy and of good extensibility and ysistoolsarepronetoalargenumberoffalsepositives. applicability, and could be performed with or without the Thus identifying the results of static analysis remains a source code. Besides, as the fuzzing test is performed in tough work. the real execution, it gains a high accuracy. What’s more, fuzzing requires few knowledge of target applications Dynamic analysis and could be easily scaled up to large scale applications. In contrast to static analysis, in dynamic analysis of pro- Though fuzzing is faced with many disadvantages such grams, an analyst need to execute the target program in as low efficiency and low code coverage, however, out- real systems or emulators (Wikipedia 2017). By monitor- weighed the bad ones, fuzzing has become the most effec- ing the running states and analyzing the runtime knowl- tive and efficient state-of-the-art vulnerability discovery edge, dynamic analysis tools can detect program bugs technique currently. precisely. The advantage of dynamic analysis is the high Table 1 shows the advantages and disadvantages of accuracy while there exists the following disadvantages. different techniques. First, debugging, analyzing and running of the target programs in dynamic analysis cause a heavy human involve- Fuzzing ment, and result in a low efficiency. Besides, the human In this section, we try to give a perspective on fuzzing, involvement requires strong

Load more