The Future of Grey-Box Fuzzing

The future of grey-box fuzzing Isak Hjelt Isak Hjelt VT 2017 Examensarbete, 15 hp Supervisor: Pedher Johansson Examiner: Kai-Florian Richter Bachelor’s program in Computing Science, 180 hp Abstract Society are becoming more dependent on software, and more artifacts are being connected to the Internet each day[31]. This makes the work of tracking down vulnerabilities in software a moral obligation for software developers. Since manual testing is expensive[7], automated bug finding techniques are attractive within the quality assurance field, since it can save companies a lot of money. This thesis summarizes the research of an automated bug finding technique called grey-box fuzzing, with the goal of saying something about its future. Grey-box fuzzing is a breed of fuzzing, where the basic con- cept of fuzzing is to provide random data as input to an application in order to test it for bugs. To portray the current state of grey-box fuzzing, two tools which are relevant to the current research will be presented and discussed. A definition of what grey-box fuzzing is will also be extracted from the research papers by looking at what they all have in common. The combination of fuzzing with symbolic execution or dynamic taint analysis are two of the approaches which this work has identified and discussed, but argues that dynamic taint analysis is more promising to the future. Lastly, the trend within fuzzing is predicted to go more towards the grey-box style of fuzzing, which leads to grey-box fuzzing rising in popularity. Acknowledgements I would like to thank my supervisor Pedher Johansson for helping me guide this thesis in the right direction and always answering all my questions to the best of his ability. I also want to thank my sister for proofreading this thesis and giving me tips on how to improve my writing. Last, but by no means least, thanks to my significant other for always motivating and sup- porting me during my three years of study. Contents 1 Introduction 1 1.1 Purpose of the thesis 3 2 The basics of fuzzing 4 2.1 Black-box fuzzing 4 2.2 White-box fuzzing 4 3 Grey-box fuzzing 6 3.1 Strengths and weaknesses 7 4 Grey-box fuzzers 7 4.1 AFL 7 4.2 VUzzer 8 5 Grey-box fuzzing now and in the future 8 5.1 Combining techniques 8 5.2 Dynamic taint analysis 10 6 Discussion and conclusion 11 6.1 Conclusion regarding the future 11 6.2 Conclusion validity 11 6.3 Importance of categorization 11 6.4 When should grey-box fuzzing be used? 12 References 13 1(16) 1 Introduction With a society that becomes increasingly dependent on technology, finding and eliminating bugs in software is essential. Many people depend on software, the software need to be available when they need it and in a way it is expected to work. Security bugs in software can have devastating impacts on companies and might cost them enormous amount of money [47]. To be able to stand up to the competition to other companies nowadays you will most likely be forced to rely on software whether you like it or not. The research for better software testing methods can not stop, it would be morally wrong if researchers stopped trying to come up with new smart techniques to find bugs in code. A lot of time and money are spent on testing, often 50% of a software project’s expenses goes towards manual testing [7]. Creating tests manually is expensive, error-prone and most of the time inconclusive [7]. Vulnerabilities in software are often caused by bugs in the code that escape detection of software quality assurance. Despite the efforts of trying to make software more resistant against security vulnerabilities, research from recent years suggests that the vulnerabilities in software are more common than ever [35]. This is a big problem to fight and the internet of things [31] phenomena does not make it easier or any less important. Bugs are commonly hidden in paths of the code that rarely gets executed. Error-prone software such as parsers or decoders that handles complex file formats must be able to handle many different inputs and corner cases. Bugs in software that handle files or any other complex input can have serious consequences which may lead to security exploits [10]. Sometimes companies purposely ignore their responsibility to test their applications for these security faults and go for a less demanding approach [33]. Because of the security testing problems stated, automated security testing has become more popular. One comple- menting testing technique that has emerged is called fuzzing. Fuzzing can be used to test applications where the space of possible inputs is large. The technique is used to see how well an application handles unexpected inputs and thereby reveal bugs. Fuzzing generates input in a random fashion and feeds it to the program. The input might generate an exception, make the program do something valid or even put the program in an invalid state that was never thought of, thus reveal a bug. Generating input for a program is mainly done in two ways: using grammar-based fuzzing which uses a model or format to generate input, or mutational fuzzing which starts with a valid input and changes it in a random way. Grey-box fuzzing is a subject without an excessive amount of documentation, which is partially the motivation for this study being made. It can briefly be explained as a fuzzing technique located between black- and white-box fuzzing, where black-box means only looking at the input/output of the system being tested, and white-box indicates the access to source code. Grey-box fuzzers does not require source code, but are still more refined than a black-box fuzzer since it can still glean information regarding the internal state of the system being tested, usually using dynamic or static analysis on a binary level [27]. Hackers often exploit bugs in software for their attacks, and they mainly use two methods to find them [17]. One of them is by using reverse engineering on binaries to retrieve the original source code (as close as possible). Once the source code is retrieved, it can be inspected with the intention of finding any security flaws. The other method is to use black- box fuzzing on the software to reveal bugs. This method is fruitful for a hackers, since the 2(16) process of feeding a system with random permutations of data can easily be automated and run until a bug occurs. Finding one bug in a software is easy compared to finding all of them, even an unsophisticated fuzzer might do the job, but finding one bug might be all it takes for a malicious hacker to do damage. To fight this problem, ethical security specialists need more sophisticated fuzzing methods to find and fix bugs before they can be found and exploited by hackers. 3(16) 1.1 Purpose of the thesis There exists no generally accepted method to distinguish grey-box fuzzing from the other two fuzzing techniques, white-box and black-box [20]. The purpose of this thesis is to clarify the term grey-box fuzzing and analyze the tools which employ the technique, thus give an understanding in where the grey-box fuzzing research stands today and where it is heading. With the hope of stating something about the future of grey-box fuzzing, these are the aspects that has been focused on: • Strategies the recent papers are presenting, and how they compare to other approaches. • What the current state is like within the research of the techniques that grey-box fuzzers utilize. • Things that are considered to be obstacles within the field of research. 4(16) 2 The basics of fuzzing The idea of fuzzing was first described by Miller in 1989 [21]. The basic idea of fuzzing is to use random strings as input to a monitored software with the intention to uncover bugs. Fuzzing is an automated or semi-automated technique where the input can be based on knowledge of the program internals, totally random, or based on some kind of initial seed. It is typically used to test applications that take structured files as input but might also be used for other things such as network protocols. The term is another word for interface robustness testing [14], where the interface is the attack surface and usually the thing available to the users. Fuzzing is a type of security testing but should not be confused with penetration testing. This technique is used to test how well a system handles unexpected input, usually with the intention of finding memory related errors like buffer overflows, heap overflows, stack overflows and the like, fuzzing might trigger any kind of bug, but deciding when a bug has occurred is a job for the test oracle [2]. Determining when a bug has occurred is a hard problem that requires an oracle to know what behavior to expect given a certain input. If a fuzzer manages to trigger a bug, but nothing registers its occurrence, the work of triggering the bug is insignificant. Research in the fuzzing technique has enhanced the knowledge of random input generation, but fuzzers still inhabits a bottleneck from the test oracle problem [2]. This topic is of importance for automated testing techniques, and improving it could have a big impact on the future. 2.1 Black-box fuzzing Black-box fuzzing is what we usually call traditional fuzzing.

Load more