Whitebox Fuzzing for Security Testing

practice DOI:10.1145/2093548.2093564 and its users millions of dollars. If a Article development led by queue.acm.org monthly security update costs you $0.001 (one tenth of one cent) in just electricity or loss of productivity, then SAGE has had a remarkable this number multiplied by one bil- impact at Microsoft. lion people is $1 million. Of course, if malware were spreading on your ma- BY PATRICE GODEFROID, MICHAEL Y. LEVIN, AND DAVID MOLNAR chine, possibly leaking some of your private data, then that might cost you much more than $0.001. This is why we strongly encourage you to apply those pesky security updates. SAGE: Many security vulnerabilities are a result of programming errors in code for parsing files and packets that are transmitted over the Internet. For ex- Whitebox ample, Microsoft Windows includes parsers for hundreds of file formats. If you are reading this article on a computer, then the picture shown in Figure 1 is displayed on your screen Fuzzing for after a jpg parser (typically part of your operating system) has read the image data, decoded it, created new data structures with the decoded data, Security and passed those to the graphics card in your computer. If the code imple- menting that jpg parser contains a bug such as a buffer overflow that can Testing be triggered by a corrupted jpg image, then the execution of this jpg parser on your computer could potentially be hijacked to execute some other code, possibly malicious and hidden in the jpg data itself. This is just one example of a pos- MOST COMMUNicatiONS READERS might think of sible security vulnerability and at- tack scenario. The security bugs dis- “program verification research” as mostly theoretical cussed throughout the rest of this with little impact on the world at large. Think again. article are mostly buffer overflows. If you are reading these lines on a PC running some Hunting for “Million-Dollar” Bugs form of Windows (like over 93% of PC users—that is, Today, hackers find security vulnera- more than one billion people), then you have been bilities in software products using two primary methods. The first is code in- affected by this line of work—without knowing it, spection of binaries (with a good dis- which is precisely the way we want it to be. assembler, binary code is like source Every second Tuesday of every month, also known code). The second is blackbox fuzzing, as “Patch Tuesday,” Microsoft releases a list of a form of blackbox random testing, security bulletins and associated security patches to which randomly mutates well-formed program inputs and then tests the be deployed on hundreds of millions of machines program with those modified inputs,3 worldwide. Each security bulletin costs Microsoft hoping to trigger a bug such as a buf- 40 COMMUNICATIONS OF THE ACM | MARCH 2012 | VOL. 55 | NO. 3 Figure 1. A sample jpg image. fer overflow. In some cases, grammars then branch of the conditional state- called whitebox fuzzing.5 It builds upon are used to generate the well-formed ment in recent advances in systematic dynamic inputs. This also allows encoding test generation4 and extends its scope application-specific knowledge and int foo(int x) { // x is an input from unit testing to whole-program test-generation heuristics. int y = x + 3; security testing. Blackbox fuzzing is a simple yet if (y == 13) abort(); // error Starting with a well-formed input, effective technique for finding se- retur n 0; whitebox fuzzing consists of symboli- curity vulnerabilities in software. } cally executing the program under test Thousands of security bugs have dynamically, gathering constraints on been found this way. At Microsoft, has only 1 in 232 chances of being ex- inputs from conditional branches en- fuzzing is mandatory for every un- ercised if the input variable x has a countered along the execution. The trusted interface of every product, as randomly chosen 32-bit value. This in- collected constraints are then sys- prescribed in the Security Develop- tuitively explains why blackbox fuzzing tematically negated and solved with a NASA 7 F ment Lifecycle, which documents usually provides low code coverage and constraint solver, whose solutions are recommendations on how to devel- can miss security bugs. mapped to new inputs that exercise OURTESY O OURTESY op secure software. different program execution paths. C H P Although blackbox fuzzing can be Introducing Whitebox Fuzzing This process is repeated using novel RA G remarkably effective, its limitations A few years ago, we started develop- search techniques that attempt to sweep PHOTO are well known. For example, the ing an alternative to blackbox fuzzing, through all (in practice, many) feasible MARCH 2012 | VOL. 55 | NO. 3 | COMMUNICATIONS OF THE ACM 41 practice execution paths of the program while straint leading to it, and attempted checking simultaneously many prop- to be solved by a constraint solver. erties using a runtime checker (such as This way, a single symbolic execution Purify, Valgrind, or AppVerifier). can generate thousands of new tests. For example, symbolic execution of (In contrast, a standard depth-first the previous program fragment with SAGE has had or breadth-first search would negate an initial value 0 for the input vari- a remarkable only the last or first constraint in each able x takes the else branch of the path constraint and generate at most conditional statement and generates impact at Microsoft. one new test per symbolic execution.) the path constraint x+3 ≠ 13. Once It combines and The program shown in Figure 2 this constraint is negated and solved, takes four bytes as input and con- it yields x = 10, providing a new input extends program tains an error when the value of that causes the program to follow the analysis, testing, the variable cnt is greater than or then branch of the conditional state- equal to four. Starting with some ment. This allows us to exercise and verification, initial input good, SAGE runs this test additional code for security bugs, program while performing a sym- even without specific knowledge of model checking, bolic execution dynamically. Since the input format. Furthermore, this and automated the program path taken during this approach automatically discovers and first run is formed by all the else tests “corner cases” where program- theorem-proving branches in the program, the path mers may fail to allocate memory or techniques constraint for this initial run is the manipulate buffers properly, leading conjunction of constraints i0 ≠ b, to security vulnerabilities. that have been i1 ≠ a, i2 ≠ d and i3 ≠ !. Each of these In theory, systematic dynamic test developed over constraints is negated one by one, generation can lead to full program placed in a conjunction with the path coverage, that is, program verifica- many years. prefix of the path constraint lead- tion. In practice, however, the search ing to it, and then solved with a con- is typically incomplete both because straint solver. In this case, all four the number of execution paths in the constraints are solvable, leading to program under test is huge, and be- four new test inputs. Figure 2 also cause symbolic execution, constraint shows the set of all feasible program generation, and constraint solving can paths for the function top. The left- be imprecise due to complex program most path represents the initial run statements (pointer manipulations of the program and is labeled 0 for and floating-point operations, among Generation 0. Four Generation 1 in- others), calls to external operating- puts are obtained by systematically system and library functions, and large negating and solving each constraint numbers of constraints that cannot in the Generation 0 path constraint. all be solved perfectly in a reasonable By repeating this process, all paths amount of time. Therefore, we are are eventually enumerated for this forced to explore practical trade-offs. example. In practice, the search is typically incomplete. SAGE SAGE was the first tool to perform Whitebox fuzzing was first imple- dynamic symbolic execution at the mented in the tool SAGE, short for x86 binary level. It is implemented on Scalable Automated Guided Execu- top of the trace replay infrastructure tion.5 Because SAGE targets large ap- TruScan,8 which consumes trace files plications where a single execution generated by the iDNA framework1 may contain hundreds of millions and virtually re-executes the recorded of instructions, symbolic execution runs. TruScan offers several features is its slowest component. Therefore, that substantially simplify symbolic SAGE implements a novel directed- execution, including instruction de- search algorithm—dubbed genera- coding, providing an interface to pro- tional search—that maximizes the gram symbol information, monitor- number of new input tests generated ing various input/output system calls, from each symbolic execution. Given keeping track of heap and stack frame a path constraint, all the constraints allocations, and tracking the flow of in that path are systematically negat- data through the program structures. ed one by one, placed in a conjunc- Thanks to offline tracing, constraint tion with the prefix of the path con- generation in SAGE is completely de- 42 COMMUNICATIONS OF THE ACM | MARCH 2012 | VOL. 55 | NO. 3 practice Figure 2. Example of program (left) and its search space (right) with the value of cnt at the end of each run. void top(char input[4]) { int cnt=0; if (input[0] == ’b’) cnt++; if (input[1] == ’a’) cnt++; if (input[2] == ’d’) cnt++; if (input[3] == ’!’) cnt++; if (cnt >= 4) abort(); // error } 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 good goo! godd god!gaod gao! gadd gad! bood boo! bodd bod! baod bao! badd bad! terministic because it works with an SAGE Architecture constraint solver (we currently use the execution trace that captures the out- The high-level architecture of SAGE Z3 SMT solver2).

Load more