<<

practice

doi:10.1145/2093548.2093564 and its users millions of dollars. If a Article development led by queue.acm.org monthly security update costs you $0.001 (one tenth of one cent) in just electricity or loss of productivity, then SAGE has had a remarkable this number multiplied by one bil- impact at . lion people is $1 million. Of course, if were spreading on your ma- by Patrice Godefroid, Michael Y. Levin, and David Molnar chine, possibly leaking some of your private data, then that might cost you much more than $0.001. This is why we strongly encourage you to apply those pesky security updates. SAGE: Many security vulnerabilities are a result of programming errors in code for files and packets that are transmitted over the Internet. For ex- Whitebox ample, includes parsers for hundreds of file formats. If you are reading this article on a computer, then the picture shown in Figure 1 is displayed on your screen for after a jpg parser (typically part of your ) has read the image data, decoded it, created new data structures with the decoded data, Security and passed those to the graphics card in your computer. If the code imple- menting that jpg parser contains a bug such as a that can Testing be triggered by a corrupted jpg image, then the execution of this jpg parser on your computer could potentially be hijacked to execute some other code, possibly malicious and hidden in the jpg data itself. This is just one example of a pos- Most Communications readers might think of sible security vulnerability and at- tack scenario. The security bugs dis- “program verification research” as mostly theoretical cussed throughout the rest of this with little impact on the world at large. Think again. article are mostly buffer overflows.

If you are reading these lines on a PC running some Hunting for “Million-Dollar” Bugs form of Windows (like over 93% of PC users—that is, Today, hackers find security vulnera- more than one billion people), then you have been bilities in software products using two primary methods. The first is code in- affected by this line of work—without knowing it, spection of binaries (with a good dis- which is precisely the way we want it to be. assembler, binary code is like source Every second Tuesday of every month, also known code). The second is blackbox fuzzing, as “ Tuesday,” Microsoft releases a list of a form of blackbox random testing, security bulletins and associated security patches to which randomly mutates well-formed program inputs and then tests the be deployed on hundreds of millions of machines program with those modified inputs,3 worldwide. Each security bulletin costs Microsoft hoping to trigger a bug such as a buf-

40 communications of the acm | march 2012 | vol. 55 | no. 3 Figure 1. A sample jpg image.

fer overflow. In some cases, grammars then branch of the conditional state- called whitebox fuzzing.5 It builds upon are used to generate the well-formed ment in recent advances in systematic dynamic inputs. This also allows encoding test generation4 and extends its scope application-specific knowledge and int foo(int x) { // x is an input from to whole-program test-generation heuristics. int y = x + 3; . Blackbox fuzzing is a simple yet if (y == 13) abort(); // error Starting with a well-formed input, effective technique for finding se- retur n 0; whitebox fuzzing consists of symboli- curity vulnerabilities in software. } cally executing the program under test Thousands of security bugs have dynamically, gathering constraints on been found this way. At Microsoft, has only 1 in 232 chances of being ex- inputs from conditional branches en- fuzzing is mandatory for every un- ercised if the input variable x has a countered along the execution. The trusted interface of every product, as randomly chosen 32-bit value. This in- collected constraints are then sys- prescribed in the Security Develop- tuitively explains why blackbox fuzzing tematically negated and solved with a ment Lifecycle,7 which documents usually provides low and constraint solver, whose solutions are recommendations on how to devel- can miss security bugs. mapped to new inputs that exercise op secure software. different program execution paths. Although blackbox fuzzing can be Introducing Whitebox Fuzzing This process is repeated using novel remarkably effective, its limitations A few years ago, we started develop- search techniques that attempt to sweep

ourtesy o f NASA g ra p h ourtesy Photo are well known. For example, the ing an alternative to blackbox fuzzing, through all (in practice, many) feasible

march 2012 | vol. 55 | no. 3 | communications of the acm 41 practice

execution paths of the program while straint leading to it, and attempted checking simultaneously many prop- to be solved by a constraint solver. erties using a runtime checker (such as This way, a single Purify, Valgrind, or AppVerifier). can generate thousands of new tests. For example, symbolic execution of (In contrast, a standard depth-first the previous program fragment with SAGE has had or breadth-first search would negate an initial value 0 for the input vari- a remarkable only the last or first constraint in each able x takes the else branch of the path constraint and generate at most conditional statement and generates impact at Microsoft. one new test per symbolic execution.) the path constraint x+3 ≠ 13. Once It combines and The program shown in Figure 2 this constraint is negated and solved, takes four bytes as input and con- it yields x = 10, providing a new input extends program tains an error when the value of that causes the program to follow the analysis, testing, the variable cnt is greater than or then branch of the conditional state- equal to four. Starting with some ment. This allows us to exercise and verification, initial input good, SAGE runs this test additional code for security bugs, program while performing a sym- even without specific knowledge of model checking, bolic execution dynamically. Since the input format. Furthermore, this and automated the program path taken during this approach automatically discovers and first run is formed by all the else tests “corner cases” where program- theorem-proving branches in the program, the path mers may fail to allocate memory or techniques constraint for this initial run is the manipulate buffers properly, leading conjunction of constraints i0 ≠ b,

to security vulnerabilities. that have been i1 ≠ a, i2 ≠ d and i3 ≠ !. Each of these In theory, systematic dynamic test developed over constraints is negated one by one, generation can lead to full program placed in a conjunction with the path coverage, that is, program verifica- many years. prefix of the path constraint lead- tion. In practice, however, the search ing to it, and then solved with a con- is typically incomplete both because straint solver. In this case, all four the number of execution paths in the constraints are solvable, leading to program under test is huge, and be- four new test inputs. Figure 2 also cause symbolic execution, constraint shows the set of all feasible program generation, and constraint solving can paths for the function top. The left- be imprecise due to complex program most path represents the initial run statements (pointer manipulations of the program and is labeled 0 for and floating-point operations, among Generation 0. Four Generation 1 in- others), calls to external operating- puts are obtained by systematically system and library functions, and large negating and solving each constraint numbers of constraints that cannot in the Generation 0 path constraint. all be solved perfectly in a reasonable By repeating this process, all paths amount of time. Therefore, we are are eventually enumerated for this forced to explore practical trade-offs. example. In practice, the search is typically incomplete. SAGE SAGE was the first tool to perform Whitebox fuzzing was first imple- dynamic symbolic execution at the mented in the tool SAGE, short for x86 binary level. It is implemented on Scalable Automated Guided Execu- top of the trace replay infrastructure tion.5 Because SAGE targets large ap- TruScan,8 which consumes trace files plications where a single execution generated by the iDNA framework1 may contain hundreds of millions and virtually re-executes the recorded of instructions, symbolic execution runs. TruScan offers several features is its slowest component. Therefore, that substantially simplify symbolic SAGE implements a novel directed- execution, including instruction de- search algorithm—dubbed genera- coding, providing an interface to pro- tional search—that maximizes the gram symbol information, monitor- number of new input tests generated ing various input/output system calls, from each symbolic execution. Given keeping track of heap and stack frame a path constraint, all the constraints allocations, and tracking the flow of in that path are systematically negat- data through the program structures. ed one by one, placed in a conjunc- Thanks to offline tracing, constraint tion with the prefix of the path con- generation in SAGE is completely de-

42 communications of the acm | march 2012 | vol. 55 | no. 3 practice

Figure 2. Example of program (left) and its search space (right) with the value of cnt at the end of each run.

void top(char input[4]) { int cnt=0; if (input[0] == ’b’) cnt++; if (input[1] == ’a’) cnt++; if (input[2] == ’d’) cnt++; if (input[3] == ’!’) cnt++; if (cnt >= 4) abort(); // error }

0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 good goo! godd god!gaod gao! gadd gad! bood boo! bodd bod! baod bao! badd bad!

terministic because it works with an SAGE Architecture constraint solver (we currently use the execution trace that captures the out- The high-level architecture of SAGE Z3 SMT solver2). All satisfiable con- come of all nondeterministic events is depicted in Figure 3. Given one (or straints are mapped to N new inputs, encountered during the recorded run. more) initial input Input0, SAGE that are tested and ranked according Working at the x86 binary level allows starts by running the program under to incremental instruction coverage. SAGE to be used on any program re- test with AppVerifier to see if this initial For example, if executing the program gardless of its source language or build input triggers a bug. If not, SAGE then with new Input1 discovers 100 new process. It also ensures that “what you collects the list of unique program in- instructions, then Input1 gets a score fuzz is what you ship,” as compilers can structions executed during this run. of 100, and so on. Next, the new input perform source-code changes that may Next, SAGE symbolically executes the with the highest score is selected to affect security. program with that input and generates go through the (expensive) symbolic SAGE uses several optimizations a path constraint, characterizing the execution task, and the cycle is repeat- that are crucial for dealing with huge current program execution with a con- ed, possibly forever. Note that all the execution traces. For example, a sin- junction of input constraints. SAGE tasks can be executed in parallel gle symbolic execution of Excel with Then, implementing a generation- on a multicore machine or even on a 45,000 input bytes executes nearly one al search, all the constraints in that set of machines. billion x86 instructions. To scale to path constraint are negated one by Building a system such as SAGE such execution traces, SAGE uses sev- one, placed in a conjunction with the poses many other challenges: how to eral techniques to improve the speed prefix of the path constraint leading recover from imprecision in symbolic and memory usage of constraint gen- to it, and attempted to be solved by a execution, how to check many proper- eration: symbolic-expression caching ensures that structurally equivalent Figure 3. Architecture of SAGE. symbolic terms are mapped to the same physical object; unrelated con- straint elimination reduces the size of Converge Input0 Constraints constraint solver queries by remov- Data ing the constraints that do not share symbolic variables with the negated constraint; local constraint caching skips a constraint if it has already Check for Generate Solve Code Coverage been added to the path constraint; Crashes Constraints Constraints (Nirvana) flip count limit establishes the maxi- (AppVerifier) (TruScan) (Z3) mum number of times a constraint generated from a particular program branch can be flipped; using a cheap Input1 Input2 syntactic check, constraint subsump- … tion eliminates constraints logically InputN implied by other constraints injected at the same program branch (mostly likely resulting from successive itera- tions of an input-dependent loop).

march 2012 | vol. 55 | no. 3 | communications of the acm 43 practice

ties together efficiently, how to lever- Windows 7. Because SAGE is typically The bottom line? In practice, use age grammars (when available) for run last, those bugs were missed by ev- both. We do at Microsoft. complex input formats, how to deal erything else, including static program with path explosion, how to reason pre- analysis and blackbox fuzzing. Acknowledgments cisely about pointers, how to deal with Finding all these bugs has saved Many people across different groups floating-point instructions and input- Microsoft millions of dollars as well as at Microsoft have contributed to dependent loops. Space constraints saved world time and energy, by avoid- SAGE’s success; more than space al- prevent us from discussing these chal- ing expensive security patches to more lows here. To review this detailed list, lenges here, but the authors’ Web pag- than one billion PCs. The software please visit queue.acm.org/detail. es provide access to other papers ad- running on your PC has been affected cfm?id=2094081. dressing these issues. by SAGE. Since 2008, SAGE has been run- An Example Related articles ning 24/7 on approximately 100-plus on queue.acm.org On April 3, 2007, Microsoft released machines/cores automatically fuzzing an out-of-band critical security patch hundreds of applications in Microsoft Black Box James A. Whittaker, Herbert H. Thompson (MS07-017) for code that parses ANI- security testing labs. This is more than http://queue.acm.org/detail.cfm?id=966807 format animated cursors. The vulner- 300 machine-years and the largest com- Security – Problem Solved? ability was originally reported to Micro- putational usage ever for any Satisfiabil- John Viega soft in December 2006 by Alex Sotirov ity Modulo Theories (SMT) solver, with http://queue.acm.org/detail.cfm?id=1071728 of Determina Security Research, then more than one billion constraints pro- Open vs. Closed: Which Source made public after exploit code ap- cessed to date. is More Secure? peared in the wild.9 It was only the SAGE is so effective at finding bugs Richard Ford third such out-of-band patch released that, for the first time, we faced “bug http://queue.acm.org/detail.cfm?id=1217267 by Microsoft since January 2006, indi- triage” issues with dynamic test gen- cating the seriousness of the bug. The eration. We believe this effectiveness References 1. bhansali, S., Chen, W., De Jong, S., Edwards, A. and Microsoft SDL Policy Weblog stated comes from being able to fuzz large Drinic, M. Framework for instruction-level tracing that extensive blackbox fuzzing of this applications (not just small units as and analysis of programs. In Proceedings of the 2nd International Conference on Virtual Execution code failed to uncover the bug and that previously done with dynamic test Environments (2006). existing static-analysis tools were not generation), which in turn allows us 2. de Moura, L. and Bjorner, N. Z3: An efficient SMT solver. In Proceedings of Tools and Algorithms for the capable of finding the bug without ex- to find bugs resulting from problems Construction and Analysis of Systems; Lecture Notes cessive false positives.6 across multiple components. SAGE is in Computer Science. Springer-Verlag, 2008, 337–340. 3. Forrester, J.E. and Miller, B.P. An empirical study of the SAGE, in contrast, synthesized also easy to deploy, thanks to x86 bi- robustness of Windows NT applications using random a new input file exhibiting the bug nary analysis, and it is fully automat- testing. In Proceedings of the 4th Usenix Windows System Symposium (Seattle, WA, Aug. 2000). within hours of starting from a well- ic. SAGE is now used daily in various 4. Godefroid, P., Klarlund, N. and Sen, K. DART: Directed Automated Random Testing. In Proceedings of formed ANI file, despite having no groups at Microsoft. Programming Language Design and Implementation knowledge of the ANI format. A seed (2005), 213–223. 5. Godefroid, P., Levin, M. Y. and Molnar, D. Automated file was picked arbitrarily from a li- Conclusion whitebox fuzz testing. In Proceedings of Network and brary of well-formed ANI files, and SAGE has had a remarkable impact at Distributed Systems Security, (2008), 151– 166. 6. howard, M. Lessons learned from the animated SAGE was run on a small test pro- Microsoft. It combines and extends cursor ; http://blogs.msdn.com/sdl/ gram that called user32.dll to parse , testing, verification, archive/2007/04/26/lessons-learned-fromthe- animated-cursor-security-bug.aspx. ANI files. The initial run generated a model checking, and automated theo- 7. howard, M. and Lipner, S. The Security Development path constraint with 341 branch con- rem-proving techniques that have been Lifecycle. Microsoft Press, 2006. 8. narayanasamy, S., Wang, Z., Tigani, J., Edwards, straints after executing 1,279,939 to- developed over many years. A. and Calder, B. Automatically classifying benign tal x86 instructions over 10,072 sym- Which is best in practice—blackbox and harmful data races using replay analysis. In Programming Languages Design and Implementation bolic input bytes. SAGE then created or whitebox fuzzing? Both offer differ- (2007). a crashing ANI file after 7 hours 36 ent cost/precision trade-offs. Blackbox 9. sotirov, A. Windows animated cursor stack overflow vulnerability, 2007; http://www.determina.com/ minutes and 7,706 test cases, using is simple, lightweight, easy, and fast but security.research/vulnerabilities/ani-header.html. one core of a 2GHz AMD Opteron 270 can yield limited code coverage. White- dual-core processor running 32-bit box is smarter but more complex. Patrice Godefroid ([email protected]) is a principal researcher at Microsoft Research. From 1994 to 2006, he Windows Vista with 4GB of RAM. Which approach is more effective at worked at Bell Laboratories. His research interests include finding bugs? It depends. If an applica- program specification, analysis, testing, and verification. Impact of SAGE tion has never been fuzzed, any form Michael Y. Levin ([email protected]) is a principal Since 2007, SAGE has discovered many development manager in the Windows Azure Engineering of fuzzing is likely to find bugs, and Infrastructure team where he leads a team developing security-related bugs in many large Mi- simple blackbox fuzzing is a good start. the Windows Azure Monitoring and Diagnostics Service. His additional interests include automated test generation, crosoft applications, including image Once the low-hanging bugs are gone, anomaly detection, data mining, and scalable debugging in processors, media players, file decod- however, fuzzing for the next bugs has distributed systems. ers, and document parsers. Notably, to become smarter. Then it is time to David Molinar ([email protected]) is a researcher SAGE found approximately one-third use whitebox fuzzing and/or user-pro- at Microsoft Research, where his interests focus on software security, electronic privacy, and cryptography. of all the bugs discovered by file fuzzing vided guidance, for example, using an during the development of Microsoft’s input grammar. © 2012 ACM 0001-0782/12/03 $10.00

44 communications of the acm | march 2012 | vol. 55 | no. 3