A Symbolic Execution Framework for JavaScript

Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, Dawn Song Computer Science Division, EECS Department University of California, Berkeley {prateeks, devdatta, sch, fmao, smcc, dawnsong}@cs.berkeley.edu

Abstract—As AJAX applications gain popularity, client-side XMLHttpRequest, and data from code running concur- JavaScript code is becoming increasingly complex. However, rently in other browser windows. Each kind of input string few automated vulnerability analysis tools for JavaScript exist. has its own format, so developers use a combination of cus- In this paper, we describe the first system for exploring the execution space of JavaScript code using symbolic execution. tom routines and third-party libraries to parse and validate To handle JavaScript code’s complex use of string operations, the inputs they receive. To effectively explore a program’s we design a new language of string constraints and implement execution space, a tool must be able to supply values for all a solver for it. We build an automatic end-to-end tool, Kudzu, of these different kinds of inputs and reason about how they and apply it to the problem of finding client-side code injection are parsed and validated. vulnerabilities. In experiments on 18 live web applications, Kudzu automatically discovers 2 previously unknown vulner- Approach. In this paper, we develop the first com- abilities and 9 more that were previously found only with a plete symbolic-execution based framework for client-side manually-constructed test suite. JavaScript code analysis. We build an automated, stand- Keywords-web security; symbolic execution; string decision alone tool that, given a URL for a web application, automat- procedures ically generates high-coverage test cases to systematically explore its execution space. Automatically reasoning about I. INTRODUCTION the operations we see in real JavaScript applications requires Rich web applications have a significant fraction of their a powerful constraint solver, especially for the theory of code written in client-side scripting languages, such as strings. However, the power needed to express the semantics JavaScript. As an increasing fraction of code is found on of JavaScript operations is beyond what existing string the client, client-side security vulnerabilities (such as client- constraint solvers [14], [18] offer. As a central contribution side code injection [20], [26]–[28]) are becoming a promi- of this work, we overcome this difficulty by proposing a nent threat. However, a majority of the research on web constraint language and building a practical solver (called vulnerabilities so far has focused on server-side application Kaluza) that supports the specification of boolean, machine code written in PHP and Java. There is a growing need for integer (bit-vector), and string constraints, including regular powerful analysis tools for the client-side components of expressions, over multiple variable-length string inputs. This web applications. This paper presents the first techniques language’s rich support for string operations is crucial for and system for automatically exploring the execution space reasoning about the parsing and validation checks that of client-side JavaScript code. To explore this execution JavaScript applications perform. space, our techniques generate new inputs to cover a pro- To show the practicality of our constraint language, we gram’s value space using dynamic symbolic execution of detail a translation from the most commonly used JavaScript JavaScript, and to cover its event space by automatic GUI string operations to our constraints. This translation also exploration. harnesses concrete information from a dynamic execution Dynamic symbolic execution for JavaScript has numerous of the program in a way that allows the analysis to scale. applications in web security. In this paper we focus on one We analyze the theoretical expressiveness of the theory of of these applications: automatically finding client-side code strings supported by our language (including in comparison injection vulnerabilities. A client-side code injection attack to existing constraint solvers), and bound its computational occurs when client-side code passes untrusted input to a complexity. We then give a sound and complete decision dynamic code evaluation construct, without proper validation procedure for the bounded-length version of the constraint or sanitization, allowing an attacker to inject JavaScript code language. We develop an end-to-end system, called Kudzu, that runs with the privileges of a web application. that performs symbolic execution with this constraint solver JavaScript execution space exploration is challenging for at its core. many reasons. In particular, JavaScript applications accept many kinds of input, and those inputs are structured just End-to-end system. We identify further challenges in build- as strings. For instance, a typical application might take ing an end-to-end automated tool for rich web applications. user input from form fields, messages from its server via For instance, because JavaScript code interacts closely with a user interface, its input space can be divided into two classes, and the state of the GUI elements together affects the the events space and the value space. The former includes behavior of the application code. the state (check boxes, list selections) and sequence of • Value space. The values of inputs supplied to a program actions of user-interface elements, while the latter includes also determine its behavior. JavaScript has numerous the contents of external inputs. These kinds of input jointly interfaces through which input is received: determine the code’s behavior, but they are suited to differ- – User data. Form fields, text areas, and so on. ent exploration techniques. Kudzu uses GUI exploration to – URL and cross-window communication abstrac- explore the event space, and symbolic execution to explore tions. Web principals hosted in other windows the value space. or frames can communicate with JavaScript code We evaluate Kudzu’s end-to-end effectiveness by applying via inter-frame communication abstractions such as it to a collection of 18 JavaScript applications. The results URL fragment identifiers and HTML 5’s proposed show that Kudzu is effective at getting good coverage postMessage, or via URL parameters. by discovering new execution paths, and it automatically – HTTP channels. Client-side JavaScript code can discovers 2 previously-unknown vulnerabilities, as well as 9 exchange data with its originating web server using client-side code injection vulnerabilities that were previously XMLHttpRequest, HTTP cookies, or additional found only with a manually-created test suite. HTTP GET or POST requests. Contributions. In summary, this paper makes the following This paper primarily focuses on techniques to systemat- main contributions: ically explore the value space using symbolic execution of • We identify the limitations of previous string constraint JavaScript, with the goal of generating inputs that exercise languages that make them insufficient for parsing-heavy new program paths. However, automatically exploring the JavaScript code, and design a new constraint language event space is also required to achieve good coverage. To to resolve those limitations. (Section IV) demonstrate the efficacy of our techniques in an end-to-end • We design and implement Kaluza, a practical decision system, we combine symbolic execution of JavaScript for the procedure for this constraint language. (Section V) value space with a GUI exploration technique for the event • We build the first symbolic execution engine for space. This full system is able to automatically explore the JavaScript, using our constraint solver. (Sections III and combined input space of client-side web application code. VI) Application: finding client-side code injection vulnerabil- • Combining symbolic execution of JavaScript with au- ities. Exploring a program’s execution space has a number of tomatic GUI exploration and other needed components, applications in the security of client-side web applications. we build the first end-to-end automated system for In this paper, we focus specifically on one security applica- exploration of client-side JavaScript. (Section III) tion, finding client-side code injection vulnerabilities. • We demonstrate the practical use of our implementation Client-side code injection attacks, which are sometimes by applying it to automatically discovering 11 client- referred to as DOM-based XSS, occur when client-side side code injection vulnerabilities, including two that code uses untrusted input data in dynamic code evaluation were previously unknown. (Section VII) constructs without sufficient validation. Like reflected or II. PROBLEM STATEMENT AND OVERVIEW stored XSS attacks, client-side code injection vulnerabilities can be used to inject script code chosen by an attacker, In this section we state the problem we focus on, exploring giving the attacker the full privileges of the web application. the execution space of JavaScript applications; describe We call the program input that supplies the data for an attack one of its applications, finding client-side code injection the untrusted source, and the potentially vulnerable code vulnerabilities; and give an overview of our approach. evaluation construct the critical sink. Examples of critical Problem statement. We develop techniques to systemati- sinks include eval, and HTML creation interfaces like cally explore the execution space of JavaScript application document.write and .innerHTML. code. In our threat model, we treat all URLs and cross-window JavaScript applications often take many kinds of input. communication abstractions as untrusted sources, as such in- We view the input space of a JavaScript program as split puts may be controlled by an untrusted web principal. In ad- into two categories: the event space and the value space. dition, we also treat user data as an untrusted source because • Event space. Rich web applications typically define tens we aim to find cases where user data may be interpreted as to hundreds of JavaScript event handlers, which may code. The severity of attacks from user-data on client-side is execute in any order as a result of user actions such often less severe than a remote XSS attack, but developers as clicking buttons or submitting forms. Event handler tend to fix these and Kudzu takes a conservative approach of code may check the state of GUI elements (such as reporting them. HTTP channels such as XMLHttpRequest check-boxes or selection lists). The ordering of events are currently restricted to communicating with a web server from the same domain as the client application, so we as symbolic variables. Dynamic symbolic execution differs do not treat them as untrusted sources. Developers may from normal execution in that while many variable have their wish to treat HTTP channels as untrusted in the future usual (concrete) values, like 5 for an integer variable, the when determining susceptibility to cross-channel scripting values of other variables which depend on symbolic inputs attacks [5], or when enhanced abstractions (such as the are represented by symbolic formulas over the symbolic proposed cross-origin XMLHttpRequest [30]) allow cross- inputs, like input1 + 5. Whenever any of the operands of a domain HTTP communication directly from JavaScript. JavaScript operation is symbolic, the operation is simulated To effectively find XSS vulnerabilities, we require two by creating a formula for the result of the operation in terms capabilities: (a) generating directed test cases that explore of the formulas for the operands. When a symbolic value the execution space of the program, and (b) checking, on propagates to the condition of a branch, Kudzu can use its a given execution path, whether the program validates all constraint solver to search for an input to the program that untrusted data sufficiently before using it in a critical sink. would cause the branch to make the opposite choice. Custom validation checks and parsing routines are the norm Event space exploration. As a component of Kudzu we rather than the exception in JavaScript applications, so our develop a GUI explorer that searches the space of all event tool must check the behavior of validation rather than simply sequences using a random exploration strategy. Kudzu’s GUI confirming that it is performed. explorer component randomly selects an ordering among the In previous work, we developed a tool called FLAX which user events registered by the web page, and automatically employs taint-guided for finding client-side code fires these events using an instrumented version of the web injection attacks [27]. However, FLAX relies on an external, browser. Kudzu also has an input-feedback component that manually developed test harness to explore the path space. can replay the sequence of GUI events explored in any Kudzu, in contrast, automatically generates a test suite that given run, along with feeding new values generated by the explores the execution space systematically. Kudzu also uses constraint solver to the application’s data inputs. symbolic reasoning (with its constraint solver) to check if the Testing for client-side code injection vulnerabilities. For validation logic employed by the application is sufficient to each input explored, Kudzu determines whether there is a block malicious inputs — this is a one-step mechanism for flow of data from an untrusted data source to a critical directed exploit generation as opposed to multiple rounds sink. If it finds one, it seeks to determine whether the of undirected fuzzing employed in FLAX. Static analysis program sanitizes and/or validates the input correctly to techniques have also been employed for JavaScript [12] prevent attackers from injecting dangerous elements into to reason about multiple paths, but can suffer from false the critical sink. Specifically, it attempts to prove that the positives and do not produce test inputs or attack instances. validation is insufficient by constructing an attack input. As Symbolic analyses and model-checking have been used for we will describe in more detail in Section III-B, it combines server-side code [2], [21]; however, the complexity of path the results of symbolic execution with a specification for conditions we observe requires more expressive symbolic attacks to create a constraint solver query. If the constraint reasoning than supported by tools for server-side code. solver finds a solution to the query, it represents an attack that can reach the critical sink and exploit a client-side code Approach Overview. The value space and event space of injection vulnerability. a web application are two different components of its input space: code reachable by exploring one part of the input III. END-TO-END SYSTEM DESIGN space may not be reachable by exploring the other com- ponent alone. For instance, exploring the GUI event space This section describes the various components that work results in discovering new views of the web application, but together to make a complete Kudzu-based vulnerability- this does not directly affect the coverage that can be achieved discovery system work. The full explanation of the constraint by systematically exploring all the paths in the code imple- solver is in Sections IV through VI. For reference, the menting each view. Conversely, maximizing path coverage is relationships between the components are summarized in unlikely to discover functionality of the application that only Figure 1. happens when the user explores a different application view. Therefore, Kudzu employs different techniques to explore A. System Components each part of the input space independently. First, we discuss the core components that would be used Value space exploration. To systematically explore differ- in any application of Kudzu: the GUI explorer that generates ent execution paths, we develop a component that performs input events to explore the event space, the dynamic symbolic dynamic symbolic execution of JavaScript code, and a new that performs symbolic execution of JavaScript, constraint solver that offers the desired expressiveness for the path constraint extractor that builds queries based on automatic symbolic reasoning. the results of symbolic execution, the constraint solver that In dynamic symbolic execution, certain inputs are treated finds satisfying assignments to those queries, and the input Figure 1: Architecture diagram for Kudzu. The components drawn in the dashed box perform functions specific to our application of finding client-side code injection. The remaining components are application-agnostic. Components shaded in light gray are the core contribution of this paper. feedback component that uses the results from the constraint interpreter. For recording an execution trace, Kudzu employs solver as new program inputs. an existing instrumentation component [27] implemented in The GUI explorer. The first step in automating JavaScript the web browser’s JavaScript interpreter. For each JavaScript application analysis is exploring the event space of user bytecode instruction executed, it records the semantics of interactions. Each event corresponds to a user interaction the operation, its operands and operand values in a sim- such as clicking a check-box or a button, setting focus plified intermediate language called JASIL [27]. The set on a field, adding data to data fields, clicking a link, of JavaScript operations captured includes all operations and so on. Kudzu currently explores the space of all se- on integers, booleans, strings, arrays, as well as control- quences of events using a random exploration strategy. One flow decisions, object types, and calls to browser-native of the challenges is to comprehensively detect all events methods. For the second step, dynamic symbolic execution, that could result in JavaScript code execution. To address we have developed from scratch a symbolic interpreter for this, Kudzu instruments the browser functions that process the recorded JASIL instructions. HTML elements on the current web page to record when an Symbolic inputs for Kudzu are configurable to match the event handler is created or destroyed. Kudzu’s GUI explorer needs of an application. For instance, in the application we component randomly selects an ordering among the user consider, detecting client-side code injection, all URL data, events registered by the web page and executes them1. The data received over cross-window communication abstrac- random seed can be controlled to replay the same ordering tions, and user data fields are marked symbolic. Symbolic of events. While invoking handlers, the GUI component also inputs may be strings, integers, or booleans. Symbolic execu- generates (benign) random test strings to fill text fields. tion proceeds on the JASIL instructions in the order they are (Later, symbolic execution will generate new input values recorded in the execution trace. At any point during dynamic for these fields to explore the input space further.) Links symbolic execution, a given operand is either symbolic or that navigate the page away from the application’s domain concrete. If the operand is symbolic, it is associated with a are cancelled, thereby constraining the testing to a single symbolic value; otherwise, its value is purely concrete and is application domain at a time. In the future, we plan to stored in the dynamic execution trace. When interpreting a investigate alternative strategies to prioritize the execution JASIL operation in the dynamic symbolic interpreter, the of events discovered as well. operation is symbolically executed if one or more of its input operands is symbolic. Otherwise the operation of the Dynamic symbolic interpreter. Kudzu performs dynamic symbolic interpreter on concrete values would be exactly the symbolic execution by first recording an execution of the same as the real JavaScript interpreter, so we simply reuse program with concrete inputs, and then symbolically in- the concrete results already stored in the execution trace. terpreting the recorded execution in a dynamic symbolic The symbolic value of an operand is a formula that represents its computation from the symbolic inputs. For 1Invoking an event handler may invalidate another handler (for instance, instance, for the JASIL assignment operation y := x, if x when the page navigates as a result). In that case, the invalidated handlers are ignored and if new handlers are created by the event that causes is symbolic (say, with the value input1 + 5), then symbolic invalidation, these events are explored subsequently. execution of the operation copies this value to y, giving y the same symbolic value. For an arithmetic operation, Input feedback. Solving the path constraint formula using say y := x1 + x2 where x1 is symbolic (say with value the solver creates a new input that explores a new program input2 + 3) and x2 is not (say with the concrete value 7), path. These newly generated inputs must be fed back to the symbolic value for y is the formula representing the the JavaScript program: for instance simulated user inputs sum (input2 + 10). Operations over strings and booleans are must go in their text fields, and GUI events should be treated in the same way, generating formulas that involve replayed in the same sequence as on the original run. The string operations like match and boolean operations like input feedback component is responsible for this task. As a and. At this point, string operations are treated simply particular HTML element (e.g a text field) in a document as uninterpreted functions. During the symbolic execution, is likely allocated a different memory address on every whenever the symbolic interpreter encounters an operation execution, the input feedback component uses XPath [32] outside the supported formula grammar, it forces the desti- and DOM identifiers to uniquely identify HTML elements nation operand to be concrete. For instance, if the operation across executions and feed appropriate values into them. If is x = parseFloat(s) for a symbolic string s, x and an input comes from an attribute for a DOM object, the input s can be replaced with their concrete values from the trace feedback component sets that attribute when the object is (say, 4.3 and “4.3”). This allows symbolic computation to created. If the input comes via a property of an event that continue for other values in the execution. is generated by the browser when handling cross-window communication, such as the origin and data properties Path constraint extractor. The execution trace records of a postMessage event, the component updates that each control-flow branch (e.g., if statement) encountered property when the JavaScript engine accesses it. Kudzu during execution, along with the concrete value (true or instruments the web browser to determine the context of false) representing whether the branch was taken. During accesses, to distinguish between accesses coming from the symbolic execution, the corresponding branch condition is JavaScript engine and accesses coming from the browser recorded by the path constraint extractor if it is symbolic. core or instrumentation code. As execution continues, the formula formed by conjoining the symbolic branch conditions (negating the conditions of B. Application-specific components branches that were not taken) is called the path constraint. If an input value satisfies the path constraint, then the program Next, we discuss three components that are specialized for execution on that input will follow the same execution path. the task of finding client-side code injection vulnerabilities: a To explore a different execution path, Kudzu selects a sink-source identification component that determines which branch on the execution path and builds a modified path critical sinks might receive untrusted input, a vulnerability constraint that is the same up to that branch, but that has condition extractor that captures domain knowledge about the negation of that branch condition (later conditions from client-side code injection attacks, and the attack verification the original branch are omitted). An input that satisfies component that checks whether inputs generated by the tool this condition will execute along the same path up to the in fact represent exploits. selected branch, and then explore the opposite alternative. Sink-source identification. To identify if external inputs There are several strategies for picking the order in which are used in critical sink operations such as eval or branch conditions can be negated — Kudzu currently uses document.write, we perform a dynamic data flow anal- a generational search strategy [11]. ysis on the execution trace. As outlined earlier, we treat all Constraint solver. Most symbolic execution tools in the past URL data, data received over cross-window communication have relied on an existing constraint solver. However, Kudzu abstractions (such as postMessage), and data filled into generates a rich set of constraints over string, integer and user data fields as potentially untrusted. The data flow boolean variables for which existing off-the-shelf solvers are analysis is similar to a dynamic taint analysis. Any execution not powerful enough. Therefore, we have built a new solver, trace that reveals a flow of data to a critical sink is subject Kaluza, for our constraints (we present the algorithm and to further symbolic analysis for exploit generation. We design details in Section V). In designing this component, use an existing framework, FLAX, for this instrumentation we examined the symbolic constraints Kudzu generates and taint-tracking [27] in a manner that is faithful to the in practice. From the string constraints arising in these, implementation of JavaScript in the WebKit interpreter. we distilled a set of primitive operations required in a Vulnerability condition extractor. An input represents an core constraint language. (This core language is detailed attack against a program if it passes the program’s validation in Section IV, while the solver’s full interface is given in checks, but nonetheless implements the attacker’s goals (i.e., Section VI.) We justify our intuition that solving the core causes a client-side code injection attack) when it reaches a constraints is sufficient to model JavaScript string operations critical sink. The vulnerability condition extractor collects in Section VI, where we show a practical translation of from the symbolic interpreter a formula representing the JavaScript string operations into our constraint language. (possibly transformed) value used at a critical sink, and combines it with the path constraint to create a formula Formula ::= ¬Formula describing the program’s validation of the input.2. To de- | Formula ∧ Formula termine whether this value constitutes an attack, the vulner- | Constraint ability condition extractor applies a sink-specific vulnera- Constraint ::= Var ∈ RegExp bility condition specification, which is a (regular) grammar | Var = Var encoding a set of strings that would constitute an attack | Var = Var ◦ Var against a particular sink. This specification is conjoined with | length(Var) Rel Number the formula representing the transformed input to create a | length(Var) Rel length(Var) formula representing values that are still dangerous after the RegExp ::= Character transformation. | ǫ For instance, in the case of the eval sink, the vulnera- | RegExp RegExp bility specification asserts that a valid attack should be zero | RegExp|RegExp * or more statements each terminated by a ‘;’, followed by | RexExp the payload. Such grammars can be constructed by using Rel ::= < | ≤ | = | ≥ | > publicly available attack patterns [13]. The tool’s attack Figure 2: Abstract grammar of the core constraint language. grammars are currently simple and can be extended easily for comprehensiveness and to incorporate new attacks. To search only for realistic attacks, the specification also ordering operations. Regular expressions are formed from incorporates domain knowledge about the possible values characters or the empty string (denoted by ǫ) via the usual of certain inputs. For instance, when a string variable cor- operations of concatenation (represented by juxtaposition), responds to the web URL for the application, we assert that alternation (|), and repetition zero or more times (Kleene the string starts with the same domain as the application. star *). Attack verification. Kudzu automatically tests the exploit The constraints all have their usual meanings. Any number instance by feeding the input back to the application, and of variables may be introduced, and Characters are drawn checking if the attack payload (such as a script with an alert from an arbitrary non-empty alphabet, but Numbers must be message) is executed. If this verification fails, Kudzu does non-negative integers. For present purposes, strings may be not report an alarm. of unbounded length, though we will introduce upper bounds on their lengths later. IV. CORE CONSTRAINT LANGUAGE In order to support a rich language of input constraints B. Expressiveness and Complexity with a simple solving back end, we have designed an Though the core constraint language is intentionally small, intermediate form we call the core constraint language. it is not minimal: some types of constraints are included for This language is rich enough to express constraints from the convenience of translating to and from the core language, JavaScript programs, but simple enough to make solving the but do not fundamentally increase its expressiveness. String constraints efficient. In this section we define the constraint equality, comparisons between lengths and constants, and language, analyze its expressiveness and the theoretical inequality comparisons between lengths can be expressed complexity of deciding it, and compare its expressiveness using concatenation, regular expressions, and equality be- to the core languages of previous solvers. tween string lengths respectively; the details are omitted for space. A. Language Definition Each of the remaining constraint types (regular expres- The abstract syntax for our core constraint language is sion membership, concatenation, and length) makes its own shown in Figure 2. A formula in the language is an ar- contribution to the expressiveness of the core constraints. bitrary boolean combination of constraints. Variables which Appendix A gives examples of the sets of strings that each represent strings may appear in five types of constraints. The constraint type can uniquely define. The core constraint lan- first three constraint types indicate that a string is a member guage is expressive enough that the complexity of deciding of the language defined by a regular expression, that two it is not known precisely; it is at least PSPACE-hard. These strings are equal, or one string is equal to the concatenation relationships are summarized in Figure 3. The complexity of two other strings. The two remaining constraints relate of our core constraint language falls to NP-complete when the length of one string to a constant natural number, or to the lengths of string variables are bounded, as they are in the length of another string, by any of the usual equality or our implementation. Further details are in the appendix.

2Sanitization for critical client-side sink operations may happen on the C. Expressiveness Comparison server side (when data is sent back via XMLHttpRequest). Our imple- mentation handles this by recognizing such transformations using approxi- Our system’s core constraint language is more expressive mate tainting techniques [27] for data transmitted over XMLHttpRequest than the constraints used in similar previous systems, and strings, but these string expressions cannot be compared with each other, only with regular expressions, so HAMPI lacks the full generality of word equations. For instance, HAMPI constraints cannot define the set {uv#u#v : u,v ∈ {0, 1}∗}. It is worth reemphasizing that these limitations are not just theoretical: they make these previous systems unsuitable for our applications. One of the most common operations in the programs we examine is to parse a single input string (such as a URL) into separate input variables using split or repeated regular expression matching. Representing the semantics of such an operation requires relating the contents of one string variable to those of another, something that neither DPRLE nor HAMPI supports.

V. CORE CONSTRAINT SOLVING APPROACH Figure 3: Relations between the unbounded versions of several theories of strings. Theories higher in the graph are We have implemented a decision procedure called Kaluza strictly more expressive but are also at least as complex for the core set of constraints, which is available online [16]. to decide. Kudzu’s core constraint language (shaded) is In this section, we explain our algorithm for solving the core strictly more expressive than either the core language of set of constraints. We introduce a bounded version of the HAMPI [18] or the theory of word equations and an equal constraints where we assume a user-supplied upper bound k length predicate (the “pure library language” of [4]). on the length of the variables. This allows us to employ a SAT-based solution strategy without reducing the practical expressiveness of the language. this expressiveness is key in allowing it to handle a more The algorithm satisfies three important properties, whose complex class of applications. informal proof appears in the appendix. Bjørner et al. [4] present a “pure library language” that, 1) Soundness. Whenever the algorithm terminates with an like our core constraint language, includes word equations assignment of values to string variables, the solution and the ability to assert that two strings have the same length, is consistent with the input constraints. so like our language its decidability is open. However, 2) Bound-k completeness. If there exists a solution for their language does not include regular expressions. Regular the string variables where all strings have length k or expressions may be less common in the .NET applications less, then our algorithm finds one such solution. Bjørner et al. study, but they are used ubiquitously in 3) Termination. The algorithm requires only a finite num- JavaScript, so regular expression support is mandatory in our ber of steps (a function of the bound) for any input. domain. Similarly the work of Caballero et al. [3], [7] deals The solver translates constraints on the contents of strings with programs compiled from C-family languages, whose into bit-vector constraints that can be solved with a SAT- string operations are much more limited. based SMT solver. For this purpose, the solver translates The DPRLE tool [14] focuses on a class of constraints that each input string into a sequence of n-bit integers (n = 8 in combine concatenation and regular expression matching, but the current implementation). Each string variable S also has in a more limited way than our tool supports. DPRLE ad- an associated integer variable LS representing its length. A dresses a different problem domain, since it gives solutions single string is converted to a bit-vector by concatenating for constraints over languages (potentially infinite sets of the binary representations of each character. Then, the bit- strings) rather than single strings, but this makes the task vectors representing each string are themselves concatenated significantly more difficult. We were unable to express the into a single long bit-vector. (The order in which the constraints from our application in DPRLE’s input format or strings are concatenated into the long vector reflects the any straightforward extension of it. For instance, there is no concatenation constraints, as detailed in step 1 below.) The way to express the constraint that two language expressions solver passes the constraints over this bit vector to a SMT should be equal, not surprising since such constraints in (satisfiability modulo theories) decision procedure for the general are undecidable [8]. theory of bit vectors, STP [10] in our implementation. HAMPI [18] provides support for regular expression Informally, it is convenient to refer to the combined bit constraints (and in fact we build on its implementation for vector as if it were an array indexed by character offsets, but this feature), but its support for other constraints is limited, we do not use STP’s theory of arrays, and character offsets particularly by the fact that it supports only a single string are multiplied by n to give bit offsets before producing the variable. The variable can be concatenated with constant final constraints. S1

Input: C : constraint list L (1,3) R Output: (IsSat : bool, Solutions : string list)

G ← BuildConcatGraph(C); S2 S3 S6 (C′, StrOrderMap) ← DecideOrder(G); ′ (1,1) L (2,3) R L (3,4) R C ← C ∪ C ; S1 = S2 . S3 S3 = S4 . S5 FailLenDB : length assignment list ← ∅; S6 = S5 . S7 S4 S5 S7 while true do INPUT CONCAT CONSTRAINTS (X, lengths) ← SolveLengths(C, FailLenDB); (2,2) (3,3) (4,4) if (X = UNSAT) then print “Unsatisfiable”; Figure 5: A sample concat graph for a set of concatenation halt(false, ∅); end constraints. The relative ordering of the strings in the final Final : bitvector constraints; character array is shown as start and end positions in Final ← CreateBVConstraints(StrOrderMap, C, lengths); parentheses alongside each node. (Result, BVSolutions) ← BVSolver(Final); if (Result = SAT) then print “Satisfiable”; printSolutions(BVSolutions, lengths, StrOrderMap); halt(true, BVsToStrings(BVSolutions)); a graph of concatenation constraints (a concat graph for end short). The graph has a node for each string variable, and for else FailLenDB ← FailLenDB ∪ lengths; each constraint S1 = S2 ◦S3, S2 and S3 are the left and right end children (respectively) of S1. An example of such a graph is end shown in Figure 5. Without loss of generality, we can assume that the graph is acyclic: if there is a cycle from S1 to S2 Figure 4: Algorithm for solving the core constraints. to S3 . . . back to S1, then S1= S2 ◦ S3 ◦···◦ S1 (or some other order), so all the variables other than S1 must be the empty string, and can be removed from the constraints. (In Our algorithm is shown in Figure 4. At a high level, our applications the constraints will in any case be acyclic by it has three steps. First, it translates string concatenation construction.) Given this graph, the algorithm then chooses constraints into a layout of variables (with overlap) in the the relative ordering of the strings in the character array by final character array mentioned above. Second, it extracts assigning start and end positions to each node with a post- integer constraints on the lengths of strings and finds a order traversal of the graph. (In Figure 5, these positions are satisfying length assignment using the SMT solver. Finally, shown in parentheses next to each node.) given a position and length for each string, the solver For the layout generated by the algorithm to be correct, the translates the constraints on the contents of each string into concat graph must be a DAG in which each internal node bit-vector constraints and checks if they are satisfiable. has exactly two children, and those children are adjacent In general, because of the interaction of length constraints in the layout. (This implies that the graph is planar.) The and regular expressions, the length assignment chosen in step graph may not have these properties at construction; for 2 might not correspond to satisfiable contents constraints, instance, Figure 6 gives a set of example constraints with even when a different length assignment would. So if step 3 contradictory ordering requirements: S2 cannot be simulta- fails to find a satisfying assignment, the algorithm returns to neously to the left and to the right of S3. The algorithm step 2 to generate a new length assignment (distinct from any resolves such requirements by duplicating a subtree of the tried previously). Steps 2 and 3 repeat until the solver finds graph (for instance as shown in the right half of Figure 6). a satisfying assignment, or it has tried all possible length To maintain the correct semantics, the algorithm adds string assignments (up to the length bound k). equality constraints to ensure that any duplicated strings Step 1: Translating concatenation constraints.. The intu- have the same contents as the originals. The algorithm ition behind Kudzu’s handling of concatenation constraints performs duplications to ensure that the graph satisfies the correctness invariant, but our current algorithm does not is that for a constraint S1 = S2 ◦ S3, it would be sufficient attempt to perform the minimal number of copies (for to ensure that S2 comes immediately before S3 in the final instance, in Figure 6 it would suffice to copy either only character array, and to lay out S1 as overlapping with S2 S2 or only S3), which in our experience has not hurt the and S3 (so that S1 begins at the same character as S2 and solver’s performance. ends at the same character as S3). This overlapping layout also has the advantage of reducing the total length of bit- Step 2: Finding a satisfiable length assignment. Each vectors required. Each concatenation constraint suggests an string variable S has an associated length variable LS. Each ordering relation among the string variables, but it might core string constraint implies a corresponding constraint not be possible to satisfy all such ordering constraints on the lengths of the strings, as detailed in Table I. For simultaneously. the regular expression containment constraint (S1 ∈ R), To systematically choose an ordering, the solver builds the set of possible lengths is an ultimately periodic set: S1 S4 S1 (1,2) S4 (3,4) L R L R the HAMPI tool [18]. At a high-level, the translation first L R L R COPY CREATION unrolls uses of the Kleene star ∗ in a regular expression

S2 S3 S2 S3 S3_COPY S2_COPY into a finite number of repetitions (never more than the (1,1) (2,2) (3,3) (4,4) string length). Next, where the regular expression has con- S1 = S2 . S3 S2 = S2_COPY S4 = S3 . S2 S3 = S3_COPY catenation, HAMPI determines all possible combinations of

INPUT CONCAT CONSTRAINTS NEW CONSTRAINTS DUE TO COPY CREATION lengths that sum to the total length, and instantiates each as a conjunction of constraints. Along with the alternations that Figure 6: A set of concat constraints with contradictory appeared in the original regular expression, each of these ordering requirements. Nodes are duplicated to resolve the conjunctions also represents an alternative way in which the contradiction. regular expression could match the string. To complete the translation, the choice between all of these alternatives is represented with a disjunction. (See [18] for a more detailed whether a length is possible depends only on its remainder explanation and some optimizations.) when divided by a fixed period, except for a finite set of HAMPI supports only a single, fixed-length input, so we exceptions. (Yu et al. use the equivalent concept of a semi- invoke it repeatedly to translate each constraint between a linear set in a conservative automaton-based approach [33].) regular expression and a string into an STP formula. We then The details of computing this set are covered in the lit- combine each of these translations with our translations of erature [23]; we note that such sets can be conveniently other string contents constraints (e.g., string equality), and represented with our SMT solver since it supports a modulus conjoin all of these constraints so that they apply to the operation. At each iteration of step 2, the solver conjoins same single long character array. It is this single combined the length constraints corresponding to all of the original formula that we pass to the SMT solver (STP) to find a string constraints, along with a constraint to rule out each satisfying assignment. length assignment that had previously been tried, and passes this formula to the SMT solver. If it returns a satisfying VI. REDUCING JAVASCRIPT TO STRING CONSTRAINTS assignment, it represents a new length assignment to try; if the constraint is unsatisfiable, then so were the original In this section we describe our tool’s translation from string constraints. JavaScript to the language of our constraint solver, focusing on the treatment of string operations. We start by giving the Core Constraint Implication on lengths full constraint language the solver supports, then describe S1 = S2 ◦ S3 LS1 = LS2 + LS3 our general approach to modeling string operations, our use S1 ∈ R LS1 ∈ LengthSet(R) of concrete values from the dynamic trace, and the process of S1 = S2 LS1 = LS2 length(S1) ⋄ i LS1 ⋄ i translating real regular expressions into textbook-style ones. length(S1) ⋄ length(S2) LS ⋄ LS 1 2 Full constraint language. The core constraint language Table I: Length constraints implied by core string con- presented in Section IV captures the essence of our solving straints, where LS is the length of a string S, and ⋄ ranges approach, but it excludes several features for simplicity, most over the operators {<, ≤, =, ≥, >}. notably integer constraints. The full constraint language sup- ported by our solver supports values of string, integer, and It is not necessary for correctness that the length abstrac- boolean types, and its grammar is given in Figure 8, along tion performed by the solver be precise, but determining with its type system in Figure 7. The additional constraints precise length bounds improves performance by avoiding are solved at step 2 of the string solution procedure, together wasted iterations. In the complete system, the integer con- with the integer constraints on the lengths of strings. To straints over lengths are solved together with integer con- match common JavaScript implementations (which reserve straints arising directly from the original program, discussed a bit as a type tag), we model integers as 31-bit signed bit- in Section VI. In our experience it is important for good vectors in our SMT solver, which supports all the integer performance to solve these two sets of integer constraints operations that JavaScript does. The solver replaces each together. The two sets of constraints may be interrelated, toString constraint with the appropriate string once a and solving them together prevents the solver from wasting value for its argument is selected: for instance, if i is given time exploring solutions that satisfy one set but not the other. the value 12, toString(i) is replaced with “12”. Step 3: Encoding as bitvectors. Given the array layout and JavaScript string operations. JavaScript has a large library lengths computed in steps 1 and 2, the remaining constraints of string operations, and we do not aim to support every over the contents of strings can be expressed as constraints operation, or the full generality of their behavior. Beyond the over fixed-size bit-vectors. String equality translates directly engineering challenge of building such a complete transla- into bit-vector equality. For the encoding of regular expres- tion, having very complex symbolic translations for common sion constraints, we reuse part of the implementation of operators would likely cause the system to bog down, and τ ::= string | int | bool ConstRegex ::= Regex | CapturedBrack(R, i) | six times. This sacrifices some generality (for instance, if a BetweenCapBrack(R,i,j) certain attack is only possible when the string appears seven times). However, we believe this is a beneficial trade-off Figure 7: Type system for the full constraint language since it allows our tool to analyze and find bugs in many uses S1 : string = (S2 : string) ◦ (S3 : string) ◦ ··· I1 : int = length(S : string) of replace. For comparison, most previous string con- S1 : string ∈ R : ConstRegex straint solvers do not support replace at all, and adding S1 : string ∈/ R : ConstRegex a replace that applied to any number of occurrences of a I1 : int = (I2 : int) {+, −, ·, /} (I3 : int) B1 : bool = (A1 : τ) {=, 6=} (A2 : τ) string (even limited to single-character strings) would make B1 : bool = (I1 : int) {<, ≤, ≥,>} (I2 : int) our core constraint language undecidable in the unbounded B1 : bool = ¬(B2 : bool) case [6]. B1 : bool = (B2 : bool) {∧, ∨} (B3 : bool) S1 : string = toString(I1 : int) Regular expressions in practice. The “regular expressions” Figure 8: Grammar and types for the full constraint supported by languages like JavaScript have many more language including operations on strings, integers, and features than the typical definition given in a computability booleans. textbook (or Figure 2). Figure 9 shows an example (adapted function validate (input) { from a real web site) of one of many regular expressions //input = ’{"action":"","val":""}’; Kudzu must deal with. Kudzu handles a majority of the mustMatch = ’{]:],]:]}’; syntax for regular expressions in JavaScript, which includes re1 =/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g; support for (possibly negated) character classes, escaped re2 =/"[ˆ"\\\n\r]*"|true|false|null| -?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g; sequences, repetition operators ({n}/?/*/+/) and sub- re3 = /(?:ˆ|:|,)(?:\s*\[)+/g; match extraction using capturing parentheses. Kudzu keeps rep1 = str.replace(re1, "@"); rep2 = rep1.replace(re2, "]"); track of the nesting of capturing parentheses, so that it rep3 = rep2.replace(re3, ""); can express the relation between the input string and the if(rep3==mustMatch) { eval(input); return true; } parts of it that match the captured groups (as shown in return false; } Table II). Kudzu does not currently support back-references Figure 9: Example of a regular-expression-based validation (they are strictly more expressive than true regular expres- check, adapted from a real-world JavaScript application. This sions), though if we see a need in the future, many uses illustrates the complexity of real regular expression syntax. of back-references could be translated using (non-regular) concatenation constraints.

VII. EXPERIMENTAL EVALUATION the generality would usually be wasted. Instead, our choice We have built a full-implementation of Kudzu using the has been to model the string operations that occur commonly WebKit browser, with 650, 7430 and 2200 lines of code in in web applications, and the core aspects of their behavior. the path constraint extraction component, constraint solver, For other operations and behavior aspects our tool uses and GUI explorer component, respectively. The system is values from the original execution trace (described further written in a mixture of C++, Ruby, and OCaml languages. below), so that they are accurate with respect to the original We evaluate Kudzu with three objectives. One objective execution even if the tool cannot reason symbolically about is to determine whether Kudzu is practically effective in how they might change on modified executions. The detailed exploring the execution space of real-world applications and translation from several common operators (a subset of those uncovering new code. The second objective is to determine supported by our implementation) to our constraint language the effectiveness of Kudzu as a stand-alone vulnerability dis- is shown in Table II. covery tool — whether Kudzu can automatically find client- Using dynamic information. One of the benefits of dy- side code injection vulnerabilities and prune away false namic symbolic execution is that it provides the flexibil- reports. Finally, we measure the efficiency of the constraint ity to choose between symbolic values (which introduce solver. Our evaluation results are promising, showing that generality) and concrete values (which are less general, Kudzu is a powerful system that finds previously unknown but guaranteed to be precise) to control the scope of the vulnerabilities in real-world applications fully automatically. search process. Our tool’s handling of string constraints takes advantage of concrete values from the dynamic traces A. Experiment Setup in several ways. An example is string replace, which We select 18 subject applications consisting of popular is often used in sanitization to transform unsafe characters iGoogle gadgets and AJAX applications, as these were into safe ones. Our translation uses a concrete value for studied by our previous tool FLAX [27]. FLAX assumes the number of occurrences of the searched-for pattern: if availability of an external (manually developed) test suite to a pattern was replaced six times in the original run, the seed its testing; in contrast, Kudzu automatically generates tool will search for other inputs in which the pattern occurs a much more comprehensive test suite and finds the points JavaScript operation Reduction to our constraint language S1 : string = charAt(S : string,I : int) (((LS = length(S)) ∧ (I ≥ 0) ∧ (I 0) ∧ (S ∈ r) ∧ ((S1 ∈ CapturedBrack(r, 1)) ∨ (S1 = “”)) k (non-greedy) ∧(S = T0 ◦ S1 ◦ T1 ◦···◦ Sk ◦ Tk) ∧ (Vi=0 Ti ∈ BetweenCapBrack(r, i, i + 1))) ∧···∧ ((Sk ∈ CapturedBrack(r, k)) ∨ (Sk = “”))) ∨((k ≤ 0) ∧ (S∈ / Regex(.*r.*)))) [S1, S2,...,Sn] : string list = match(S : string,r : ConstRegex,n : int) (((S = T1 ◦ M1 ◦ T2 ◦ M2 ◦···◦ Tn ◦ Mn ◦ Tn+1) ∧ (T1,T2,...,Tn+1 ∈/ Regex(.*r.*)) (greedy match) ∧(M1,M2,...,Mn ∈ Regex(r))) n is the concrete number of occurrences of strings matching r. S1 : string = replace(S : string,r : ConstRegex, s : string,n : int) ((S = T1 ◦ M1 ◦ T2 ◦ M2 ◦···◦ Tn ◦ Mn ◦ Tn+1) ∧ (T1,T2,...,Tn+1 ∈/ Regex(.*r.*)) ∧(M1,M2,...,Mn ∈ Regex(r)))) ∧ (S1 = T1 ◦ s ◦ T2 ◦ s ◦···◦ Tn+1) n is the concrete number of occurrences of strings matching r in S. [S1, S2,...,Sk] : string list = split(S : string, s : string,n : int) (((S = S1 ◦ s ◦ S2 ◦ s ◦···◦ Sn ◦ s ◦ Sn+1) ∧ (S1, S2,...,Sn+1 ∈/ Regex(.*s.*)))) n is the concrete number of occurrences of strings matching r. I1 : int = search(S : string,r : ConstRegex) (((I1 < 0) ∧ (S∈ / .*r.*)) ∨((I1 ≥ 0) ∧ (S = T1 ◦ T2 ◦ T3) ∧ (I1 = length(T1)) ∧ (T2 ∈ Regex(r)) ∧ (T1,T3 ∈/ (.*r.*)))) S1 : string = substring(S : string, start : int, end : int) ((start ≥ 0) ∧ (end < length(S)) ∧ (end ≥ start) ∧ (S = T1 ◦ S1 ◦ T2) ∧ (start = length(T1)) ∧(I1 = end − start) ∧ (I1 = length(S1)) ∨((start ≥ 0) ∧ (end ≥ length(S)) ∧ (end ≥ start) ∧ (S = T1 ◦ S1 ◦ T2) ∧ (start = length(T1)) ∧(LS = length(S)) ∧ (I1 = LS − start) ∧ (I1 = length(S1)) ∨((start < 0) ∧ (end < length(S)) ∧ (end ≥ start) ∧ (S = T1 ◦ S1 ◦ T2) ∧ (0 = length(T1)) ∧(I1 = end − start) ∧ (I1 = length(S1)) ∨((start < 0) ∧ (end ≥ length(S)) ∧ (end ≥ start) ∧ (S = T1 ◦ S1 ◦ T2) ∧ (0 = length(T1)) ∧(LS = length(S))(LS = length(S1)) B1 : bool = match(S, r) ((B1) ∧ (S ∈ Regex(r)) ∨ (¬(B1) ∧ (S∈ / Regex(r)))) I : int = parseInt(S) (S = toString(I)) ∧ ((S = “NaN”) ∨ (S = Regex([0-9]+))) Table II: Our reduction from common JavaScript operations to our full constraint language. Capitalized variables may be concrete or symbolic, while lowercase variables (including regular expressions) must take a concrete value. Our implementation supports a larger set of operations, whose translations are similar. Match / Test Substr / Application # of new Initial / Final Bug / Split Substring / inputs Code Coverage found X 1% Concat CharAt / Academia 20 30.27 / 76.47% X Replace / 8% CharCodeAt AJAXIm 15 49.58 / 77.67% EncodeURI / 5% FaceBook Chat 54 26.85 / 76.84% - X DecodeURI ParseUri 13 53.90 / 86.10% X 8% Plaxo 31 5.72 / 76.43% AskAWord 10 29.30 / 67.95 % X Birthday Reminder 27 59.47 / 73.94% - Block Notes 457 65.06 / 71.50 % X IndexOf / Calorie Watcher 16 64.54 / 73.53% - LastIndexOf Expenses Manager 133 61.09 / 76.56% - / Strlen Listy 19 65.31 / 79.73% X 78% Notes LP 25 46.62 / 76.67% - Progress Bar 12 63.60 / 75.09% - Simple Calculator 1 46.96 / 80.52% X Todo List 15 72.51 / 86.41% X Figure 10: Distribution of string operations in our subject TVGuide 6 30.39 / 75.13% X Word Monkey 20 14.84 / 75.36% X applications. Zip Code Gas 11 59.05 / 74.28% - Average 49 46.95 / 76.68% 11 of vulnerability without requiring any external test harness Table III: The top 5 applications are AJAX applications, a priori. Further, in our experiments Kudzu discovers 2 new while the rest are Google/IG gadget applications. Column vulnerabilities within a few hours of testing which were 2 reports the number of distinct new inputs generated, and missed by the FLAX because of its lack of coverage. In column 3 reports the increase in code coverage from the addition, as we show later in this section, many of the initial run to and the final run. generated constraints are highly complex and not suitable for manual inspection or fuzzing, whereas Kudzu either asserts the safety of the validation checks or finds exploits largest fraction are operations like indexOf that take string for vulnerabilities in one iteration as opposed to many rounds inputs and return an integer, which motivate the need for of random testing. a solver that reasons about integers and strings simulta- To test each subject application, we seed the system with neously. A significant fraction of the operations, including the URL of the application. For the gadgets, the URLs are subtring, split and replace, implicitly give rise the same as those used by iGoogle page to embed the gadget. to new strings from the original one, thereby giving rise We configure Kudzu to give a pre-prepared username and to constraints involving multiple string variables. Of the login password for applications that required authentication. match, split and replace operations, 31% are regular We report the results for running each application under expression based. Over 33% of the regular expressions have Kudzu, capping the testing time to a maximum of 6 hours one or more capturing parentheses. Capturing parentheses in for each application. All tests ran on a Ubuntu 9.10 Linux regular expression based match operations lead to constraints workstation with 2.2 GHz Intel dual-core processors and 2 involving multiple string variables, similar to operations such GB of RAM. as split. These characteristics show that a significant fraction of the B. Results string constraints arising in our target applications require a Table III presents the final results of testing the subject solver that can reason about multiple string variables. We applications. The summary of our evaluation highlights empirically see examples of complex regular expressions three features of Kudzu: (a) it automatically discovers new as well as concatenation operations, which stresses the program paths in real applications, significantly enhancing need for our solver that handles both word equations and code coverage; (b) it finds 2 client-side code injection in regular expression constraints. Prior to this work, off-the- the wild and several in applications that were known to shelf solvers did not support word equations and regular contain vulnerabilities; and (c) Kudzu significantly prunes expressions simultaneously. away false positives, successfully discarding cases that do Vulnerability Discovery. Kudzu is able to find client-side employ sufficient validation checks. code injection vulnerabilities in 11 of the applications tested. Characteristics of string operations in our applications. 2 of these were not known prior to these experiments Constraints arising from our applications have an average and were missed by FLAX. One of them is on a social- of 63 JavaScript string operations, while the remaining networking application (http://plaxo.com) that was are boolean, logical and arithmetic constraints. Figure 10 missed by our FLAX tool because the vulnerability exists groups the observed string operations by similarity. The on a page linked several clicks away from the initial post- 4 4 x 10 x 10 6 6

5 5 Compiled Executed Compiled Executed

4 4

3 3

2 2

Number of instructions 1 Number of instructions 1

0 0

listy listy plaxo ajaxim ajaxim plaxo birthday todo−listtv−guide parseUri birthday todo−listtv−guide parseUri notes−lab calculator academia notes−lab calculator academia ask−a−wordblock−notes block−notes progress−bar zipcode−gas ask−a−word progress−bar zipcode−gas word−monkey facebook−chat word−monkey facebook−chat calorie−watcher calorie−watcher expense−manage expense−manage Figure 11: Kudzu code coverage improvements over the Figure 12: Benefits from symbolic execution alone (dark testing period. For each experiment, the right bar shows the bars) vs. complete Kudzu (light bars). For each experiment, increase in the executed code from the initial run to total the right bar shows the increase in the total executed code code executed. The left bar shows the increase in the code when the event-space exploration is also turned on. The left compiled from initial run to the total code compiled in the bar shows the observed increase in the code compiled when entire test period. the event-space exploration is turned on.

evaluation of FLAX and Kudzu. authentication page. The vulnerable code is executed only as part of a feature in which a user sets focus on a text Code and Event-space Coverage. Table III shows the box and uses it to update his or her profile. This is one code coverage by executing the initial URL, and the final of the many different ways to update the profile that the coverage after the test period. Measuring code coverage in application provides. Kudzu found that only one of these a dynamically compiled language is challenging because all ways resulted in a client-side code injection vulnerability, the application code is not known prior to the experiments. while the rest were safe. In this particular functionality, In our experiments, we measured the total code compiled 3 the application fails to properly validate a string from a during our experiments and the total code executed . postMessage event before using it in an eval operation. Table III shows an average improvement of over 29% in The application implicitly expects to receive this message code coverage. The coverage varies significantly depending from a window hosted at a sub-domain of facebook.com; on the application. Figure 11 provides more detail. On however, Kudzu automatically determines that any web several large applications, such as Facebook Chat, AjaxIM, principal could inject any data string matching the format and Plaxo, Kudzu discovers a lot of new code during testing. FB_msg:.*{.*}. This subsequently results in code injec- Kudzu is able to concretely execute several code paths, as tion because the vulnerable application fails to validate the shown by the increase in the right-side bars in Figure 11. origin of the sender and the structure of JSON string before On the other less complex gadget applications, most of the its use in eval. relevant code is observed during compilation in the initial run itself, leaving a relatively smaller amount of new code The second new vulnerability was found in a ToDo for Kudzu to discover. We also manually analyzed the source Google/IG gadget. Similar to the previous case, the vul- code of these applications and found that a large fraction of nerability becomes reachable only when a specific value is their code branches were not dependent on data we treat as selected from a dropdown box. This interaction is among untrusted. many that the gadget provides and we believe that Kudzu’s To measure the benefits of symbolic execution alone, we automatic exploration is the key to discovering this use case. repeated the experiments with the event-space exploration In several other cases, such as AjaxIM, the vulnerable code turned off during the test period and report the comparison is executed only after several events are executed after initial to full-featured Kudzu in Figure 12. We consistently observe sign-in page—Kudzu automatically reaches them during its exploration. 3One unit of code in our experiments is a JavaScript bytecode compiled Kudzu did not find vulnerabilities in only one case that by the interpreter. To avoid counting the same bytecode across several runs, we adopted a conservative counting scheme. We assigned a unique identifier FLAX reported a bug. This is because the vulnerability to each bytecode based on the source file name, source line number, line was patched in the time period between our experimental offset and a hash of the code block (typically one function body) compiled. Application # of initial # of total Total events events fired events fired discovered manual intervention or undirected iteration. This eliminates Academia 20 78 310 false positives significantly in practice. For instance, Kudzu AJAXIm 72 481 988 found that the Facebook web application has several uses of FaceBook Chat 15 989 1354 postMessage data in eval constructs, but all uses were ParseUri 5 16 17 correctly preceded by checks that assert that the origin of the Plaxo 88 381 688 AskAWord 2 8 11 message is a domain ending in .facebook.com. In con- Birthday Reminder 12 20 20 trast, the vulnerability in Plaxo fails to check this and Kudzu Block Notes 7 85 319 identifies the vulnerability the first time it reaches that point. Calorie Watcher 14 18 22 Some of the validation checks Kudzu deals with are quite Expenses Manager 10 107 1473 Listy 15 470 638 complex — Figure 9 shows an example which is simplified Notes LP 10 592 1034 from a real application. These examples are illustrative of Progress Bar 8 24 36 the need for automated reasoning tools, because checking Simple Calculator 17 34 67 the sufficiency of such validation checks would be onerous Todo List 8 26 61 TVGuide 17 946 1517 by hand and impractical by random fuzzing. Lastly, we point Word Monkey 3 10 22 out that like most other vulnerability discovery tools, Kudzu Zip Code Gas 12 12 12 can have false negatives because it may fail to cover code, Average 18.61 238.72 477.17 or because of overly strict attack grammars. Table IV: Event space Coverage: Column 2 and 3 show the Constraint Solver Evaluation. Figure 13 shows the running number of events fired in the first run and in total. The last times for solving queries of various input constraint sizes. column shows the total events discovered during the testing. Each constraint is a either a JavaScript string, arithmetic, logical, or boolean operation. The sizes of the equations varied from 1 to up to 250 constraints. The solver decides that symbolic execution alone discovers and executes a satisfiability of the constraints typically under a second for significant fraction of the application by itself. The event- satisfiable cases. As expected, to assert unsatisfiability, the exploration combined with symbolic execution does perform solver often takes time varying from nearly a second to 50 strictly better than symbolic execution in all but 3 cases. In a seconds. The variation is large because in many cases the majority of the cases, turning on the event-space exploration solver asserts unsatisfiable by asserting the unsatisfiability significantly complements symbolic execution, especially of length constraints, which is inexpensive because the for the AJAX applications which have a significant GUI step of bit-vector encoding is avoided. In other cases, the component. In the 3 cases where improvements are not unsatisfiability results only when the solver determines the significant, we found that the event exploration generally unsatisfiability of bit-vector solutions. either led to off-site navigations or the code executed could Our solver requires only an upper bound on the lengths be explored by symbolic execution alone. For example, in of input variables, and is able to infer satisfiable lengths of the parseUri case, same code is executed by text-box input variables internally. In these experiments, we increase the as well as by clicking a button on the GUI. upper bound of the input variables from 10 to 100 characters Table IV shows the increase in number of events executed in steps of 20 each. If the solver asserts unsatisfiability by Kudzu from the initial run to the total at the end of test up to the length bound of 100, the constraints are deemed period. These events include all keyboard and mouse events unsatisfiable. which result in execution of event handlers, navigation, form submissions and so on. We find that new events are VIII. RELATED WORK dynamically generated during one particular execution as Kudzu is the first application of dynamic symbolic execu- well as when new code is discovered. As a result, Kudzu tion to client-side JavaScript. Here, we discuss some related gradually discovers new events and was able to execute projects that have applied similar techniques to server-side approximately 50% of the events it observes during the web applications, or have used different techniques to search period of testing. for JavaScript bugs. Finally, we summarize why we needed Effectiveness. Kudzu automatically generates a test suite of to build a new string constraint solver. 49 new distinct inputs on average for an application in the Server-side analysis. JavaScript application code is similar test period (shown in column 2 of table III). in some ways to server-side code (especially PHP); for In the exploitable cases we observed, Kudzu was able instance, both tend to make heavy use of string operations. to show the existence of a vulnerability with an attack Several previous tools have demonstrated the use of sym- string once it reached the point of vulnerability. That is, bolic execution for finding SQL injection and reflected or its constraint solver correctly determines the sufficiency or stored cross-site scripting attacks to code written in PHP insufficiency of validation checks in a single query without and Java [1], [19], [31]. However, JavaScript code usually 50 jax builds a static representation of a Web 2.0 application by clicking elements on the page and building a state graph from the resulting transitions. Kudzu’s value space exploration complements such GUI exploration techniques

5 and enables a more complete analysis of the application using combined symbolic execution and GUI exploration. String constraint solvers. String constraint solvers have recently seen significant development, and practical tools are beginning to become available, but as detailed in Sec- 0.5 tion IV-C, no previous solvers would be sufficient for JavaScript, since they lack support for regular expres- sions [3], [4], [7], string equality [14], or multiple vari- ables [18], which are needed in combination to reason

0.05 about JavaScript input parsing. In concurrent work, Veanes 0 50 100 150 200 250 et al. give an approach based on automata and quantified Solve Time (SAT cases) Solve Time(UNSAT cases) axioms to reduce regular expressions to the Z3 decision Figure 13: The constraint solver’s running time (in seconds) procedure [29]. Combined with [4], this would provide as a function of the size of the input constraints (in terms similar expressiveness to Kudzu. of the number of symbolic JavaScript operations) IX. CONCLUSION With the rapid growth of AJAX applications, JavaScript parses its own input, so JavaScript symbolic execution code is becoming increasingly complex. In this regard, requires more expressive constraints, specifically to relate security vulnerabilities and analysis of JavaScript code is different strings that were previously part of a single string. an important area of research. In this paper, we presented Additional challenges unique to JavaScript arise because the design of the first complete symbolic-execution based JavaScript programs take many different kinds of input, system for exploring the execution space of JavaScript some of which come via user interface events. programs. In making the system practical we addressed Like Kudzu, the Saner [2] tool for PHP aims to check challenges ranging from designing a more expressive lan- whether sanitization routines are sufficient, not just that they guage for string constraints to implementing exploration and are present. However their techniques are quite different: replay of GUI events. We have implemented our ideas in they select paths and model transformations statically, then a tool called Kudzu. Given a URL for a web application, perform testing to verify some vulnerabilities. Their def- Kudzu automatically generates a high-coverage test suite. inition of sanitization covers only string transformations, We have applied Kudzu to find client-side code injection not validation checks involving branches, which occur fre- vulnerabilities and Kudzu finds 11 vulnerabilities (2 previ- quently in our applications. ously unknown) in live applications without producing false Analysis frameworks for JavaScript. Several works have positives. recently applied static analysis to detect bugs in JavaScript X. ACKNOWLEDGMENTS applications (e.g., [9], [12]). Static analysis is complemen- tary to symbolic execution: if a static analysis is sound, an We thank David Wagner, Adam Barth, Vijay Ganesh, absence of bug reports implies the absence of bugs, but static Adam Kiezun,˙ Domagoj Babic, Adrian Mettler, Juan Ca- analysis warnings may not be enough to let a developer ballero, Pongsin Poosankam and our anonymous reviewers reproduce a failure, and in fact may be false positives. for helpful feedback on the paper and suggestions for FLAX uses taint-enhanced blackbox fuzzing to detect if improvements to earlier drafts of this paper. This material is the JavaScript application employs sufficient validation or based upon work partially supported by the National Science not [27]; like Kudzu, it searches for inputs to trigger a Foundation under Grants No. 0311808, No. 0448452, No. failure. However, FLAX requires an external test suite to 0627511, and CCF-0424422, by the Air Force Office of be able to reach the vulnerable code, whereas Kudzu gen- Scientific Research under Grant No. 22178970-4170, and erates a high-coverage test suite automatically. Also, FLAX by the Army Research Office under Grant No. DAAD19- performs only black-box fuzz testing to find vulnerabilities, 02-1-0389. Any opinions, findings, and conclusions or rec- while Kudzu’s use of a constraint solver allows it to reason ommendations expressed in this material are those of the about possible vulnerabilities based on the analyzed code. authors and do not necessarily reflect the views of the Na- Crawljax is a recently developed tool for event-space tional Science Foundation, the Air Force Office of Scientific exploration of AJAX applications [24]. Specifically, Crawl- Research, or the Army Research Office. REFERENCES Automatic creation of SQL injection and cross-site script- ing attacks. In 30th International Conference on Software [1] S. Artzi, A. Kiezun,˙ J. Dolby, F. Tip, D. Dig, A. Paradkar, and Engineering (ICSE), May 2009. M. D. Ernst. Finding bugs in dynamic web applications. In [20] A. Klein. DOM based cross site scripting or XSS of the third International Symposium on and Analysis, kind. Technical report, Web Application Security Consortium, 2008. 2005. [2] D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, [21] M. Martin and M. S. Lam. Automatic generation of XSS and C. Kruegel, and G. Vigna. Saner: Composing Static and SQL injection attacks with goal-directed model checking. In Dynamic Analysis to Validate Sanitization in Web Applica- 17th USENIX Security Symposium, 2008. tions. In Proceedings of the IEEE Symposium on Security [22] Y. Matiyasevich. Word equations, Fibonacci numbers, and and Privacy, Oakland, CA, May 2008. Hilbert’s tenth problem (extended abstract). In Workshop on [3] A. Barth, J. Caballero, and D. Song. Secure content sniffing Fibonacci Words, Turku, Finland, Sept. 2006. for web browsers or how to stop papers from reviewing [23] A. B. Matos. Periodic sets of integers. Theoretical Computer themselves. In Proceedings of the 30th IEEE Symposium on Science, 127(2):287–312, May 1994. Security and Privacy, Oakland, CA, May 2009. [24] A. Mesbah, E. Bozdag, and A. van Deursen. Crawling ajax [4] N. Bjørner, N. Tillmann, and A. Voronkov. Path feasibility by inferring user interface state changes. In Proceedings of analysis for string-manipulating programs. In Proceedings of the International Conference on Web Engineering, 2008. the 15th International Conference on Tools and Algorithms [25] W. Plandowski. Satisfiability of word equations with con- for the Construction and Analysis of Systems, 2009. stants is in PSPACE. In 40th Annual Symposium on Foun- [5] H. Bojinov, E. Bursztein, and D. Boneh. XCS: Cross channel dations of Computer Science (FOCS), pages 495–500, Oct. scripting and its impact on web applications. In CCS, 2009. 1999. [6] J. R. Buchi¨ and S. Senger. Definability in the existential [26] Samy. I’m popular. Description of the MySpace worm by theory of concatenation and undecidable extensions of this the author, including a technical explanation., Oct 2005. theory. Mathematical Logic Quarterly, 34(4):337–342, 1988. [27] P. Saxena, S. Hanna, P. Poosankam, and D. Song. FLAX: [7] J. Caballero, S. McCamant, A. Barth, and D. Song. Ex- Systematic discovery of client-side validation vulnerabilities tracting models of security-sensitive operations using string- in rich web applications. In 17th Annual Network & Dis- enhanced white-box exploration on binaries. Technical Re- tributed System Security Symposium, (NDSS), 2010. port UCB/EECS-2009-36, EECS Department, University of [28] TwitPwn. DOM based XSS in Twitterfall. 2009. California, Berkeley, 2009. [29] M. Veanes, P. de Halleux, and N. Tillmann. Rex: Symbolic [8] A. Chandra, J. Halpern, A. Meyer, and R. Parikh. Equations regular expression explorer. In International Conference on between regular terms and an application to process logic. In Software Testing, Verification and Validation, 2010. Proceedings of the 13th annual ACM Symposium on Theory of Computing (STOC), pages 384–390, May 1981. [30] W3C. HTML 5 specification. http://www.w3.org/TR/html5/. [9] R. Chugh, J. A. Meister, R. Jhala, and S. Lerner. Staged [31] G. Wassermann, D. Yu, A. Chander, D. Dhurjati, H. Inamura, information flow for JavaScript. In PLDI, 2009. and Z. Su. Dynamic test input generation for web appli- cations. In Proceedings of the International symposium on [10] V. Ganesh and D. L. Dill. A decision procedure for bit-vectors Software testing and analysis, 2008. and arrays. In Computer Aided Verification, 19th International Conference (CAV), pages 519–531, July 2007. [32] XML path language 2.0. http://www.w3.org/TR/xpath20/. [11] P. Godefroid, M. Y. Levin, and D. Molnar. Automated [33] F. Yu, T. Bultan, and O. H. Ibarra. Symbolic string verifica- whitebox fuzz testing. In Network and Distributed System tion: Combining string analysis and size analysis. In Tools Security, Feb. 2008. and Algorithms for the Construction and Analysis of Systems (TACAS), pages 322–336, Mar. 2009. [12] S. Guarnieri and B. Livshits. Gatekeeper: mostly static enforcement of security and reliability policies for JavaScript code. In Usenix Security, 2009. APPENDIX [13] R. Hansen. XSS cheat sheet. http://ha.ckers.org/xss.html. [14] P. Hooimeijer and W. Weimer. A decision procedure for A. Expressiveness subset constraints over regular languages. In ACM SIGPLAN Conference on Programming Language Design and Imple- Each of the following constraint types allows new sets of mentation (PLDI), pages 188–198, June 2009. strings to be defined: [15] H. B. Hunt III. The equivalence problem for regular expres- • The theory of just concatenation constraints v1 = v2◦v3 sions with intersection is not polynomial in tape. Technical (plus constant strings) is equivalent to the theory of Report 73-161, Cornell University Department of Computer Science, Mar. 1973. word equations, which is known to be neither a subset [16] Kaluza. Kaluza web page. nor a superset of the theory of regular expressions in http://webblaze.cs.berkeley.edu/2010/kaluza/, 2010. terms of expressiveness [17]. For instance, the set of [17] J. Karhumaki,¨ W. Plandowski, and F. Mignosi. The express- strings that consist of a single string repeated {ww : ibility of languages and relations by word equations. In w ∈ Σ∗} can be easily expressed with a word equation, Automata, Languages and Programming, 24th International but is not regular. Colloquium, (ICALP), pages 98–109, 1997. • Conversely, the regular languages expressed by regular [18] A. Kiezun,˙ V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. HAMPI: A solver for string constraints. In Interna- expressions include sets of strings that cannot be ex- tional Symposium on Software Testing and Analysis, 2009. pressed solely with word equations. An example given [19] A. Kiezun,˙ P. J. Guo, K. Jayaraman, and M. D. Ernst. in [17] is the set of strings consisting of a and b over a three-letter alphabet {a, b, c}, which is represented adjacency ordering is that for each original constraint S1 = by the regular expression (a|b)*. S2 ◦ S3, the first length(S2) characters of S1 should be • The constraint length(v1) = length(v2) also adds constrained to be equal to the characters of S2, and similarly expressiveness, since the language {u$v : |u| = |v|} is between the remaining length(S3) characters of S1 and not regular, and the relation of two strings having equal S3. When the nodes are adjacent in the concatenation graph, length also can not be expressed in word equations for this holds because the characters are in fact the same char- any non-unary alphabet [17]. acters; when nodes are copied to resolve ordering conflicts, the invariant holds because the copies are constrained to be B. Complexity equal to the originals. The problem of deciding Kudzu’s core constraint language Length abstraction. The key property of the length con- is at least PSPACE-hard, which can be seen by adapting the straints synthesized in step 2 is that they be a sound construction given by Hunt [15] to prove the same hardness abstraction of the possible lengths of strings that could for the emptiness problem for regular expressions with an satisfy the constraints: for any strings that satisfy a string intersection operator. The construction is a reduction from constraint, their lengths must satisfy the corresponding the acceptance problem for a linearly-bounded automaton; length constraint. It is not necessary for correctness that since it uses the intersection operator only at the top level the abstraction be precise (that every length correspond of the constructed expression, it can also be expressed to a possible string solution), though precision improves as a conjunction of regular expression constraints in our the solver’s performance by pruning the search space of language. (The best known algorithms for the word equation lengths. For the concatenation, string equality, and length- problem are also in PSPACE [25], but the best known lower related constraints, the correctness of the length abstractions bound for that fragment is only NP-hardness.) is immediate. For a detailed discussion of the operation On the upper bound side, the complexity of our deciding LengthSet(R) we refer readers to [23]. our constraints is still an open problem. The fragment consisting of word equations and length equalities is known Soundness. For soundness, suppose that the algorithm pro- to be strictly more expressive that word equations alone [6], duces a solution that assigns string si with length li to each [17], but its decidability has been mentioned as open by string variable Si that appears in a constraint set. By the several authors [4], [6], [22]. Our core constraint language soundness of the bit-vector solver, the solution corresponds does not contain any features that obviously cause it to be to an assignment of truth values to each string constraint in undecidable, but it is strictly more expressive than fragment the formula that gives the formula the boolean result true. of word equations and length equalities, so its decidability Let the constraint literals be a set which contains C for is also an open problem. each constraint C assigned true, and ¬C for each constraint The potential difficulty of deciding the unbounded version C assigned false. By the soundness of the bit-vector solver, of our core constraint language (even PSPACE problems the lengths li are a correct solution to the length constraints, are usually infeasible at scale) motivates our decision in the so the length-related constraint literals will be satisfied. practical system to search just for solutions using strings of By the soundness of the translation of regular expression lengths bounded by a user-specified parameter. The bounded constraints and the bit-vector solver, si ∈ L(Ri) for each version of the constraint decision problem is NP-complete constraint Si ∈ Ri, so each regular expression constraint (where the size of the input is measured by the bound, for- literal is satisfied. Finally, for concatenation and string mally by specifying the bound in unary). The reduction from equality constraints, step 1 guaranteed that the corresponding SAT is immediate, since the constraint language includes characters were either identified or constrained to be equal, arbitrary boolean combinations, and the bounded language so by construction or by the soundness of the bit-vector is in NP because solutions (of size proportional to the bound) solver (respectively), their constraint literals will be satisfied. can be checked in polynomial time using standard regular Thus the assignment produced by the algorithm gives an expression matching algorithms. appropriate value to each constraint literal such that the constraint formula is satisfied. C. Correctness Bounded completeness. For bounded completeness, sup- We discuss the correctness of our algorithm in terms of pose to the contrary that there exists a set of solu- soundness, completeness (up to the bound), and termination. tion strings si with lengths li for a set of constraints, These properties build on the corresponding properties for but that the algorithm when run on those constraints the underlying bit-vector solver and the translation from prints Unsatisfiable. Consider two cases, according to regular expressions to bit-vectors, which have already been whether the lengths li were ever returned by the procedure established. SolveLengths during execution. If the lengths were returned Concatenation constraints. The key invariant established on some iteration j, then by the bounded completeness of by the translation from concatenation constraints to an the regular expression translation, and the correctness of the translation of string equality and concatenation constraints, the bit-vector constraints for those lengths must be satisfiable by the string values si. However, the algorithm could only have printed Unsatisfiable if the bit-vector solver found the bit-vector constraints unsatisfiable, contradicting its completeness. On the other hand, suppose that the lengths li were never returned by SolveLengths. Because the program printed Unsatisfiable, the final call to SolveLengths must have returned UNSAT when supplied with a set of length constraints L, and some number of failed length assignments not including the length assignment li. By the soundness of the length abstraction, the lengths li satisfy L, but since the li assignment satisfies L and was not excluded, the completeness of the bit-vector solver implies that it should have been returned by SolveLengths, contradicting our assumption that it was never returned by SolveLengths. Thus neither case is possible, so we reject the assumption and conclude that if a solution exists, the algorithm will not print Unsatisfiable. Termination. For termination, only the main while loop must be considered, since all of the subroutines are guaran- teed to terminate. To see that the loop terminates, we observe that the set FailLenDB of failed length assignments grows on each iteration of the main loop, since when SolveLengths gives SAT, it always returns a new length assignment distinct from those previously in FailLenDB. On the other hand, the length assignments returned by SolveLengths always satisfy the length bounds in C. But because the length of each string is a non-negative integer no greater than the corresponding length bound, there are only finitely many possible length assignments that satisfy the bounds. Thus the correctness of SolveLengths and the fact that FailLenDB grows monotonically together guarantee that SolveLengths will eventually return UNSAT, causing the algorithm to terminate.