Symbolic Execution for Javascript
Total Page:16
File Type:pdf, Size:1020Kb
Symbolic Execution for JavaScript José Fragoso Santos Petar Maksimović Théotime Grohens Imperial College London, UK Imperial College London, UK ENS Paris, France [email protected] Mathematical Institute SASA, Serbia [email protected] [email protected] Julian Dolby Philippa Gardner IBM Research, New York, USA Imperial College London, UK [email protected] [email protected] ABSTRACT Whole-program analysis is not sufficient for JavaScript appli- We present a framework for trustworthy symbolic execution of cations, which are commonly run in highly dynamic execution JavaScripts programs, whose aim is to assist developers in the test- environments: for example, client-side programs dynamically load ing of their code: the developer writes symbolic tests for which and execute third-party code as a matter of course. One also needs the framework provides concrete counter-models. We create the to be able to analyse incomplete code in a compositional manner. framework following a new, general methodology for designing Designing compositional analyses for dynamic languages such as compositional program analyses for dynamic languages. We prove JavaScript, however, is non-trivial. This is because, unlike static lan- that the underlying symbolic execution is sound and does not gen- guages such as C, C++ and Java, dynamic languages do not observe erate false positives. We establish additional trust by using the the frame property [39], essential for compositionality. Intuitively, theory to precisely guide the implementation and by thorough test- the frame property means that the output of a program cannot ing. We apply our framework to whole-program symbolic testing change if the state in which it is run is extended. In JavaScript, of real-world JavaScript libraries and compositional debugging of it is possible to introduce bugs by extending the state in which a separation logic specifications of JavaScript programs. program is run. To our knowledge, there has been no previous work on compositional symbolic execution for JavaScript. CCS CONCEPTS We introduce Cosette, a framework for trusted symbolic execu- tion of JavaScript (ECMAScript 5 Strict [18]). Its aim is to provide • Theory of computation → Separation logic; Program anal- general-purpose symbolic analysis and assist developers in the test- ysis; Logic and verification; • Software and its engineering → ing of their code. We present a new, general methodology for deve- Automated static analysis; loping compositional analyses for languages that do not satisfy the KEYWORDS frame property, and apply it to the design of the symbolic execution of Cosette. In contrast to existing tools, the symbolic analysis of Symbolic execution, Separation logic, Formal semantics, JavaScript Cosette is fully formalised in a way that guides the implementation ACM Reference Format: and is also trusted, in that it follows the JavaScript semantics and José Fragoso Santos, Petar Maksimović, Théotime Grohens, Julian Dolby, does not produce false positive bug reports. We apply Cosette to and Philippa Gardner. 2018. Symbolic Execution for JavaScript. In Pro- whole-program symbolic testing of real-world JavaScript libraries ceedings of The 20th International Symposium on Principles and Practice of and compositional debugging of separation logic specifications of Editor E. Editor (Ed.). ACM, New York, Declarative Programming (PPDP’18), JavaScript programs. We illustrate these use cases in §2. NY, USA, Article -, 14 pages. https://doi.org/sampleDOI We show the architec- 1 INTRODUCTION ture of Cosette on the right. We extend JS with con- JavaScript (JS) is the most widespread dynamic language, used by structs for creating and 95.1% of websites [51]. Due to its dynamic, complex nature, it is a reasoning about symbolic difficult target for symbolic analysis. Recent symbolic execution values. Using JS-2-JSIL [22], tools for JS [30, 42, 52] have made significant progress: they are fully a trusted compiler from JS automatic, aim at code in the large, and primarily focus on scalabil- to the JSIL intermediate ity and coverage issues. However, they do have some limitations. language, we compile the extended JS program to an extended They only target specific bug patterns rather than general-purpose JSIL program. The core of Cosette is a new symbolic interpreter for symbolic execution, and depend on whole-program analysis. JSIL, written in Rosette [48, 49], a framework for creating symbolic Permission to make digital or hard copies of part or all of this work for personal or interpreters. The JSIL symbolic interpreter either outputs a concrete classroom use is granted without fee provided that copies are not made or distributed counter-model for the given assertions, or guarantees correctness for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. up to a given bound for unfolding loops and recursive predicates. For all other uses, contact the owner/author(s). Our new, general methodology underpinning the compositional PPDP’18, September 2018, Frankfurt, Germany JSIL symbolic interpreter consists of three stages: (1) we design a © 2018 Copyright held by the owner/author(s). ACM ISBN sampleISBN. JSIL instrumented semantics that exhibits the frame property by https://doi.org/sampleDOI explicitly keeping track of object properties that we know are not PPDP’18, September 2018, Frankfurt, Germany J. Fragoso Santos, P. Maksimović, T. Grohens, J. Dolby, P. Gardner present; (2) we define the JSIL symbolic semantics by lifting theJSIL Our running example is a key-value map implementation, given instrumented semantics; and (3) we link the JSIL symbolic semantics in Figure 1a. It contains four functions: Map, for constructing an to the JSIL concrete semantics by describing the frames that can empty map; get, for retrieving the value associated with a given be safely added to the initial state. The key innovation is to have key; put, for inserting/updating key-value pairs; and validKey, for the instrumented semantics as a proper interim stage in the design deciding whether or not a key is valid. The library implements a of the symbolic execution, obtaining more modular reasoning and key-value map as an object with property _contents, denoting the substantially simpler proofs. We present this methodology in §3. object storing the map contents. The named properties of _contents An essential goal for us was to establish trust in Cosette. In §3.5, and their value attributes correspond to the map keys and values. we give a bounded soundness result for the JSIL symbolic semantics The functions get, put, and validKey are shared between all map and prove that Cosette never produces false positive bug reports. objects, as they are defined in Map.prototype, the prototype1 of These results, combined with the correctness of JS-2-JSIL and the objects created using Map as a constructor. The get function returns fact that the memory models of JavaScript and JSIL are the same the value associated with a given key, or null if the key is not in by design, enable us to lift the results of analyses done on com- the map. To check that the given key is in the map, get uses the piled JSIL code back to JS (cf. §3.7, §4.4). We implement the JSIL built-in function hasOwnProperty, which lives in Object.prototype, instrumented interpreter following the instrumented semantics of the default prototype of all objects. The put function updates the §3.3 to the letter. We test the combination of JS-2-JSIL and the JSIL map if the supplied key is valid, and otherwise throws an error. instrumented interpreter using Test262, the official ECMAScript In Figure 1b, we show a general heap of key-value maps created test suite [19]. Out of the 10469 tests for ES5 Strict, we identify by the library. The map object, whose prototype is Map.prototype, 8330 tests appropriate for our coverage, of which we pass 100%. has the property _contents, pointing to the contents object. The Finally, we ensure that the JSIL symbolic interpreter obtained by contents object holds the key-value pairs, and its prototype is the Rosette lifting of the instrumented interpreter is consistent with Object.prototype. The Map.prototype object holds the get, put, and the symbolic semantics of §3.4 by constructing symbolic unit tests validKey functions,2 and its prototype is also Object.prototype. Fi- for each JSIL command, assuming the premises and asserting the nally, the Object.prototype holds the hasOwnProperty function that conclusion of the appropriate rule of the symbolic semantics. is called by Map.prototype.get. We apply Cosette to whole-program symbolic testing. A gen- 2.1 Whole-program Symbolic Testing eral developer can use Cosette for symbolic testing of their code by having symbolic inputs instead of concrete inputs and stating Developers are used to writing unit tests—verifying that, given the constraints that the output needs to satisfy as simple, intuitive some concrete inputs, their code produces the expected outputs. first-order assertions over these inputs. If a test fails, Cosette pro- Using Cosette, they can write unit tests with symbolic inputs and vides the concrete input that causes it to fail, exposing bugs in the outputs, testing a broad range of behaviours with a single symbolic tested code. In §5, we evaluate Cosette on two real-world JavaScript test. For example, one unit test for the put function consists of data structure libraries, where, using fewer tests, we achieve better inserting a valid key-value pair (k, v) into a map and then verifying coverage (100%) than the concrete unit test suites shipped with the that it has been inserted correctly. In Cosette, this test can be written libraries, and discover unexpected bugs in both libraries. as in Figure 1c. First, we declare k to be a symbolic string and v We also apply Cosette to specification-driven bug-finding.