Type Refinement for Static Analysis of Javascript

Type Refinement for Static Analysis of JavaScript Vineeth Kashyapy John Sarracinoy, z John Wagnery Ben Wiedermannz Ben Hardekopfy yUniversity of California, Santa Barbara zHarvey Mudd College [email protected] john [email protected] john [email protected] [email protected] [email protected] Abstract dynamic nature of these languages makes creating precise, sound, Static analysis of JavaScript has proven useful for a variety of pur- and efficient static analyses a daunting task. poses, including optimization, error checking, security auditing, In this paper we focus on the static analysis of JavaScript, program refactoring, and more. We propose a technique called type though in principle our proposed techniques are applicable to other refinement that can improve the precision of such static analyses dynamic languages as well. Our work is complementary to other re- for JavaScript without any discernible performance impact. Refine- cent work on JavaScript analysis, which has focused on understand- ment is a known technique that uses the conditions in branch guards ing a program’s types by proposing various novel abstract domains to refine the analysis information propagated along each branch to track type information [24, 27]. This focus on types is essential path. The key insight of this paper is to recognize that JavaScript for JavaScript analysis; because JavaScript behavior relies heavily semantics include many implicit conditional checks on types, and on the runtime types of the values being operated on, understanding that performing type refinement on these implicit checks provides types is a necessary prerequisite to understanding many other prop- significant benefit for analysis precision. erties of program behavior. However, with one exception (discussed To demonstrate the effectiveness of type refinement, we im- further in Section 5) this prior work on JavaScript analysis has ig- plement a static analysis tool for reporting potential type-errors in nored an observation that has been profitably exploited in more tra- JavaScript programs. We provide an extensive empirical evaluation ditional static analyses: that branch conditions (i.e., predicates that of type refinement using a benchmark suite containing a variety of determine a program’s control flow) necessarily constrain the set of JavaScript application domains, ranging from the standard perfor- values that can flow into the corresponding branches. This observa- mance benchmark suites (Sunspider and Octane), to open-source tion can be used to refine the abstract information propagated by JavaScript applications, to machine-generated JavaScript via Em- the static analysis within each branch, thus improving the precision scripten. We show that type refinement can significantly improve of the analysis. The details of how this concept works and how it analysis precision by up to 86% without affecting the performance can be applied to improve the precision of static analysis are ex- of the analysis. plained in Appendix A (for any analysis in general) and Section 2 (for JavaScript analysis specifically). Categories and Subject Descriptors F.3.2 [Semantics of Pro- While this general observation is well-known in the static anal- gramming Languages]: Program Analysis ysis community, applying it specifically to JavaScript raises several important questions that must be answered to gain any useful benefit: (1) what kinds of conditions provide the most useful informa- 1. Introduction tion for refinement; (2) how prevalent are these kinds of conditions Dynamic languages have become ubiquitous. For example, Java- in realistic JavaScript programs; and (3) how can we best exploit Script is used to implement a large amount of critical online infras- these conditions, based on their prevalence and usefulness, to sub- tructure, including web applications, browser addons/extensions, stantially increase the precision of static analysis? and interpreters such as Adobe Flash. In response to the growing 1.1 Key Insight prominence of dynamic languages, the research community has be- gun to investigate how to apply static analysis techniques in this do- Our key insight that informs our proposed technique is that the main. Static analysis is used to deduce properties of a program’s ex- most prevalent and useful conditional branches are not explicit ecution behavior; these properties can be used for a variety of use- in the text of JavaScript program, i.e., these conditions do not ful purposes including optimization [21, 27], error checking [33], show up syntactically as if or while statements. Rather, they are verification [11], security auditing [18, 19], and program refactor- implicit in the JavaScript semantics themselves. As an example, ing [13], among other uses. However, dynamic languages present a consider the statement var result = myString.length;. While unique challenge to static analysis, inherent in their very name: the syntactically there are no conditional branches in this statement, during execution there are several conditional branches taken by the JavaScript interpreter: Permission to make digital or hard copies of all or part of this work for personal or • Is myString either null or undefined? If so then raise a type-error classroom use is granted without fee provided that copies are not made or distributed exception, otherwise continue execution. for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM • Is myString a primitive value or an object? If it’s a primitive must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, value then convert it to an object first, then access the length to post on servers or to redistribute to lists, requires prior specific permission and/or a property; otherwise just access the length property. fee. Request permissions from [email protected]. DLS ’13, October 28, 2013, Indianapolis, Indiana, USA. • Does the object (or one of its prototypes) contain a length Copyright c 2013 ACM 978-1-4503-2433-5/13/10. $15.00. property? If so then return the corresponding value, otherwise http://dx.doi.org/10.1145/2508168.2508175 return undefined. Our thesis is that JavaScript static analysis can take advantage of 2.1 Key Insight these implicit conditional executions to refine the type information While the typeof check is an obvious candidate for refinement, our of the abstract values being propagated by the analysis, and that this key insight is that most of the conditionals involving types aren’t type refinement can provide significant improvement in analysis even syntactically present in the JavaScript program—rather, they precision. are implicit in the semantics of the JavaScript language itself. 1.2 Contributions Consider the following statement: Our specific contributions are: var x = myString[i]; • A definition of type refinement for static analysis of JavaScript, including several variations that use different kinds of condi- This seemingly simple statement requires a large number of tions to refine types (Section 2). implicit type checks. Example 1 makes all of these checks explicit. None of these checks involve typeof. Instead, we see three new • An empirical evaluation of the proposed type refinement varia- kinds of conditions that involve type information. tions (Section 4). This evaluation is carried out on a more com- prehensive set of JavaScript benchmarks than any presented in previous literature on JavaScript static analysis; it includes not Example 1 The semantics of var x = myString[i]; only the standard SunSpider and V8 benchmark suites, but also a number of open-source JavaScript applications [1, 3] and a 1: if myString is null or undefined then number of machine-generated JavaScript programs created us- 2: type-error ing Emscripten [2]. 3: else • A set of recommendations for including type refinement in 4: // convert myString to an object first? JavaScript analyses (Section 6). Our evaluation shows that 5: if myString is a primitive then taking advantage of implicit conditional branches provides 6: obj = toObject(myString) a critical precision advantage for finding type errors, while 7: else 8: obj = myString the explicit typeof conditional branches exploited in previous 9: end if work [20] provide only marginal benefit. We conclude that type refinement is a promising technique for 10: // convert i to a string JavaScript analysis. This technique’s design is informed by the se- 11: // case 1: i is a primitive mantics of JavaScript, enabling it to take advantage of language 12: if i is a primitive then features hidden from plain sight and thus gain precision that would 13: prop = toString(i) be lost by a technique that does not specifically exploit JavaScript 14: else semantics. Furthermore, type refinement is orthogonal to the ques- 15: if i.toString is callable then tion of designing abstract domains for JavaScript analysis; this 16: tmp = i.toString() means that it can profitably be combined with interesting new ab- 17: else stract domains in the future to achieve even better results. 18: goto line 26 19: end if 2. The Potential for Refinement in JavaScript 20: end if Refinement allows an analysis to safely replace a less-precise an- 21: // case 2: i is not a primitive, but i.toString() is swer with a more-precise answer. Appendix A gives suitable back- 22: if tmp is a primitive then ground on static analysis and the concept of refinement; readers 23: prop = toString(tmp) unfamiliar with these notions may wish to refer to that appendix 24: // case 3: i.toString() is not a primitive; try i.valueOf() before continuing. Refinement can apply to many different abstract 25: else domains for analysis, but we hypothesize that, for JavaScript, the 26: if i.valueOf is callable then abstract domain of types is a particularly fruitful target for refine- 27: tmp2 = i.valueOf() ment. In JavaScript, as with many dynamic languages, the type of a 28: else value strongly influences the behavior of a program. Thus, refining 29: type-error type information intuitively would seem likely to improve the pre- 30: end if cision of JavaScript static analysis (and our empirical results bear out this intuition).

Load more