ac tif t r Comp * * let A nt e e * A t s W i E * s e n l C l o
I D
C
o *
D
*
c
u
L e
m s
E
u P
e
e
n
R
t
v
e
Refinement Types for TypeScript o
d
t
* y
* s
E a
a
l d u e a t
Panagiotis Vekris Benjamin Cosman Ranjit Jhala University of California, San Diego, USA {pvekris, blcosman, jhala}@cs.ucsd.edu
Abstract that were once solely in the functional realm. This trend We present Refined TypeScript (RSC), a lightweight refine- towards abstraction and reuse poses two related problems ment type system for TypeScript, that enables static verifica- for static analysis: modularity and extensibility. First, how tion of higher-order, imperative programs. We develop a for- should analysis precisely track the flow of values across mal system for RSC that delineates the interaction between higher-order functions and containers or modularly account refinement types and mutability, and enables flow-sensitive for external code like closures or library calls? Second, how reasoning by translating input programs to an equivalent in- can analyses be easily extended to new, domain specific termediate SSA form. By establishing type safety for the properties, ideally by developers, while they are designing intermediate form, we prove safety for the input programs. and implementing the code? (As opposed to by experts who Next, we extend the core to account for imperative and dy- can at best develop custom analyses run ex post facto and are namic features of TypeScript, including overloading, type of little use during development.) reflection, ad hoc type hierarchies and object initialization. Refinement types hold the promise of a precise, modular Finally, we evaluate RSC on a set of real-world benchmarks, and extensible analysis for programs with higher-order func- including parts of the Octane benchmarks, D3, Transducers, tions and containers. Here, basic types are decorated with and the TypeScript compiler. We show how RSC success- refinement predicates that constrain the values inhabiting the fully establishes a number of value dependent properties, type [29, 40]. The extensibility and modularity offered by such as the safety of array accesses and downcasts, while refinement types have enabled their use in a variety of ap- plications in typed, functional languages, like ML [28, 40], incurring a modest overhead in type annotations and code ] restructuring. Haskell [37], and F [33]. Unfortunately, attempts to apply refinement typing to scripts have proven to be impractical Categories and Subject Descriptors D.3.3 [Programming due to the interaction of the machinery that accounts for im- Languages]: Language Constructs and Features – Classes perative updates and higher-order functions [5] (§6). and objects, Constraints, Polymorphism; F.3.3 [Logics and In this paper, we introduce Refined TypeScript (RSC): a Meanings of Programs]: Studies of Program Constructs – novel, lightweight refinement type system for TypeScript, a Object-oriented constructs, Type structure typed superset of JavaScript. Our design of RSC addresses General Terms Languages, Verification three intertwined problems by carefully integrating and ex- tending existing ideas from the literature. First, RSC ac- Keywords Refinement Types, TypeScript, Type Systems, counts for mutation by using ideas from IGJ [42] to track Immutability which fields may be mutated, and to allow refinements to de- 1. Introduction pend on immutable fields, and by using SSA-form to recover path and flow-sensitivity. Second, RSC accounts for dynamic Modern scripting languages – like JavaScript, Python, and typing by using a recently proposed technique called Two- Ruby – have popularized the use of higher-order constructs Phase Typing [39], where dynamic behaviors are specified via union and intersection types, and verified by reduction to refinement typing. Third, the above are carefully designed to permit refinement inference via the Liquid Types [28] frame-
Permission to make digital or hard copies of all or part of this work for personal or work to render refinement typing practical on real-world pro- classroom use is granted without fee provided that copies are not made or distributed grams. Concretely, we make the following contributions: for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a • We develop a core calculus that permits locally flow- fee. Request permissions from [email protected]. sensitive reasoning via SSA translation and formalizes PLDI’16, June 13–17, 2016, Santa Barbara, CA, USA c 2016 ACM. 978-1-4503-4261-2/16/06...$15.00 the interaction of mutability and refinements via declara- http://dx.doi.org/10.1145/2908080.2908110 tive refinement type checking that we prove sound (§3).
310 (x : T , . . . , x : T ) ⇒ T function reduce(a, f, x) { Summaries Function Types 1 1 n n , var res = x; where arguments are named xi and have types Ti and the for (var i = 0; i < a.length; i++) output is a T, are used to specify the behavior of functions. res = f(res, a[i], i); In essence, the input types Ti specify the function’s precon- return res; } ditions, and the output type T describes the postcondition. Each input type and the output type can refer to the argu- function minIndex(a) { ments xi, yielding precise function contracts. For example, if (a. length ≤ 0) return -1; (x:nat) ⇒ {ν :nat | x < ν} is a function type that de- function step(min, cur, i) { return cur < a[min] ? i : min; scribes functions that require a non-negative input, and en- } sure that the output exceeds the input. return reduce(a, step, 0); Higher-Order Summaries This approach generalizes di- } rectly to precise descriptions for higher-order functions. For Figure 1: Computing the Min-Valued Index with reduce example, reduce from Figure1 can be specified as Treduce:
(a:A[], f:(B, A, idx)⇒B, x:B)⇒B (1) • We extend the core language to TypeScript by describing how we account for its various dynamic and imperative This type is a precise summary for the higher-order behavior features; in particular we show how RSC accounts for of reduce: it describes the relationship between the input type reflection via intersection types, encodes interface array a, the step (“callback”) function f, and the initial value hierarchies via refinements and handles object initializa- of the accumulator, and stipulates that the output satisfies the tion (§4). same properties B as the input x. Furthermore, it critically specifies that the callback f is only invoked on valid indices • We implement rsc, a refinement type checker for Type- for the array a being reduced. Script, and evaluate it on a suite of real-world pro- grams from the Octane benchmarks, Transducers, D3 2.1 Applications 1 and the TypeScript compiler . We show that RSC’s re- Next, we show how refinement types let programmers spec- finement typing is modular enough to analyze higher- ify and statically verify a variety of properties — array safety, order functions, collections and external code, and ex- reflection (value-based overloading), and downcasts — po- tensible enough to verify a variety of properties from tential sources of runtime problems that cannot be prevented classic array-bounds checking to program specific invari- via existing techniques. ants needed to ensure safe reflection: critical invariants that are well beyond the scope of existing techniques for 2.1.1 Array Bounds imperative scripting languages (§5). Specification We specify safety by defining suitable refine- ment types for array creation and access. For example, we 2. Overview view read a[i], write a[i] = e and length access a.length as calls get(a,i), set(a,i,e) and length(a) where We begin with a high-level overview of refinement types get :
311 which reduces to the logical verification condition (VC) 2.1.2 Overloading Dynamic languages extensively use value-based overload- 0 < len(arr) ⇒ (ν = 0 ⇒ 0 ≤ ν < len(arr)) ing to simplify library interfaces. For example, a library may export The VC is proved valid by an SMT solver [24], verifying subtyping, and hence, the array access’ safety. function $reduce(a, f, x) { if (arguments.length===3) return reduce(a,f,x); Path Sensitivity is obtained by adding branch conditions into return reduce(a.slice(1),f,a[0]); the typing environment. Consider } function head0(a:number[]): number{ The function $reduce has two distinct types depending on if (0 < a.length) return head (a); its parameters’ values, rendering it impossible to statically return 0; type without path-sensitivity. Such overloading is ubiqui- } tous: in more than 25% of TypeScript libraries, more than Recall that head should only be invoked with non-empty 25% of the functions are value-overloaded [39]. arrays. The call to head above occurs under Γhead0 defined Intersection Types Refinements let us statically verify value- as: a : number[], 0 < len(a) i.e. which has the binder for based overloading via Two-Phase Typing [39]. First, we the formal a, and the guard predicate established by the specify overloading as an intersection type. For example, branch condition. Thus, the call to head yields the obligation $reduce gets the following signature, which is just the con- junction of the two overloaded behaviors: Γ ` {ν = a} v NEArray hnumberi head0 (a:A[]+ , f:(A, A, idx) ⇒A)⇒A//1 (a:A[] , f:(B, A, idx) ⇒B, x:B)⇒B//2 yielding the valid VC The type A[]+ in the first conjunct indicates that the first 0 < len(a) ⇒ (ν = a ⇒ 0 < len(ν)) argument needs to be a non-empty array, so that the call to slice and the access of a[0] both succeed. Polymorphic, Higher-Order Functions Next, let us assume Dead Code Assertions Second, we check each conjunct that reduce has the type Treduce described in (1), and see separately, replacing ill-typed terms in each context with how to verify the array safety of minIndex (Figure1). The assert(false). This requires the refinement type checker challenge here is to precisely track which values can flow to prove that the corresponding expressions are dead code, into min (used to index into a), which is tricky since those as assert requires its argument to always be true: values are actually produced inside reduce. assert : (b:{v:bool | v = true}) ⇒ A Types make it easy to track such flows: we need only de- To check $reduce, we specialize it per overload context: termine the instantiation of the polymorphic type variables of reduce at this call site inside minIndex. The type of the function $reduce1(a,f) { if (arguments.length===3) return assert(false); f parameter in the instantiated type corresponds to a signa- return reduce(a.slice(1), f, a[0]); ture for the closure step which will let us verify the clo- } sure’s implementation. Here, rsc automatically instantiates (by building complex logical predicates from simple terms function $reduce2(a,f,x) { if (arguments.length===3) return reduce(a,f,x); that have been predefined in a prelude) return assert(false); } A 7→ numberB 7→ idx hai (2) In each case, the “ill-typed” term (for the corresponding assert(false) Let us reassure ourselves that this instantiation is valid, input context) is replaced with . Refinement assert by checking that step and 0 satisfy the instantiated type. If typing easily verifies the s, as they respectively occur under the inconsistent environments we substitute (2) into Treduce we obtain the following types . for step and 0, i.e. reduce’s second and third arguments: Γ1 = arguments:{len(ν) = 2}, len(arguments) = 3 . step :(idx,number,idx) ⇒idx 0:idx Γ2 = arguments:{len(ν) = 3}, len(arguments) 6= 3 The initial value 0 is indeed a valid idx thanks to the which bind arguments to an array-like object corresponding a.length check at the start of the function. To check step, to the arguments passed to that function, and include the assume that its inputs have the above types: branch condition under which the call to assert occurs. min:idx, curr:number, i:idx 2.2 Analysis The body is safe as the index i is trivially a subtype of the Next, we outline how rsc uses refinement types to ana- required idx, and the output is one of min or i and hence, lyze programs with closures, polymorphism, assignments, of type idx as required. classes and mutation.
312 2.2.1 Polymorphic Instantiation 2.2.2 Assignments rsc uses the framework of Liquid Typing [28] to automat- Next, let us see how the signature for reduce in Figure1 is ically synthesize the instantiations of (2). In a nutshell, rsc verified by rsc. Unlike in the functional setting, where re- (a) creates templates for unknown refinement type instan- finements have previously been studied, here, we must deal tiations, (b) performs type checking over the templates to with imperative features like assignments and for-loops. generate subtyping constraints over the templates that cap- SSA Transformation We solve this problem in three steps. ture value-flow in the program, (c) solves the constraints via First, we convert the code into SSA form, to introduce new a fixpoint computation (abstract interpretation). binders at each assignment. Second, we generate fresh tem- Step 1: Templates Recall that reduce has the polymorphic plates that represent the unknown types (i.e. set of values) type Treduce. At the call-site in minIndex, the type variables for each Φ-variable. Third, we generate and solve the sub- A, B are instantiated with the known base-type number. Thus, typing constraints to infer the types for the Φ-variables, and rsc creates fresh templates for the (instantiated) A, B: hence, the “loop-invariants” needed for verification. Let us see how this process lets us verify reduce from A 7→ {ν :number | κA} B 7→ {ν :number | κB} Figure1. First, we convert the body to SSA form (§3.3): where the refinement variables κA and κB represent the un- known refinements. We substitute the above in the signature function reduce(a, f, x) { var r0 = x, i0 = 0; . . for reduce to obtain a context-sensitive template while[i2 = φ(i0, i1), r2 = φ(r0, r1)](i2 < a.length) { (a:κ [], (κ , κ , idx hai) ⇒ κ , κ ) ⇒ κ (3) r1 = f(r2, a[i2], i2); A B A B B B i1 = i2 + 1; } Step 2: Constraints Next, rsc generates subtyping con- return r2; straints over the templates. Intuitively, the templates describe } the sets of values that each static entity (e.g. variable) can evaluate to at runtime. The subtyping constraints capture where i2 and r2 are the Φ-variables for i and r respectively. the value-flow relationships e.g. at assignments, calls and Second, we generate templates for the Φ-variables: returns, to ensure that the template solutions – and hence inferred refinements – soundly over-approximate the set of i2:{ν :number | κi2} r2:{ν :B | κr2} (4) runtime values of each corresponding static entity. We generate constraints by performing type checking We need not generate templates for the SSA variables i0, r0, over the templates. As a, 0, and step are passed in as argu- i1 and r1 as their types are those of the expressions they are ments, we check that they respectively have the types κA[], assigned. Third, we generate subtyping constraints as before; κB and (κB, κA, idx hai) ⇒ κB. Checking a and 0 yields the Φ-assignment generates additional constraints the subtyping constraints Γ ` {ν = i0} v κ Γ ` {ν = i1} v κ Γ ` number[] v κ [] Γ ` {ν = 0} v κ 0 i2 1 i2 A B Γ ` {ν = r0} v κ Γ ` {ν = r1} v κ . 0 r2 1 r2 where Γ = a:number[], 0 < len(a) from the else-guard that holds at the call to reduce. We check step by checking where Γ0 is the environment at the “exit” of the basic block its body under the environment Γstep that binds the input where i0 and r0 are defined: parameters to their respective types . . Γ0 = a:number[], x:B, i0:natN h0i, r0:{ν :B | ν = x} Γstep = min:κB, cur:κa, i:idx hai
As min is used to index into the array a we get Similarly, the environment Γ1 includes bindings for vari- ables i1 and r1. In addition, code executing the loop body Γ ` κ v idx hai step B has passed the conditional check, so our path-sensitive envi- As i and min flow to the output type κB, we get ronment is strengthened by the corresponding guard:
Γstep ` idx hai v κB Γstep ` κB v κB . Γ1 = Γ0, i1:natN hi2 + 1i, r1:B, i2 < len(a) Step 3: Fixpoint The above subtyping constraints over the κ variables are reduced via the standard rules for co- and Finally, the above constraints are solved to contra-variant subtyping, into Horn implications over the κs. rsc solves the Horn implications via (predicate) abstract κi2 7→ 0 ≤ ν < len(a) κr2 7→ true interpretation [28] to obtain the solution κA 7→ true and κB 7→ 0 ≤ ν < len(a) which is exactly the instantiation which verifies that the “callback” f is indeed called with in (2) that satisfies the subtyping constraints, and proves values of type idx hai, as it is only called with i2 : idx hai, minIndex is array-safe. obtained by plugging the solution into the template in (4).
313 type ArrayN
var z = new Field (3,7, new Array(45)); z. reset (new Array(45));//OK 2.2.3 Mutability z. reset (new Array (5));//BAD In the imperative, object-oriented setting (common in dy- namic scripting languages), we must account for class and object invariants and their preservation in the presence of 3. Formal System field mutation. For example, consider the code in Figure2, modified from the Octane Navier-Stokes benchmark. Next, we formalize the ideas outlined in §2. We intro- duce our formal core FRSC (§3.1): an imperative, mutable, Field 2-dimensional Class Invariants Class implements a object-oriented subset of Refined TypeScript, that resembles dens vector, “unrolled” into a single array , whose size is the the core of Safe TypeScript [27]. To ease refinement reason- w h product of the idth and eight fields. We specify this in- ing, we SSA-transform (§3.3) FRSC to a functional, yet still variant by requiring that width and height be strictly positive mutable, intermediate language IRSC (§3.2) that closely fol- i.e. pos dens grid ( ) and that be a with dimensions specified lows the design of CFJ [25] (the language used to formalize this.w this.h by and . An advantage of SMT-based refine- X10), which in turn is based on Featherweight Java [18]. We ment typing is that modern SMT solvers support non-linear then formalize our static semantics in terms of IRSC (§3.4), rsc reasoning, which lets specify and verify program spe- prove them sound and connect them to those of FRSC (§3.5). cific invariants outside the scope of generic bound checkers. Mutable and Immutable Fields The above invariants are 3.1 Source Language (FRSC) w h only meaningful and sound if fields and cannot be modi- The syntax of this language is given below. Meta-variable immutable fied after object creation. We specify this via the e ranges over expressions, which can be variables x, con- rsc qualifier, which is used by to then (1) prevent updates stants c, property accesses e.f, method calls e.m(e), object constructor to the field outside the , and (2) allow refine- creations new C(e), and cast operations
314 e ::= x | c | this | e.f | e.m(e) | new C(e) |
315 0 δ e ,→ e δ s ,→ u; δ δ B ,→ e eM ,→ Mf
0 S-VAR S-THIS δ e ,→ e δ = δ[x 7→ x] x fresh S-VARDECL 0 δ x ,→ δ (x) δ this ,→ this δ var x = e ,→ let x = e in h i; δ
δ e ,→ e δ s1 ,→ u1; δ1 δ s2 ,→ u2; δ2 δ e ,→ e 0 0 0 0 0 0 (x, x1, x2) = δ1 ./ δ2 δ = δ[x 7→ x ] x fresh δ = δ[x 7→ x ] x fresh S-ITE 0 0 S-ASGN 0 0 δ if(e){s1} else{ s2} ,→ letif [ x , x1, x2 ](e)? u1 : u2 in h i; δ δ x = e ,→ let x = e in h i; δ
S-DOTASGN S-SEQ 0 0 S-SKIP δ e ,→ e δ e ,→ e δ s1 ,→ u1; δ1 δ1 s2 ,→ u2; δ2 0 0 δ skip ,→ h i ; δ δ e.f = e ,→ let _ = e.f ← e in h i; δ δ s1; s2 ,→ u1 hu2i ; δ2
0 0 δ s ,→ u; δ δ e ,→ e toString (m) = toString (m) {x 7→ x} B ,→ e m, x fresh S-BODY S-MDECL δ s; return e ,→ u hei m(x: T ) {p} : T {B} ,→ def m x: T {p} : T = e
Figure 3: Selected SSA Transformation Rules
K; e −→ K0; e0 K; e −→ K0; e0
Q-CAST Γ ` K.H (l): S; S ≤ T c = true ⇒ i = 1 c = false ⇒ i = 2 K;
Figure 4: Selected Reduction Rules for FRSC and IRSC
∆ Theorem 1 (Forward Simulation). If Q ,−→ Q, then: on built-in operators b, such as ==, <, +, etc. T, S, R, U ::= ∃x: T1.T2 | {ν :N | p} ∆ N ::= C | B (a) if Q is terminal, then ∃ Q0 s.t. Q −→∗ Q0 and Q0 ,−→ Q ∆ p ::= p1 ∧ p2 | ¬p | t (b) if Q −→ Q0, then ∃ Q0 s.t. Q −→∗ Q0 and Q0 ,−→ Q0 t ::= x | c | ν | this | t.f | f t | b t Structural Constraints Following CFJ, we reuse the notion of an Object Constraint System, to encode constraints related 3.4 Static Semantics to the object-oriented nature of the program. Most of the We proceed by describing refinement checking for IRSC. rules carry over to our system. A key extension in our setting is we partition C has I (that encodes inclusion of an element Types Type annotations on the source language are propa- I in a class C) into two cases: C hasMut I and C hasImm I, gated unaltered through the translation phase. Our type lan- to account for elements that may be mutated. These elements guage (shown below) resembles that of existing refinement can only be fields (i.e. there is no mutation on methods). type systems [19, 25, 28]. A refinement type T may be an ex- Environments and Well-formedness A type environment Γ istential type or have the form {ν :N | p}, where N is a class contains type bindings x : T and guard predicates p that name C or a primitive type B, and p is a logical predicate encode path sensitivity. Γ is well-formed if all of its bindings (over some decidable logic) which describes the properties are well-formed. A refinement type is well-formed in an that values of the type must satisfy. Type specifications (e.g. environment Γ if all symbols (simple or qualified) in its method types) are existential-free, while inferred types may logical predicate (i) are bound in Γ, and (ii) correspond to be existentially quantified [20]. immutable fields of objects. We omit the rest of the well- formedness rules as they are standard in refinement type Logical Predicates Predicates p are logical formulas over systems. terms t. These terms can be variables x, primitive constants Besides well-formedness, our system’s main judgment c, the reserved value variable ν, the reserved variable this forms are those for subtyping and refinement typing [19]. to denote the containing object, field accesses t.f, uninter- Subtyping is defined by the judgment Γ ` S ≤ T . The preted function applications f t and applications of terms rules are standard among refinement type systems with ex-
316 Γ ` e : T Γ ` u . Γ0
Γ ` u . x : S Γ ` e : T z fresh Γ(x) = T T-CST Γ, x : S ` e : T Γ, z : T ` z hasImm fi: Ti T-VAR T-CTX T-FLD-I Γ ` x : self (T, x) Γ ` c : ty (c) Γ ` u hei : ∃x: S.T Γ ` e.fi : ∃z: T. self (Ti, z.fi)
T-FLD-M T-INV T-DOTASGN Γ ` e : T Γ ` e : T, e : T Γ ` e1 : T1, e2 : T2 0 Γ, z : T ` z hasMut gi : Ti Γ, z : T ` z has def m z: R {p} : S = e Γ, z1 : bT1c ` z1 hasMut f: S,T2 ≤ S z fresh Γ, z : T, z : T ` T ≤ R, p z, z fresh z1 fresh
Γ ` e.gi : ∃z: T.Ti Γ ` e.m (e): ∃z: T. ∃z: T.S Γ ` e1.f ← e2 : T2
T-NEW T-CAST Γ ` e : T I, T M ` class (C)Γ, z : C ` fields (z) = ◦ f: R, g: U Γ ` e : S Γ ` T T-CTXEMP Γ, z : C, zI : self T I, z.f ` T I ≤ R, T M ≤ U, inv (C, z) z, zI fresh Γ ` S T . Γ ` h i . · Γ ` new C (e): ∃zI: T I. {ν :C | ν.f = zI ∧ inv (C, ν)} Γ ` e as T : T
Γ ` e : S, S ≤ bool Γ, z : S, z ` u1 . Γ1 Γ, z : S, ¬z ` u2 . Γ2 Γ ` e : T Γ, Γ1 ` Γ1 (x1) ≤ T Γ, Γ2 ` Γ2 (x2) ≤ T Γ ` T T fresh T-LETIN T-LETIF Γ ` let x = e in h i . x : T Γ ` letif [ x, x1, x2 ](e)? u1 : u2 in h i . x : T
Figure 5: Static Typing Rules for IRSC istential types. For example, the rule for subtyping between [T-NEW] Similarly, only immutable fields are referenced in two refinement types Γ ` {ν :N | p} ≤ {ν :N | p0} reduces the refinement of the inferred type at object construction. verification condition Valid ( Γ ⇒ ( p ⇒ p0 )) to a : , [T-INV] Extracting the method signature using the has op- J K J K J K where Γ is the embedding of environment Γ into our logic erator has already performed the necessary substitutions to J K accounting for both guard predicates and variable bindings: account for the specific receiver object. . ^ ^ Γ = {p | p ∈ Γ} ∧ {[x/ν] p, | x : {ν :N | p} ∈ Γ} [T-CAST] Cast operations are checked statically obviating J K the need for a dynamic check. This rule uses the notion of Here, we assume existential types are simplified to non- compatibility subtyping ( ), which is defined as: existential bindings when they enter the environment. . Details regarding structural and well-formedness con- Definition 1 (Compatibility Subtype). Γ ` S . T iff straints, and subtyping rules are included in the extended hS −→bΓ T ci = R 6= fail with Γ ` R ≤ T . version [38]. Refinement Typing Rules Figure5 contains rules of the Here, the operation bT c extracts the base type of T , and Γ two forms of our typing judgements: Γ ` e : T and Γ ` hT −→ Di succeeds when under environment Γ we can u . Γ0. The first assigns a type T to an expression e under statically prove D’s invariants, starting from the invariants an environment Γ, and the second checks the body of an contained in T . We use the predicate inv (D, ν) (as in CFJ) SSA context u under Γ and returns the environment Γ0 of the to denote the conjunction of the class invariants of D and variables introduced in u that are available when checking its its supertypes (with the necessary substitutions of this by hole (rule T-CTX). Below, we discuss the novel rules: ν). We assume that part of these invariants is a predicate that states inclusion in the specific class (instanceof (ν, D)). [T-FLD-I] Immutable object parts can be assigned a more Therefore, we can prove that T can safely be cast to D. For precise type, by leveraging the preservation of their identity. Γ This notion, known as self-strengthening [20, 25], is defined the output of this operation it holds that: bhT −→ Dic = D, with the aid of the strengthening operator : which enables the use of traditional subtyping. Formally: . C {ν :N | p} p0 = {ν :N | p ∧ p0} ( Γ . D p if ( Γ ∧ p ) ⇒ inv (D, ν) C . h{ν :_ | p} −→ Di = (∃x: S. T ) p = ∃x: S. (T p) failC otherwiseJ K J K C . C self (T, t) = T (ν = t) Γ . Γ,x : S C h∃x: S. T −→ Di = ∃x: S. hT −−−−→ Di [T-FLD-M] Here we avoid such strengthening, as the value of field gi is mutable, so cannot appear in refinements. [T-DOTASGN] Only mutable fields may be reassigned.
317 [T-LETIF] To type conditional structures, we first infer a 4.1 Types type for the condition and then check each of the branches First, we discuss how rsc handles core TS features like u1 and u2, assuming that the condition is true or false, object literals, interfaces and primitive types. respectively, to achieve path sensitivity. Each branch assigns Object Literal Types TS supports object literals, i.e. anony- types to the Φ-variables which compose Γ1 and Γ2, and the propagated types for these variables are fresh types operating mous objects with field and method bindings. rsc types ob- ject members in the same way as class members: method as upper bounds to their respective bindings in Γ1 and Γ2. signatures need to be explicitly provided, while field types 3.5 Type Safety and mutability modifiers are inferred based on use, e.g. in: To state our safety results, we extend our type checking var point = { x: 1, y: 2 }; point.x = 2; judgment to runtime locations l with the use of a heap typing the field x is updated and hence, rsc infers that x is mutable. Σ, mapping locations to types, and add a location typing rule: Interfaces TS supports named object types in the form of in- terfaces, and treats them in the same way as their structurally Σ(l) = T T-LOC equivalent class types. For example, the interface Γ; Σ ` l : T interface PointI{ number x, y; } We establish type safety for IRSC in the form of a subject is equivalent to a class PointC defined as reduction (preservation) and a progress theorem that connect the static and dynamic semantics of IRSC. These theorems class PointC{ number x, y; } employ the notions of heap and signature well-formedness: In rsc these two types are not equivalent, as objects of type Σ ` H and ` S. PointI do not necessarily have PointC as their constructor: Theorem 2 (Subject Reduction). If Γ; Σ ` e : T , K; e −→ var pI = { x: 1, y: 2 }, pC = new PointC(1,2); K0; e0 and Σ ` K.H then ∃ T 0, Σ0 ⊇ Σ s.t. Γ; Σ0 ` e0 : T 0, pI instanceof PointC ;// returns false 0 0 0 pC instanceof PointC ;// returns true Γ ` T . T , and Σ ` K .H. Theorem 3 (Progress). If Γ; Σ ` e : T , ` S and Σ ` H However, ` PointC ≤ PointI i.e. instances of the class then either e is a value, or ∃ e0,H0, Σ0 ⊇ Σ s.t. Σ0 ` H0 and may be used to implement the interface. S; H; e −→ S; H0; e0. Primitive Types We extend rsc’s support for primitive types to model the corresponding types in TS. TS has undefined We defer the proofs to the extended version [38]. As a and null types to represent the eponymous values, and treats corollary of the Progress Theorem we get that cast operators these types as the “bottom” of the type hierarchy, effectively are guaranteed to succeed, hence they can safely be erased. allowing those values to inhabit every type via subtyping. Corollary 4 (Safe Casts). Cast operations can safely be rsc also includes these two types, but does not treat them erased when compiling to executable code. as “bottom” types. Instead rsc handles them as distinct primitive types inhabited solely by undefined and null, With the use of our Simulation Theorem and extending respectively, that can take part in unions. Consequently, the our checking judgment for terms in IRSC to runtime config- following code is accepted by TS but rejected by rsc: urations (` Q), we can state a soundness result for FRSC: var x = undefined ; var y = x + 1; ∆ Theorem 5. (FRSC Type Safety) If Q ,−→ Q and ` Q then ∆ either Q is a terminal form, or ∃Q0 s.t. Q −→ Q0, Q0 ,−→ Q0 Unsound Features in TS include (1) treating undefined and and ` Q0. null as inhabitants of all types, (2) co-variant input sub- typing, (3) allowing unchecked overloads, and (4) allowing a special “dynamic” any type to be ascribed to any term. 4. Scaling to TypeScript rsc ensures soundness by (1) performing checks when non- TypeScript (TS) extends JavaScript (JS) with modules, null (non-undefined) types are required (e.g. during field classes and a lightweight type system that enables IDE sup- accesses), (2) using the correct variance for functions and port for auto-completion and refactoring. TS deliberately constructors, (3) checking overloads via two-phase typing eschews soundness [3] for backwards compatibility with (§2.1.2), and, (4) eliminating the any type. existing JS code. In this section, we show how to use re- Many uses of any (indeed, all uses, in our benchmarks §5) finement types to regain safety, by presenting the highlights can be replaced with a combination of union or intersection of Refined TypeScript (and our tool rsc), that scales the types or downcasting, all of which are soundly checked via core calculus from §3 up to TS by extending the support for path-sensitive refinements. In future work, we wish to sup- types (§4.1), reflection (§4.2), interface hierarchies (§4.3), port the full language, namely allow dynamically checked and imperative programming (§4.4). uses of any by incorporating orthogonal dynamic techniques
318 dynamic cast from the contracts literature. We envisage a op- interface Type{ eration castT :: (x: any) ⇒ {ν :T | ν = x}. It is straight- immutable flags: TypeFlags; forward to implement castT for first-order types T as a id: number; dynamic check that traverses the value, testing that its com- symbol?: Symbol; ... ponents satisfy the refinements [30]. Wrapper-based tech- } niques from the contracts/gradual typing literature should then let us support higher-order types. interface ObjectType extends Type { ... } 4.2 Reflection interface InterfaceType extends ObjectType { baseTypes: ObjectType[]; JS programs make extensive use of reflection via “dynamic” declaredProperties: Symbol[]; type tests. rsc statically accounts for these by encoding ... type-tags in refinements. The following tests if x is a number } before performing an arithmetic operation on it: enum TypeFlags{ var r = 1; Any = 0x00000001, String = 0x00000002, if ( typeof x ==="number") r += x; Number = 0x00000004, Class = 0x00000400, Interface = 0x00000800, Reference = 0x00001000, We account for this idiomatic use of typeof by statically Object = Class | Interface | Reference tracking the “type” tag of values inside refinements using ... } uninterpreted functions (akin to the size of arrays). Thus, values v of type boolean, number, string, etc. are re- Figure 6: Type hierarchies in the tsc compiler fined with the predicate ttag(v)="boolean" , ttag(v)= "number", ttag(v)="string" , etc., respectively. Fur- Specifying Hierarchies with Refinements rsc allows devel- thermore, typeof has type opers to create and use Type objects with the above invariant typeof : (z:A) ⇒ {v:string | v = ttag(z)} by specifying a predicate typeInv 3: so the output type of typeof x and the path-sensitive guard isMask
2 rsc handles other type tests, e.g. instanceof, via an extension of the 3 Modern SMT solvers easily handle formulas over bit-vectors, including technique used for typeof tests; we omit a discussion for space. operations that shift, mask bit-vectors, and compare them for equality.
319 (x:{A| impl(x,ObjectType)}) ⇒{v:ObjectType|v=x} leak references of the constructed object (this) or read any of its fields, as they might still be in an uninitialized state. The if-test ensures that the immutable field t.flags masked with 0x00003C00 is non-zero, satisfying the third line in the (a) Internal Initialization: Constructors Type invariants do type definition of typeInv, which in turn implies that t in not hold while the object is being “cooked” within the con- fact implements the ObjectType interface. structor. To safely account for this idiom, rsc defers the checking of class invariants (i.e. the types of fields) by re- 4.4 Imperative Features placing: (a) occurrences of this.fi = ei, with _fi = ei, Immutability Guarantees Our system uses ideas from Im- where _fi are local variables, and (b) all return points with mutability Generic Java [42] (IGJ) to provide statically a call ctor_init(_fi), where the signature for ctor_init checked immutability guarantees. In IGJ a type reference is is: (x: T ) ⇒ void. Thus, rsc treats field initialization in a of the form C
320 • Property Accesses rsc verifies that each field (x.f) Benchmark LOC TMR Time (s) or method lookup (x.m(...)) succeeds. Recall that navier-stokes 366 3 18 39 473 undefined and null are not considered to inhabit the splay 206 18 2 0 6 types to which the fields or methods belong. richards 304 61 5 17 7 • Array Bounds rsc verifies that each array read (x[i]) or raytrace 576 68 14 2 15 write (x[i] = e) occurs within the bounds of the array transducers 588 138 13 11 12 (x). d3-arrays 189 36 4 10 37 tsc-checker 293 10 48 12 62 • Overloads rsc verifies that functions with overloaded TOTAL 2522 334 104 91 (i.e. intersection) types correctly implement the intersec- tions in a path-sensitive manner as described in (§2.1.2). Figure 7: LOC is the number of non-comment lines of source (computed via cloc v1.62). The number of RSC specifications • Downcasts rsc verifies that at each TS (down)cast of the given as JML style comments is partitioned into T trivial anno- form
5.1 Benchmarks We ported a number of existing JS or TS programs to rsc. are trivial but have mutability information, and only 17% We selected benchmarks that make heavy use of language (91/529) mention refinements, i.e. are definitions like type constructs relevant to the safety properties described above. nat = {v:number|0≤v} or dependent signatures like (a:T These include parts of the Octane test suite, developed by [],n:idx)⇒T. These numbers show rsc has annotation Google as a JavaScript performance benchmark [12] and overhead comparable to TS, as in 83% of the cases the already ported to TS by Rastogi et al. [27], the TS com- annotations are either identical to TS annotations or to TS piler [22], and the D3 [4] and Transducers [7] libraries: annotations with some mutability modifiers. Of course, in the remaining 17% of the cases, the signatures are more • navier-stokes, which simulates two-dimensional fluid complex than the (non-refined) TS version. motion over time; richards, which simulates a process Code Changes We had to modify the source in various scheduler with several types of processes passing infor- small (but important) ways to facilitate verification. The total mation packets; splay, which implements the splay tree number of changes is summarized in Figure8. The trivial data structure; and raytrace, which implements a ray- changes include the addition of type annotations (accounted tracer that renders scenes involving multiple lights and for above) and simple transformations to work around the objects; all from the Octane suite, current limitations of our front-end, e.g. converting x++ to • transducers: a library that implements composable data x=x+1. The important classes of changes are the following: transformations, a JavaScript port of Hickey’s Clojure li- brary, which is extremely dynamic in that some functions • Control-Flow: Some programs had to be restructured to have 12 (value-based) overloads, work around rsc’s currently limited support for certain control flow structures (e.g. break). We also modified • d3-arrays: the array manipulating routines from the some loops to use explicit termination conditions. D3 [4] library, which makes heavy use of higher-order functions as well as value-based overloading, • Classes and Constructors: As rsc does not yet support default constructor arguments, we changed relevant new • tsc-checker, which includes parts of the TS com- calls in Octane to supply them explicitly, and refactored piler (v1.0.1.0), abbreviated as tsc. We check 15 func- navier-stokes to use traditional OO style classes and tions from compiler/core.ts and 14 functions from constructors instead of JS records with function fields. compiler/checker.ts (for which we needed to import 779 lines of type definitions from compiler/types.ts). • Non-null Checks: In splay we added 5 explicit non- These code segments were selected among tens of thou- null checks for mutable objects as proving those required sands of lines of code comprising the compiler codebase, precise heap analysis that is outside rsc’s scope. because they exemplified interesting properties, like the • Ghost Functions: navier-stokes has more than a hun- bit-vector based type hierarchies explained in §4.3. dred (static) array access sites, most of which compute in- dices via non-linear arithmetic (i.e. via computed indices Results Figure7 quantitatively summarizes the results of our of the form arr[r*s + c]); SMT support for non-linear evaluation. Overall, we had to add about 1 line of annotation integer arithmetic is brittle (and accounts for the anoma- per 5 lines of code (529 for 2522 LOC). The vast majority lous time for navier-stokes). We factored axioms (334/529 or 63%) of the annotations are trivial, i.e. are about non-linear arithmetic into ghost functions whose TS-like types of the form (x:nat)⇒ nat; 20% (104/529) types were proven once via non-linear SMT queries, and
322 1 function distinct
323 we build on Immutability Generic Java which encodes object (1) more permissive but lightweight techniques for object and reference immutability using Java generics [42]. Subse- initialization [44], (2) automatic inference of trivial types via quent work extends these ideas to allow (1) richer ownership flow analysis [16], (3) verification of security properties, e.g. patterns for creating immutable cyclic structures [43], (2) access-control policies in JS browser extensions [15]. unique references, and ways to recover immutability after violating uniqueness, without requiring alias analysis [13]. Acknowledgments Reference immutability has recently been combined with We would like to thank our anonymous reviewers and our rely-guarantee logics (originally used to reason about thread shepherd Cormac Flanagan for their feedback, and Alexan- interference), to allow refinement type reasoning. Gordon et der Bakst for his helpful comments on earlier drafts of al. [14] treat references to shared objects like threads in rely- this paper. This work was supported by NSF Grants CNS- guarantee logics, and so multiple aliases to an object are al- 1223850, CNS-0964702 and gifts from Microsoft Research. lowed only if the guarantee condition of each alias implies the rely condition for all other aliases. Their approach al- References lows refinement types over mutable data, but resolving their proof obligations depends on theorem-proving, which hin- [1] C. Anderson, P. Giannini, and S. Drossopoulou. Towards Type ders automation. Militão et al. present Rely-Guarantee Pro- Inference for JavaScript. In Proceedings of ECOOP, 2005. tocols [23] that can model complex aliasing interactions, [2] G. M. Bierman, A. D. Gordon, C. Hri¸tcu,and D. Langworthy. and, compared to Gordon’s work, allow temporary inconsis- Semantic Subtyping with an SMT Solver. In Proceedings of tencies, can recover from shared state via ownership track- ICFP, 2010. ing, and resort to more lightweight proving mechanisms. [3] G. M. Bierman, M. Abadi, and M. Torgersen. Understanding The above extensions are orthogonal to rsc; in the future, TypeScript. In Proceedings of ECOOP, 2014. it would be interesting to see if they offer practical ways for [4] M. Bostock. http://d3js.org/. accounting for (im)mutability in TS programs. [5] R. Chugh, D. Herman, and R. Jhala. Dependent Types for Object Initialization A key challenge in ensuring immutabil- JavaScript. In Proceedings of OOPSLA, 2012. ity is accounting for the construction phase where fields are [6] R. Chugh, P. M. Rondon, and R. Jhala. Nested Refinements: initialized. We limit our attention to lightweight approaches A Logic for Duck Typing. In Proceedings of POPL, 2012. i.e. those that do not require tracking aliases, capabilities or [7] Cognitect Labs. https://github.com/cognitect-labs/ separation logic [11, 31]. Haack and Poll [17] describe a transducers-js. flexible initialization schema that uses secret tokens, known [8] A. Feldthaus and A. Møller. Checking Correctness of Type- only to stack-local regions, to initialize all members of cyclic Script Interfaces for JavaScript Libraries. In Proceedings of structures. Once initialization is complete the tokens are con- OOPLSA, 2014. verted to global ones. Their analysis is able to infer the points [9] C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. where new tokens need to be introduced and committed. The Saxe, and R. Stata. Extended Static Checking for Java. In Masked Types [26] approach tracks, within the type system, Proceedings of PLDI, 2002. the set of fields that remain to be initialized. X10’s hard- [10] M. Furr, J.-h. D. An, J. S. Foster, and M. Hicks. Static Type hat flow-analysis based approach to initialization [44] and Inference for Ruby. In Proceedings of the Symposium on Freedom Before Commitment [32] are the most permissive Applied Computing, 2009. of the lightweight methods, allowing, unlike rsc, method [11] P. Gardner, S. Maffeis, and G. D. Smith. Towards a program dispatches or field accesses in constructors. logic for JavaScript. In Proceedings of POPL, 2012. [12] Google Developers. https://developers.google.com/ 7. Conclusions and Future Work octane/. We have presented RSC which brings SMT-based modu- [13] C. S. Gordon, M. J. Parkinson, J. Parsons, A. Bromfield, and lar and extensible analysis to dynamic, imperative, class- J. Duffy. Uniqueness and Reference Immutability for Safe based languages by harmoniously integrating several tech- Parallelism. In Proceedings of OOPSLA, 2012. niques. First, we restrict refinements to immutable variables [14] C. S. Gordon, M. D. Ernst, and D. Grossman. Rely-guarantee and fields (cf. X10 [34]). Second, we make mutability para- References for Refinement Types over Aliased Mutable Data. metric (cf. IGJ [42]) and recover path- and flow-sensitivity In Proceedings of PLDI, 2013. via SSA. Third, we account for reflection and value over- [15] A. Guha, M. Fredrikson, B. Livshits, and N. Swamy. Verified loading via two-phase typing [39]. Finally, our design en- Security for Browser Extensions. In Proceedings of the IEEE sures that we can use liquid type inference [28] to automati- Symposium on Security and Privacy, 2011. cally synthesize refinements. Consequently, we have shown [16] S. Guo and B. Hackett. Fast and Precise Hybrid Type Infer- how rsc can verify a variety of properties with a mod- ence for JavaScript. In Proceeding of PLDI, 2012. est annotation overhead similar to TS. Finally, our experi- [17] C. Haack and E. Poll. Type-Based Object Immutability with ence points to several avenues for future work, including: Flexible Initialization. In Proceedings of ECOOP, 2009.
324 [18] A. Igarashi, B. C. Pierce, and P. Wadler. Featherweight Java: [32] A. J. Summers and P. Mueller. Freedom Before Commitment: A Minimal Core Calculus for Java and GJ. ACM Trans. A Lightweight Type System for Object Initialisation. In Pro- Program. Lang. Syst., May 2001. ceedings of OOPSLA, 2011. [19] K. Knowles and C. Flanagan. Hybrid Type Checking. ACM [33] N. Swamy, J. Chen, C. Fournet, P.-Y. Strub, K. Bhargavan, Trans. Program. Lang. Syst., Feb. 2010. and J. Yang. Secure Distributed Programming with Value- [20] K. Knowles and C. Flanagan. Compositional Reasoning and dependent Types. In Proceedings of ICFP, 2011. Decidable Checking for Dependent Contract Types. In Pro- [34] O. Tardieu, N. Nystrom, I. Peshansky, and V. Saraswat. Con- ceedings of PLPV, 2008. strained Kinds. In Proceedings of OOPSLA, 2012. [21] B. S. Lerner, J. G. Politz, A. Guha, and S. Krishnamurthi. Te- [35] P. Thiemann. Towards a Type System for Analyzing JaS: Retrofitting Type Systems for JavaScript. In Proceedings JavaScript Programs. In Proceedings of ESOP, 2005. of DLS, 2013. [36] S. Tobin-Hochstadt and M. Felleisen. Logical Types for Un- [22] Microsoft Corporation. TypeScript v1.4. http://www. typed Languages. In Proceedings of ICFP, 2010. typescriptlang.org/. [23] F. Militão, J. Aldrich, and L. Caires. Rely-Guarantee Proto- [37] N. Vazou, E. L. Seidel, R. Jhala, D. Vytiniotis, and S. Peyton- cols. In Proceedings of ECOOP, 2014. Jones. Refinement Types for Haskell. In Proceedings of ICFP, 2014. [24] G. Nelson. Techniques for Program Verification. Technical Report CSL81-10, Xerox Palo Alto Research Center, 1981. [38] P. Vekris, B. Cosman, and R. Jhala. Refinement Types for http://arxiv.org/abs/ [25] N. Nystrom, V. Saraswat, J. Palsberg, and C. Grothoff. Con- TypeScript (Extended version). 1604.02480 strained Types for Object-oriented Languages. In Proceedings . of OOPSLA, 2008. [39] P. Vekris, B. Cosman, and R. Jhala. Trust, but Verify: Two- [26] X. Qi and A. C. Myers. Masked Types for Sound Object Phase Typing for Dynamic Languages. In Proceedings of Initialization. In Proceedings of POPL, 2009. ECOOP, 2015. [40] H. Xi and F. Pfenning. Dependent Types in Practical Program- [27] A. Rastogi, N. Swamy, C. Fournet, G. Bierman, and P. Vekris. ming. In Proceedings of POPL, 1999. Safe & Efficient Gradual Typing for TypeScript. In Proceed- ings of POPL, 2015. [41] B. Yankov. http://definitelytyped.org. [28] P. M. Rondon, M. Kawaguci, and R. Jhala. Liquid Types. In [42] Y. Zibin, A. Potanin, M. Ali, S. Artzi, A. Kiezun, and M. D. Proceedings of PLDI, 2008. Ernst. Object and Reference Immutability Using Java Gener- [29] J. Rushby, S. Owre, and N. Shankar. Subtypes for Specifica- ics. In Proceedings of ESEC/FSE, 2007. tions: Predicate Subtyping in PVS. IEEE TSE, 1998. [43] Y. Zibin, A. Potanin, P. Li, M. Ali, and M. D. Ernst. Own- [30] E. L. Seidel, N. Vazou, and R. Jhala. Type Targeted Testing. ership and Immutability in Generic Java. In Proceedings of In Proceedings of ESOP, 2015. OOPSLA, 2010. [31] F. Smith, D. Walker, and G. Morrisett. Alias Types. In [44] Y. Zibin, D. Cunningham, I. Peshansky, and V. Saraswat. Proceedings of ESOP, 2000. Object Initialization in X10. In Proceedings of ECOOP, 2012.
325