ac tif t r Comp * * let A nt e e * A t s W i E * s e n l C l o

I D

C

o *

D

*

c

u

L e

m s

E

u P

e

e

n

R

t

v

e

o

d

Occurrence Typing Modulo Theories t

* y

* s

E a

a

l d u e a t

Andrew M. Kent David Kempe Sam Tobin-Hochstadt Indiana University Bloomington, IN, USA &M/KF2Mi-/F2KT2-bKi?'!BM/BMX2/m

Abstract 1. Introduction We present a new combining occurrence typ- Applying a static type discipline to an existing code base ing—a technique previously used to type check programs in written in a dynamically-typed language such as JavaScript, dynamically-typed languages such as Racket, Clojure, and Python, or Racket requires a type system tailored to the id- JavaScript—with dependent refinement types. We demon- ioms of the language. When building gradually–typed sys- strate that the addition of refinement types allows the inte- tems, designers have focused their attention on type systems gration of arbitrary solver-backed reasoning about logical with relatively simple goals, e.g. ruling out dynamic type er- propositions from external theories. By building on occur- rors such as including a string in an arithmetic computation. rence typing, we can add our enriched type system as a natu- These systems—ranging from widely-adopted industrial ef- ral extension of Typed Racket, reusing its core while increas- forts such as TypeScript [6], Hack [16], and Flow [15] to ing its expressiveness. The result is a well-tested type system more academic systems such as Typed Racket [25], Typed with a conservative, decidable core in which types may de- Clojure [2], Reticulated Python [29], and Gradualtalk [1]— pend on a small but extensible set of program terms. have been successful in this narrow goal. In addition to describing our design, we present the fol- However, advanced type systems can express more pow- lowing: a formal model and proof of correctness; a strategy erful properties and check more significant invariants than for integrating new theories, with specific examples includ- merely the absence of dynamic type errors. Refinement and ing linear arithmetic and bitvectors; and an evaluation in the dependent types, as well as sophisticated encodings in the context of the full Typed Racket implementation. Specifi- type systems of languages such as Haskell and ML [11, 30], cally, we take safe vector operations as a case study, exam- allow programmers to capture more precise correctness crite- ining all vector accesses in a 56,000 line corpus of Typed ria for their programs such as those for balanced binary trees, Racket programs. Our system is able to prove that 50% of sized vectors, and much more. these are safe with no new annotations, and with a few an- In this paper, we combine these two strands of research, notations and modifications we capture more than 70%. producing a system we dub Refinement Typed Racket, or RTR. RTR follows in the tradition of Dependent ML [32] and Categories and Subject Descriptors F.3.1 [Specifying and Liquid Haskell [27] by supporting dependent and refinement Verifying and Reasoning about Programs] types over a limited but extensible expression language. Ex- perience with these languages has already demonstrated that arXiv:1511.07033v2 [cs.PL] 4 Oct 2016 General Terms Languages, Design, Verification expressive and rich program properties can be captured by a fully-decidable type system. Keywords Refinement types, occurrence typing, Racket Furthermore, by building on the existing framework of occurrence typing, refinement types prove to be a natural ad- dition to the Typed Racket implementation, formal model, and soundness results. Occurrence typing is designed to rea- son about dynamic type tests and control flow in existing Permission to make digital or hard copies of part or all of this work for personal or Permissionclassroom use to makeis granted digital without or hard fee copies provided ofall that or copies part of are this not work made for or personaldistributed or Racket programs, using propositions about the types of terms classroomfor profit or use commercial is granted advantage without fee and provided that copies that bear copies this are notice not madeand the or full distributed citation and simple rules of logical inference. Extending this logic foron profitthe first or commercial page. Copyrights advantage for andcomponents that copies of bear this this work notice owned and theby fullothers citation than onACM the firstmust page. be honored. Copyrights Abstracting for components with credit of this is workpermitted. owned To by copy others otherwise, than ACM to to encompass refinements of types as well as propositions mustrepublish, be honored. to post Abstracting on servers, with or to credit redistribute is permitted. to lists, To copy contact otherwise, the Owner/Author. or republish, drawn from solver-backed theories produces an expressive toRequest post on permissions servers or tofrom redistribute [email protected] to lists, requires or Publications prior specific Dept., permission ACM, and/orInc., fax a fee. Request permissions from [email protected]. +1 (212) 869-0481. Copyright held by Owner/Author. Publication Rights Licensed to system which scales to real programs. In this paper, we show CopyrightACM. is held by the owner/author(s). Publication rights licensed to ACM. examples drawn from the theory of linear inequalities and the PLDI’16PLDI ’16,, JuneJune 13–17, 13 - 17, 2016,2016, Santa Barbara, Barbara, CA, CA, USA USA ACM.Copyright 978-1-4503-4261-2/16/06...$15.00 © ACM 978-1-4503-4261-2/16/06…$15.00 theory of bitvectors. http://dx.doi.org/10.1145/2908080.2908091DOI: http://dx.doi.org/10.1145/2908080.2908091

296 (: max : [x : Int] [y : Int] The remainder of this paper is structured as follows: sec- -> [z : Int #:where (∧ (≥ z x) (≥ z y))]) tion 2 reviews the basics of occurrence typing and introduces (define (max x y) (if (> x y) x y)) dependency and refinement types via examples; section 3 formally presents the details of our type system, as well as Figure 1. max with refinement types soundness results for the calculus; section 4 describes the challenges of scaling the calculus to our implementation in Typed Racket; section 5 contains the results of our empirical Figure 1 presents a simple demonstration of integrating evaluation of the effectiveness of using refinement types for refinement types with linear arithmetic. The max function vector bounds verification; section 6 discusses related work; takes two integers and returns the larger, but instead of de- and section 7 concludes. scribing it as a simple binary operator on values of type Int, as the current Typed Racket implementation specifies, we 2. Beyond Occurrence Typing give a more precise type describing the behavior of max. The syntax for function types in RTR allows for ex- Occurrence typing—an approach whereby different occur- plicit dependencies between the domain and range by giv- rences of the same identifier may be type checked at different ing names to arguments which are in scope in any log- types throughout a program—is the strategy Typed Racket ical refinements. For the range in this example we use uses to type check idiomatic Racket code [25, 26]. To illus- [z : Int #:where (∧ (≥ z x) (≥ z y))] as syn- trate, consider a Typed Racket function which accepts either tactic sugar for a logical refinement on the base type Int: an integer or a list of bits as input and returns the least sig- (Refine [z : Int] (∧ (≥ z x) (≥ z y))). Note nificant bit: that the max function definition does not require any changes (: least-significant-bit : to accommodate the stronger type, nor do clients of max (U Int (Listof Bit)) -> Bit) need to care that the type provides more guarantees than (define (least-significant-bit n) before; the conditional in the body of max enables the use (if (int? n) of the refinement type in the result, as in most refinement (if (even? n) 0 1) type systems. Typed Racket’s pre-existing ability to reason (last n))) about conditionals means that abstraction and combination Type checking the function body begins with the assump- of conditional tests properly works in RTR without requiring tion that n is of type (U Int (Listof Bit)) (an ad hoc anything more from solvers for various theories. union containing Int and (Listof Bit)). Typed Racket With these features we enable programmers to enforce then verifies the test-expression(int? n) is well typed and new invariants in existing Typed Racket code and thus eval- checks the remaining branches of the program with the fol- uate the effectiveness of our type checker on real-world lowing insights: programs. As evidence, we have implemented our system in Typed Racket, including support for linear arithmetic 1. In the then branch we know (int? n) produced a non- and provably-safe vector access, and automatically analyzed #f value, thus n must have type Int. The type system can three large libraries totalling more than 56,000 lines of code. then verify (if (even? n) 0 1) is well-typed at Bit. We determined that approximately 50% of the vector ac- 2. In the else branch we know (int? n) produced #f, im- cesses are provably safe with no code changes. We then ex- plying n is not of type Int. This fact, combined with our amined one of the libraries in detail, finding that of the 75% previous knowledge of the type of n, tells us n must have that were not automatically proved safe, an additional 47% type (Listof Bit). Thus (last n) is also well typed could be verified by adding type annotations and minor code and of type Bit. modifications. Our primary contributions are as follows: This strategy of gleaning typed-based information from tests in control flow statements is an essential part of oc- 1. We present the design of a novel and sound integration currence typing. Instead of only describing the expression between occurrence typing and refinement types drawn (int? n) as being of type Boolean, from arbitrary logical theories. (int? n) : B 2. We describe how to scale our design to a realistic imple- $ mentation (i.e. Typed Racket). RTR adds additional type-based logical information: 3. We validate our design by using our implementation to (int? n) : pB ; n I n I ; ✁0q verify the majority of vector accesses in a large Racket $ P | R library.1 The first additional element, n I, states ‘if the result is P non-#f, then n is an integer.’ We dub this statement the ‘then’ 1 Our accompanying artifact is available at the following url: https:// proposition, since it is what holds in the first branch of a github.com/andmkent/pldi16-artifact conditional using (int? n) as the test. The second element,

297 n I, tells us ‘if the result is #f, then n is not an integer.’ Because there is no explicit knowledge about the length of R This we dub the ‘else’ proposition, since it is what holds in B, our attempt verify one of the indices in safe-dot-prod the second branch of a conditional using (int? n) as the will not type check: test. The final element, ✁0, tells us the expression is not a term hvT2 *?2+F2` 2``Q` BM (safe-vec-ref B i) that can be lifted into types. `;mK2Mi k- 2tT2+i2/, This additional typed-based logical information and the (Refine [i : Int] (∧ (≤ 0 i) (< i (len B)))) usual logical connectives (conjunction, disjunction, etc.) are #mi ;Bp2M, Int already an integral part of Typed Racket’s well-tested type system. By enriching and extending this foundation—adding In order to type check safe-dot-prod, the types for the extensible refinement types and new approaches for reason- domain must either be enriched to include the assumption ing about aliasing and identifiers going out of scope—we can that the vectors are of equal length, or a dynamic check provide a more expressive, theory-extensible system capable must be added which verifies the assumption at runtime. of verifying a wider array of practical program invariants. Also note that without carefully examining the use sites of this function it is difficult to know which solution would 2.1 Occurrence Typing with Linear Arithmetic be ideal—demanding clients statically verify the property at Consider how a standard vector access function vec-ref every call may be an unreasonable requirement. Fortunately might be implemented in a relatively simply-typed language a middle ground can be achieved by allowing for both: (e.g. standard Typed Racket). In order to ensure we only access valid indices of the vector, our function must conduct (: dot-prod : a runtime check before performing the raw, unsafe memory (Vecof Int) (Vecof Int) -> Int) access at the user-specified index: (define (dot-prod A B) (unless (= (len A) (len B)) (: vec-ref : (∀ {A} (Vecof A) Int -> A)) (error "invalid vector lengths!")) (define (vec-ref v i) (safe-dot-prod A B)) (if (≤ 0 i (sub1 (len v))) (unsafe-vec-ref v i) Legacy code and clients who cannot easily verify their (error "invalid vector index!"))) vectors’ lengths may continue to call dot-prod while clients Although the type for vec-ref prevents some runtime wishing to statically eliminate this error may call a safe ver- errors, invalid indices remain a potential problem. In order sion which uses a stronger type. to eliminate these, we can extend our new system to con- Safe vector access is a simple example of the program sider propositions from the theory of linear integer arithmetic invariants expressible with occurrence typing extended with (with a simple implementation of Fourier-Motzkin elimina- the theory of linear integer arithmetic—we have chosen it tion [7] as a lightweight solver). This allows us to give ≤ a for thorough examination because it relates directly to our dependent function type where the truth-value of the result sizable case study on existing Typed Racket code. Xi [31], reports the intuitively implied linear inequality. We can then however, demonstrates at length in the presentation of De- design a safe function safe-vec-ref: pendent ML how the invariants of far richer programs, such as balanced red-black trees and simple type-preserving eval- (: safe-vec-ref : uators, can be expressed using this same class of refinements. (∀ {A} [v : (Vecof A)] [i : Int #:where (∧ (≤ 0 i) (< i (len v)))] 2.2 Occurrence Typing with Bitvectors -> [res : A])) Linear arithmetic, however, is merely one example of ex- (define safe-vec-ref unsafe-vec-ref) tending RTR with an external theory. To illustrate, we ad- Now the type guarantees only provably valid indices are ditionally experimented by adding the theory of bitvectors used. While replacing all occurrences of vec-ref with to RTR. By leveraging Z3’s bitvector reasoning [8] we were xtime safe-vec-ref in a program may seem desirable, such a able to type check the helper function found in many change would likely result is programs that no longer type implementations of AES [19] encryption. This function com- putes the result of multiplying the elements of the field F28 by check! One reason for this is the validity of an index is not 8 4 3 always apparent at the actual use site. For example, consider x (i.e. polynomials of the form F2[x]/(x +x +x +x+1), a standard vector dot product function: which AES conveniently represents using a byte): (: safe-dot-prod : (: xtime : Byte -> Byte) (Vecof Int) (Vecof Int) -> Int) (define (xtime num) (define (safe-dot-prod A B) (let ([n (AND (* #x02 num) #xff)]) (for/sum ([i (in-range (len A))]) (cond (* (safe-vec-ref A i) [(= #x00 (AND num #x80)) n] (safe-vec-ref B i)))) [else (XOR n #x1b)])))

298 In this example the type Byte is a shorthand for the type n ::= ... 2 1 0 1 2 ... Integers ´ |´ | | | (Refine [b : BitVector] (≤ #x00 b #xff)). To p ::= MQi //R BMi? ... Primitive Ops | | | verify this program, we enrich the types of the relevant bit- e ::= Expressions wise operations (e.g. =, AND, etc.) to include propositions and x variable | refinements relating the values to bitvector-theoretic state- n i`m2 7Hb2 p base values | | | | ments and add bitvector literals to the set of terms which λx:τ.e (ee) abstraction, application | | may be lifted to the type level. Adding the theory of bitvec- (if eee) conditional | tors and verifying this program proved to be a relatively (let (xe) e) local binding | straightforward process; in subsection 3.4 we discuss in de- (cons ee) pair construction | tail our general strategy for adding new theories to RTR. (fst e) (snd e) field access | | v ::= Values 3. Formal Model n i`m2 7Hb2 p base values | | | | v, v [ρ,λx:τ.e] pair, closure Our base system λRT R is a natural extension of Typed |x y| τ,σ ::= Types Racket’s previous formal model, λTR [26]; new language universal type forms and judgments are highlighted. |J I T F τ τ basic types The typing judgment for λRT R resembles a standard typ- | | | | ˆ ( ⃗τ ) ad-hoc union type ing judgment except that instead of assigning types, it assigns | x:τ R function type type-results to well-typed expressions: | Ť Ñ x:τ ψ refinement type Γ e : pτ ; ψ+ ψ ; oq |t | u $ | ´ ψ ::= Propositions This judgment states that in environment Γ tt ff trivial/absurd prop | | o τ o τ o is/is not of type τ • | P | R e has type τ; ψ ψ ψ ψ compound props | ^ | _ • if e evaluates to a non-7Hb2 (i.e. treated as true) value, o o object aliasing ‘then proposition’ ψ holds; | ” + χT prop from theory | T • if e evaluates to 7Hb2, ‘else proposition’ ψ holds; ϕ ::= 7bi bM/ Fields ´ | • e’s value corresponds to the symbolic object o. o ::= Symbolic Objects ✁0 null object | 3.1 Syntax x variable reference | (ϕo) object field reference The syntax of terms, types, propositions, and other forms are | o, o object pair given in Figure 2. |x y Expressions. λRT R uses a standard set of expressions R ::= Type-Results with explicit pair operations for simplicity (so our presen- pτ ; ψ ψ ; oq type-result | | tation may omit polymorphism). x:τ.R existential type-result |D Types. The universal ‘top’ type is the type which de- Γ::= ÝÑψ Environments scribes all well typed terms. I is the typeJ of integers, while T ρ ::= ÝÝÝx Ñv Runtime Environments and F are the types of the boolean values i`m2 and 7Hb2. Pair ÞÑ Figure 2. λ Syntax types are written τ σ. ( ÝÑτ ) describes a ‘true’ (i.e. un- RT R tagged) union of its ˆcomponents. For convenience we write the boolean type ( TF) asŤB and the uninhabited ‘bottom’ type ( ) as . Function types consist of a named argument solver. In this way our logic describes an extensible system x, a domain Ktype Ťτ, and range type-result R in which x is that can be enriched with various theories according to the bound.Ť x:τ ψ is a standard refinement type, describing needs of the application at hand. any valuet x of| typeu τ for which proposition ψ holds. Fields. A field allows us to reference a subcomponent of a Propositions. At our system’s core is a propositional logic structural value. For example, if p is a tree-like structure built with domain specific features. tt and ff are the trivial and using nested pairs, (7bi (bM/ p)) would describe the value absurd propositions, while and represent the conjunc- found by accessing the first field of the result of accessing ^ _ tion and disjunction of propositions. Type information is ex- p’s second field. In this model having the 7bi and bM/ fields pressed by propositions of the form o τ or o τ, which for pairs suffices; in general, fields for both built-in and user- state that symbolic object o is or is not ofP type τ respectively.R defined data types are needed in order to type check real- o o describes an ‘alias’ between symbolic objects, stat- world programs. Our vector case study, for example, required 1 ” 2 ing that the object o1 points to the same value as o2. Finally, a H2M field which described a vector’s length. an atomic propositions of the form χT represents a statement Symbolic Objects. Instead of allowing our types to de- from a theory for which λ has been provided a sound pend on arbitrary program terms (as is done in systems with T RT R

299 full dependent types), we define a canonical subset of terms ∆(MQi)=x: pB ; x F x F ; ✁0q JÑ P | R called symbolic objects which represent the terms which may ∆(//R)=x:I pI ; tt ff ; ✁0q Ñ | be lifted to the type level in our system. These objects act as ∆(BMi?) = x: pB ; x I x I ; ✁0q JÑ P | R a conservative ‘whitelist’ of sorts, allowing our type system ∆(#QQH?) = x: pB ; x B x B ; ✁0q JÑ P | R to work in a full-scale programming language by only con- ∆(TB`?) = x: pB ; x x ; ✁0q sidering obviously safe terms (i.e. excluding mutated fields, JÑ PJˆJ| RJˆJ potentially non-deterministic functions, etc.). Figure 3. Primitive Types Initially these objects are only used to describe values bound to variables, field accesses, and pairs of objects, while the null symbolic object ✁0 represents terms we do not lift to metafunction described in Figure 3 for primitive operators. the type level. These objects (excluding pairs) are what al- Note that the then- and else-propositions are consistent with lows standard Typed Racket to type check many dynamic their being 7Hb2 or non-7Hb2. Additionally, by default none programming idioms. When extending RTR to handle addi- of these terms will appear in types and propositions, as sig- tional theories, the grammar of symbolic objects is extended nified by the null symbolic object ✁0. to include program terms the new theory must reason about. T-Var may assign any type τ to variable x so long as Finally, when performing standard capture-avoiding sub- the system can derive Γ x τ. The then- and else- stitution we keep symbolic objects in the obvious normal propositions reflect the self$ evidentP fact that if x is found to form (e.g. (7bi x, y ) is reduced to x). Propositions that end x y equal 7Hb2 then x is of type F, otherwise x is not of type up directly referring to ✁0, such as ✁0 I, are treated as equiv- P F. The symbolic object informs the type system that this alent to tt (i.e. meaningless) and are discarded. expression corresponds to the program term x. Type-Results. In order to allow our system to easily rea- T-Abs, the rule for checking lambda abstractions, checks son about more than the just the simple type τ of an expres- the body of the abstraction in the extended environment sion, we assign a well typed expression a type-result. In addi- which maps x to τ. We use the standard convention of choos- tion to describing an expression’s type, a type-result further ing fresh names not currently bound when extending Γ with informs the system by explicitly capturing two additional new bindings. The type-result from checking the body is properties: (1) what is learned when the expression’s value is then used as the range for the function type, and the then- used as the test-expression in a conditional—this is described and else-propositions report the non-falseness of the value. by the pair of propositions ψ+ ψ in the type-result—and (2) T-App handles function application, first checking that e | ´ 1 which symbolic object o the expression’s value corresponds and e2 are well-typed individually and then ensuring the type to. of e2 is a subtype of the domain of e1. The overall type-result Existentially quantified type-results allow types to depend of the application is the range of the function, R, with the on terms with no in-scope symbolic object. Our usage of symbolic object of the operand, o2, lifted and optionally sub- existential quantification resembles the technique introduced stituted for x. This lifting substitution is defined as follows: by Knowles and Flanagan [17] in many ways, except that τ our usage is restricted to when substitution is simply not R[x ✁0] = x:τ.R úùñτ D possible (i.e. when the variable’s assigned expression has a R[x o]=R[x o] null object). úùñ ÞÑ Environments. For simplicity in this model we use an In essence, if the operand corresponds to a value our type environment built entirely of propositions. In a real imple- system can reason directly about (i.e. its object is non-null), mentation it is useful to separate the environment into two we perform capture avoiding substitution as expected. Oth- portions: a traditional mapping of variables to types along erwise, an existential quantifier à la Knowles and Flanagan with a set of currently known propositions. The latter can [17] is used to capture the argument expression’s precise then be used to refine the former during type checking. type, even though it’s exact identity is unknown; this enables Runtime Environments. Our runtime environments are the function’s range to depend on its argument regardless of standard mappings of variables to closed runtime values, whether the term can soundly be lifted to the type level. appearing in closures and our big-step reduction semantics. T-If is used for conditionals, describing the important pro- cess by which information learned from test-expressions is 3.2 Typing Rules projected into the respective branches. After ensuring e1 is The typing judgment is defined in Figure 4 and an executable well-typed at some type, we make note of the then- and PLT Redex [9] model is included in our accompanying arti- else-propositions ψ1+ and ψ1 . We then extend the envi- fact. The individual rules are those previously used by Typed ronment with the appropriate ´proposition, dependent upon Racket with only a few minor modifications to incorporate which branch we are checking: ψ1+ is assumed while check- our new forms (i.e. existential type-results and aliases). ing the then-branch and ψ1 for the else-branch. The type T-Int, T-True, T-False, and T-Prim are used for type check- result of a conditional is simply´ the type result implied by ing the respective base values, with T-Prim consulting the ∆ both branches.

300 T-Int T-True T-False T-Prim Γ n : pI ; tt ff ; ✁0q Γ i`m2 : pT ; tt ff ; ✁0q Γ 7Hb2 : pF ; ff tt ; ✁0q Γ p : p∆(p); tt ff ; ✁0q $ | $ | $ | $ | T-Var T-Abs T-Subsume Γ x τ Γ,x τ e : R Γ e : R1 Γ R1 : R $ P P $ $ $ ă Γ x : pτ ; x F x F ; xq Γ λx:τ.e : px:τ R ; tt ff ; ✁0q Γ e : R $ R | P $ Ñ | $ T-If T-Let Γ e1 : p ; ψ1+ ψ1 ; ✁0q Γ e1 : pτ1 ; ψ1+ ψ1 ; o1q T-App $ J | ´ $ | ´ Γ,ψ1+ e2 : R ψx =(x F ψx+) (x F ψx ) Γ e1 : px:τ R ; tt tt ; ✁0q $ R ^ _ P ^ ´ $ Ñ | Γ,ψ1 e3 : R Γ,x τ,x o1,ψx e : R2 Γ e2 : pσ ; tt tt ; o2q Γ σ : τ ´ $ P ” $ $ | $ ă τ1 σ Γ (if e1 e2 e3):R Γ (let (xe ) e ):R [x o ] Γ (e1 e2):R[x o2] $ $ 1 2 2 úùùñ 1 $ úùñ T-Cons Γ e1 : pτ1 ; tt tt ; o1q T-Fst T-Snd $ | Γ e2 : pτ2 ; tt tt ; o2q Γ e : pτ1 τ2 ; tt tt ; oq Γ e : pτ1 τ2 ; tt tt ; oq $ | $ ˆ | $ ˆ | R = pτ1 τ2 ; tt ff ; x1,x2 q R = pτ1 ; tt tt ;(7bi x)q R = pτ2 ; tt tt ;(bM/ x)q ˆ | τ x y τ | τ | τ Γ (cons e e ):R[x 1 o ][x 2 o ] Γ (fst e):R[x 1 o] Γ (snd e):R[x 2 o] $ 1 2 1 úùùñ 1 2 úùùñ 2 $ úùùñ $ úùùñ Figure 4. Typing Judgment

T-Let first checks whether the expression e1 being bound main and range; in order to reason correctly about dependen- to x is well typed. When checking the body, the environment cies when checking the range, the environment is extended is extended with the type for x, a proposition describing x’s to assign x the more specific domain type. Pair subtyping is then- and else- propositions, and an alias stating that x refers standard. to o1 (i.e. the symbolic object of e1). Since x is unbound For refinement types we have three rules: S-Weaken states outside the body, we perform a lifting substitution of o1 for if τ is a subtype of σ in Γ then so is any refinement of τ;S- x on the result as we do with function application. Refine1 and S-Refine2 allow subtyping inquiries about re- In order to omit polymorphism we use explicit pair in- finements to be translated into their equivalent logical in- troduction and elimination rules. T-Cons introduces pairs, quiries. first checking the types and symbolic objects for e1 and e2. The subtyping relation for type-results relies on subtyping The type-result then includes the product of these individ- for the type and object, and logical implication for the then- ual types, propositions reflecting the non-7Hb2 nature of the and else-propositions. Since existentially quantified type- value, and a symbolic pair object (all modulo the two lift- results are only used as a tool for type checking, there is only ing substitutions). Pair elimination forms are checked with one explicit subtyping rule for them: SR-Exists. This rule re- T-Fst and T-Snd, which ensure their arguments are indeed a sembles the standard existential instantiation rule from first pair before returning the expected type and a symbolic object order logic, stating an existentially quantified type-result is a describing which field was accessed. subtype of another type result if the subtyping relation holds in the appropriately extended environment. 3.3 Subtyping and Proof System The subtyping and proof system use a combination of famil- 3.3.2 Proof System iar rules from type theory and formal logic. Figure 6 describes the type-specific portion of the proposi- tional logic for λRT R. We omit the introduction and elim- 3.3.1 Subtyping ination rules for forms from propositional logic, since they Figure 5 describes the subtyping relation : for types, sym- are identical to those used by λTR [26] (i.e. resembling those bolic objects, and type-results. ă found in any natural deduction system). For objects, the null object ✁0 is the top object and objects L-Sub says an object o is of type τ when it is a known are sub-objects of any alias-equivalent object. Pair objects subtype of τ. L-Not conversely lets us prove object o is not of are sub-objects in a pointwise fashion. type τ when assuming the opposite implies a contradiction. All types are subtypes of themselves and of the top type L-Bot serves as an ‘ex falso quodlibet’ of sorts, allowing us . A type is a subtype of a union if it is a subtype of any to draw any conclusion since is uninhabited. elementJ of the union. Unions are only subtypes of a type if Object aliasing allows us Kto reason about the statically every member of the union is a subtype of that type. Function known equivalences classes of symbolic objects. L-Refl and subtyping has the standard contra- and co-variance in the do- L-Sym provide reflexivity and symmetry for aliasing, while

301 SO-Equiv S-Union1 S-Union2 SO-Null S-Refl S-Top Γ o1 o2 τ in ⃗τ . Γ τ : σ σ in ⃗σ . Γ τ : σ $ ” Γ o : ✁0 Γ τ : τ Γ τ : @ $ ă D $ ă Γ o1 : o2 $ ă $ ă $ ă J Γ ( ⃗τ ) : σ Γ τ :( ⃗σ ) $ ă $ ă $ ă ď ď SO-Pair S-Pair S-Refine2 Γ o1 : o3 Γ τ1 : τ2 S-Weaken S-Refine1 Γ τ : σ Γ $ o ă: o Γ $ σ ă: σ Γ τ : σ Γ,x τ,ψ x σ Γ,x$ τă ψ $ 2 ă 4 $ 1 ă 2 $ ă P $ P P $ Γ o ,o : o ,o Γ τ σ : τ σ Γ x:τ ψ : σ Γ x:τ ψ : σ Γ τ : x:σ ψ $x 1 2yă x 3 4y $ 1 ˆ 1 ă 2 ˆ 2 $t | uă $t | uă $ ă t | u S-Fun SR-Result Γ τ : τ Γ τ : τ Γ,ψ ψ SR-Exists $ 2 ă 1 $ 1 ă 2 1+ $ 2+ Γ,x τ2 R1 : R2 Γ o1 : o2 Γ,ψ1 ψ2 Γ,x τ R1 : R2 P $ ă $ ă ´ $ ´ P $ ă Γ x:τ1 R1 : x:τ2 R2 Γ pτ1 ; ψ1+ ψ1 ; o1q : pτ2 ; ψ2+ ψ2 ; o2q Γ x:τ.R1 : R2 $ Ñ ă Ñ $ | ´ ă | ´ $D ă Figure 5. Subtyping

L-Sym L-Sub L-Not L-Bot L-Refl Γ o σ Γ σ : τ Γ,o τ ff Γ o Γ o2 o1 $ P $ ă P $ $ PK Γ o o $ ” Γ o τ Γ o τ Γ ψ $ ” Γ o o $ P $ R $ $ 1 ” 2 L-Update+ L-Update– L-Transport Γ o τ Γ (⃗ϕ o ) σ Γ o τ Γ (⃗ϕ o ) σ Γ ψ(o1)Γo1 o2 $ P $+ P $ P $ R $ $ ” Γ o mT/i2 (τ,⃗ϕ,σ) Γ o mT/i2´(τ,⃗ϕ,σ) Γ ψ(o2) $ P Γ $ P Γ $ L-RefI L-Theory L-TypeFork L-ObjFork Γ o τ L-RefE $ P [[Γ]] χT Γ o1,o2 τ1 τ2 Γ o1,o2 o3,o4 Γ ψ[x o] Γ o x:τ ψ T , $x yP ˆ $x y”x y $ ÞÑ $ Pt | u Γ χT Γ o1 τ1 o2 τ2 Γ o1 o3 o2 o4 Γ o x:τ ψ Γ ψ[x o] $ $ P ^ P $ ” ^ ” $ Pt | u $ ÞÑ Figure 6. RTR-specific Logic Rules

L-Transport allows us replace alias-equivalent objects in any derivable proposition (giving us transitivity). L-ObjFork and mT/i2˘(τ τ ,⃗ϕ::7bi,σ)=mT/i2˘(τ ,⃗ϕ,σ) τ Γ 1 ˆ 2 Γ 1 ˆ 2 L-TypeFork provide a means for reducing claims about ob- mT/i2Γ˘(τ1 τ2,⃗ϕ::bM/,σ)=τ1 mT/i2Γ˘(τ2,⃗ϕ,σ) + ˆ ˆ ject pairs to be claims about their fields. mT/i2Γ (τ,ϵ,σ) = `2bi`B+iΓ(τ,σ) L-Update+ and L-Update– play a key role in our system, mT/i2Γ´(τ,ϵ,σ) = `2KQp2Γ(τ,σ) allowing positive and negative type statements to refine the mT/i2˘(( ⃗τ ),⃗ϕ,σ) =( mT/i2ÝÝÝÝÝÝÝÝÝÝݢ(τ,⃗ϕ,Ñσ)) known types of objects. Roughly speaking, if we know an Γ Γ object o is of type τ, updating some field (ϕn (... (ϕ0 o))) Ť Ť `2bi`B+iΓ(τ,σ) = if τ σ = within the object (abbreviated (⃗ϕ o )) with additional infor- K X H `2bi`B+i (( ⃗τ ),σ)=(ÝÝÝÝÝÝÝÝÝ`2bi`B+i (τ,Ñσ)) mation computes the following: if we know (⃗ϕ o ) σ—that Γ Γ P `2bi`B+iΓ( x:τ ψ ,σ)= x:`2bi`B+iΓ(τ,σ) ψ the field is of type σ—we update that field’s type τ 1 to be ap- tŤ | u tŤ | u `2bi`B+iΓ(τ,σ) = τ if Γ τ : σ proximately τ 1 σ (i.e. a conservative ‘intersection’ of the $ ă X `2bi`B+iΓ(τ,σ) = σ otherwise two types); conversely, updating a field’s type τ 1 with the knowledge that the field is not σ updates the field to be ap- `2KQp2Γ(τ,σ) = if Γ τ : σ proximately τ 1 σ (i.e. the ‘difference’ between the two). A K $ ă ´ (( ⃗τ ),σ)=( (τ,σ)) full definition of mT/i2 is given in Figure 7. `2KQp2Γ ÝÝÝÝÝÝÝÝÝ`2KQp2Γ Ñ `2KQp2Γ( x:τ ψ ,σ)= x:`2KQp2Γ(τ,σ) ψ L-RefI and L-RefE construct and eliminate refinement tŤ | u tŤ | u types in the expected ways, essentially saying that the propo- `2KQp2Γ(τ,σ) = τ otherwise sition o x:τ ψ is equivalent to the compound proposition Figure 7. Update metafunction o τ Ptψ[x | ou]. P ^ ÞÑ

302 Finally, a proposition χT from theory is derived using 3.5.1 Models L-Theory. This rule consults a solver for Ttheory with the T Because our formalism is described as a type-theory aware relevant knowledge from Γ. logic, it is convenient to examine its soundness using a model-theoretic approach commonly used in proof theory. 3.4 Integrating Additional Theories For λRT R a model is any runtime-value environment ρ and Our system is designed in an extensible fashion, allowing is said to satisfy a proposition ψ (written ρ ψ) when its as- an arbitrary external theory to be added so long as a theory- signment of values to the free variables of ψ(make the propo- specific solver is provided. To illustrate, we discuss the linear sition a tautology. The details of satisfaction are defined in arithmetic extension we implemented in Typed Racket in Figure 8. The satisfaction relation extends to environments order to perform our vector-related case study. in a pointwise manner. To add this theory, we first must identify the canonical set In order to complete our definition of satisfaction, we also of program terms which appear in the theory’s sentences. For require a typing rule for closures: our case study this included integer arithmetic expressions of the form a x + a x + ... + a x (i.e. linear combinations T-Closure 0 0 1 1 n n Γ.ρ ΓΓλx:τ.e : R over Z) and a field which describes a vector’s length. We D ( $ [ρ,λx:τ.e]:R can extend the grammar of fields and symbolic objects to $ naturally include these terms: The satisfaction rules are mostly straightforward. tt is always satisfied, while the logical connectives and are ϕ ::= ... H2M satisfied in the standard ways. Aliases are satisfied_ when^ the | o ::= ... n n o o + o objects are equivalent values in ρ. | | ¨ | The satisfaction rules M-Refine, M-RefineNot1, and M- Now our type system and logic can reason directly about the RefineNot2 allow refinement types to be satisfied by satisfy- terms our theory discusses. ing the type and proposition separately. M-Theory consults We then identify the theory-relevant predicates and ex- a decider for the specific theory in order to satisfy sentences tend our grammar of propositions to include them: in its domain. From M-Type we see propositions stating an object o is of χLI ::= o o o o type τ are satisfied when the value of o in ρ is a subtype of τ. ă | ď Similarly M-TypeNot tells us if an object o’s value in ρ has Finally, the types of some language primitives must be a type which does not overlap with τ, then the proposition enriched so these newly added forms are emitted during type o τ is satisfied. checking. For example, we must modify the typing judgment R for integer literals to include the precise symbolic object: 3.5.2 Soundness Our first lemma states that our proof theory respects models. T-Int Lemma 1. If ρ Γ and Γ ψ then ρ ψ. Γ n : pI ; tt ff ; nq ( $ ( $ | Proof. By structural induction on Γ ψ $ Similarly, primitive functions which perform arithmetic computation, arithmetic comparison, and report a vector’s With our proof theory and models in sync and our opera- length must be updated to return the appropriate proposi- tional semantics defined, we can state and prove the next key lemma for type soundness which deals with evaluation. tions and symbolic objects (similar to how BMi? and 7bi are handled in our presentation of λRT R). Lemma 2. If Γ e : pτ ; ψ+ ψ ; oq, ρ Γ and ρ e v With these additions in place, a simple function which then all of the following$ hold:| ´ ( $ ó converts linear integer propositions into solver-compatible assertions allows our system to begin type checking pro- 1. all non-✁0 structural parts of o are equal in ρ to the corre- grams with these theory-specific types. sponding parts of v, 2. v 7Hb2 and ρ ψ+, or v = 7Hb2 and ρ ψ , and ‰ ( ( ´ 3.5 Semantics and Soundness 3. Γ v : pτ ; tt tt ; ✁0q $ | λRT R uses the big-step reduction semantics described in Fig- Proof. By induction on the derivation of ρ e v. ure 8, which notably treats all non-7Hb2 values as ‘true’ for $ ó the purposes of conditional test-expressions. The evaluation Now we can state our soundness theorem for λRT R. judgment ρ e v states that in runtime-environment ρ, $ ó Theorem 1. (Type Soundness for λRT R). If e : τ and expression e evaluates to the value v. A model-theoretic sat- e v then v : τ. $ isfaction relation is used to prove type soundness, just as in $ ó $ prior work on occurrence typing [26]. Proof. Corollary of Lemma 2.

303 B-Let B-Var ρ e1 v1 B-Fst B-Snd B-Val $ ó B-Abs ρ(x)=v ρ[x := v1] e2 v ρ e v1,v2 ρ e v1,v2 ρ v v $ ó ρ λx:τ.e [ρ,λx:τ.e] $ óx y $ óx y $ ó ρ x v ρ (let (xe ) e ) v $ ó ρ (7bi e) v ρ (bM/ e) v $ ó $ 1 2 ó $ ó 1 $ ó 2 B-Beta B-Prim B-IfTrue ρ e1 [ρc,λx:τ.e] ρ e1 p ρ e1 v1 B-IfFalse B-Pair $ ó $ ó $ ó ρ e2 v2 ρ e2 v2 v1 7Hb2 ρ e1 7Hb2 ρ e1 v1 ρ [x :=$ v ]ó e v δ($p, v )=ó v ρ ‰e v ρ$ eó v ρ $ e ó v c 2 $ ó 2 $ 2 ó $ 3 ó $ 2 ó 2 ρ (e e ) v ρ (e e ) v ρ (if e e e ) v ρ (if e e e ) v ρ (cons e e ) v ,v $ 1 2 ó $ 1 2 ó $ 1 2 3 ó $ 1 2 3 ó $ 1 2 óx 1 2y M-Or M-And M-Alias M-Refine M-Top ρ ψ1 or ρ ψ2 ρ ψ1 ρ ψ2 ρ(o1)=ρ(o2) ρ o τρψ[x o] ρ tt ( ( ( ( ( P ( ÞÑ ( ρ ψ ψ ρ ψ ψ ρ o o ρ o x:τ ψ ( 1 _ 2 ( 1 ^ 2 ( 1 ” 2 ( Pt | u M-Type M-TypeNot M-Theory M-RefineNot1 M-RefineNot2 ρ(o):τ ρ(o):σστ = [[ ρ]] ( χT ρ o τ ρ ψ[x o] $ $ X H T ( R (␣ ÞÑ ρ o τ ρ o τ ρ χT ρ o x:τ ψ ρ o x:τ ψ ( P ( R ( ( Rt | u ( Rt | u Figure 8. Big-step Reduction and Model Relation

Although this model-theoretic proof technique works representative members from alias-equivalent classes of ob- quite naturally, it includes the standard drawbacks of big- jects. By eagerly substituting and using a single representa- step soundness proofs, saying nothing about diverging or tive member in the environment, large complex propositions stuck terms. We could address this by adding an 2``Q` value which conservatively but inefficiently tracked dependencies— of type that is propagated upward during evaluation and such as those arising from local-bindings—can be omitted K modify our soundness claim to show 2``Q` is not derived entirely, resulting in major performance improvements for from evaluating well-typed terms. real world Typed Racket programs. Propogating existentials. Our typing judgments use sub- 4. Scaling to a Real Implementation sumption to omit the less interesting details of type checking. Making this system algorithmic would not only require the Although λRT R describes the essence of our approach, there are additional details to consider when reasoning about a standard inlining of subtyping throughout many of the judg- realistic programming language. ments, but would also require that existential bindings on the type-results of subterms be propagated upward by the current 4.1 Efficient, Algorithmic Subtyping term’s type-result. This ensures all identifiers in the raw re- sults of type checking are still bound and frees us from sim- In order to highlight the essential features of λRT R we chose a more declarative description of the type system. To make plifying every intermediate type-result (as our model with this process efficient and algorithmic several additional steps subsumption often requires). This technique is thoroughly can be taken. described in Knowels and Flanagan’s [17] algorithmic type Hybrid environments. Instead of working with only a system, which served as an important motivation for this as- set of propositions while type checking, it is helpful to use pect of our approach. an environment with two distinct parts: one which resembles a standard type environment—mapping objects to the cur- 4.2 Mutation rently known positive and negative type information—and We soundly support mutation in our type system in a conser- another which contains only the set of currently known com- vative fashion. First, a preliminary pass identifies which vari- pound propositions (since all atomic type-propositions can ables and fields may be mutated during program execution. be efficiently stored in the former part). With these pieces in The type checker then proceeds to type check the program, place, it is easy to iteratively refine the standard type envi- omitting symbolic objects for mutable variables and fields. ronment with the mT/i2 metafunction while traversing the This way, the initial type of a newly introduced variable will abstract syntax tree instead of saving all logical reasoning for be recorded but no potentially unsound assumptions will be checking non-trivial terms. made from runtime tests in the code. Representative objects. Another valuable simplification An illustrative example of this approach in action was which greatly reduced type checking times was the use of found during our vector access case study and analysis of

304 the Racket math library. It contained a module with a vari- Although initially verifying these vector accesses appears able cache-size of type Int. The type system ensured any somewhat straightforward, Typed Racket’s type checker runs updates to the value of cache-size were indeed of type after macro expansion. At that point the obvious nature of the Int, but tests on the relative size of the cache—such as original program may be obfuscated in the sea of primitives (> cache-size n)—failed to produce any logical infor- that emerge, and the system is left to infer types for the newly mation about the size of cache-size. This failure made it introduced identifiers and lambda abstractions: impossible to verify accesses whose correctness relied on the (letrec result of this test, since a concurrent thread could easily mod- ([start 0] [end (len A)] ify the cache and its size between our testing and performing [step 1] [initial 0] the operation, invalidating any supposed guarantees. Indeed, [loop without much effort we were able to cause a runtime error in (λ (pos acc) the math library by exploiting this fact before patching the (cond offending code. [(< pos end) (define i pos) 4.3 Type Inference and Polymorphism (loop (+ step pos) (+ acc (* (vec-ref A i) Typed Racket (and RTR) relies on local type inference [21] (vec-ref B i))))] to instantiate type variables for polymorphic functions when- [else acc]))]) ever possible. Since type inference is such an essential part (loop start initial)) of type checking real programs, we were unable to check any interesting examples until we had accommodated refinement Here RTR is left to infer types for both the domain and types. range of the inner loop function (note that its arguments The constraint generation algorithm in local type infer- were not even annotatable identifiers in the original pro- V gram). Initially, our local type inference chooses type Int ence, written Γ X¯ S : T C, takes as input an envi- ronment Γ, a set$ of typeă variablesñ V , a set of unknown type for the position argument pos. This might be perfectly ac- variables X¯, and two types S and T , and produces a con- ceptable in Typed Racket, since Int is a valid argument type straint set C. Since the implementation of the algorithm al- for vec-ref. However, when attempting to verify the vector ready correctly handled when S is a subtype of T , we merely access, Int is too permissive: it does not express the loop- needed to add the natural cases which allow constraint gen- invariant that pos is always non-negative. eration to properly recurse into the types being refined: In an effort to effectively reason about these macros we experimented with adding an additional heuristic to our in- CG-Ref ference for anonymous lambda applications: if a variable Γ,x τ,ψ ψ is, directly or indirectly, used as a vector index within the P 1 $ 2 Γ V τ : σ C function, we try the type Nat instead of Int. This type, $X¯ ă ñ V combined with the upper-bounds check within the loop, Γ X¯ x:τ ψ1 : x:σ ψ2 C $ t | uă t | uñ is enough to verify the access in (vec-ref A i) and CG-RefUpper (vec-ref B i) (assuming they are of equal length). How- CG-RefLower Γ,x τ ψ ever, the heuristic quickly fails in the reverse iteration case, V V P $ (in-range (len A) 0 -1) (i.e. where i steps from Γ X¯ τ : σ C Γ X¯ τ : σ C V $ ă ñ V $ ă ñ (sub1 (len A)) to 0) since for the last iteration pos is -1 Γ ¯ x:τ ψ : σ C Γ ¯ τ : x:σ ψ C $X t | uă ñ $X ă t | uñ and not a Nat. This naturally requires maintaining the full environment of More advanced techniques for inferring invariants—such propositions throughout the constraint generation process. as those used by Liquid Types[22]—will be needed if id- Although we did not perform a detailed analysis, the annota- iomatic patterns such as Racket’s for are to seamlessly in- tion burden for polymorphic functions seems unaffected by tegrate with refinement types. our changes. 5. Case Study: Safe Vector Access 4.4 Complex Macros In order to evaluate RTR’s effectiveness on real programs we 2 Racket programmers use a series of for-macros for many it- examined all unique vector accesses in three large libraries eration patterns [10]. This simple dot-product example iter- written in Typed Racket, totalling more than 56,000 lines of ates i from 0 to (sub1 (len A)) to perform the relevant code: computations: • The math library, a Racket standard library covering op- erations ranging from number theory to linear algebra. It (for/sum ([i (in-range (len A))]) (* (vec-ref A i) 2 Since we type check programs after macro expansion, vector accesses were (vec-ref B i))) assessed at this time as well, and accesses in macros were only counted once.

305 100 ing: pattern matching on vectors and loops using a vector’s length as an explicit bound were extremely common. For the remaining vector accesses we performed a pre- 80 6% liminary review of the plot and pict3d libraries and an in depth examination of the math library. 13% 60 5.1 Enriching the Math Library For the math library we examined each individual access to 40 34% determine how many of the failing cases our system might 74% handle with reasonable effort. We identified five general cat- 33% egories that describe these initially unverified vector opera-

% of vector ops verifiable 20 tions: 25% Annotations Added. 34% of the failed accesses were 13% unverified until additional (or more specific) type annotations plot pict3d math were added to the original program. In this recursive loop snippet taken from our case study, for example, the Nat Verified after code modifications annotation for the index i is not specific enough to verify the vector reference: Verified with type annotations added Automatically verified (let loop ([i : Nat (len ds)] [res : Nat 1]) (cond [(zero? i) res] Figure 9. safe-vec-ref case study [else (loop (- i 1) (* res (safe-vec-ref ds i)))])) contains 22,503 lines of code and 301 unique vector op- erations. Using (Refine [i : Nat] (≤ i (len ds))) for the type of i, however, allows RTR to verify the vector access • The plot library, also a part of Racket’s standard library, immediately. As we discussed in subsection 4.4, a more ad- which supports both 2- and 3-dimensional plotting. It con- vanced inference algorithm could potentially help by auto- tains 14,987 lines of code and 655 unique vector opera- matically inferring these types. On the other hand, as code tions. documentation these added annotations often made programs • The pict3d library,3 which defines a performant 3D en- easier to understand and helped us navigate our way through gine with a purely functional interface, has 19,345 lines the large, unfamiliar code base. of code and 129 unique vector operations. Code Modified. 13% of the unverified accesses were ver- ifiable after small local modifications were made to the body These libraries were chosen because of their size and fre- of the program. In some cases, these modifications moved quent use of vector operations. During our analysis we tested the code away from particularly complex macros; other pro- whether each vector read and write could be replaced with its grams presented opportunities for a few well-placed dynamic equivalent safe-vec- counterpart and still type check. checks to prove the safety of a series of vector operations. An To reason statically about vector bounds and linear inte- example of the latter can be seen in the function vec-swap!: ger arithmetic we first enriched Typed Racket’s base type (: vec-swap! : environment, modifying the type of 36 functions. This in- ∀ {A} (Vecof A) Int Int -> Void) cluded enriching the types of 7 vector operations, 16 arith- (define (vect-swap! vs i j) metic operations, 12 arithmetic fixnum operations (i.e. oper- (unless (= i j) ations that work only on fixed-width integers), and the typing (cond of Racket’s equal?. [(and (< -1 i (len vs)) ;; added We initially verified over 50% of accesses without the aid (< -1 j (len vs))) ;; added of additional annotations to the source code. As Figure 9 il- (define i-val (safe-vec-ref vs i)) lustrates, our success rate for entirely automatic verification (define j-val (safe-vec-ref vs j)) of vector indices was 74% for plot, 13% for pict3d, and (safe-vec-set! vs i j-val) 25% for math. We attribute plot’s unusually high automatic (safe-vec-set! vs j i-val)] [else (error "bad index(s)!")]))) success rate relative to the other libraries to a few heavily re- peated patterns which are guaranteed to produce safe index- This function swaps the values at two indices within a vec- tor. Our initial investigation concluded adding constraints to 3 https://github.com/jeapostrophe/pict3d the type was unreasonable for this particular function (i.e.

306 clients could not easily satisfy the more specific types), how- F‹ [23, 24] adds full dependent types and refinement types ever we noticed adding two simple tests on the indices in (along with other features) to an Fω-like core while allowing question allowed us to safely verify four separate vector op- manual and SMT solver-backed discharging of proof obliga- erations without perturbing any client code. This approach tions. Although our system shares the goal of allowing users seemed like an advantageous tradeoff in this and other situ- to further enrich their typed programs beyond the expressive- ations and worked well in our experience. ness of the core system, we have chosen a simpler, less ex- Beyond our scope. 22% were unverifiable because, in pressive approach aimed at allowing dynamically typed pro- their current form, their invariants were too complex to de- grams to gradually adopt a simpler set of dependent types. scribe (i.e. they were outside the scope of our type system Chugh et al. [5] explore how extensive use of refinement and/or linear integer theory). One simple example of this in- types and an SMT solver enable type checking for rich dy- volved determining the maximum dimension dims for a list namically typed languages such as JavaScript [4]. This ap- of arrays: proach feels similar to ours in terms of features and expres- siveness. As seen in our respective metatheories, however, (define dims (apply max (map len dss))) their system requires a much more complicated design and Because of the complex higher order nature of these oper- a complex stratified soundness proof; this fact has made it ations, our simple syntactic analysis and linear integer theory “[difficult to] add extra (basic) typing features to the lan- was unable to reason about how the integer dims related to guage” [28]. In contrast, our system uses a well-understood the vectors in the list dss. core and does not require interaction with an external SMT Unimplemented features 6% of the unverified accesses solver. This allows us to use many common type-theoretic involved Racket features we had neglected to support dur- algorithms and techniques—as witnessed by Typed Racket’s ing implementation (e.g. dependent record fields), but which continued adoption of new features. seemed otherwise amenable to our verification techniques. Vekris et al. [28] explore how refinements can help rea- Unsafe code. As previously mentioned in subsection 4.2, son about complex JavaScript programs utilizing a novel two we discovered 2 vector operations which made unsafe as- phase approach. The first phase elaborates the source lan- sumptions about a mutable cache whose size could shrink guage into a ML-like target that is checked using standard and cause errors at runtime. Both of these correctly did not techniques, at which point the second phase attempts to ver- typecheck using our system and were subsequently patched. ify all ill-typed branches are in fact infeasible using refine- Total. In all, 72% of the vector accesses in the math library ments in the spirit of Knowles and Flanagan [18] and Rondon were verifiable using these approaches without drastically et al. [22]. Our single-phase approach, however, does not re- altering any internal algorithms or data representations.4 quire elaboration into an ML-like language and allows our system to work more directly with a larger set of types. 6. Related Work Sage’s use of a dynamic and static types is similar to There is a history of using refinements and dependent types our approach for type checking programs. However, their to enrich already existing type systems. Dependent ML [32] usage of first-class types and arbitrary refinements means adds a practical set of dependent types to standard ML to their core system is expressive yet undecidable [13]. Our allow for richer specifications and compiler optimizations system utilizes a more conservative, decidable core in which through simple refinements, using a small custom solver to only a small set of immutable terms are lifted into types. check constraints. Liquid Haskell [27] extends Haskell’s type Because of this, having impure functions and data in the system with a more general set of refinement types supported language does not require changes to the type system. Also, by an SMT solver and predicate abstraction. We similarly our approach only reasons about non-type related theories strive to provide an expressive, practical extension to an ex- when they are explicitly added. isting type system by adding dependent refinements. Our Our usage of existential quantification to enable depen- approach, however, seeks to enrich a type system designed dent yet abstract reasoning for values no longer in scope specifically for dynamically typed languages and therefore is strongly resembles the approach described by Knowels and built on a different set of foundational features (e.g. subtyp- Flanagan [17]. Our design, however, lifts fewer terms into ing, ‘true’ union types, type predicates, etc.). types in general and substitutes terms directly into types Some approaches, aiming for more expressive type spec- when possible. Additionally, our design includes features ifications, have shown how enriching an ML-like typesys- specifically aimed at dynamic languages instead of refining tem with dependent types and access to theorem proving (au- a more standard type theory. tomated and manual) provides both expressive and flexible Ou et al. [20] aim to make the process of working with de- programming tools. ATS, the successor of DML, supports pendent types more palatable by allowing fine-grained con- both dependent and linear types as well as a form of inter- trol over the trade-offs between dependent and simple types. active theorem proving for more complex obligations [3]. This certainly is similar to our system in spirit, but there are several important differences. They choose to automatically 4 Our modified math library can be found in our artifact.

307 insert coercions when dependent fragments and simple types [2] Ambrose Bonnaire-Sergeant, Rowan Davies, and Sam Tobin- interact, while we do not explicitly distinguish between the Hochstadt. Practical Optional Types for Clojure. In Proc. two and require explicit code to cast values. Additionally, ESOP, 2016. while they convert their programs from a surface language [3] Chiyan Chen and Hongwei Xi. Combining Programming with into an entirely dependently typed language, our programs Theorem Proving. In Proc. ICFP, 2005. are translated into dynamically typed Racket code, which is [4] Ravi Chugh, David Herman, and Ranjit Jhala. Dependent void of any artifacts of our type system. This places us in a Types for Javascript. In Proc. OOPSLA, 2012. more suitable position for supporting sound interoperability [5] Ravi Chugh, Patrick M. Rondon, and Ranjit Jhala. Nested between untyped and dependently typed programs. Refinements: A Logic for Duck Typing. In Proc. POPL, 2012. Manifest contracts [12] are an approach that uses depen- [6] Microsoft Co. Typescript Language Specification. http: dent contracts both as a method for ensuring runtime sound- //www.typescriptlang.org, 2014. ness and as a way to provide static typing information. Un- [7] George B. Dantzig and B. Curtis Eaves. Fourier-Motzkin like our system, this method only reasons about explicit casts Elimination and Its Dual. J. Combinatorial Theory Series A, (i.e. program structure does not inform the type system), and 1973. there is no description of how a solver would be utilized to [8] Leonardo De Moura and Nikolaj Bjorner. Z3: An Efficient dispatch proof goals. SMT Solver. In Proc. TACAS, 2008. [9] Matthias Felleisen, Robert Bruce Findler, and Matthew Flatt. 7. Conclusion Semantics Engineering with PLT Redex.MITPress, 2009. Enriching existing programs with stronger static guarantees [10] Matthew Flatt and PLT. Reference: Racket. Technical Re- is the original goal of the scripts-to-programs approach. Its port PLT-TR-2010-1, PLT Design Inc., 2010. https:// realization in the type system underlying Typed Racket al- racket-lang.org/tr1. lows Racket programmers to add simple types to their pro- [11] Matthew Fluet and Riccardo Pucella. Practical Datatype grams with relatively little effort. In this paper, we show how Specializations with Phantom Types and Recursion Schemes. refinement types and an extensible logic allows programmers Electronic Notes in Theoretical Computer Science, 2006. to continue this process by adding additional invariants to [12] Michael Greenberg, Benjamin C. Pierce, and Stephanie their repertoire which allow for strong new guarantees. Our Weirich. Contracts Made Manifest. In Proc. POPL, 2010. integration of refinement types with the occurrence typing [13] Jessica Gronski, Kenneth Knowles, Aaron Tomb, Stephen N. underlying Typed Racket produces a new system that is both Freund, and Cormac Flanagan. Sage: Hybrid Checking for more expressive and simpler than previous approaches. Flexible Specifications. In Proc. Wksp. on Scheme and Func- Additionally, our evaluation demonstrates that despite the tional Programming, 2006. relatively simple nature of RTR’s dependent types, the in- [14] David Herman and Philippe Meunier. Improving the Static variants that can be expressed are powerful. Our case study Analysis of Embedded Languages via Partial Evaluation. In of vector operations finds that half of existing operations in Proc. ICFP, 2004. a large Typed Racket code base can be already proven safe [15] Facebook Inc. Flow: A static type checker for JavaScript. with many of the remainder checked with simple annotations http://flowtype.org, 2014. and changes to the code. We anticipate that other programs, [16] Facebook Inc. Hack. http://hacklang.org, 2014. ranging from fixed-width arithmetic to theories of regular ex- [17] Kenneth Knowles and Cormac Flanagan. Compositional pressions [14], can similarly benefit from the strong specifi- Reasoning and Decidable Checking for Dependent Contract cations provided via refinement types. Types. In Proc. PLPV, 2009. [18] Kenneth Knowles and Cormac Flanagan. Hybrid Type Check- Acknowledgments ing. ACM Trans. Program. Lang. Syst., 2010. [19] Frederic P. Miller, Agnes F. Vandome, and John McBrewster. This work was supported by funding from the National Sci- Advanced Encryption Standard.Alpha Press, 2009. ence Foundation and the National Security Agency. We [20] Xinming Ou, Gang Tan, Yitzhak Mandelbaum, and David would like to thank the anonymous reviewers for their help- Walker. Dynamic Typing with Dependent Types. IFIP Intl. ful feedback on this article, and our colleagues at Indiana Conf. on Theoretical Computer Science, 2004. University whose helpful discussions influenced our work: [21] Benjamin C. Pierce and David N. Turner. Local Type Infer- Ambrose Bonnaire-Sergeant, David Christiansen, Cameron ence. ACM Trans. Program. Lang. Syst., 2000. Swords, Andre Kuhlenschmidt, and Jeremy Siek. [22] Patrick M. Rondon, Ming Kawaguci, and Ranjit Jhala. Liquid Types. In Proc. PLDI, 2008. References [23] Nikhil Swamy, Juan Chen, Cédric Fournet, Pierre-Yves Strub, [1] Esteban Allende, Oscar Callau, Johan Fabry, Éric Tanter, and Karthikeyan Bhargavan, and Jean Yang. Secure Distributed Marcus Denker. Gradual Typing for Smalltalk. Science of Programming with Value-dependent Types. In Proc. ICFP, Computer Programming, 2014. 2011.

308 [24] Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, [28] Panagiotis Vekris, Benjamin Cosman, and Ranjit Jhala. Trust, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bharga- but Verify: Two-Phase Typing for Dynamic Languages. In van, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, Proc. ECOOP, 2015. Jean-Karim Zinzindohoue, and Santiago Zanella-Béguelin. [29] Michael M. Vitousek, Andrew M. Kent, Jeremy G. Siek, and Dependent Types and Multi-monadic Effects in F*. In Proc. Jim Baker. Design and Evaluation of Gradual Typing for POPL, 2016. Python. In Proc. DLS, 2014. [25] Sam Tobin-Hochstadt and Matthias Felleisen. Interlanguage [30] Stephanie Weirich. Depending on Types. In Proc. ICFP, 2014. Migration: From Scripts to Programs. In Proc. DLS, 2006. [31] Hongwei Xi. Dependent ML: An Approach to Practical Pro- [26] Sam Tobin-Hochstadt and Matthias Felleisen. Logical Types gramming with Dependent Types. J. Functional Program- for Untyped Languages. In Proc. ICFP, 2010. ming, 2007. [27] Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, [32] Hongwei Xi and Frank Pfenning. Eliminating Array Bound and Simon Peyton-Jones. Refinement Types for Haskell. In Checking Through Dependent Types. In Proc. PLDI, 1998. Proc. ICFP, 2014.

309