<<

Practical Optional Types for

Ambrose Bonnaire-Sergeant Rowan Davies Sam Tobin-Hochstadt Indiana University University of Western Australia Indiana University [email protected] [email protected] [email protected]

Abstract (ann parent[’{:file(U nil File)} ->(U nil Str)]) (defn parent[{^Filef :file}] Typed Clojure is an optional for Clojure, a dynamic (iff(.getParentf) nil)) language in the Lisp family that targets the JVM. Typed Clojure’s type system build on the design of Typed Racket, repurposing in particular occurrence typing, an approach to statically reasoning Figure 1. A simple Typed Clojure program about predicate tests. However, in adapting the type system to Clo- jure, changes and extensions are required to accommodate addi- tional language features and idioms used by Clojure programmers. and features of Clojure, producing an expressive synthesis of ideas In this paper, we describe Typed Clojure and present these type and demonstrating a surprising coincidence between multiple dis- system extensions, focusing on three features widely used in Clo- patch in Clojure and Typed Racket’s occurrence typing framework. jure. First, Java interoperability is central to Clojure’s mission but The essence of Typed Clojure, of course, is Clojure, a dynam- introduces challenges such as ubiquitous null; Typed Clojure han- ically typed language in the Lisp family built to run on the Java dles Java interoperability while ensuring the absence of null-pointer Virtual Machine (JVM) which has recently gained popularity as an exceptions in typed programs. Second, Clojure programmers id- alternative JVM language. It offers the flexibility of a Lisp dialect, iomatically use immutable dictionaries for data structures; Typed including macros, emphasizes a functional style via a standard li- Clojure handles this in the type system with multiple forms of het- brary of immutable data structures, and provides interoperability erogeneous dictionary types. Third, multimethods provide extensi- with existing Java code, allowing programmers to use existing Java ble operations, and their Clojure semantics turns out to have a sur- libraries without leaving Clojure. Since its initial release in 2007, prising synergy with the underlying occurrence typing framework. Clojure has been widely adopted for “backend” development in We provide a formal model of the Typed Clojure type system places where its support for parallelism, , incorporating these and other features, with a proof of soundness. and Lisp-influenced abstraction is desired on the JVM. As a result, Additionally, Typed Clojure is now in use by numerous corpora- it now has an extensive base of existing untyped programs, whose tions and developers working with Clojure, and we report on expe- developers can now benefit from Typed Clojure. As a result, Typed rience with the system and its lessons for the future. Clojure is used in industry, experience we discuss in this paper. Figure 1 presents a simple program demonstrating many aspects 1. Clojure with static typing of our system, from simple type annotations to explicit handling of Java’s null (written nil) in interoperation, as well as an extended The popularity of dynamically-typed languages in software devel- form of occurrence typing and Clojure’s type hints, which are opment, combined with a recognition that types often improve pro- central to Typed Clojure’s approach to interoperability. grammer productivity, software reliability, and performance, has The parent function has the type led to the recent development of a wide variety of optional and gradual type systems aimed at checking existing programs writ- [’{:file(U nil File)} ->(U nil Str)] ten in existing languages. These include Microsoft’s TypeScript for JavaScript, Facebook’s Hack for PHP and Flow for JavaScript, and which means that it takes a hash table whose :file key maps to MyPy for Python among the optional systems, and Typed Racket, either nil or a File, and it produces either nil or a String. Reticulated Python, and GradualTalk among gradually-typed sys- The parent function uses the :file keyword as an accessor tems.1 to get the file, checks that it isn’t nil, and then obtains the One key lesson of these systems, indeed a lesson known to early parent by making a Java method call. The annotation ^Filef developers of optional type systems such as StrongTalk, is that type is a type hint on f, which instructs the Clojure compiler (run- systems for existing languages must be designed to work with the ning prior to Typed Clojure typechecking) to statically resolve features and idioms of the target language. Often this takes the form the getParent call to File’s getParent method with signature of a core language, be it of functions or classes and objects, together String getParent();, rather than using reflection at runtime. with extensions to handle distinctive language features. In the remainder of this paper, we describe how Typed Clojure’s We synthesize these lessons to present Typed Clojure, an op- central innovations, including Java interoperability, multimethods, tional type system for Clojure. Typed Clojure builds on the core and heterogeneously-typed immutable maps, enable this example type checking approach of Typed Racket, an existing gradual type and many others. We begin with an example-driven presentation of system for Racket. However, Typed Clojure extends this basic the main type system features in Section 2. We then incrementally framework in multiple ways to accommodate the unique idioms present a core calculus for Typed Clojure covering all of these features together in Section 3 and prove type soundness (Section 4). 1 We reserve the term “gradual typing” for systems such as Typed Racket We then discuss the full implementation of Typed Clojure, dubbed which soundly interoperate between typed and untyped code; systems like core.typed, which extends the formal model in many ways, and Typed Clojure or TypeScript which do not enforce type invariants we the experience gained from its use in Section 5. Finally, we discuss describe as “optionally typed”. related work and conclude.

1 2015/2/28 2. Overview of Typed Clojure Here is the above program in Typed Clojure. We now begin a tour of the central features of Typed Clojure, Example 1 beginning with Clojure itself. In our presentation, we will make use (ns demo.eg1 of the full Typed Clojure system to illustrate the key type system (:refer-clojure :exclude[fn]) ideas, before studying the core features in detail in section 3. (:require[clojure.core.typed :refer[fnU Num]])) 2.1 Clojure (fn[x:-(U nil Num)] Clojure (Hickey 2008) is a Lisp built to run on the Java Virtual Ma- (if(number?x)(incx)0)) chine with exemplary support for concurrent programming and im- This is a regular Clojure file compiled with the Clojure com- mutable data structures. It emphasizes mostly-functional program- piler, with the ns form declaring a namespace for managing var ming, restricting imperative updates to a limited set of structures and class imports. Here :require declares a runtime dependency which have specific thread synchronization behaviour. By default, on clojure.core.typed, Typed Clojure’s core namespace, and it provides fast implementations of immutable lists, vectors, and :refer brings a collection of vars into scope by name. The hash tables, which are used for most data structures, although it :refer-clojure :exclude option unmaps core vars from the also provides means for defining new records. current namespace—here we unmap clojure.core/fn, which One of Clojure’s primary advantages is easy interoperation with creates a function from a parameter vector and a body expression, existing Java libraries. It automatically generates appropriate JVM and import a typed variant of fn that supports type annotations. bytecode to make Java method and constructor calls, and treats Java The typed fn supports optional annotations by adding :- and a values as any other Clojure value. However, this smooth interop- type after a parameter position or binding vector to annotate param- erability comes at the cost of pervasive null, which leads to the eter types and return types respectively. Typed Clojure provides a possibility of null pointer exceptions—a drawback we address in check-ns function to type checks the current namespace. number? Typed Clojure. is a Java instanceof test of java.lang.Number. As in Typed Racket, U creates an untagged union type, which can take any num- 2.2 Clojure Syntax ber of types. We describe new syntax as they appear in each example, but we Typed Clojure can already check all of the examples in Tobin- also include the essential basics of Clojure syntax. Hochstadt and Felleisen (2010)—the rest of this section describes nil is exactly Java’s null. Parentheses indicate applications, the extensions necessary to check Clojure code. brackets delimit vectors, braces delimit hash-maps and double quotes delimit Java strings. Symbols begin with an alphabetic char- 2.4 Exceptional control flow acter, and a colon prefixed symbol like :a is a keyword. Along with conditional control flow, Clojure programmers rely on Commas are always whitespace. exceptions to assert type-related invariants.

2.3 Typed Racket and occurrence typing (fn[x:-(U nil Num)] Example 2 Tobin-Hochstadt and Felleisen (2010) presented Typed Racket with (do(if(number?x) nil(throw(Exception.))) occurrence typing, a technique for deriving type information from (incx))) conditional control flow. They introduced the concept of occurrence The do form sequences two expressions returning the latter, typing with the following example. throw corresponds to Java’s throw and (class. args*) is the #lang typed/racket syntax for Java constructors—that is a class name with a dot suffix as the operator followed the arguments to the constructor. (lambda([x:(U #f Number)]) In this example a throw expression guards (incx) , the in- (if(number?x)(add1x)0)) crement function for numbers, from being evaluated if x is nil, preventing a possible null-pointer exception. This function takes a value that is either #f or a number, repre- To check this example, occurrence typing automatically as- sented by an untagged union type. The ‘then’ branch has an implicit sumes x is a number when checking the second do subexpression invariant that x is a number, which is automatically inferred with based on the first subexpression. 2 We model this formally (sec- occurrence typing and type checked without further annotations. tion 3.1) and prove null-pointer exceptions are impossible in typed We chose to build on the ideas and implementation of Typed code (section 4). Racket to implement a type system targeting Clojure for several reasons. Initially, the similarities between Racket and Clojure drew 2.5 Heterogeneous hash-maps us to investigate the effectiveness of repurposing occurrence typing Hash-maps with keyword keys play a major role in Clojure pro- for a Clojure type system—both languages share a Lisp heritage, gramming. HMap types model the most common usages of key- similar standard functions (for instance map in both languages is word maps. variable-arity) and idioms. While Typed Racket is gradually typed and has sophisticated dynamic semantics for cross-language inter- (defalias Expr Example 3 action, we chose to first implement the static semantics with the (U’{:op’:if, :test Expr, :then Expr, :else Expr} hope to extend Typed Clojure to be gradually typed at a future date. ’{:op’:do, :left Expr, :right Expr} Finally, Typed Racket’s combination of bidirectional checking and ’{:op’:const, :val Num})) occurrence typing presents a successful model for type check- ing dynamically typed programs without compromising soundness, (defn an-exp[]:- Expr which is appealing over success typing (Lindahl and Sagonas 2006) (let[v{:op :const, :val1}] which cannot prove strong properties about programs and soft typ- {:op :do, :leftv, :rightv})) ing (Cartwright and Fagan 1991) which has proved too complicated in practice. 2 See https://github.com/typedclojure/examples for full exam- ples. From here we omit ns forms.

2 2015/2/28 The defn macro defines a top-level function, with syntax like however we could stash the raw private key in another entry the typed fn. The function an-exp is verified to return an Expr. like :secret-key which is not mentioned by the partial HMap The defalias macro defines a type abbreviation which can EncKeyPair without Typed Clojure noticing. reference itself recursively. Here defalias defines Expr that de- scribes the structure of a recursively-defined AST as a union of Branching on HMaps Finally, testing on HMap properties al- HMaps. A quoted keyword in a type, such as ’:if, is a singleton lows us to refine its type down branches. dec-map takes an Expr, type that contains just the keyword. A type that is a quoted map like traverses to its nodes and decrements their values by dec, then ’{:op’:if} is a HMap type with a fixed number of keyword en- builds the Expr back up with the decremented nodes. present tries of the specified types known to be , zero entries known Example 6 to absolutely be absent, and an infinite number of unknown entries 1 (ann dec-leaf[Expr -> Expr]) entries. Since only keyword keys are allowed, they do not require 2 (defn dec-leaf[m] quoting. 3 (if(=(:opm) :if) 4 {:op :if, HMaps in Practice The next example is extracted from a produc- 5 :test(dec-leaf(:testm)), tion system at CircleCI, a company with a large production Typed 6 :then(dec-leaf(:thenm)), Clojure system (section 5.3 presents a case study). 7 :else(dec-leaf(:elsem))} 8 (if(=(:opm) :do) Example 4 (defalias RawKeyPair 9 {:op :do, (HMap :mandatory{:public-key RawKey, 10 :left(dec-leaf(:leftm)), :private-key RawKey}, 11 :right(dec-leaf(:rightm))} :complete? true)) 12 {:op :const, (defalias EncKeyPair 13 :val(dec(:valm))}))) (HMap :mandatory{:public-key RawKey, :enc-private-key EncKey}, If we go down the then branch (line 4), since (=(:opm) :if) :complete? true)) is true we remove the :do and :const Expr’s from the type of m (because their respective :op entries disagrees with (=(:opm) :if) ) (ann enc-keypair[RawKeyPair -> EncKeyPair]) and we are left with an :if Expr. On line 8, we instead strike out the (defn enc-keypair[kp] :if Expr since it contradicts (=(:opm) :if) being false. Line (assoc(dissoc kp :private-key) 9 know we can remove the :const Expr from the type of m because :enc-private-key(encrypt(:private-key kp)))) it contradicts (=(:opm) :do) being true, and we know m is a :do Expr. Line 12 we strike out :do because (=(:opm) :do) enc-keypair takes an unencrypted keypair and returns an en- is false, so we are left with m being a :const Expr. crypted keypair by non-destructively dissociating the raw :private-key Section 3.3 discusses how this automatic reasoning is achieved. entry with dissoc and associating an encrypted private key as :enc-private-key on an immutable map with assoc. The 2.6 Java interoperability expression (:private-key kp) shows that keywords are also Clojure supports interoperability with Java, including the ability to functions that look themselves up in a map returning the asso- call constructors, methods and access fields. ciated value or nil if the key is missing. Since EncKeyPair is :complete?, Typed Clojure enforces the return type does not con- (fn[f](.getParentf)) tain an entry :private-key, and would complain if the dissoc operation forgot to remove it. Calls to Java methods and fields have prefix notation like The next example is the same except we use the :absent-keys (.method target args*) and (.field target) respectively, HMap option. We also utilize map destructuring, which is binding with method and field names prefixed with a dot and methods tak- position syntax for pattern matching. Parameters are replaced with ing some number of arguments. maps of symbols to keywords that bind the symbols to lookups on Unlike Java, Clojure is dynamically typed. We have no type that keyword, optionally terminated by :as which aliases the pa- information about f but we still need to pick a method to call. rameter. For example (fn[{^Filex :x, :asm}] ...) ex- The Clojure compiler delegates the choice to runtime using Java pands to a let binding m to the first argument and ^Filex to reflection. Unfortunately reflection is slow and unpredictable, so (:xm) . Clojure’s let takes a flat binding vector which can refer Clojure supports type hints to help eliminate it where possible, to previous bindings like Scheme’s let*, and a body expression. (fn[^Filef](.getParentf)) (defalias RawKeyPair Example 5 Symbols support metadata—the syntax ^Filef is a single ex- (HMap :mandatory{:public-key RawKey, pression that is a symbol f with metadata {:tag File}. In binding :private-key RawKey})) positions like (fn[^Filef] ...) syntactic occurrences pre- (defalias EncKeyPair serve metadata. (HMap :mandatory{:public-key RawKey, The Clojure compiler uses the type hint to statically resolve :enc-private-key EncKey}, the method call to the public String getParent() method of :absent-keys#{:private-key})) java.io.File. The call to getParent is unambiguous at runtime but type checking fails—Typed Clojure considers f to be of type (ann enc-keypair[RawKeyPair -> EncKeyPair]) Any, which is unsafe to use as the target of even a resolved method. (defn enc-keypair[{pkey :private-key, :as kp}] If instead we annotate the function parameter with type File, we (assoc(dissoc kp :private-key) get a static type error. :enc-private-key(encrypt pkey))) (fn[f:- File](.getParentf)) ;; type error Since this example enforces that :private-key must not ap- pear in a EncKeyPair Typed Clojure would still complain if we Typed Clojure disallows reflection in typed code so we must add forgot to dissoc :private-key from the return value. Now, back the type hint to obtain a well-typed expression.

3 2015/2/28 (isa? :a :a) ;=> true (fn[^Filef:- File](.getParentf)) Example 7 (isa? Keyword Object) ;=> true The type hinting system and Typed Clojure’s static type check- We include the above sequence of definitions as example 11. ing are separate, the latter predating the former by several years. The interaction between them is often not as obvious, for example Typed Clojure has an explicit type for null null-pointer exceptions (ann path[Any ->(U nil String)]) Example 11 are impossible. (defmulti path class) do not need to take nil into account, (defmethod path String[x]x) (defmethod path File[^Filex](.getPathx)) (defn parent[^Filef:-(U nil File)] Example 8 (iff(.getParentf) nil)) (path "dir/a") ;=> "a" Typed Clojure and Java treat null differently. In Clojure, where it is known as nil, Typed Clojure assigns it an explicit type called nil. In Java null is implicitly a member of any refer- Typed Clojure does not predict if a runtime dispatch will be ence type. This means the Java static type String is equivalent successful—(path :a) type checks because :a agrees with the to (U nil String) in Typed Clojure. parameter type Any, but throws an error at runtime. Reference types in Java are nullable, so to guarantee a method call does not leak null into a Typed Clojure program we must HMap dispatch The flexibility of isa? is key to the generality assume methods can return nil. of multimethods. In example 12 we dispatch on the :op key of our HMap AST Expr. Since keywords are functions that look (ann parent[(U nil File) ->(U nil Str)]) Example 9 themselves up in their argument, we simply use :op as the dispatch (defn parent[^Filef] function. (iff(.getParentf) nil)) (ann inc-leaf[Expr -> Expr]) Example 12 In contrast, JVM invariants guarantee that constructors cannot (defmulti inc-leaf :op) return null, so we are safe to assume constructors are non-nullable. (defmethod inc-leaf :if[{tt :test,t :then,e :else}] {:op :if, Example 10 (fn[^Strings:- String]:- File :test(inc-leaf tt), (File.s)) :then(inc-leaft), By default Typed Clojure conservatively assumes method and :else(inc-leafe)}) constructor arguments to be non-nullable, but can be configured (defmethod inc-leaf :do[{l :left, :right}] globally for particular positions if needed. {:op :do, :left(inc-leafl), 2.7 Multimethods :right(inc-leafr)}) (defmethod inc-leaf :const[{v :val}] A multimethod in Clojure is a function with a dispatch function {:op :const, and a dispatch table of methods. Multimethods are created with :val(incv)}) defmulti. inc-map is like example 6 except the nodes are incremented. (ann path[Any ->(U nil String)]) The reasoning is similar, except we only consider one branch (the (defmulti path class) current method) by locally considering the current dispatch value The multimethod path has type [Any ->(U nil String)] , and reasoning about how it relates to the dispatch function. For an initially empty dispatch table and dispatch function class, a example, in the :do method we learn the :op key is a :do, which function that returns the class of its argument or nil if passed nil. narrows our argument type to the :do Expr, and similarly for the We can use defmethod to install a method to path. :if and :const methods. (defmethod path String[x]x) Multiple dispatch isa? is special with vectors—vectors of the same length recursively call isa? on the elements pairwise. Now the dispatch table maps the dispatch value String to the function (fn[x]x) . We add another method which maps File (isa?[Keyword Keyword][Object Object]) ;=> true to the function (fn[^Filex](.getPathx)) in the dispatch Example 13 simulates multiple dispatch by dispatching on a table. vector containing the class of both arguments. open takes two (defmethod path File[^Filex](.getPathx)) arguments which can be strings or files and returns a new file that concatenates their paths. After installing both methods, the call We call three different File constructors, each known at (path(File. "dir/a")) compile-time via type hints. Multiple dispatch follows the same kind of reasoning as example 12, except we update multiple bind- dispatches to the second method we installed because ings simultaneously. (isa?(class "dir/a") String) 2.8 Final example is true, and finally returns Example 14 combines everything we will cover for the rest of the paper: multimethod dispatch, reflection resolution via type hints, ((fn[^Filex](.getPathx)) "dir/a") . Java method and constructor calls, conditional and exceptional flow The isa? function first tries an equality check on its arguments, reasoning, and HMaps. then if that fails and both arguments are classes a subclassing check We dispatch on :p to distinguish the two cases of FSM—for is returned. example on :F we know the :file is a file. The body of the first

4 2015/2/28 , e ::= x | v | (e e) | λxτ .e Expressions Example 13 (defaliasFS(U File String)) | (if e e e) | (let [x e] e) τ v ::= s | n | | [ρ, λx .e]c Values (ann open[FSFS -> File]) c ::= class | inc | number? Constants ψ|ψ (defmulti open(fn[lr] σ, τ ::= > | (S −→τ ) | x:τ −−−→ τ Types [(classl)(classr)])) o (defmethod open[File File][^File f1,^File f2] | (Val s) | C s ::= k | C | nil | b Value types (let[s(.getPath f2)] b ::= true | false Boolean values (do(if(string?s) nil(throw(Exception.)))

(File. f1s)))) ψ ::= τπ(x) | τπ(x) | ψ ⊃ ψ Propositions (defmethod open[String String][s1 s2] | ψ ∧ ψ | ψ ∨ ψ | tt | ff (File.(str s1 "/" s2))) o ::= π(x) | ∅ Objects (defmethod open[File String][^File s1,^String s2] π ::= pe−→ Paths (File. s1 s2)) −→ Γ ::= ψ Proposition Environment −−−−→ (open(File. "dir") "a") ;=> # ρ ::= {x 7→ v} Value environments (open "dir" "a/b") ;=> # (open(File. "a/b")(File. "c")) ;=> # Figure 3. Syntax of Terms, Types, Propositions and Objects

(defalias FSM Example 14 how the value of this expression can be derived from the current (U’{:p’:F :file(U nil File)} runtime environment. Type propositions can only reference non- ’{:p’:S :str(U nil String)})) empty objects. The second insight is that we can replace the traditional repre- (ann maybe-parent[FSM ->(U nil Str)]) sentation of a type environment (eg., a map from variables to types) (defmulti maybe-parent :p) with a set of propositions, written Γ. Instead of mapping x to the (defmethod maybe-parent:F[{file :file :asm}] type Number, we use the proposition Numberx . (if(:filem)(.getParent^File file) nil)) Given a set of propositions, we can use logical reasoning to (defmethod maybe-parent:S[{^String str :str}] derive new information about our programs with the judgement (do(if str nil(throw(Exception.))) Γ ` ψ. In addition to the standard rules for the logical connectives, (.getParent(File. str)))) the key rule is L-Update, which combines multiple propositions about the same variable, allowing us to refine its type. (maybe-parent{:p:S :str "dir/a"}) ;=> "dir" L-UPDATE (maybe-parent{:p:F :file(File. "dir/a")}) ;=> "dir" Γ ` τπ0(x) Γ ` νπ(π0(x)) (maybe-parent{:p:F :file nil}) ;=> nil Γ ` update(τ, ν, π)π0(x) Figure 2. Multimethod Examples For example, with L-Update we can use the knowledge of Γ ` S ( nil N)x and Γ ` nil x to derive Γ ` Nx . (The metavariable ν ranges over τ and τ (without variables).) We cover L-Update in method uses type hints to resolve reflection and conditional control more detail in Section 3.3. flow to prove null-pointer exceptions are impossible. The second Finally, this approach allows the type system to track program- method is similar except it uses exceptional control flow. ming idioms from dynamic languages using implicit type-based reasoning based on the result of conditional tests. For instance, 3. A Formal Model of λTC Example 1 only utilises x once the programmer is convinced it is safe to do so based whether (number?x) is true or false. To Now that we have demonstrated the core features Typed Clojure express this in the type system, every expression is described by provides, we link them together in a formal model called λ . Our TC two propositions: a ‘then’ proposition for when it reduces to a true presentation will start with a review of occurrence typing (Tobin- value, and an ‘else’ proposition when it reduces to a false value— Hochstadt and Felleisen 2010). Then for the rest of the section for (number?x) the then proposition is Number and the else we incrementally add each novel feature of Typed Clojure to the x formalism, interleaving presentation of syntax, typing rules, opera- proposition is Numberx . tional semantics and . We formalise our type system following Tobin-Hochstadt and The first insight about occurrence typing is that logical formu- Felleisen (2010) (with differences highlighted in blue). The typing las can be used to represent type information about our programs by judgement relating parts of the runtime environment to types via propositional Γ ` e : τ ; ψ+|ψ− ; o logic. Type Propositions ψ make assertions like “variable x is of says expression e is of type τ in the proposition environment Γ, type Number” or “variable x is not nil”—in our logical system we with ‘then’ proposition ψ+, ‘else’ proposition ψ− and object o. write these as Numberx and nil x . The other propositions are stan- We write Γ ` e : τ if we are only interested in the type. dard logical connectives: implications, conjunctions, disjunctions, Syntax is given in Figure 3. Expressions include variables, val- and the trivial (tt) and impossible (ff ) propositions (Figure 3). ues, application, abstractions, conditionals and let expressions. All The particular part of the runtime environment we reference binding forms introduce fresh variables. Values include booleans, in a type proposition is called the object. The typing judgement nil, class literals, keywords, numbers, constants and closures. Value relates an object to every expression in the language. An object is environments map local bindings to values. either empty, written ∅, which says this expression is not known Types include the top type, untagged unions, functions, sin- to evaluate to a particular part of the current runtime environment, gleton types and class instances. We abbreviate Boolean as B, or a variable with some path, written π(x), that exactly indicates Keyword as K and Number and N. The type (Val K) is inhabited

5 2015/2/28 S by the class literal K and :a is of type K. We abbreviate ( ) as S-UNIONSUB ⊥ nil true false S-UNIONSUPER −−−−−−→i ,(Val ) as nil,(Val ) as true and (Val ) as false. Func- S-REFL S-TOP ∃i. ` τ <: σ ` τ <: σ tion types contain latent (terminology from (Lucassen and Gifford ` τ <: τ ` τ <: > i i [ −→i [ −→i 1988)) propositions and object, which, along with the return type, ` τ <:( σ ) ` ( τ ) <: σ may refer to the function argument. They are latent because they are instantiated with the actual object of the argument in applications S-FUNMONO ψ |ψ S-OBJECT S-SCLASS before they are used in the proposition environment. ` x:σ −−−−−→+ − τ <: Fn ` C <: Object ` (Val C) <: Class Figure 4 contains the core typing rules. The key rule for reason- o ing about conditional control flow is T-If. S-FUN ` σ0 <: σ ` τ <: τ 0 S-SBOOL 0 0 T-IF ψ+ ` ψ + ψ− ` ψ − ` (Val b) <: B 0 Γ ` e1 : τ1 ; ψ1+|ψ1− ; o1 ` o <: o 0 0 Γ, ψ1 ` e2 : τ ; ψ2 |ψ2 ; o ψ |ψ ψ |ψ + + − S-SKW + − 0 + − 0 Γ, ψ1 ` e3 : τ ; ψ3 |ψ3 ; o ` x:σ −−−−−→ τ <: x:σ −−−−−−→ τ − + − ` (Val k) <: K o o0 Γ ` (if e1 e2 e3): τ ; ψ2+ ∨ ψ3+|ψ2− ∨ ψ3− ; o Figure 5. Core Subtyping rules The propositions of the test expression e1, ψ1+ and ψ1−, are used as assumptions in the then and else branch respectively. If the result of the if is a true value, then it either came from e2, in which 3.1 Reasoning about Exceptional Control Flow case ψ2+ is true, or from e3, which implies ψ3+ is true. The else We extend our model with sequencing expressions and errors, proposition is ψ2− ∨ ψ3− similarly. The T-Local rule connects the err type system to the proof system over type propositions via Γ ` τx where models the result of calling Clojure’s throw special to derive a type for a variable. Using this rule, the type system can form with some Throwable. then appeal to L-Update to refine the type assigned to x. e ::= ... | err | (do e e) Expressions We are now equipped to type check Example 1, starting at body: Our main insight is as follows: if the first subexpression in a ...(if(number?x)(incx)0) ... sequence reduces to a value, then it is either true or false. If we learn some proposition in both cases then we can use that proposition as S We know Γ = ( nil N)x . The test expression uses T-App, an assumption to check the second subexpression. T-Do formalises this intuition. Γ ` (number? x): B ; Nx |Nx ; ∅ T-DO Nx |Nx since number? has type x: > −−−−−→ B and x has object x. Γ ` e1 : τ1 ; ψ1+|ψ1− ; o1 ∅ Γ, ψ1 ∨ ψ1 ` e : τ ; ψ |ψ ; o Finally we check both branches using the extended proposition + − + − environment as specified by T-If. Going down the then branch, our Γ ` (do e1 e): τ ; ψ+|ψ− ; o new assumption Nx is crucial to check The introduction of errors, which do not evaluate to either a true or false value, makes our insight interesting. Γ, Nx ` x : N ; (∪ nil false) x |(∪ nil false) x ; ∅ because we can now satisify the premise of T-Local: T-ERROR Γ ` err : ⊥ ; ff |ff ; ∅ Γ, Nx ` Nx . Recall Example 2. Operational semantics We define the dynamic semantics for ...(do(if(number?x) nil(throw(Exception.))) λTC in a big-step style using an environment, following Tobin- (incx)) ... Hochstadt and Felleisen (2010). We include both errors and a wrong value, which is provably ruled out by the type system. As before, checking (number? x) allows us to use the proposi- The main judgement is ρ ` e ⇓ α which states that e evaluates to tion Nx when checking the then branch. answer α in environment ρ. We chose to omit the core rules (see By T-Nil and subsumption we can propagate this information to Figure A.17) however a notable difference is nil is a false value, both propositions. which affects the semantics of if: Nx ` nil : nil ; Nx |Nx ; ∅ Furthermore, using T-Error and subsumption we can conclude any- B-IFTRUE thing in the else branch. ρ ` e1 ⇓ v1 B-IFFALSE v1 6= false v1 6= nil ρ ` e1 ⇓ false or ρ ` e1 ⇓ nil Nx ` err : ⊥ ; Nx |Nx ; ∅ ρ ` e2 ⇓ v ρ ` e3 ⇓ v Using the above as premises to T-If we conclude that if the first ρ ` (if e1 e2 e3) ⇓ v ρ ` (if e1 e2 e3) ⇓ v expression in the do evaluates successfully, Nx must be true. Subtyping (Figure 5) is a reflexive and transitive relation with S ( nil N) ` (if (number? x) nil err): B ; Nx |Nx ; ∅ top type >. Singleton types are instances of their respective x classes—boolean singleton types are of type B, class literals are We can now use Nx in the environment to check the second subex- instances of Class and keywords are instances of K. Instances of pression (inc x), completing the example. classes C are subtypes of Object. Since function types are sub- types of Fn, all types except for nil are subtypes of Object, so > 3.2 Precise Types for Heterogeneous maps = (S nil Object). Function subtyping is contravariant left of the Figure 6 presents syntax, typing rules and dynamic semantics in arrow—latent propositions, object and the result type are covariant. detail. The type (HMapE MA) includes M, a map of present en- Subtyping for untagged unions is standard. tries (mapping keywords to types), A, a set of keyword keys that

6 2015/2/28 UBSUME T-S T-CONST T-TRUE Γ ` e : τ ; ψ+|ψ− ; o tt ff tt ff 0 0 Γ ` c : δτ (c); | ; ∅ Γ ` true : true ; | ; ∅ T-ABS Γ, ψ+ ` ψ + Γ, ψ− ` ψ − 0 0 Γ, σx ` e : τ ; ψ+|ψ− ; o ` τ <: τ ` o <: o T-KW T-FALSE 0 0 0 0 Γ ` k :(Val k); tt|ff ; ∅ Γ ` false : false ; ff |tt ; ∅ ψ+|ψ− Γ ` e : τ ; ψ |ψ ; o Γ ` λxσ .e : x:σ −−−−−→ τ ; tt|ff ; ∅ + − o T-CLASS T-NIL T-NUM tt ff ff tt Γ ` n : N ; tt|ff ; ∅ Γ ` C :(Val C); | ; ∅ Γ ` nil : nil ; | ; ∅

T-LET Γ ` e1 : σ ; ψ1+|ψ1− ; o1 T-APP 0 ψf |ψf ψ = (∪ nil false) x ⊃ ψ1+ T-LOCAL Γ ` e : x:σ −−−−−−−→+ − τ ; ψ |ψ ; o 00 o + − ψ = (∪ nil false) x ⊃ ψ1− Γ ` τx f 0 00 0 0 0 0 Γ, σx , ψ , ψ ` e : τ ; ψ+|ψ− ; o σ = (∪ nil false) Γ ` e : σ ; ψ +|ψ − ; o 0 0 0 0 Γ ` (let [x e1] e): τ ; ψ+|ψ−[o1/x]; o[o1/x] Γ ` x : τ ; σx |σx ; x Γ ` (e e ): τ[o /x]; ψf +|ψf −[o /x]; of [o /x]

Figure 4. Typing rules

are known to be absent and tag E which is either C (“complete”) if the map is fully specified by M, and P (“partial”) if there are un- e ::= ... | (get e e) | (assoc e e e) Expressions known entries. To ease presentation, if an HMap has completeness v ::= ... | {} Values tag C then A implicitly contains all keywords not in the domain of τ ::= ... | (HMapE MA) Types M −−−→ . Keys cannot be both present and absent. M ::= {k 7→ τ} HMap mandatory entries The expressions (getm :a) and (:am) are semantically −→ A ::= { k } HMap absent entries identical, though we only model the former to avoid the added E ::= C | P HMap completeness tags complexity of keywords being functions. To simplify presentation, pe ::= ... | keyk Path Elements we only provide syntax for the empty map literal and restrict lookup and extension to keyword keys. The metavariable m ranges over T-GETHMAP −−−−→ −→ −−−−−−−−−−→i [ E the runtime value of maps {k 7→ v}, usually written {k v}. Γ ` e :( (HMap MA) ); ψ1+|ψ1− ; o Subtyping for HMaps designate Map as a common supertype −−−−−−→i Γ ` ek :(Val k) M[k] = τ for all HMaps. S-HMap says that an HMap is a subtype of another [ i HMap if they agree on E, agree on mandatory entries with subtyp- Γ ` (get e e ):( −→τ ); tt|tt ; key (x)[o/x] k k ing and at least cover the absent keys of the supertype. Complete maps are subtypes of partial maps as long as they agree on the T-GETHMAPABSENT E mandatory entries of the partial map via subtyping (S-HMapP). Γ ` e :(HMap MA); ψ1 |ψ1 ; o + − The typing rules for get consider three possible cases. T- Γ ` ek :(Val k) k ∈ A GetHMap models a lookup that will certainly succeed, T-GetHMapAbsent Γ ` (get e e ): nil ; tt|tt ; key (x)[o/x] k k a lookup that will certainly fail and T-GetHMapPartialDefault a T-GETHMAPPARTIALDEFAULT lookup with unknown results. Lookups on unions of HMaps are P Γ ` e :(HMap MA); ψ1+|ψ1− ; o only supported in T-GetHMap, in particular to support looking up Γ ` ek :(Val k) k 6∈ dom(M) k 6∈ A :op on a map of type Expr (Example 3) where every element in tt tt the union contains the key we are looking up. The objects in the Γ ` (get e ek): > ; | ; keyk(x)[o/x] T-Get rules are more complicated than those in T-Local—the next T-ASSOCHMAP section discusses this in detail. Finally T-AssocHMap extends an Γ ` e :(HMapE MA) HMap with a mandatory entry while preserving completeness and Γ ` ek :(Val k)Γ ` ev : τ k 6∈ A absent entries, and enforcing k 6∈ A to prevent badly formed types. E Γ ` (assoc e ek ev):(HMap M[k 7→ τ] A); tt|ff ; ∅ The semantics for get and assoc are straightforward. If the entry is missing, B-GetMissing produces nil. S-HMAP ∀i. M[ki] = σi and ` σi <: τi A1 ⊇ A2 E E −−−→ i 3.3 Paths ` (HMap MA1) <:(HMap {k 7→ τ} A2) Recall the first insight of occurrence typing—we can reason about S-HMAPP specific parts of the runtime environment using propositions. The ∀i. M[k ] = σ ` σ <: τ S-HMAPMONO i i and i i E way to refer to parts of the runtime environment with occurrencing −−−→ i ` (HMap MA) <: Map ` (HMapC MA0) <:(HMapP {k 7→ τ} A) is via path elements.A path consists of a series of path elements applied right-to-left to a variable written π(x). Tobin-Hochstadt B-ASSOC B-GET and Felleisen (2010) introduce the path elements car and cdr to ρ ` e ⇓ m ρ ` e ⇓ m B-GETMISSING 0 reason about selector operations on cells. We instead want to ρ ` ek ⇓ k ρ ` e ⇓ k ρ ` e ⇓ m 0 ρ ` ev ⇓ vv k ∈ dom(m) ρ ` e ⇓ k reason about HMap lookups and calls to class. v = m[k 7→ vv] m[k] = v k 6∈ dom(m) 0 0 ρ ` (assoc e ek ev) ⇓ v ρ ` (get e e ) ⇓ v ρ ` (get e e ) ⇓ nil Key path element We introduce our first path element keyk , which represents the operation of looking up a key k. We directly Figure 6. HMap Syntax, Typing and Operational Semantics relate this to our typing rule T-GetHMap (Figure 6) by checking the then branch of the first conditional test is checked in an equivalent version of Example 3.

7 2015/2/28 (fn[m:- Expr] e ::= ... bC x | (let [bC x e] e) Expressions (if(=(getm :op) :if) | (. e fld) | (. e (mth −→e )) | (new C −→e ) {:op :if, ...} C −→ | (. e (mth −→ e )) Non-reflective Expressions [[C ],C] (if ...))) C −→ −→ | (. e fldC ) | (new C e ) We do not specifically support = in our calculus, but on key- −−−−→ [C ] word arguments it works identically to isa? which we model in v ::= ... | C {fld : v} Values γ ::= ? | C Section 3.5. Intuitively, if Γ ` e : τ ; ψ |ψ ; o then (= e :if) has Type Hints + − Σ ::= {x−−→: γ} Type Hint Environment the true and false propositions −−−−−−−−−→−→ ce ::= {mths 7→ {[mth, [C ],C]}, Class descriptors (Val :if) |(Val :if) [o/x] −−−−−→ x x flds 7→ {[fld, C]}, tt −−→−→ where substitution reduces to if o = ∅. ctors 7→ {[C ]}} −−−−−→ We start with proposition environment Γ = Expr m. Since Expr CT ::= {C 7→ ce} Class Table is a union of HMaps, each with the entry :op, we can use T- T-NEWSTATIC GetHMap. −−−−−−−−−−−−−→ −−−−−−→ tt tt Convert(Ci) = τi Convert(C) = τ Γ ` ei : τi Γ ` (get m :op): K ; | ; key:op(m) −→ Γ ` (new −→ C ei ): τ ; tt|ff ; ∅ Using our intuitive definition of = above, we know [Ci] T-METHODSTATIC Γ ` (= (get m :op):if):B ;(Val :if) |(Val :if) ; ∅ −−−−−−−−−−−−−→ key:op(m) key:op(m) Convert(Ci) = τi Convert(C1) = σ 0 −−−−−−→ Going down the then branch gives us the extended environment Γ Convertnil(C2) = τ Γ ` e : σ Γ ` ei : τi = Expr ,(Val :if) . Using L-Update we can combine what C1 −→ m key:op(m) Γ ` (. e (mth −→ ei )) : τ ; tt|tt ; ∅ [[Ci],C2] we know about object m and object key:op(m) to derive Fld(CT , C, fld) = [C,Cf ] if [fld, Cf ] ∈ CT [C][flds] 0 P −→ −→ −→ Γ ` (HMap {:op :if, :test Expr , :then Expr , :else Expr } {})m Ctor(CT ,C, [Cp]) = [Cp] if [Cp] ∈ CT [C][ctors] −→ −→ −→ The full definition of update is given in Figure 10 which con- Mth(CT , C, mth, [Cp]) = [C, [Cp],Cr] if [mth, [Cp],Cr] ∈ CT [C][mths] siders both keys a path elements as well as the class path element Convertnil(Void ) = nil Convert(Void ) = nil described below. In the absence of paths, update simply performs S Convertnil(C) = ( nil C) Convert(C) = C set-theoretic operations on types; see Figure 11 for details. B-NEW B-FIELD −−−−−−−→ Class path element Our second path element class is used in the ρ ` e ⇓ v ρ ` ei ⇓ vi 1 −→ −→ latent object of the constant class function. Like Clojure’s class JVMgetstatic[C1, v1, fld, C2] = v JVMnew[C1, [Ci], [vi ]] = v function class returns the argument’s class or nil if passed nil. C −→ ρ ` (. e fld 1 ) ⇓ v ρ ` (new −→ C ei ) ⇓ v C2 [Ci] pe ::= ... | class Path Elements B-METHOD tt|tt −−−−−−−−→ S ρ ` em ⇓ vm ρ ` ea ⇓ va δτ (class) = x: > −−−−→ ( nil Class ) −→ class(x) −→ JVMinvokestatic[C1, vm, mth, [Ca], [va],C2] = v The dynamic semantics are given in Figure 8. The definition of C1 −→ ρ ` (. em (mth −→ ea)) ⇓ v update supports various idioms relating to class which we intro- [[Ca],C2] duce in Section 3.5. Figure 7. Java Interoperability Syntax, Typing and Operational 3.4 Java Interoperability and Type Hints Semantics In Section 2.6 we discussed the role of type hints to help eliminate reflective calls. In this section, we introduce our model of Java −→ signature C2 fld;. The constructor invocation (new −→ C ei ) calls and provide user-facing syntax corresponding to Clojure’s Java −→ [Ci] interoperability forms and type hinted forms. Then we model the the constructor with Java signature C (Ci);. Clojure compiler’s compile-time reflection resolution algorithm. To We model Clojure’s reflection resolution algorithm as a rewrite CT 0 achieve this, first we define notation for non-reflective Java forms relation Σ `r e ⇒ e which rewrites e to a possibly-less that unambiguously call a field, method or constructor. Then we reflective expression e0 with respect to type hint environment Σ and define a rewrite relation that uses type hints to resolve reflection Java class table CT . For example, R-FieldElimRefl emits a non- explicitly. Finally we give typing rules that model how Typed reflective field if it can find a field matching the type hint inferred Clojure interacts with non-reflective calls. on e0. We present Java interoperability in a restricted setting without R-FIELDELIMREFL CT 0 0 class inheritance, overloading or Java Generics. Σ `r e ⇒ e Σ `h e : C We extend the syntax in Figure 7 with type hinted expressions, Fld(CT , C, fld) = [Ct,Cf ] reflective and non-reflective Java field lookups and calls to methods Σ `CT (. e fld) ⇒ (. e0 fldCt ) and constructors. We model the syntax after the ‘dot’ special form r Cf to prevent ambiguity—(.flde) is now (. e fld), (.mthe es*) Type hint inference is given by the judgement Σ `h e : γ which is (. e (mth −→es)) and (.class es*) is (new C −→es). The reflective infers the (possibly-unknown) type hint γ of expression e in type expressions come without typing rules because Typed Clojure only hint environment Σ. We defer the remainder of the rules for rewrit- reasons about resolved reflection (as demonstrated in Section 2.6). ing and the definition of type hint inference to the supplemental C1 −→ The method call (. e (mth −→ ei )) is a non-reflective call to material. [[Ci],C2] −→ As an example, we rewrite a simple field reference, with the the mth method on class C1, with Java signature C2 mth (Ci);. assumptions that that the class Point has a field x of Java type N and The field access (. e fldC1 ) calls the field on class C with Java Σ(p) = Point . In rewriting the expression (. p x), Point is inferred C2 1

8 2015/2/28 e ::= ... | (defmulti τ e) Expressions −−−−→ | (defmethod e e e) | (isa? e e) δ(class,C {fld : v}) = C δ(class, true) = B v ::= ... | [v, t]m Values δ(class,C) = Class σ, τ ::= ... | (Multi τ τ) Types τ δ(class, false) = B −−−→ δ(class, [ρ, λx .e]c) = Fn δ(class, k) = K t ::= {v 7→ v} Multimethod dispatch table δ(class, [vd, t]m) = Multi δ(class, nil) = nil T-DEFMULTI δ(class, m) = Map 0 0 ψ |ψ ψ |ψ σ = x:τ −−−−−→+ − τ 0 σ0 = x:τ −−−−−−→+ − τ 00 Γ ` e : σ0 o o0 Figure 8. Primitives Γ ` (defmulti σ e):(Multi σ σ0 ); tt|ff ; ∅

T-DEFMETHOD as the type hint of p and N is the field type by Fld (Figure 7). Now 0 0 ψ |ψ ψ |ψ we just plug in the new information into our new expression + − + − 0 τm = x:τ −−−−−→ σ τd = x:τ −−−−−−→ σ o o0 Point 0 00 00 (. p xN ). Γ ` em :(Multi τm τd) IsAProps(o , τv) = ψ +|ψ − Γ ` e : τ Γ, τ , ψ00 ` e : σ ; ψ |ψ ; o Now we present the typing rules for resolved Java interoperabil- v v x + b + − τ tt ff ity. T-FieldStatic checks a resolved field expression by ensuring the Γ ` (defmethod em ev λx .eb):(Multi τm τd); | ; ∅ target has the correct static type, then returns a nilable type corre- IsAProps(class(π(x)), (Val C)) = Cπ(x)|Cπ(x) sponding the Java type. IsAProps(o, (Val s)) = ((Val s)x |(Val s)x )[o/x] if s 6= C T-FIELDSTATIC IsAProps(o, τ) = tt|tt otherwise Convert(C1) = σ Convertnil(C2) = τ Γ ` e : σ S-PMULTIFN C 1 tt tt ψ |ψ Γ ` (. e fldC ): τ ; | ; ∅ + − 2 ` σt <: x:σ −−−−−→ τ o To continue our example, assume Γ = Pointp. T-FieldStatic ψ0 |ψ0 S + − 0 S-PMULTI therefore produces the type ( nil N) for the entire expression. ` σd <: x:σ −−−−−−→ τ 0 0 0 The rules T-MethodStatic and T-NewStatic work similarly o ` σ <: σ ` τ <: τ 0 0 (Figure 7), varying in the choice of nilability in the conversion ψ+|ψ− ` (Multi σ τ) <:(Multi σ τ ) ` (Multi σt σd) <: x:σ −−−−−→ τ function—methods can return nil but constructors cannot. o The evaluation rules B-Field, B-New and B-Method (Figure 7) S-MULTIMONO simply evaluate their arguments and call the relevant JVM opera- 0 0 ψ |ψ ψ |ψ tion, which do not model—Section 4 states our exact assumptions. ` (Multi x:σ −−−−−→+ − τ x:σ −−−−−−→+ − τ 0 ) <: Multi o o0

3.5 Multimethod preliminaries: isa? B-BETAMULTI B-DEFMETHOD ρ ` e ⇓ [vd, t]m We now consider the isa? operation, a core part of the dispatch 0 0 mechanism for multimethods. Recalling the examples in Sec- ρ ` e ⇓ [vd, t]m ρ ` e ⇓ v ρ ` e0 ⇓ v ρ ` (v v0) ⇓ v tion 2.7, isa? is a subclassing test for classes, otherwise an equality v B-DEFMULTI d e ρ ` ef ⇓ vf ρ ` e ⇓ vd GM(t, ve) = vf test—we do not model the semantics for vectors. 0 v = [vd, t[vv 7→ vf ]]m v = [vd, {}]m ρ ` (vf v ) ⇓ v The key component of the T-IsA rule is the IsAProps metafunc- ρ ` (defmethod e e0 e ) ⇓ v ρ ` (defmulti τ e) ⇓ v ρ ` (e e0) ⇓ v tion (Figure 9), used to calculate the propositions for isa? tests. f −→ GM(t, ve) = v if v = {v } T-ISA −→f fs f 0 0 where vfs = {vf |(vv, vf ) ∈ t and IsA(vv, ve) = true} Γ ` e : σ ; ψ +|ψ − ; o 0 GM(t, ve) = err otherwise Γ ` e : τ IsAProps(o, τ) = ψ+|ψ− Γ ` (isa? e e0): B ; ψ |ψ ; ∅ B-ISA + − ρ ` e1 ⇓ v1 IsA(v, v) = true v 6= C As an example, (isa? (class x) K) has the true and false proposi- ρ ` e2 ⇓ v2 IsA(C,C0 ) = true ` C <: C0 IsA(v1, v2) = v 0 tions IsAProps(class(x), (Val K)) = Kx |Kx , meaning that if this IsA(v, v ) = false otherwise expression produces true, x is a keyword, otherwise it is not. ρ ` (isa? e1 e2) ⇓ v The operational behavior of isa? is given by B-IsA (Figure 9). IsA explicitly handles classes in the second case. Figure 9. Multimethod Syntax, Typing and Operational Semantics 3.6 Multimethods To ease presentation, we present immutable multimethods, with dispatch function e. The expression (defmethod em ev ef ) ex- syntax and semantics given in Figure 9. defmethod returns a new tends multimethod em and to map dispatch value ev to ef in an extended multimethod without changing the original multimethod. extended dispatch table. The value [v, t]m is the runtime value of a Example 11 is now written multimethod with dispatch function v and dispatch table t. The T-DefMulti rule ensures that the type of the dispatch func- (let[path(defmulti[Any ->(U nil String)] class)] tion has at least as permissive a parameter type as the interface (let[path(defmethod path String[x]x)] type. For example, we can check the definition from our translation (let[path(defmethod path File[^Filex] above of Example 11 using T-DefMulti. (.getPathx))] (path "dir/a")))) ;=> "a" ` (defmulti σ class):(Multi σ σ0 ); tt|ff ; ∅ 0 tt tt tt tt The type (Multi σ σ ) characterizes multimethods with in- | 0 | S 0 where σ = x: > −−−→ τ and σ = x: > −−−−→ ( nil Class). terface type σ and dispatch function type σ . The expression ∅ class(x) (defmulti σ e) defines a multimethod with interface type σ and Since the parameter types agree, this is well-typed.

9 2015/2/28 For the purposes of our soundness proof, we require that all restrict(τ, σ) = ⊥ values are consistent. Consistency ensures that occurrence typing if 6 ∃v. ` v : τ ; ψ ; o 1 1 does not refer to variables hidden inside a closure. and ` v : σ ; ψ2 ; o2 −−−−−−−−→ σ restrict((S −→τ ), σ) = (S restrict(τ, σ)) Definition 1. v is consistent with ρ iff ∀ [ρ1, λx .e]c in v, if ` σ tt ff 0 0 0 restrict(τ, σ) = τ ` τ <: σ [ρ1, λx .e]c : τ ; | ; ∅ and ∀ o in τ, either o = ∅, or o = if 0 0 0 restrict(τ, σ) = σ otherwise π (x), or ρ(o ) = ρ1(o ). Our main lemma says if there is a defined reduction, then remove(τ, σ) = ⊥ if ` τ <: σ −−−−−−−−→ the propositions, object and type are correct. The metavariable α remove((S −→τ ), σ) = (S remove(τ, σ)) ranges over v, err and wrong. remove(τ, σ) = τ otherwise Lemma 1. If Γ ` e : τ ; ψ+|ψ− ; o, ρ |= Γ, ρ is consistent, and ρ ` e ⇓ α then either Figure 11. Restrict and Remove • ρ ` e ⇓ v and all of the following hold: 1. either o = ∅ or ρ(o) = v,

The T-DefMethod rule is carefully constructed to ensure we 2. either TrueVal(v) and ρ |= ψ+ or FalseVal(v) and ρ |= have a syntactic lambda expression as the right-most subexpres- ψ−, 0 0 0 0 0 0 sion. This way we can manually check the body of the lambda un- 3. ` v : τ ; ψ +|ψ − ; o for some ψ +, ψ − and o , and der an extended environment as sketched in Example 12. We use 4. v is consistent with ρ, or IsAProps to compute the proposition for this method, since isa? is • ρ ` e ⇓ err. used at runtime in multimethod dispatch. We continue with the next line of the translation of Example 11. Proof. By induction on the derivation of the typing judgement. From the previous line we have Γ = (Multi σ σ0 ) , so path (Full proof given as lemma A.7). Γ ` (defmethod prop String λx> .x):(Multi σ σ0 ); tt|ff ; ∅ We can now state our soundness theorem. We know prop is a multimethod by Γ, so now we check the body Theorem 1 (Type soundness). If Γ ` e : τ ; ψ+|ψ− ; o and of this method. 0 0 0 0 0 0 ρ ` e ⇓ v then ` v : τ ; ψ +|ψ − ; o for some ψ +, ψ − and o Γ, > , String ` x : String ; tt|ff ; ∅ x x Theorem 2 (Well-typed programs don’t go wrong). If ` e : τ ; ψ |ψ ; o then 6` e ⇓ wrong. The new proposition Stringx is derived by + − IsAProps(class(x), (Val File )) = String |String . x x 5. Experience 0 The body of the let is checked by T-App because (Multi σ σ ) is Typed Clojure is implemented as a Clojure named core.typed. a subtype of its interface type σ. In contrast to Racket, Clojure does not provide extension points to Multimethod definition semantics are straightforward. B-DefMulti the macroexpander. To satisfy our goals of providing Typed Clojure creates a multimethod with the given dispatch function and an as a library that works with the latest version of the Clojure com- empty dispatch table. B-DefMethod produces a new multimethod piler, core.typed is implemented as an external static analysis with an extended dispatch table. B-BetaMulti invokes the dispatch pass that must be explicitly invoked by the programmer. Therefore, function with the evaluated argument to obtain the dispatch value, core.typed is in a sense a linter. and uses GM (which models Clojure’s get-method) to extract the This means that type checking is truly optional. On the positive appropriate method. The call to GM only returns a value if there side, core.typed is flexible to the needs of a dynamically typed is exactly one method such that the corresponding dispatch value programmer, encouraging experimentation with programs that may is compatible, using IsA, with the result of the dispatch function. not type check. On the negative side, programmers must remember Finally we return the result of applying the extracted method and to type check their namespaces, though since type checking is a the original argument. function call away, it is easily integrated as editor shortcuts or continuous integration. Also, programs cannot depend on the static 4. Metatheory semantics of Typed Clojure, meaning that type-based optimisation is impossible. If this were not the case, we could dispose of type- We prove type soundness follow using the same technique as Tobin- hints altogether, and simply use static types to resolve reflection. Hochstadt and Felleisen (2010). We also include errors and a wrong value and prove well-typed programs do not go wrong. 5.1 Further Extensions Rather than modelling Java’s dynamic semantics, we instead Datatypes, Records and Protocols Clojure features datatypes and make our assumptions about Java explicit. We concede that method protocols. Datatypes are Java classes declared final with public final and constructor calls may diverge or error, but we assume they can fields. They can implement Java interfaces or protocols, which are never go wrong. (Assumptions for other operations are given in the similar to interfaces but already-defined classes and nil may extend supplemental material). protocols. Typed Clojure can reason about most of these features, −−−−−→ including the ability to define polymorphic datatypes and protocols Assumption 1 (JVMnew). If ∀i. vi = Ci {fldj : vj } or vi = nil and utilising the Java type system to help check implemented inter- and vi is consistent with ρ then either face methods. −→ −→ −−−−−→ • JVMnew[C, [Ci], [vi ]] = C {fldk : vk} which is consistent with ρ, Mutation and Polymorphism Clojure supports mutable refer- −→ −→ ences with software-transactional-memory which Typed Clojure • JVMnew[C, [Ci], [vi ]] = err, or −→ −→ defines bivariantly—with write and read type parameters as in • JVMnew[C, [Ci], [vi ]] is undefined. the atomic reference (Atom2 Int Int) which can write and read

10 2015/2/28 E E update((HMap MA), ν, π :: keyk ) = (HMap M[k 7→ update(τ, ν, π)] A) if M[k] = τ E update((HMap MA), τ, π :: keyk ) = ⊥ if ` nil 6<: τ and k ∈ A E update((HMap MA), τ, π :: keyk ) = ⊥ if ` nil <: τ and k ∈ A E E update((HMap MA), ν, π :: keyk ) = (HMap MA) if k ∈ A P P update((HMap MA), τ, π :: keyk ) = (∪ (HMap M[k 7→ τ] A) if ` nil <: τ, (HMapP M (A ∪ {k}))) k 6∈ dom(M) and k 6∈ A P P update((HMap MA), ν, π :: keyk ) = (HMap M[k 7→ update(>, ν, π)] A) if k 6∈ dom(M) and k 6∈ A P P update((HMap MA), ν, π :: keyk ) = (HMap M[k 7→ update(>, ν, π)] A) −−−−−−−−−−→i −−−−−−−−−−−−−−−−−−−−−−−−−−→i S E S E update(( (HMap MA) ), ν, π :: keyk ) = ( update((HMap MA), ν, π :: keyk ) ) update(τ, (Val C), π :: class) = update(τ, C, π) update(τ, (Val C), π :: class) = update(τ, C, π) if 6 ∃C0 . ` C0 <: C and C0 6= C update(τ, σ, π :: class) = update(τ, Object, π) if ` σ <: Object update(τ, σ, π :: class) = update(τ, nil , π) if ` Object <: σ update(τ, σ, π :: class) = update(τ, nil, π) if ` σ <: nil update(τ, σ, π :: class) = update(τ, Object, π) if ` nil <: σ update(τ, ν, π :: class) = τ update(τ, σ, ) = restrict(τ, σ) update(τ, σ, ) = remove(τ, σ)

Figure 10. Type Update

(ann clojure.core/swap! the expectations of Typed Clojure. We hope to add support for (All[wrb ...] gradually typing in the future. [(Atom2wr)[rb ...b ->w]b ...b ->w])) 5.3 Case Study (swap!(atom:- Num1)+23) ;=> 6 (atom contains 6) CircleCI provides continuous integration services built with a mix- ture of open- and closed-source. Typed Clojure has been used at Figure 12. Type annotation and example call of swap! CircleCI in production Clojure systems for at least two years. CircleCI provided the first author access to the main closed- source backend system written in Clojure and Typed Clojure. We Int. Typed Clojure also supports , in- conducted a study of the effectiveness of Typed Clojure in prac- cluding Typed Racket’s variable-arity polymorphism (Strickland tice. There is no clear metric for quantifying typed Clojure code, et al. 2009), which enables us to assign a type to functions like since untyped code can be freely mixed and some seemingly typed swap! (figure 12), which takes a mutable atom, a function and ex- namespaces are not checked regularly. We manually type checked tra arguments, and swaps into the atom the result of applying the all namespaces that depend on clojure.core.typed and consid- function to the atom’s current value and the extra arguments. ered those with type errors as untyped. We then searched the re- maining typed code for unsafe Typed Clojure operations like var 5.2 Limitations annotations with :no-check and the tc-ignore macro, which in- struct Typed Clojure to ignore the specified code, and also consid- Java Arrays Java arrays are known to be statically unsound. Bracha ered those untyped. Furthermore, we manually collected and in- et al. (1998) summarises the approach taken to regain runtime spected all top-level annotations and classified them. soundness, which involves checking array writes at runtime. We determined that CircleCI has a Clojure code base of approx- Typed Clojure implements an experimental partial solution, imately 50,000 lines, including around 10,000 lines of typed code. making arrays bivariant, separating the write and read types into Out of 588 top-level var annotations, 270 (46%) were checked an- contravariant and covariant parameters. If the array originates from notations of functions defined in typed code, 129 (22%) annotations typed code, then we may track the write and read parameters stati- assigned types to external libraries and the remaining 189 (32%) cally. Currently arrays from foreign sources have their write param- annotated ‘unchecked’ user code. HMaps were a valuable feature, eter set to to ⊥, protecting typed code from writing something of with 38 (59%) out of 64 total type aliases featuring them; see ex- incorrect type. However there are currently no casting mechanisms ample 4 for an instance. to convince Typed Clojure the foreign array is writeable. Based on this and other interactions with Typed Clojure users, Array-backed sequences Typed Clojure assumes sequences are it is clear the main barrier to entry to Typed Clojure for large immutable. This is almost always true, however for performance systems is the requirement to annotate functions outside the borders reasons, sequences created from Java arrays (and Iterables) reflect of typed code. We conjecture that this can be addressed by making future writes to the array in the ‘immutable’ sequence. While dis- annotations available for popular libraries. turbing and a clear unsoundness in Typed Clojure, this has not yet been an issue in practice and is strongly discouraged as undefined behavior: “Robust programs should not mutate arrays or Iterables 6. Related Work that have seqs on them.” (Hickey 2015). Multimethods Millstein and Chambers and collaborators present Gradual typing Gradual typing ensures sound interoperability a sequence of systems (Chambers 1992; Chambers and Leavens between typed and untyped code by enforcing invariants of the 1994; Millstein and Chambers 2002) with statically-typed multi- type system via run-time contracts. Currently, interactions between methods and modular type checking. In contrast to Typed Clojure, typed and untyped Clojure code are unchecked which can violate in these system methods declare the types of arguments that they

11 2015/2/28 expect which corresponds to exclusively using class as the dis- gradual typing—ineraction between typed and untyped code is patch function in Typed Clojure. However, Typed Clojure does not unchecked and thus unsound. We hope to explore the possibilites attempt to rule out failed dispatches at runtime. of using existing mechanisms for contracts and proxies in Java and Clojure to enable sound gradual typing for Clojure. Record Types Row polymorphism (Wand 1989; Cardelli and Additionally, the Clojure compiler is unable to use Typed Clo- Mitchell 1991; Harper and Pierce 1991), used in systems such jure’s wealth of static information to optimze programs. Addressing as the OCaml object system, provides many of the features of this requires not only first enabling sound gradual typing, but also HMap types, but defined using universally-quantified row vari- integrating Typed Clojure into the Clojure tool chain more deeply, ables. HMaps in Typed Clojure are instead designed to be used so that its information can be passed on to the compiler. with subtyping, but nonetheless provide similar expressiveness, in- Finally, our case study and broader experience indicate that cluding the ability to require presence and absence of certain keys. Clojure programmers still find themselves unable to use Typed Dependent JavaScript (Chugh et al. 2012) can track similar Clojure on some of their programs for lack of expressiveness. This invariants as HMaps with types for JS objects. They must deal with requires continued effort to analyze and understand the relevant mutable objects, they feature refinement types and strong updates features and idioms and develop new type checking approaches. to the heap to track changes to objects. Typed Lua (Maidl et al. 2014) has table types which track Acknowledgements entries in a mutable Lua table. Typed Lua changes the dynamic Thanks to Andrew Kent and Andre Kuhlenschmidt for comment on semantics of Lua to accommodate mutability: Typed Lua raises drafts of this paper. a runtime error for lookups on missing keys—HMaps consider lookups on missing keys normal. The integration of completeness information, crucial for many References examples in Typed Clojure, is not provided by any of these systems. E. Allende, O. Callau, J. Fabry, E.´ Tanter, and M. Denker. Gradual typing for . Science of Computer Programming, 96:52–69, 2014. Java Interoperability in Statically Typed Languages Scala (Oder- G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler. Making the future sky et al. 2006) has nullable references for compatibility with Java. safe for the past: Adding genericity to the java . Programmers must manually check for null as in Java to avoid In OOPSLA, 1998. null-pointer exceptions. L. Cardelli and J. C. Mitchell. Operations on records. In Mathematical Other optional and gradual type systems In addition to Typed Structures in Computer Science, pages 3–48, 1991. Racket, several other gradual type systems have been developed re- R. Cartwright and M. Fagan. Soft typing. In Proc. PLDI, 1991. cently, targeting existing dynamically-typed languages. Reticulated C. Chambers. Object-oriented multi-methods in cecil. In Proc. ECOOP, Python (Vitousek et al. 2014) is an experimental gradually typed 1992. system for Python, implemented as a source-to-source translation C. Chambers and G. T. Leavens. Typechecking and modules for multi- that inserts dynamic checks at language boundaries and supporting methods. In Proc. OOPSLA, 1994. Python’s first-class object system. Typed Clojure does not support R. Chugh, D. Herman, and R. Jhala. Dependent types for . In a first-class object system because Java (and Clojure) have nominal Proc. OOPSLA, 2012. classes, however HMaps offer an alternative to the structural ob- Facebook. Hack language specification. Technical report, 2014. jects offered by Reticulated. Similarly, GradualTalk (Allende et al. 2014) offers gradual typing for SmallTalk, with nominal classes. Facebook. Flow language specification. Technical report, 2015. Optional types, requiring less implementation effort and avoid- R. Harper and B. Pierce. A record calculus based on symmetric concatena- ing any runtime cost, have been widely adopted in industry, includ- tion. In Proc. POPL, 1991. ing Hack, an extension to PHP (Facebook 2014), and Flow (Face- R. Hickey. The clojure programming language. In Proc. DLS, 2008. book 2015) and TypeScript (Microsoft 2014), two extensions of R. Hickey. Clojure sequence documentation, February 2015. URL http: JavaScript. These systems all support some form of occurrence typ- //clojure.org/sequences. ing, but not in the generality presented here, nor do they include the T. Lindahl and K. Sagonas. Practical type inference based on success other features we have presented. typings. In Proc. PPDP, 2006. J. M. Lucassen and D. K. Gifford. Polymorphic effect systems. In Proc. 7. Conclusion POPL, 1988. We have presented Typed Clojure, an optionally-typed version of A. M. Maidl, F. Mascarenhas, and R. Ierusalimschy. Typed lua: An optional Clojure whose type system works with a wide variety of distinctive type system for lua. In Proc. Dyla, 2014. Clojure idioms and features. Although based on the foundation of Microsoft. Typescript language specification. Technical Report Version 1.4, Typed Racket’s occurrence typing approach, Typed Clojure both 2014. extends the fundamental control-flow based reasoning as well as T. Millstein and C. Chambers. Modular statically typed multimethods. In applying it to handle seemingly unrelated features such as multi- Information and Computation, pages 279–303. Springer-Verlag, 2002. methods. In addition, Typed Clojure supports crucial features such M. Odersky, V. Cremet, I. Dragos, G. Dubochet, B. Emir, S. McDirmid, as heterogeneous maps and Java interoperability while integrating S. Micheloud, N. Mihaylov, M. Schinz, E. Stenman, L. Spoon, these features into the core type system. M. Zenger, and et al. An overview of the scala programming language The result is a sound, expressive, and useful type system which, (second edition). Technical report, EPFL Lausanne, Switzerland, 2006. when implemented in core.typed with appropriate extensions, T. S. Strickland, S. Tobin-Hochstadt, and M. Felleisen. Practical variable- suitable for typechecking significant amount of existing Clojure arity polymorphism. In Proc. ESOP, 2009. programs. As a result, Typed Clojure is already successful: it is S. Tobin-Hochstadt and M. Felleisen. Logical types for untyped languages. widely used in the Clojure community among both enthusiasts and In Proc. ICFP, ICFP ’10, 2010. professional programmers and recieves contributions from many M. M. Vitousek, A. M. Kent, J. G. Siek, and J. Baker. Design and evaluation developers. of gradual typing for python. In Proc. DLS, 2014. However, there is much more that Typed Clojure can pro- M. Wand. Type inference for record concatenation and multiple inheritance, vide. Most significantly, Typed Clojure currently does not provide 1989.

12 2015/2/28