<<

Submitted to the Ninth International Conference on Foundations of Object-Oriented Languages

First-Class Modules for Haskell Mark Shields Simon Peyton Jones Microsoft Research Cambridge {markshie, simonpj}@microsoft.com

Abstract gether towards this goal. • Record types, with fields of polymorphic type, dot no- Though Haskell’s module language is quite weak, its core tation, and the ability to use a single field name in language is highly expressive. Indeed, it is tantalisingly close distinct record types (Section 2.1). to being able to express much of the structure traditionally • Nested type declarations inside such records (Sec- delegated to a seperate module language. However, the en- tion 2.3). These nested declarations are purely syn- codings are awkward, and some situations can’t be encoded tactic sugar; there is nothing complicated. at all. • First-class universal and existential quantification (Sec- In this paper we refine Haskell’s core language to support tion 4.1). Together with record types, this allows us first-class modules with many of the features of ML-style conveniently to express the types of (generative) func- modules. Our proposal cleanly encodes signatures, struc- tors. tures and functors with the appropriate type abstraction and • A declaration-oriented construct for opening an type sharing, and supports recursive modules. All of these existentially-quantified value (Section 4.2), together features work across compilation units, and interact harmo- with a notation to allow opened types to appear in niously with Haskell’s class system. Coupled with support type annotations (Section 4.3). The standard approach for staged computation, we believe our proposal would be an is expression-oriented, which is unbearably clumsy in elegant approach to run-time dynamic linking of structured practice, whereas our construct works fine at the top code. level (Section 4.4). Our work builds directly upon Jones’ work on parameterised Taken individually, all of these ideas have been proposed signatures, Odersky and L¨aufer’s system of higher-ranked before. Our contribution is to put them all together in type annotations, Russo’s semantics of ML modules using a coherent design for a core language that can reasonably ordinary existential and universal quantification, and Oder- claim to compete with, and in some ways improve on, the sky and Zenger’s work on nested types. We motivate the ML brand leader. In particular, our system treats module system by examples, and include a more formal presenta- structures as first-class values, supports , and tion in the appendix. interacts harmoniously with Haskell’s constrained polymor- phism. From the module point of view, it separates signa- tures from structures, and offers type abstraction, generative 1 Introduction functors, type sharing, separate compilation, and recursive and nested signatures and structures. Our proposal is, at the There are two competing techniques for expressing the large- top-level, fairly compatible with Haskell’s existing module scale structure of programs. The “brand leader” is the two- system (though for clarity we shall bend the syntax some- level approach, in which the language has two layers: a core what in this paper). language, and a module language. The most sophisticated We present our system by a series of worked examples. A example of this structure is ML and its variants, but many more formal presentation may be found in the appendix. other languages, such as Haskell or Modula, take the same At the time of writing we have only just begun to estab- form, only with weaker module languages. lish the formal properties of our system. We have, however, In the last few years, however, the core language of (extended implemented a prototype interpreter which is available for versions of) Haskell has become very rich, to the point where download, and hope to merge these extensions into GHC, a it is tantalisingly close to being able to compete in the large- production Haskell compiler. scale-structure league. If that were possible, it would of course be highly desirable: it would remove the need for a second language; and it would automatically mean that 2 Concrete Modules as Records modules were first-class citizens, so that functors become ordinary functions. Following Jones [6], we encode interfaces as parameterised The purpose of this paper is to show that, by bringing to- record types, and implementations as records. Haskell al- gether several separate pieces of existing work, we can indeed ready has some record-like syntax for data constructors with bridge this final gap. More specifically, we propose several named arguments, and many Haskell implementations allow more-or-less orthogonal extensions to Haskell that work to- these fields to be assigned a polymorphic type. However, our

1 requirements are more demanding, as we wish to share field mkListSet’ :: forall a . Eq a => Set a [] names between records, and allow nested type declarations. mkListSet’ = Set { So we begin by introducing a new form of type declaration. empty = [] add = \x xs -> x : filter ((/=) x) xs asList = id 2.1 Parameterised Records } Record types are introduced (only) by explicit declaration, and may be parameterised: By using the overloaded operator (/=) we have replaced the explicit parameterisation over the record EqR a with implicit record Set a f = { parameterisation over the class Eq a. empty :: f a add :: a -> f a -> f a Record fields may have polymorphic types: union :: f a -> f a -> f a record Monad f = { asList :: f a -> [a] fmap :: forall a b . (a -> b) -> f a -> f b } unit :: forall a . a -> f a bind :: forall a b . f a -> (a -> f b) -> f b (Note that f has kind Type -> Type.) Equality between } record types is nominal rather than structural. Unlike Haskell, a single field name may be re-used in different record Such records may be constructed and taken apart in the types. same way as before: Record terms are constructed by applying a record construc- tor to a set of (possibly mutually recursive1) bindings: listMonad :: Monad [] {- inferred -} listMonad = Monad { intListSet :: Set Int [] {- inferred -} fmap = map intListSet = Set { unit = \a -> [a] empty = [] bind = \ma f -> concat (map f ma) add = \(x :: Int) xs -> x : filter (/= x) xs } union = foldr add asList = id singleton :: a -> [a] } singleton x = listMonad.unit x

Record terms may be used within patterns, but we also sup- We do not permit subtyping or extensibility for records, de- port the usual “dot notation” for field projection: ferring such extensions to future work.

one :: [Int] {- inferred -} one = intListSet.asList 2.2 Type inference (intListSet.add 1 intListSet.empty) Type inference in this system is problematic. For example, As in Haskell, the type signature on a binding — such as consider: one :: [Int] — is optional; the system will infer a type for g = \m f x -> m.fmap f (m.unit x) one, but the programmer may constrain the type with a type signature. Since fmap may be a field name in many records, the type of Regarding a module as a record allows an ML functor to be m.fmap depends on the type of m — which we do not know. replaced by an ordinary function. For example: We avoid this, and other difficulties relating to higher-ranked record EqR a = { eq :: a -> a -> Bool } polymorphism, by placing imposing the binder rule: the pro- grammer must supply a type annotation for every lambda- mkListSet :: forall a . EqR a -> Set a [] bound, or letrec-bound, variable whose type mentions a record mkListSet eq = Set { type constructor. With the binder rule in place, it becomes empty = [] easy to share field names between distinct record types. The add = \x xs -> binder rule is somewhat conservative — a clever inference x : filter (\y -> not (eq.eq x y)) xs engine could sometimes do without such an annotation — asList = id but it ensures that the typability of a program does not } depend on the inference algorithm. We discuss alternative approaches in Section 6.

Since “functors” are ordinary functions, they integrate In practice, it may be tricky to give such a type annotation. smoothly with Haskell’s type class mechanism: In our example, the type of m is Monad α, where α is the type in which g is polymorphic. We provide two ways to 1This is another (cosmetic, but important) difference solve this, both of which have been validated by practical from Haskell 98 records. experience in GHC. First, we can suppply a type signature

2 for g rather than m: transClosure :: Rep -> Rep } g :: forall m a b . Monad m -> (a -> b) -> a -> m b g = \m f x -> m.fmap f (m.unit x) We may reference nested data and record constructors within terms by a similar projection syntax:

Here, we rely on the type checker to propagate the type leaf :: forall a . Set a^BinTree {- inferred -} annotation for g to an annotation for m, in the “obvious” leaf = Set^Leaf way — this statement can be made precise, but we do not edge :: Graph Int^Edge {- inferred -} do so here. Alternatively, g’s argument m may be annotated edge = Graph^Edge { from = 1; to = 2 } directly:

g = free t in \(m :: Monad t) f x -> Notice that type inference supplies the necessary type argu- m.fmap f (m.unit x) ments for Set and Graph. Record terms whose types contain nested types are con- Here the term free t in ... introduces a fresh type variable structed in the usual way: t standing for any type within the scope of a term. Thus g’s trivGraph :: Graph () {- inferred -} first argument may be assigned any type of the form Monad τ trivGraph = Graph { for some type τ. mkGraph = \vs es -> Rep [()] [Edge { from = (); to = () }] 2.3 Nested Type Declarations transClosure = \r -> r } Modules typically contain a mix of term-level and type-level declarations. Following Odersky and Zenger [13], we allow As for types, data and record constructors may be referred record declarations to contain nested type declarations: to relatively or absolutely. record BTSet a = { Our approach to nested types diverges from the usual treat- data BinTree = Leaf | Node BinTree a BinTree ment of ML-style nested modules in two critical ways. Firstly, we never allow record terms to contain type dec- empty :: BinTree larations. (Later we will allow type declarations within top- add :: a -> BinTree -> BinTree level implementations, but this is merely a syntactical con- } venience.) As a consequence, our system avoids entirely the need for any dependent types, and manifestly respects the phase distinction [3] between type checking and evaluation. A nested type may be projected from a type in much the The work of Odersky and Zenger [13] takes a similar ap- same way as a field may be projected from a term. For proach. Secondly, we never allow record types to contain various syntactical reasons, we write ^ instead of the usual abstract types, i.e., types which are named but whose def- . to denote type projection. For example, we may write: inition is hidden. (Again, we will later allow abstract type unitSet :: BTSet a -> a -> Set a^BinTree declarations within top-level interfaces, but again this is a unitSet set a = set.add a set.empty syntactical convenience.) Together, these restrictions mean that nested type declara- (^ binds weaker than type application.) Notice that there tions may always be flattened into a non-nested declarations. is another way of writing the signature for add in the above For example, our BTSet declaration may be rewritten: record declaration: data BTSet_BinTree a add :: a -> Set a^BinTree -> Set a^BinTree = BTSet_Leaf | BTSet_Node (BTSet_BinTree a) a (BTSet_BinTree a) Indeed, all four ways of writing add’s type signature are equivalent: referring to a type relatively (by relying on the record BTSet a = { type declarations currently in scope) is equivalent to refer- empty :: BTSet_BinTree a ring to it absolutely (by following a path from some top-level cmp :: a -> a -> BTSet_Cmp record type). add :: a -> BTSet_BinTree a -> BTSet_BinTree a Since records are just another type declaration, they may } also be nested within other records:

record Graph ver = { Jones [6] advocates not supporting nested types on the record Edge = { from :: ver; to :: ver } grounds that they may always be translated away in this data Rep = Rep [ver] [Edge] manner. We support them in our system because they are convenient, they subsume the usual namespace mechanism, mkGraph :: [ver] -> [edge] -> Rep and they turn out to be easily implemented.

3 3 Abstract Modules and Existentials stract, but equal types, known as the “diamond import prob- lem” [9] in the literature. For example, assume we have a We now turn our attention to one of the essential properties function which, given any implementation of sets, generates of a module language: the ability to hide implementation some additional “helper” functions: types. As we mentioned in the Introduction there are two main approaches to implementation hiding, which we briefly record SetHelp a f = { review in this Section. The classic approach is to use exis- unionAll :: [f a] -> f a tential types (Section 3.1), but the approach that has so far } been more successful in practice, exemplified by ML, uses dependent sums (Section 3.2). mkSetHelp :: forall a f . Set a f -> SetHelp a f mkSetHelp set = SetHelp { unionAll = foldr set.union set.empty 3.1 Type Abstraction in Haskell }

In the intListSet example of Section 2.1 the representation Now consider constructing some set helpers for our abstract type of sets as lists was exposed. This is bad, because a intSet. Clearly we cannot simply write: client of the module could pass any list to the add operation, whereas the implementation of add will expect the set it is intSetHelp = mkSetHelp intSet passed to obey invariants maintained by the module (e.g. error: Type "AbsSet Int" is incompatible with the list has no duplicates). type "Set Int f" It has long been recognised that existential quantification provides an appropriate mechanism for hiding such a repre- One way to avoid this mismatch between AbsSet and Set sentation type [11]. Many Haskell implementations already is to write a version of mkSetHelp which works on abstract support existential types, allowing us to write:2 sets directly:

data AbsSet a = exists f . MkAbsSet (Set a f) data AbsSetHelp a = exists f . MkAbsSetHelp (SetHelp a f) intSet :: AbsSet Int {- inferred -} intSet = MkAbsSet intListSet mkAbsSetHelp :: forall a . AbsSet a -> AbsSetHelp a Consider typing the binding of intSet. Within the body mkAbsSetHelp absset of the MkAbsSet data constructor, f is bound to [], and so = case absset of the application MkAbsSet intListSet is well-typed. Outside MkAbsSet set -> of the AbsSet constructor, the existential quantifier over f MkAbsSetHelp (mkSetHelp set) hides this binding3. Programs wishing to use the operations of intSet must first Notice that we had to introduce (another) datatype, “open” the existential quantification using a case expres- AbsSetHelp, to hide the representation of sets in SetHelp. sion: Using this function, we may now write:

one :: [Int] {- inferred -} intSetHelp :: AbsSetHelp Int {- inferred -} one = case intSet of intSetHelp = mkAbsSetHelp intSet MkAbsSet s -> s.asList (s.add 1 s.empty) However, intSet and intSetHelp may never be mixed, de- Typing the arm of the case involves checking that feating the whole purpose of mkAbsSetHelp: the term s.asList (s.add 1 s.empty) is well-typed un- main = case (intSet, intSetHelp) of der the assumption s :: Set Int F for any type con- (MkAbsSet s, MkAbsSetHelp h) -> structor F . Equivalently, we must check the term s.asList (h.unionAll \s -> s.asList (s.add 1 s.empty) has the polymorphic [s.add 1 s.empty, s.add 2 s.empty]) type forall f . Set Int f -> υ, for υ a type not containing error: arm of case is insufficiently polymorphic f. Often we wish to manipulate implementations containing ab- Somehow we must convey the information that a particular

2 set and its helpers share the same representation, without Somewhat confusingly, these implementations require exposing the representation itself. the keyword forall to be used in this situation rather than exists. The only solution to this problem within Haskell is to care- 3The alert reader may be alarmed by our use of ex- fully structure our program so that intSet is opened in a istential quantification over higher-kinded type variables. scope containing both the definition of intSetHelp, and all Haskell uses a simple but incomplete unification algorithm uses of these two terms which need to share their represen- for higher-kinded types which turns out to work very well in tation types: practice [4].

4 intSet = MkAbsSet intListSet end

two :: forall f . The binding of f to list in IntListSet is hidden by the Set Int f -> SetHelp Int f -> [Int] explicit signature coercion. Of course, the binding of a to two s h = s.asList (h.unionAll [s.add 1 s.empty, int is also hidden, even though this is probably not intended. s.add 2 s.empty]) Sharing of abstract types is expressed using manifest types: main = case intSet of module type SETHELP = MkAbsSet s -> let h = mkSetHelp s sig in two s h type a type ’a f These examples illustrate two serious drawbacks to the unionAll : (a f) list -> a f existential-type approach to type abstraction within Haskell: end

(i) We are forced to introduce an entirely spurious module type MKSETHELP = datatype (e.g., AbsSet) for every instance of type ab- functor (S : SET) -> straction. This datatype is simply there to tell the type (SETHELP with type a = S.a inference system where to expect existential types. type ’a f = ’a S.f)

(ii) More seriously, this spurious datatype must be stripped module MkSetHelp : MKSETHELP = away within a scope which covers all of the terms which functor (S : SET) -> need to share a particular implementation type. This struct is awkward in large programs, and impossible if uses type a = S.a of an abstract type must be split between compilation type ’a f = ’a S.f units. let unionAll = fold_right S.union end

3.2 Type Abstraction in ML/OCaml Here the sharing of types a and f in the argument and result of the MkSetHelp functor is made explicit by the with clause These drawbacks have led most module language designers in the functor’s type. to abandon the simple-minded approach to type abstraction through existential quantification in preference for strong or To sum up: In OCaml, all nested types are abstract un- translucent (dependent) sums [2] (the later are also known less explicitly made manifest, while in Haskell all type pa- as manifest types [7]). For example, in OCaml our abstract rameters are concrete unless explicitly hidden by existential set would be described by the signature: quantification.

module type SET = sig type a 4 Putting Existentials to Work type ’a f val empty : a f val add : a -> a f -> a f The dependent-sum approach to modular structure has val union : a f -> a f -> a f proved to be very fruitful in practice. Nevertheless, there val asList : a f -> a list are strong reasons for continuing to search for alternatives. end ML-style module systems can be extended to support both first-class and recursive modules but, although the details for these extensions have been worked out [16, 18], the re- Here the type constructor f is a nested type of SET which sulting system is dauntingly complicated. Furthermore, it is left abstract. In ML-based module languages, signatures would be difficult to adopt such a system for Haskell, be- are not parameterised, and nested types are abstract by de- cause the interaction with Haskell’s system of type classes is fault. A binding for f must be supplied in any structure entirely unclear. Indeed, no one has even attempted to work implementing signature SET: out the details for an ML-style module system supporting module IntListSet : SET = type classes. Lastly, there is an uncomfortable duplication struct of functionality between a rich core language and a rich mod- type a = int ule language; other things being equal, it would clearly be type ’a f = ’a list better to have a single layer. let empty = [] So, instead of abandoning existentials for dependent sums, let add = fun x xs -> we shall tackle head-on the two problems we identified with x :: filter (fun y -> y <> x) xs existentials: the need for spurious datatypes (Section 4.1), let union = fun xs ys -> fold_right add xs ys and the need to open existentials in a common scope (Sec- let asList = fun xs -> xs tion 4.4).

5 4.1 Type Inference for Higher-ranked Polymorphism forall b . b -> c -> b. Type checking the outer applica- tion reduces to checking

We would like to get rid of the spurious AbsSet forall b c . b -> c -> b ≤ forall a . a -> Int -> a that we were forced to introduce in Section 3.1. The data type served to tell the type inference engine where where we write ≤ to denote “subsumes.” The check proceeds to introduce existential quantification (at occurrences of the by skolemizing the right-hand side quantified variables: MkAbsSet constructor) and where to eliminate it (at case expressions that match MkAbsSet). forall b c . b -> c -> b ≤ a’ -> Int -> a’ where a’ skolem constant Instead, we would like to be able to use existential quantifi- cation freely within type schemes, without a mediating data then freshening the left-hand-side quantifier variables: type. For example, we’d like to write the intSet example directly, thus: b’ -> c’ -> b’ ≤ a’ -> Int -> a’ where a’ skolem constant intSet :: exists f . Set Int f intSet = Set { and, finally, unifying the result. Since [b’ 7→ a’, c’ 7→ Int] empty = [] is a most general unifier, the subsumption check succeeds. add = \x xs -> x : filter ((/=) x) xs We must extend the system of Odersky and L¨aufer in two asList = id ways. Firstly, we allow type schemes to arbitrarily mix uni- } versal and existential quantifiers. Though this adds no ex- pressive power4, it is a great aid when reporting type errors! Existential quantifiers must now be able to occur in the re- The subsumption of existentials is exactly dual to that of sult of a . For example, here are the types we universals. would like for mkListSet and mkListSet’, which we saw in Secondly, we must account for Haskell’s class constraints. In Section 2.1: particular, any quantifier may introduce a constraint, and we mkListSet :: forall a . may need to decide constraint entailment during subsump- EqR a -> exists f . Set a f tion checking. Consider our previous example amended to include class constraints:

mkListSet’ :: forall a . Eq a => (\(f :: forall a . Num a => a -> Int -> a) -> exists f . Set a f f 1 2) (\x y -> if x == x then x else undefined) The type signature for mkListSet expresses both that we can construct a set implementation from any equality on type a, To type check the outer application, the system must decide and that for each such equality the representation type of the subsumption: the result is abstract. That is to say, this type signature mimics the generative functor application of ML. (We shall forall b c . Eq b => b -> c -> b ≤ see in Section 4.2 that our system cannot mimic OCaml’s forall a . Num a => a -> Int -> a applicative functors [8], and instead requires all type sharing to be made manifest.) The Eq b constraint arises from the use of ==, and Num a from the type annotation on f. The check proceeds as be- Our system supports higher-ranked signatures such as these fore, skolemizing the right-hand side quantified variables, by adopting the system of type annotations of Odersky and freshening the left, to yield: and L¨aufer [12]. We extend the binder rule of Section 2.2 by requiring a type annotation on every lambda-bound, or Eq b’ => b’ -> c’ -> b’ ≤ Num a’ => a’ -> Int -> a’ letrec-bound, variable whose type uses existential or univer- where a’ skolem constant sal quantification. (Exception: in the case of letrec, when the universal quantification is at the top level, the anno- Then the constraint Num a’ is added to the set of “known” tation may be omitted, using the standard Hindley-Milner constraints: trick for recursive definitions.) Eq b’ => b’ -> c’ -> b’ ≤ a’ -> Int -> a’ The Odersky/L¨aufer system strictly generalises the type in- assuming Num a’, and a’ skolem constant ference algorithm used by those Haskell implementations al- ready supporting rank-two polymorphism. Type inference For the moment, the Eq b’ constraint is ignored, and the reduces to solving a set of subsumption constraints over left- and right-hand side types are unified to yield the most types with mixed prefix. For example, consider inferring general unifier [b’ 7→ a’, c’ 7→ Int]. Finally, the system the type of: must check that Num a’ `e Eq a’ (\(f :: forall a . a -> Int -> a) -> f 1 2) (\x y -> x) 4E.g. the rank-one existential exists a . τ(a) may be replaced by the rank-two universal The system discovers that \x y -> x has most general type forall b . (forall a . τ(a) -> b) -> b.

6 where `e denotes the constraint entailment relation. This is let open as follows6: true, since Eq is a superclass of Num. Hence the term is well typed. Γ ` u : exists ∆ . σ dom(∆) ∩ dom(Γ) = ∅ We have given only illustrative examples here, but the Ap- Γ ++ ∆, x : σ ` t : σ0 pendix gives the technical details. This type inference algo- Γ ` σ0 : scheme rithm is potentially more expensive than that used by ex- openlet 0 isting Haskell implementations. In particular, the expensive Γ ` let open x = u in t : σ operations of constraint simplification and generalisation oc- Notice how the existentially quantified type variables ∆ aris- cur, by default, for every step of type inference rather than ing from u are “lifted over” the binding for x, and become just once per let-bound term. We plan to investigate re- free (skolemized) type variables when checking the type of t. fining the inference algorithm to avoid these operations as The side condition on ∆ ensures each existentially quantified much as possible. type variable is indeed free—alpha-conversion may always be used to satisfy this condition. The check that σ0 is a well- formed type scheme prevents any type variable in ∆ from escaping the scope of x. For example, this term is ill-typed: 4.2 Opening Existentials let open s = intSet Now that we allow existentials to appear without a medi- in s.empty ating data constructor, we must find a replacement for the error: Skolemized type "f’" introduced in open of rˆole previously played by case. For example, recall from the "s" escapes scope of binding in type previous section that: "f’ Int" intSet :: exists f . Set Int f Without this restriction our system would be unsound:

Attempting to project from intSet directly would lead to a let f = \x -> let open y = type error: ((x, (== x)) :: exists a . (a, a -> Bool)) in y in (snd (f 1)) (fst (f True)) -- Crash! one = intSet.asList (intSet.add 1 intSet.empty) error: cannot project "empty" from term of non-record type "exists f . Set Int f" An alternative design would be to modify the typing rule for projection instead of that for let; in other words, make existential quantifiers transparent to projection. We prefer Motivated by Russo’s semantics for ML modules [17], we the present design because it makes explicit the generative introduce a variation of let which explicitly “opens” any nature of existential types. For example, the following term existential quantified type variables of the let-bound term: is (rightly) ill-typed, because it attempts to mix sets created one :: [Int] {- inferred -} from different equalities on Int: one = let open s = intSet inEq :: EqR Int -- Normal equality on integers in s.asList (s.add 1 s.empty) z2Eq :: EqR Int -- Equality mod 2

The keyword open indicates that the let-body let open s1 = mkListSet intEq s.asList (s.add 1 s.empty) should be type checked in let open s2 = mkListSet z2Eq assuming s :: Set Int f’, where f’ is a fresh (skolem) type in s1.asList s2.empty constant replacing the existentially quantified f in the type error: Incompatible types "Set Int f1" and of intSet5. By opening intSet explicitly we eliminate the "Set Int f2", where type variable "f1" existential quantifier on its type without compromising its arises from open of "s1", and type type abstraction: variable "f2" arises fromm open of "s2"

bad = let open s = intSet in s.add 2 [1] A limitation of our approach is that we cannot mimic the error: Incompatible types "f’" and "[]", where applicative functors of OCaml: type variable "f’" arises from open of let open s1 = mkListSet intEq "absIntSet" in let open s2 = mkListSet intEq in s1.asList s2.empty Writing Γ to range over type and kind contexts, and ∆ to error: ... range over kind contexts, we may write the typing rule for Even though (thanks to the absence of side effects) s1 and s2 are observationally equivalent, the considers 5Haskell’s existing monadic do notation also uses a bind- ing construct whose left-hand and right-hand side types dif- 6See Figure 14 for the actual rule, which this only ap- fer. proximates.

7 their implementation types to be distinct. All type sharing t = s in our system must be manifest; even this extreme case is rejected: The type “x!” where x is an in-scope term variable, denotes 7 let open s1 = intSet the type of x . This new type form can occur in any type. in let open s2 = intSet For example, in s1.asList s2.empty \x (y :: x!) . (x, y) error: ...

has type forall a . a -> a -> (a, a), since the annotation on y forces it to have the same type as x. It is illegal to take 4.3 Type Annotations for Opened Bindings the type of a variable of scheme type:

Recall again our running example: id :: forall a . a -> a {- inferred -} id = \x -> x intSet :: exists f . Set Int f \(f :: id!) . f 1 one :: [Int] {- inferred -} error: "id" has a type scheme as its type, and one = let open s = intSet cannot be dereferenced in s.asList (s.add 1 s.empty) Even this is not quite enough, however. Consider yet another Is it possible for the programmer to give a type signature version of our example: for s? The trouble is that, in the body of the let, s has type Set Int f’, where f’ is a fresh (skolem) type constant, let open s :: exists f. Set Int f and the programmer has no way to write such a thing. Yet open s = intSet such annotations are desirable for documentation reasons, in let unit = \x -> s.add x s.empty and will be absolutely essential when we come to top-level in s.asList (unit 1) bindings (see Section 4.5). Our solution is to add a new open form of type signature, How can we write the type of unit? If s has type Set Int f , dual to the open form of term binding: unit has type Int -> f Int. So we need to be able to refer to a component of s’s type. Again following Odersky and one = let open s :: exists f. Set Int f Zengler, we add another new type form, thus: open s = intSet in s.asList (s.add 1 s.empty) unit :: Int -> s!^f Int unit = \x -> s.add x s.empty The open type signature simply declares that s has the type obtained by opening (skolemizing) the type The “^f” projects the f-component out of the type applica- exists f. Set Int f. The type signature for s behaves ex- tion s!. We use the type-variable names from the original actly like any other type signature: it is optional, and may definition of Set (back in Section 2.1) as the “field names” constrain the type to be less polymorphic than the inferred for these type projections. type. These two new type forms give rise to a small algebra over However, we are not done yet. Suppose we write (rather types. For example, the following three types are all equal artificially): to Int:

let open s :: exists f. Set Int f Set Int []^a (1) open s = intSet (Int, Set Int [])^t2^a (2) in let t = s (Set Int [] -> Int)^arg^a (3) in s.asList (t.add 1 t.empty) In (1) we know record Set has a type parameter named How can we give a type signature to t? We cannot say: a, and this parameter is bound to Int in the application Set Int [] (recall type application binds tighter than ^). In open t :: exists f. Int -> f Int (2), we rely on the built-in type parameters t1, t2 etc. to t = s refer to the successive type arguments of the tuple type con- structor. Similarly, in (3) we rely on the built-in type pa- because that would introduce a fresh skolemized type f, dis- rameters arg and res to refer to the argument and result tinct from the one introduced by the type signature for s. types respectively of the function type constructor. Instead, we want to say “t has the same type as s”. Follow- We also allow a record field to be dereferenced. For example: ing some preliminary work of Odersky and Zenger [13], we 7 allow the programmer to say precisely that: Though this notation involves term variables in type ex- pressions, the type does not depend on the value of the term t :: s! variable, only on its type.

8 unit :: Int -> s!^empty! Int level implementation to be the body of a record, both ap- unit = \x -> s.add x s.empty pearing in a notional “cosmic” global scope. Top-level interfaces appear in “hsi” files. For example, file Here we say the result of unit has the same type as the Lists.hsi could look something like: empty field in the record type denoted by s!. module Lists where

4.4 Opening Top-Level Bindings data List a = Nil | Cons a (List a) map :: forall a b . (a -> b) -> List a -> List b So far we have not tackled the second of the two problems ... etc ... we identified in Section 3.1, namely that an existential must be opened over a scope that contains all terms that must share an implementation type. Indeed, we identified it as Such a file induces the type declaration in the “cosmic” the more serious of the two problems. scope: The design we have presented so far was carefully chosen to record Lists = { solve this problem as well. All that is needed is to allow open data List a = Nil | Cons a (List a) to be used for top-level bindings. map :: forall a b . (a -> b) -> List a-> List b Consider again the intSet and intSetHelp example of Sec- ... etc ... tion 3.1. Our improved support for existential quantification } eliminates the need for any spurious AbsSet and AbsSetHelp datatypes. By using open, we may also both open and bind Top-level implementations appear in “hs” files. Continuing intSet in a single top-level declaration: the above example, file Lists.hs could resemble:

open intSet :: exists f . Set Int f module Lists where open intSet = intListSet map = ... In the rest of the program, intSet has type Set Int f’, ... etc ... where f’ is a fresh skolem type constant. The mkSetHelp and two functions remain unchanged: This induces the “cosmic” term declaration:

mkSetHelp :: forall a f . Set a f -> SetHelp a f Lists :: Lists mkSetHelp set = SetHelp { ... } Lists = Lists { map = ... two :: forall f . Set Int f -> ... etc ... SetHelp Int f -> [Int] } two s h = ... In effect, we simply introduce a term variable, Lists, into With these definitions, we may now create setHelp directly: the initial type context with type Lists. Both interfaces and implementations may import other in- setHelp = mkSetHelp intSet terfaces. Interfaces supply enough type information to be main = two intSet setHelp able to type check implementations independently and in any order. Interfaces may be mutually recursive in their Looking at the type of mkHelpSet we see setHelp has type type declarations, subject to the usual rule that all recur- Set Int f’, and thus the application of two is well-typed. sion passes through a data or record constructor. Using type dereferencing, it is also possible to write mutually recursive In Section 4.2 we mentioned that, to preserve soundness, type signatures (see Section 5.1). skolemized type variables cannot escape the scope of the term variable which introduced them. Since the scope of a We must be a little more generous with “cosmic” record top-level binding is the entire program, this check is unnec- type declarations and record terms in order to support open 8 essary for opened top-level signatures. This is indeed fortu- in signatures and bindings, and instance declarations. nate, since separate compilation means that we may not be For example, we may have within Lists.hsi the declara- able to “see” the entire scope of the binding. tions:

4.5 Top-level Interfaces and Implementations 8Neither of these constructs make sense within arbitrary Haskell’s existing module system combines the implementa- records. Allowing open anywhere leads to unsoundness for tion of a module and its specification into a sin- the same reason given in Section 4.2. Allowing instance gle compilation unit. In our system we split these notions. declarations anywhere leads to local instance declarations Roughly speaking, we take a top-level interface to be the [19] and would be a profound change to Haskell’s type class body of a parameterless record type declaration, and a top- system.

9 open llen :: exists a . forall b . List b -> a isGT :: Int -> llen!^res -> Bool However, this may be awkward in practice, and impossible if x and y must be defined in separate compilation units. instance eqList :: Eq a => Eq (List a) A better solution is to allow type dereferences to be mutually recursive: Here we declare a function llen to calculate an abstract rep- resentation of a list’s length which function isGT may test. open x :: exists a . Pair a (y!^b) We also have an instance declaration which is named eqList x = Pair { fst = 1; snd = y.snd } so that it may later be reconciled against its definition. These declarations must have matching bindings within open y :: exists b . Pair (x!^a) b Lists.hs: y = Pair { fst = x.fst; snd = True }

open llen = \xs -> xs Notice the type-level recursion of x! and y! exactly mir- isGT n xs = case xs of rors the term-level recursion of x and y. Furthermore, even Nil -> n > 0 if x and y were defined in separate implementation files, Cons _ xs’ -> both their signatures would be visible to the compiler within if n > 0 then isGT (n - 1) xs’ their respective interface files. Thus these mutually recursive else False bindings may be type checked in isolation. instance eqList where ... We must be a little more restrictive on type-level recursion than term-level recursion. For example, all of the following bindings are rejected: Haskell’s existing module system allows top-level term and type bindings to be hidden. Our system supports a similar undefined :: undefined! mechanism, though for brevity we do not consider it here. undefined = undefined error: "undefined" has a cyclicly defined type

5 Working Out The Details pair :: Pair (pair!^b) (pair!^a) pair = Pair { fst = pair.snd; snd = pair.fst } In this section we complete our exposition by describing how error: "pair" has a cyclicly defined type existentials interact with recursion and type classes.

They must instead be annotated in the usual way: 5.1 Recursive Abstract Types undefined :: forall a . a Being a lazy language, Haskell allows top-level definitions to undefined = undefined be arbitrarily mutually recursive. In this section we consider how mutual recursion interacts with our type abstraction pair :: forall a . Pair a a mechanism. pair = Pair { fst = pair.snd; snd = pair.fst } Consider the recursive top-level definitions: Why are the bindings for x and y accepted, while those for record Pair a b = { fst :: a; snd :: b } undefined and pair rejected? Roughly speaking, though x and y are mutually recursive, their resulting values are fully x :: Pair Int Bool defined, and similarly their types. However, undefined and x = Pair { fst = 1; snd = y.snd } pair contain undefined elements, and hence their types in those positions remain undetermined. y :: Pair Int Bool y = Pair { fst = x.fst; snd = True } To deal with this, we typecheck a recursive binding group in five phases; we illustrate using the x, y example of above.

We now wish to hide the implementation types Int and Bool. 1 The first phase skolemizes the existentially quantified type Of course, for this example its easy to collapse the recursion variables of all opened definitions, producing an environ- into a single term: ment that gives the types of x and y: open xy :: exists a b . (Pair a b, Pair a b) open xy = let x = Pair { fst = 1; snd = y.snd } x :: Pair a’ (y!^b) y = Pair { fst = x.fst; snd= True } y :: Pair (x!^a) b’ in (x, y) Here, a’ and b’ are the skolem types introduced to instan- x :: xy!^t1 tiate a and b respectively. x = fst xy y :: xy!^t2 2 In phase 2, all types are rewritten to avoid any use of type y = snd xy dereference, type variable projection, and field projection.

10 Furthermore, relative types are rewritten to a canonical Hence, kind inference finds nothing amiss here! We could absolute form. We use a normal-order (call-by-name) eval- perform kind inference before rewriting by augmenting the uation strategy so as to accept as many recursively defined kind system with record kinds, but the additional complexity types as possible. A type of the form x! is rewritten to the does not seem justified. Though confusing, these types are type of x already in the environment (though care must harmless. be taken to detect cycles.) The above exposition also applies to recursive let bindings. In our example, we rewrite the type of x as follows: The only difference is that we must ensure no skolemized type variables escape the scope of the term as a whole. To Pair a’ (y!^b) ensure type inference remains complete in the presence of −→ Pair a’ ((Pair (x!^a) b’)^b) recursive bindings, we require that all letrec-bound vari- −→ Pair a’ b’ ables be type annotated if any single letrec-bound variable is opened. After rewriting our environment, we have:

x :: Pair a’ b’ 5.2 Type Classes and Existentials y :: Pair a’ b’ So far we have used existential quantification to hide every- thing about a type parameter: 3 Next, we perform standard kind inference for the types in the new environment, which for Haskell reduces to type open intSet :: exists f . Set Int f inference for a simply-typed lambda-calculus. However, by exploiting Haskell’s type class system we can 4 Next, we carry out standard type inference for the right- selectively expose information about abstract types. For ex- hand side of each binding, in the type environment com- ample, we can expose that f is a functor: puted by the earlier phases. In Haskell, type inference involves a weak form of higher-kinded kind-preserving uni- open intSet :: exists f . Functor f => Set Int f fication. Since all relative types have been normalized to intSet = intListSet an absolute form, the equality theory on types is free. With this signature for intSet we have two interfaces for Pair { fst = 1; snd = y.snd } :: Pair Int b’ sets of integers. We have already been using the first ex- Pair { fst = x.fst; snd = True} :: Pair a’ Bool plicit interface, which is simply the fields of Set reached via projection from intSet. The second implicit interface is pro- vided by the overloaded operators of class Functor. These operations may be used directly. For example, the following 5 Lastly, we check that each right-hand side does indeed term has type [Int]: have the claimed existentially quantified type: intSet.asList (fmap (+ 2) (intSet.add 1 intSet.empty)) Pair { fst = 1; snd = y.snd } :: exists a. Pair a b’ Pair { fst = x.fst; snd = True} Notice the use of the overloaded operator from the Functor :: exists b. Pair a’ b class: fmap :: forall f a b . Functor f => (a -> b) -> f a -> f b (Notice that we must, of course, rewrite the original exis- tential type signatures, just as in phase 2, to obtain the claimed types.) When type checking the binding of intSet, the system checks that Functor [] is satisfiable, then extends the A consequence of rewriting types (phase 2) before kind infer- known constraint context with Functor f’, where f’ is the ence (phase 3) is that our system admits some very dubious skolemized type corresponding to f in the signature for looking type annotations. For example: intSet. Hence the function fmap may be instantiated at type (Int -> Int) -> f’ Int -> f’ Int. record Pair a b = { fst :: a; snd :: b } In Haskell, an instance declaration allows the programmer strange :: (Pair, Int)^t1 Int Pair^a to make a new data type into an instance of a given class. strange = 1 For example:

In phase 2, the type annotation for strange is rewritten: data Age = MkAge Int

(Pair, Int)^t1 Int Pair^a instance Eq Age where −→ Pair Int Pair^a (==) (MkAge i) (MkAge j) = i == j −→ Int

11 Our open mechanism also introduces a new data type, the Finally, we borrowed some notation (but, as it turns out, skolemized type constant, so it makes sense to allow it, too, not the underlying type-theoretic machinery) from the work to be an instance of a class. For example: of Odersky and Zenger on Nested Types [13]. This notation allowed type annotations to capture type sharing of abstract open absEq :: exists a . (a -> a -> Bool) types by refering to the type of other term variables in scope. absEq = ((==) :: Int -> Int -> Bool) This aspect of our system is probably the most unusual.

instance Eq (absEq!^arg) where (==) = absEq 7 Conclusions

We tried to make each of our refinements as orthogonal as possible. That is to say, our proposal is not to add a monolithic module language to Haskell, but rather to refine 6 Related work Haskell’s core language with a number of features which, taken together, capture the desired expressiveness. Our system draws together the work of four separate sys- The biggest deficiency of our system is that programs are tems. Firstly, from Jones [6] work on Parameterised Sig- subject to non-local changes when making a previously con- natures we took the idea that, at heart, a module imple- crete types abstract. Not only must record types be changed mentation is just a record, and a module interface is just to parameterise over such types, but all uses of those record a record type with polymorphic fields parameterised over types must be similarly changed to encode the appropriate all its abstract types. The problems of type abstraction propagation of type information. This has long been used as and type sharing then became almost trivial: we used or- a justification for the move to dependent sum-based module dinary existential quantification to hide types, and ordinary systems [3, 1]. Hovever, we feel this will not be a problem in parametric polymorphism to capture type sharing. This ap- practice. After all, Haskell programmers have been getting proach avoided the need for dependent types, and thus au- by for years with a comparatively simple module system in tomatically respected a phase distinction between types and which only top-level type abstraction was possible. terms. To further simplify matters, we disallowed anony- mous records, and thus type equality for record types in our Our aim in this paper was to motivate each of our design system is nominal. decisions. Most of our effort to date has been invested in experimenting with a prototype compiler, which we have Secondly, we adopted the annotations-based type system of found to be an invaluable design tool. At the time of writing Odersky and L¨aufer [12] to allow higher-ranked polymorphic we have only just begun to establish the usual soundness and types to be used in conjunction with type inference of rank- completeness properties. one types. In particular, this system allowed us to write ex- istential quantifiers within the result type of functions, and We hope to transfer these ideas into GHC, an industrial- thus write Haskell functions which mimic ML functors. This strength Haskell compiler. We have also begun to explore system also allowed us to share field names of polymorphic an extension of our system with method constraints [19], and type between records without further complicating type in- believe this provides an expressive framework for interface- ference. A little care had be taken to extend this system oriented programming. with support for Haskell’s constrained polymorphism. Some existing Haskell implementations support rank-two polymor- Acknowledgements phism. Our extension of Odersky and L¨aufer’s system can be seen as a natural generalisation of the existing type inference We thank Claudio Russo for useful discussions. algorithm to arbitrary-ranked polymorphism. Another pos- sibility would have been to abandon Hindley/Damas/Milner- References style type inference in preference for local type inference [15, 14]. However, we felt that would have been too great a [1] K. Crary, R. Harper, and D. Dreyer. A type system for change for Haskell. higher-order modules. (To appear in POPL’02), Sept. Thirdly, we examined Russo’s semantics for ML signatures 2001. and structures [17] in order to understand how the dot notation of ML modules interacts with ordinary existen- [2] R. Harper and M. Lillibridge. A type-theoretic approach tial and universal polymorphism. As a result, we refined to higher-order modules with sharing. In Proceedings Haskell’s let construct so as to be able to optionally open of the Twenty-First Annual ACM Symposium on Prin- an existentially-quantified type within the scope of the let- ciples of Programming Languages, Portland, Oregon, binding. This new construct made it possible to access pages 123–137, Portland, Oregon, Jan. 1994. records of existential type using the dot notation, which in [3] R. Harper, J. C. Mitchell, and E. Moggi. Higher-order turn allowed records of abstract type to be used across com- modules and the phase distinction. In Proceedings of pilation units. With this refinement in place, we may view the Seventeenth Annual ACM Symposium on Principles our research agenda as one of refining Haskell to be as ex- of Programming Languages, pages 341–354, 1990. pressive as ML’s language of sematic objects [10], and argue this is almost as convenient as programming in ML’s module [4] M. P. Jones. A system of constructor classes: Overload- language directly. ing and implicit higher-order polymorphism. In Pro-

12 ceedings of the ACM SIGPLAN Conference on Func- [17] C. V. Russo. Types for Modules. PhD thesis, Depart- tional Programming Languages and Computer Architec- ment of Computer Science, The University of Edin- ture (FPCA’93), Copenhagen, Denmark, 1993. burgh, June 1998.

[5] M. P. Jones. Qualified Types: Theory and Practice. [18] C. V. Russo. Recursive structures for Standard ML. In Distinguished Dissertations in Computer Science. Cam- Proceedings of the sixth ACM SIGPLAN International bridge University Press, 1994. Conference on Functional Programming (ICFP’01), Firenze, Italy, pages 50–61, Sept. 2001. [6] M. P. Jones. Using parameterized signatures to express modular structure. In Proceedings of the 23rd Annual [19] M. Shields and S. Peyton Jones. Object-oriented ACM SIGPLAN-SIGACT Symposium on Principles of style overloading for Haskell. In First Workshop Programming Languages, St. Petersburg Beach, Florida, on Multi-language Inferastructure and Interoperability pages 68–78. ACM Press, Jan. 1996. (BABEL’01), Firenze, Italy, Sept. 2001.

[7] X. Leroy. Manifest types, modules, and separate com- pilation. In Proceedings of the Twenty-First Annual A Type Checking ACM Symposium on Principles of Programming Lan- guages, Portland, Oregon, pages 109–122. ACM Press, This section gives a brief overview of type checking for an Jan. 1994. idealized Haskell with our extensions. We offer this without any proofs of correctness: A proper account is in prepara- [8] X. Leroy. Applicative functors and fully transparent tion. higher-order modules. In Proceedings of the 22nd An- Figure 1 presents the syntax. In common with many other nual ACM SIGPLAN-SIGACT Symposium on Princi- module systems, we represent names by a pair, written X i , of ples of Programming Languages (POPL’95), San Fran- a base name X and a stamp i. This way it is always possible cisco, California, pages 142–153. ACM Press, Jan. 1995. to alpha-convert a binder without changing its base name, [9] D. B. MacQueen. Using dependent types to express which is significant when projecting types and fields. We as- modular structure. Proceedings of the 13th ACM Sym- sume stamps in source-program names begin as 0. (Actually, posium on Principles of Programming Languages, St. we could avoid stamps on class, term variable and instance Petersburg, USSR, pages 277–286, Jan. 1986. names since type substitution can never capture such names. However, we retain them here for uniformity.) [10] R. Milner, M. Tofte, R. Harper, and D. MacQueen. The Constructs marked “top-level only” may only appear within Definition of Standard ML (Revised). The MIT Press, record declaration bodies or record term bodies at the out- Oct. 1997. ermost level of a program. We assume a program is a set of unparameterised record type declarations (corresponding to [11] J. C. Mitchell and G. D. Plotkin. Abstract types have imported interfaces) and a single record term (corresponding existential type. ACM Transactions on Programming to the implementation being compiled). In practice, each Languages and Systems, 10(3):470–502, July 1988. of these elements would reside in a separate file. To pre- serve soundness our system must reject any record declara- [12] M. Odersky and K. L¨aufer. Putting type annotations tion which shadows a top-level interface name. However, we to work. In Proceedings of the 23rd Annual ACM allow shadowing in all other contexts. SIGPLAN-SIGACT Symposium on Principles of Pro- gramming Languages, St. Petersburg Beach, Florida, Type contexts (ranged over by Γ) map type variables, record pages 54–67. ACM Press, Jan. 1996. names, class names and term variables to their kinds, record bodies, class bodies and type schemes respectively. All four [13] M. Odersky and C. Zenger. Nested types. In Proceed- of these constructs have a separate name space. Kind con- ing of the Workshop on Foundations of Object-Oriented texts (ranged over by ∆) map only type variables to kinds. Languages (FOOL8), London, UK, Jan. 2001. We let dom(X ) denote the domain of X , whether it be a type context, kind context, or record type declaration body. [14] M. Odersky, C. Zenger, and M. Zenger. Colored lo- cal type inference. In Proceedings of the 28th Annual Figure 2 presents rules which simultaneously kind-check and ^ ^ ! ACM SIGPLAN-SIGACT Symposium on Principles of rewrite a type. Types of the form τ a and τ x are ex- Programming Languages (POPL’01), London, England, panded to the types they denote. Furthermore, all relative pages 41–53. ACM Press, Jan. 2001. types are rewritten as absolute types. This is achieved by recording, with each record and class declaration in scope, [15] B. C. Pierce and D. N. Turner. Local type inference. an optional type prefix denoting the scope at which the dec- In Proceedings of the 25th Annual ACM SIGPLAN- laration was introduced. When a relative record or class SIGACT Symposium on Principles of Programming type is looked-up in the environment, it is replaced by the Languages, San Diego, California, pages 252–265. ACM coresponding absolute type. Press, Jan. 1998. For example, in the declaration:

[16] C. V. Russo. First-class structures for Standard ML. record A a = { Nordic Journal of Computing, 7(4):348–374. record B b = { ... }

13 Γ ` τ : κ ,→ τ 0 Type var names a, b Type names A, B Instance names w Class names C, D Γ ` Int : Type ,→ Int Term var names x, y Integers/Stamps i, j Γ ` υ : Type ,→ υ0 Γ ` τ : Type ,→ τ 0 0 Kinds κ ::= Type | κ -> κ Γ ` (υ -> τ) : Type ,→ (υ0 -> τ 0)

Types τ, υ ::= Int | υ -> τ | ( τ ) Γ ` τ : Type ,→ τ 0 | ai υ | Ai υ | x i ! | τ^a υ | τ^A υ | τ^x! Γ ` ( τ ) : Type ,→ ( τ 0 )

Type schemes σ ::= forall ∆ . φ => σ i 0 | exists ∆ . φ => σ (a : κ -> κ ) ∈ Γ 0 | σ -> σ0 Γ ` υ : κ ,→ υ | τ Γ ` (ai υ) : κ0 ,→ (ai υ0) Primitive constraints φ ::= C i υ | τ^C υ Constraint scheme ξ ::= forall ∆ . φ0 => φ (record {τ^}opt Ai (ai : κ ++ bi : κ0) = { S }) ∈ Γ Record decl bodies S ::= record Ai ∆ = { S 0 } ; S Γ ` υ : κ ,→ υ0 i | x :: σ ; S Γ ` (Ai υ) : κ0 -> Type ,→ ({τ^}opt Ai υ0) (top-level only) | open x i :: σ ; S i (top-level only) | class φ => C ∆ where τ ; S (x i : τ) ∈ Γ (top-level only) | instance w :: ξ ; S Γ ` τ : Type ,→ τ 0 | ² Γ ` x i ! : Type ,→ τ 0 Record constructor P ::= Ai | P^A Γ ` τ : hθ | ∆ | Si Class constructor Q ::= C i | P^C (ai : κ0) ∈ ∆ Terms t, u ::= i | x i | \x i {:: σ}opt . t | t u Γ ` (θ ai ) υ : κ ,→ υ0 | ( t ) | let ( x i ) = u in t 0 0 | let {open}opt x i {:: σ}opt Γ ` (τ^a υ) : κ ,→ υ = u in t | (t :: σ) | free ∆ in t Γ ` τ : hθ | ∆ | Si i i i 0 0 | P { s } | t.x (record A (a : κ ++ b : κ ) = { S }) ∈ S 0 | ?Q Γ ` τ : Type ,→ τ 0 | record Ai ∆ = { S } in t Γ ` υ : κ ,→ υ opt i Record bodies s ::= {open} x = t ; s Γ ` (τ^A υ) : κ0 -> Type ,→ (τ 0^A υ0) (top-level only) | instance w i = t ; s | ² Γ ` τ : hθ | ∆ | Si (x i : υ) ∈ S Type var contexts ∆ ::= ∆, a i : κ | · Γ ` υ : Type ,→ υ0 opt i Type contexts Γ ::= record {τ^} A ∆ = { S }, Γ 0 Γ ` (τ^x!) : Type ,→ υ | class φ => {τ^}opt C i ∆ where τ, Γ Figure 2: Well-kinded types | ai : κ, Γ | x i : σ, Γ | ² A a. i Constraint contexts Ξ ::= w : ξ, Ξ | ² Note that the record declarations within the body of a pro- gram prog have no prefix, since they are not contained within i j Programs prog ::= record A = { S } ; B { s } any other record declaration. Hence we write prefix types as {τ}opt , and write · to denote the absent prefix. Figure 1: Syntax Figure 3 presents supporting rules for resolving a type into a record. Three components are derived: a substitution from the record’s type arguments to their actual values in the x :: B Int type; the kind context of the record’s arguments, and the } record type declaration body, suitably instantiated. If the given type is not a record, a built-in record is returned where possible. For example, given: the relative type B Int of x is rewritten to the absolute type A a^B Int, since the prefix for the declaration of B is the type record A a^C b c = { x :: (a, b, c) } ∈ Γ

14 Γ ` τ : hθ | ∆ | Si Γ ` φ : hφ0 | ∆ | τi

aj ∩ dom(Γ) = ∅ opt i 0 Γ ` Int : hId | · | ²i (class φ => {τ^} C aj : κ where τ ) ∈ Γ |κ| = |υ| Γ ` C i υ : h[aj 7→ υ] φ | aj : κ | [aj 7→ υ] τ 0i Γ ` (υ -> τ) : h [arg 7→ υ, res 7→ τ] | 0 0 0 arg : Type, res : Type | ² i Γ ` τ : hθ | ∆ | S i aj ∩ dom(Γ) = ∅ (class φ => C i aj : κ where τ 0) ∈ S 0 |κ| = |υ| Γ ` ( τ ) : h [t1 7→ τ1, . . . , tn 7→ τn ] | 0 t1 : Type, . . . , tn : Type | ² i Γ ` τ^C υ : h[aj 7→ υ] φ | aj : κ | [aj 7→ υ] τ i

opt i Figure 4: (record {τ^} A ai : κ = { S }) ∈ Γ Class corresponding to a primitive constraint ai ∩ dom(Γ) = ∅ dom(S) ∩ dom(Γ) = ∅ Γ ` φ ,→ φ constraint |υ| = |κ| i Γ ` (A υ) : h[ai 7→ υ] | ai : κ | ai 7→ υ] Si (class φ => {τ^}opt C i aj : κ where τ 0) ∈ Γ Γ ` υ : κ ,→ υ0 (x i : τ) ∈ Γ i opt i 0 constraint Γ ` τ : hθ | ∆ | Si Γ ` C υ ,→ {τ^} C υ i Γ ` x ! : hθ | ∆ | Si Γ ` τ : hθ0 | ∆0 | S 0i (class φ => C i aj : κ where τ 0) ∈ S 0 Γ ` τ : hθ | ∆ | Si Γ ` τ : Type ,→ τ 0 i 0 (a : κ ) ∈ ∆ Γ ` υ : κ ,→ υ0 Γ ` (θ ai ) υ : hθ0 | ∆0 | S 0i Γ ` τ^C υ ,→ τ 0^C υ0 constraint Γ ` (τ^a υ) : hθ0 | ∆0 | S 0i Figure 5: Well-kinded primitive constraints Γ ` τ : hθ | ∆ | Si (record Ai ai : κ = { S 0 }) ∈ S ai ∩ dom(Γ) = ∅ declaration bodies. This is little involved because of the com- dom(S 0) ∩ dom(Γ) = ∅ bination of recursive record bodies with type skolemization |υ| = |κ| on “opened” signatures. Γ ` (τ^A υ) : h[ai 7→ υ] | ai : κ | [ai 7→ υ] S 0i Figure 7 gives rules for collecting the skolemized type vari- ables of a record body. The result is the set of these type Γ ` τ : hθ | ∆ | Si variables, and a rewritten body in which each variable sig- (x i :: υ) ∈ S nature is augmented by its external type signature (which Γ ` υ : hθ0 | ∆0 | S 0i we write within braces). For ordinary (unopened) variables, 0 0 0 the external type signature is identical to the supplied (in- Γ ` (τ^x!) : hθ | ∆ | S i ternal) type signature. For opened variables, the external Figure 3: Record corresponding to a type type signature is a copy of the supplied signature with any outermost existential quantifiers removed. For example, the signature: we have: open x :: exists a b . C a b => Γ ` C Int Bool : h [b 7→ Int, c 7→ Bool] forall c . a -> b -> c | b : Type, c : Type | x :: (a, Int, Bool) i is rewritten to the signature:

Figure 4 is the analogue of Figure 3 for primitive constraints, open x :: exists a b . C a b => and allows the given primitive constraint to be resolved to a forall c . a -> b -> c class. Three components are derived: the superclass context { C a b => forall c . a -> b -> c } of the class declaration, the kind context of the class’s argu- ments, and the class’s body type, all suitably instantiated. assuming the type variables a and b are fresh. Figures 5 and 6 extends the kind-checking rules to primitive Figure 8 shows how to extract both the type context and constraints and type schemes in the obvious way. constraint context corresponding to a record body. This The next three figures formalize kind-checking record type judgement is parameterised by a type prefix representing

15 Γ ` σ ,→ σ0 scheme {τ}opt ` S : hΓ | Ξi

dom(∆) ∩ dom(Γ) = ∅ aj ∩ fv({τ}opt ) = ∅ ∀i . fv(φ) ∩ dom(∆) 6= ∅ {τ^}opt Ai aj ` S 0 : h | Ξ0i 0 Γ ++ ∆ ` φ ,→ φ constraint {τ}opt ` S : hΓ | Ξi Γ ++ ∆ ` σ ,→ σ0 scheme {τ}opt ` record Ai aj : κ = { S 0 } ; S : Γ ` forall/exists ∆ . φ => σ ,→ hrecord {τ^}opt Ai aj : κ = { S 0 }, Γ | Ξ0 ++ Ξi forall/exists ∆ . φ0 => σ0 scheme {τ}opt ` S : hΓ | Ξi Γ ` σ ,→ σ00 scheme Γ ` σ0 ,→ σ000 scheme {τ}opt ` {open}opt x i :: σ {φ => σ0} ; S : i 0 Γ ` (σ -> σ0) ,→ (σ00 -> σ000) scheme hx : σ , Γ | named(φ) ++ Ξi

opt Γ ` τ ,→ τ 0 : Type {τ} ` S : hΓ | Ξi opt i Γ ` τ ,→ τ 0 scheme {τ} ` class φ => C ∆ where τ ; S : hclass φ => {τ^}opt C i ∆ where τ, Γ | Ξi Figure 6: Well-kinded type schemes {τ}opt ` S : hΓ | Ξi Γ ` S ,→ h∆0 | S 0i skolem {τ}opt ` instance w i :: ξ ; S : hΓ | compname({τ}opt , w i ) : ξ, Ξi

dom(∆1) ∩ dom(Γ) = ∅ Γ ++ ∆1 ` S1 ,→ h∆2 | S3i skolem opt Γ ++ ∆2 ` S2 ,→ h∆3 | S4i skolem {τ} ` ² : h² | ²i i Γ ` record A ∆1 = { S1 } ; S2 ,→ i Figure 8: Extracting a type and constraint context from a h∆2 ++ ∆3 | record A ∆1 = { S3 } ; S4i skolem record body dom(∆) ∩ dom(Γ) = ∅ Γ ++ ∆ ` S ,→ h∆0 | S 0i skolem and class declaration so that relative types may be rewritten i Γ ` open x :: exists ∆ . φ => σ ; S ,→ to absolute types (see Figure 2). The constraint context col- h∆ ++ ∆0 | open x i :: exists ∆ . φ => σ lects all instance declarations, and all constraints appearing { φ => σ } ; S 0i skolem underneath existentially quantified type variables in opened signatures. The latter are assigned arbitrary fresh names Γ ` S ,→ h∆ | S 0i skolem using the function named. Γ ` {open}opt x i :: σ ; S ,→ For reasons which will soon become clear, instance declara- h∆ | {open}opt x i :: σ { σ } ; S 0i skolem tions must be added to the constraint context with a name which combines both the instance declaration name and the Γ ` S ,→ h∆0 | S 0i skolem prefix of its scope of definition. We assume compname is some injective function from a type prefix and an instance i Γ ` class φ => C ∆ where τ ; S ,→ name to a composite instance name. For example: h∆0 | class φ => C i ∆ where τ ; S 0i skolem compname(·, w^i) = w^i 0 Γ ` S ,→ h∆ | S i skolem compname(A, w^i) = A_w^i Γ ` instance w i :: ξ ; S ,→ h∆ | instance w i :: ξ ; S 0i skolem Figure 9 extends the kind checking judgments to record bod- ies which have already been rewritten by the rules of Fig- ure 7. Since record bodies are mutually recursive, and since types may de-reference any term variable in scope, when Γ ` ² ,→ h² | ²i skolem kind-checking a nested record body we must first extend the type context with the definitions of the record body itself Figure 7: Skolemizing opened type schemes (suitably alpha-converted). We may now turn our attention to type checking. Figure 10 the record body’s scope of definition. The type context col- presents constraint entailment for class constraints, which lects just the outmost declarations of the record body S. is entirely standard [5]. Figure 11 present rules for decid- Notice that only the external signature of variable declara- ing type scheme subsumption. They extend the system pre- tions is significant when forming type contexts. We record sented in [12] to handle existential types and constraints. the type prefix of the record body with each nested record Figure 12 presents rules to expand record constructors and

16 Γ | Ξ ` σ ≤ σ0 {τ}opt | Γ ` S ,→ S 0 record

aj ∩ dom(Γ) = ∅ opt i j 0 Γ | Ξ ` τ ≤ τ {τ^} A a ` S1 : hΓ | i 0 dom(Γ ) ∩ dom(Γ) = ∅ 0 Γ | Ξ ` σ ≤ σ1 opt i j j 0 1 {τ^} A a | Γ ++ a : κ ++ Γ ` S1 ,→ S3 record 0 Γ | Ξ ` σ2 ≤ σ2 opt i j {τ} | Γ ` record A a : κ = { S1 } ; S2 ,→ 0 0 Γ | Ξ ` (σ1 -> σ2) ≤ (σ1 -> σ2) i j record A a : κ = { S3 } ; S4 record ai ∩ dom(Γ) = ∅ Γ ` σ1 ,→ σ3 scheme Γ ` τ : κ Γ ` φ ,→ φ0 constraint Γ | Ξ ` [ai 7→ τ] σ ≤ σ0 Γ ` σ ,→ σ scheme 2 4 Ξ `e [ai 7→ τ] φ {τ}opt | Γ ` S ,→ S 0 record i 0 opt opt i Γ | Ξ ` forall a : κ . φ => σ ≤ σ {τ} | Γ ` {open} x :: σ1 {φ => σ2} ; S ,→ opt i 0 0 record {open} x :: σ3 {φ => σ4} ; S ai ∩ dom(Γ) = ∅ Ξ0 = named(φ) aj ∩ dom(Γ) = ∅ Γ ++ ai : κ | Ξ ++ Ξ0 ` σ ≤ σ0 Γ ++ aj : κ ` φ ,→ φ0 constraint i 0 Γ ++ aj : κ ` τ : Type ,→ τ 0 Γ | Ξ ` exists a : κ . φ => σ ≤ σ {τ}opt | Γ ` S ,→ S 0 record ai ∩ dom(Γ) = ∅ opt i j {τ} | Γ ` class φ => C a : κ where τ ; S ,→ Ξ0 = named(φ) 0 i j 0 0 class φ => C a : κ where τ ; S record Γ ++ ai : κ | Ξ ++ Ξ0 ` σ ≤ σ0 i 0 dom(∆) ∩ dom(Γ) = ∅ Γ | Ξ ` σ ≤ forall a : κ . φ => σ Γ ++ ∆ ` φ ,→ φ0 constraint i Γ ++ ∆ ` φ0 ,→ φ000 constraint a ∩ dom(Γ) = ∅ {τ}opt | Γ ` S ,→ S 0 record Γ ` τ : κ Γ | Ξ ` σ ≤ [ai 7→ τ] σ0 {τ}opt | Γ ` instance w i :: forall ∆ . φ0 => φ ; S ,→ e i instance w i :: forall ∆ . φ000 => φ0 ; S 0 record Ξ ` [a 7→ τ] φ Γ | Ξ ` σ ≤ exists a i : κ . φ => σ0

Figure 11: Type subsumption Γ ` ² ,→ ² record

Figure 9: Well-kinded record declaration bodies within terms elide all type arguments. (Of course our type inference system infers these types.) Γ | Ξ `p φ Figures 13 and 14 present the type checking rules. In com- mon with [12] we must generalise the type of nearly every p Γ | Ξ ` φ sub-term, in contrast to Hindley/Damas/Milner-based sys- i (w : φ) ∈ Ξ Γ ` φ : hφ0 | ∆ | τi tems in which only let-bound terms need be generalised. p p 0 This is accomplished using the `G judgement form. Each Γ | Ξ ` φ Γ | Ξ ` φi binding construct shadows any variables in scope with the record newrec Γ | Ξ `e φ same name. In rules and , we use the func- tion unstamp to reset to 0 the stamps of the context cor- responding to the record declaration body before using it i j 0 (w : forall a : κ . φ => φ) ∈ Ξ to extend the current context. This is because the alpha- aj ∩ dom(Γ) = ∅ conversion of record bodies during kind checking is not re- Γ ` τ : κ Γ | Ξ `p φ flected within terms, which continue to use names with a e j 0 e Γ | Ξ ` [a 7→ τ] φ default stamp of 0. Since all types within the type con- Γ | Ξ ` φ e text are absolute, and we prevent the shadowing of top-level Γ | Ξ ` [aj 7→ τ] φ module names, it is sound to shadow type declarations in project Figure 10: Constraint entailment this simple-minded way. In rule , we use the ex- ternal signature of the projected variable to determine the expression’s type. class constructors to types denoting a particular record or Record bodies are type checked by the rules of Figure 15. class. This is necessary because record and class constructors We assume record bodies are permuted to match their record

17 Γ | Ξ `G t : σ Γ ` P ,→ τ dom(∆) ∩ dom(Γ) = ∅ record {τ^}opt Ai aj : κ = { S } ∈ Γ Γ ++ ∆ | φ constraint Γ ` υ : κ Ξ0 = named(φ) i opt i Γ ` A ,→ {τ^} A υ Γ ++ ∆ | Ξ ++ Ξ0 ` t : σ gen Γ ` P ,→ τ Γ | Ξ `G t : forall ∆ . φ => σ Γ ` τ : hθ | ∆ | Si record Ai aj : κ = { S 0 } ∈ S Γ | Ξ ` t : σ ,→ T Γ ` υ : κ Γ ` P^A ,→ τ^A υ int Γ | Ξ ` i : Int Γ ` Q ,→ σ (x i : σ) ∈ Γ opt i var class => ^ j where i φ {τ } C a : κ τ ∈ Γ Γ | Ξ ` x : σ Γ ` C i ,→ forall a j : κ . {τ^}opt C i a => τ Γ ` τ : Type i j Γ i , x : τ | Ξ `G t : σ b ∩ dom(Γ) = ∅ \x abs Γ ++ bj : κ0 ` P ,→ τ Γ | Ξ ` \x i . t : (τ -> σ) Γ ` τ : hθ | ∆ | Si class φ => C i aj : κ where τ ∈ S Γ ` σ0 scheme i 0 j j Γ i , x : σ | Ξ `G t : σ a ∩ b = ∅ \x aabs i 0 0 Γ ` P^C ,→ forall a j : κ ++ bj : κ0 . τ^C aj => τ Γ | Ξ ` \x ::σ . t : (σ -> σ)

Figure 12: Expanding record and class constructors Γ | Ξ `G u : σ1 Γ | Ξ `G t : σ2 Γ | Ξ ` σ2 ≤ (σ3 -> σ4) Γ | Ξ ` σ ≤ σ declaration bodies. Checking instance bindings against in- 1 3 app stance declarations is somewhat involved, as we must ensure Γ | Ξ ` t u : σ4 not only that the supplied term is of the appropriate type, but also that the superclass context of the instance declara- Γ | Ξ `G t : σ tion’s class may also be satisfied. The later check requires Γ | Ξ ` σ ≤ τ the instance declaration under construction to be temporar- tup ( ) ( ) ily removed from the known constraint context. This is the Γ | Ξ ` t : τ reason we used compname to form the composite name of 0 Γ | Ξ `G u : σ the instance declaration in Figure 8. Γ | Ξ ` σ0 ≤ ( υ ) For example, assume we have the class declaration: Γ ++ x i : υ | Ξ ` t : σ \x i lettup class B a => C a where a -> Int Γ | Ξ ` let ( x i ) = u in t : σ

0 Γ | Ξ `G u : σ the instance declaration: i 0 Γ i , x : σ | Ξ ` t : σ \x let instance w :: forall a . A a => C a Γ | Ξ ` let x i = u in t : σ and the instance binding: Γ ` σ1 ,→ σ2 scheme Γ | Ξ `G u : σ3 instance w = t Γ | Ξ ` σ3 ≤ σ2 i Γ i , x : σ | Ξ ` t : σ \x 2 4 alet i Furthermore, assume t has type: Γ | Ξ ` let x :: σ1 = u in t : σ4

forall a b . A a => a -> b Figure 13: Well-typed terms (part 1 of 2)

We must now check that (i) t matches its instance declara- tion, that (ii) its type is compatible with that given by its accomplish all three of these tests by checking: corresponding class declaration, and (iii) that the superclass constraints of the class are satisfiable. Let Ξ be the known Ξ\w ` forall a b . (A a, B a) => a -> b ≤ constraint context. With a little cunning, its possible to forall a . A a => a -> Int

18 Notice that the constraint B a was added to the type scheme 0 for t so as to check (iii). This subsumption test succeeds if Γ | Ξ `G u : exists ∆ . φ => σ Ξ contains a constraint of the form: dom(∆) ∩ dom(Γ) = ∅ Ξ0 = named(φ) w’’ :: forall a . B a i 0 0 Γ\x i ++ ∆, x : σ | Ξ ++ Ξ ` t : σ Γ ` σ scheme If we didn’t remove w from Ξ, the above subsumption test openlet would also succeed if Ξ contains a constraint of the form: Γ | Ξ ` let open x i = u in t : σ w’ :: forall a . A a σ1 = exists ∆ . φ => σ2 dom(∆) ∩ dom(Γ) = ∅ This is because: Γ ++ ∆ ` φ ,→ φ0 constraint e w’ :: forall a . A a, w :: forall a . A a => C a ` B a Γ ++ ∆ ` σ2 ,→ σ3 scheme Γ | Ξ `G u : σ4 0 To avoid this circularity problem, we must temporarily re- Γ | Ξ ` σ4 ≤ exists ∆ . φ => σ3 move the instance declaration we are trying to type check Ξ0 = named(φ0) from the known constraint context. i 0 Γ\x i ++ ∆, x : σ3 | Ξ ++ Ξ ` t : σ5 Γ ` σ scheme Finally, Figure 16 shows how to type check an entire pro- 5 aopenlet gram. We first split the program into a record type dec- i Γ | Ξ ` let open x :: σ1 = u in t : σ5 laration body and a single term. The record declaration body is then skolemized, kind checked, and the result used Γ ` σ1 ,→ σ2 scheme to construct the initial type context. The term is then type Γ | Ξ `G t : σ3 checked assuming that, for each interface A, a variable A Γ | Ξ ` σ3 ≤ σ2 of type A is in scope. Since the term must correspond to annot an implementation, we require it has the appropriate record Γ | Ξ ` (t :: σ1) : σ2 type. a ∩ dom(Γ) = ∅ For brevity we have not formalized three aspects of the sys- Γ ` υ : κ tem. Firstly, we must detect cyclic types such as: Γ | Ξ ` [ai 7→ υ] t : σ free x :: x! Γ | Ξ ` free ai : κ in t : σ

Γ ` P ,→ τ This is easily achieved by keeping track of how many times Γ ` τ : hθ | ∆ | Si each variable has been dereferenced while rewriting a given τ ` S : hΓ0 | ²i type. Secondly, we must prevent top-level interface names Γ00 = unstamp(Γ0) from being shadowed, and must disallow the explicit con- 00 τ | Γ 00 ++ Γ | Ξ ` s : S struction of a record using a top-level interface constructor: \dom(Γ ) record Γ | Ξ ` P { s } : τ module A where record A = { ... } -- illegal Γ | Ξ `G t : σ1 a = A { ... } -- illegal Γ | Ξ ` σ1 ≤ τ Γ ` τ : hθ | ∆ | Si ({open}opt x i :: σ {φ => σ }) ∈ S Finally, we have not formalized abstract top-level types. 2 3 project These are handled by augmenting the interface with the local Γ | Ξ ` t.x : σ3 type declarations of the implementation. Abstract top-level types named in the interface are simply overwritten by their Γ ` Q ,→ σ class concrete declaration given in the implementation. The only Γ | Ξ ` ?Q : σ subtlety is checking the two type declarations have equal kinds. Ai 6∈ dom(Γ) Type inference for our system has been for- dom(∆) ∩ dom(Γ) = ∅ malized as a Haskell program, available from Γ ++ ∆ ` S ,→ h² | S 0i skolem 0 0 http://www.cse.ogi.edu/~mbs. This is easily transliter- · ` S : hΓ | ²i ated into a more familiar presentation as a set of inductive Γ00 = unstamp(Γ0) 00 0 00 relations. · | Γ\dom(Γ00) ++ ∆ ++ Γ ` S ,→ S record Γ, record Ai ∆ = { S 00 } | Ξ ` t : σ Γ ` σ scheme newrec Γ | Ξ ` record Ai ∆ = { S } in t : σ

Figure 14: Well-typed terms (part 2 of 2)

19 {τ}opt | Γ | Ξ ` s : S ` prog toplevel

{τ}opt | Γ | Ξ ` s : S ² ` record Ai = { S } ,→ h∆ | S 0i skolem 0 opt i 0 · ` S : hΓ | i {τ} | Γ | Ξ ` s : (record A ∆ = { S } ; S) · | ∆ ++ Γ ` S 0 ,→ S 00 record 00 0 opt · ` S : hΓ | Ξi {τ} | Γ | Ξ ` s : S 0 i i j j ∆ ++ Γ ++ A : A | Ξ `G B { s } : B opt i {τ} | Γ | Ξ ` s : (class φ => C ∆ where τ ; S) ` record Ai = { S } ; B j { s } toplevel

Γ | Ξ `G t : σ3 Figure 16: Well-formed programs Γ | Ξ ` σ3 ≤ σ1 {τ}opt | Γ | Ξ ` s : S {τ}opt | Γ | Ξ ` ({open}opt x i = t ; s) : opt i ({open} x :: σ1 {φ => σ2} ; S)

aj ∩ dom(Γ) = ∅ Γ ++ aj : κ ` φ : hφ00 | ∆ | τ 0i j 0 0 σ1 = forall a : κ . φ => τ Γ | Ξ `G t : σ2 j 0 000 00 σ2 = forall b : κ . φ => τ bj ∩ dom(Γ) = ∅ Γ ++ bj : κ0 ` υ : κ Γ ++ bj : κ0 ` υ0 : κ [aj 7→ υ] τ 0 = [bj 7→ υ0] τ 00 j 0 000 j 00 00 σ4 = forall b : κ . φ ++ [a 7→ υ] φ => τ Γ | Ξ\compname({τ}opt ,wi ) ` σ4 ≤ σ1 {τ}opt | Γ | Ξ ` s : S {τ}opt | Γ | Ξ ` (instance w i = t ; s) : (instance w i :: forall a j : κ . φ0 => φ ; S)

{τ}opt | Γ | Ξ ` ² : ²

Figure 15: Well-typed record bodies

20