<<

First-class labels for extensible rows Technical report: UU-CS-2004-051

Daan Leijen Institute of Information and Computing Sciences, Utrecht University P.O.Box 80.089, 3508 TB Utrecht, The Netherlands [email protected]

Abstract ants based on the theory of qualified types. Unfortu- nately, as we show later in this article, type inference This paper describes a for extensible for their proposed extension is incomplete. Sulzmann records and variants with first-class labels; labels are acknowledges this problem and describes a record cal- polymorphic and can be passed as arguments. This in- culus with first-class labels as an instance of HM(X) creases the expressiveness of conventional record calculi [27, 28, 29], but this calculus is not extensible and lacks significantly, and we show how we can encode intersec- a general compilation method. Shields and Meijer intro- tion types, closed-world overloading, type case, label duce an intriguing label-less calculus, λtir, where records selective calculi, and first-class messages. We formally with first-class labels can be encoded with opaque types motivate the need for row equality predicates to express [25, 24]. However, due to general type equality con- type constraints in the presence of polymorphic labels. straints, λtir is a very complicated system. We wanted This naturally leads to an orthogonal treatment of unre- to explore an alternative point in the design space with stricted row polymorphism that can be used to express a simpler calculus where labels are explicit; indeed, we first-class patterns. can express many of the motivating examples of λtir Based on the theory of qualified types, we present an without having to work from first principles. effective type inference algorithm and efficient compila- We base our calculus directly on the original record tion method. The type inference algorithm, including system of Gaster and Jones, with two important tech- the discussed extensions, is fully implemented in the nical differences: we introduce row equality predicates experimental Morrow compiler. to express required type constraints in the presence of polymorphic labels, and we move the language of 1 Introduction rows into the predicate language to ensure complete- ness of type inference. Furthermore, our system is the Records and variants provide a convenient way to con- first to naturally describe row polymorphic operations, struct data types. Furthermore, record calculi can be and it completely avoids the complex unification prob- used as a foundation for features as objects and mod- lems, and resulting restriction to non-empty rows, as ule systems. Over the years, many aspects of record described by Gaster [4]. Being able to use unrestricted calculi have been studied, including polymorphic exten- row polymorphism, we can express first-class extensible sion, record concatenation, type directed compilation, patterns and views [30]. and even label-less calculi. We have fully implemented our type inference algo- The holy grail [6] of polymorphic extensible record rithm, including discussed extensions like row polymor- calculi are first-class labels, where labels are polymor- phic operations, in the experimental Morrow compiler phic and can be passed as arguments. First-class labels [12]. Indeed, Morrow can infer all the types of the ex- increase the expressiveness of conventional record cal- amples in this paper. However, at the time of writing, culi significantly, and we will show how we can encode the code generator for Morrow is not yet complete. many interesting programming idioms, including inter- In our calculus, row terms are no longer part of the section types, closed-world overloading, type case, label type language but are only present in the predicate lan- selective calculi, and first-class messages. guage, to ensure completeness of type inference with One of the earliest works on first-class labels was polymorphic labels. By not folding labels back into done by Gaster and Jones [5, 4]. They describe a poly- the language of types, we also reduce the complexity morphic type system for extensible records and vari- of improvement with respect to λtir. Unfortunately, the improvement algorithm, and thus the test for satisfia- (|l1 :: τ1, ..., ln :: τn|) ≡ (|l1 :: τ1 |...| (|ln :: τn |(||)|)...|) bility, ambiguity, and entailment, is still exponential. (|l1 :: τ1, ..., ln :: τn |r|) ≡ (|l1 :: τ1 |...| (|ln :: τn | r |)...|) However, just like normal type inference, we believe The fields of a row are distinguished by their label and that the worst case is unlikely to occur in practical pro- not by their position, and we consider rows equal up to grams. This intuition is largely confirmed by practical permutation of their fields: experience with the Morrow compiler. In Section 2 we give an overview of conventional ex- (|l :: a, m :: b | r|) = (|m :: b, l :: a | r|) tensible records and variants `ala Gaster and Jones [5]. In Section 3 we extend this calculus with first-class poly- morphic labels, and we discuss many interesting exam- 2.2 Lacks contraints ples in the following section. Section 5 discusses first- For the purposes of this paper, we restrict ourselves to class patterns. Section 6 and 7 formally define our cal- rows without duplicate labels. Without this constraint, culus, and give the typing rules and an inference algo- fields can not be addressed unambiguously and, as a rithm. Section 8 describes simplification and improve- result, some programs can not be assigned a principal ment, and discusses complexity issues. We finish with type [31]. A particularly elegant approach to enforce a discussion of related work and the conclusion. uniqueness of labels are lacks (or insertion) constraints [5, 25]. A predicate (r\l) restricts r to rows that do not 2 Records and variants contain a label l. In general, a row extension (|l :: a | r|) is only valid when the predicate (r\l) holds. For clarity, There are two dual concepts to describe the structure of we always write the lacks constraints explicitly in this data types: products group data items together, while paper, but a practical system can normally infer them sums describe a choice between alternatives. Here are from row expressions automatically, which simplifies the two examples of both operations: type signatures a lot. type Point = Int × Int type Event = Char + Point 2.3 Record operations In most programming languages, we can name the com- A record interprets a row as a product of types. The ponents of a product and sum with a label. A labeled empty record is the only ground value with an empty product is called a record (or struct), while a labeled record type: sum is called a variant (or union, or ). We de- {} :: {} scribe records with curly braces {}, and variants with angled brackets hi: Furthermore, there are three basic operations that can be performed on records, namely selection, restriction, type Point = {x :: Int, y :: Int} and extension: type Event = hkey :: Char, mouse :: Pointi ( .l) :: ∀ra. (r\l) ⇒ {l :: a | r } → a We can see that both records and variants are described ( − l) :: ∀ra. (r\l) ⇒ {l :: a | r } → {r } by a sequence of labeled types, which we call a row. In {l = | } :: ∀ra. (r\l) ⇒ a → {r } → {l :: a | r } this article, we enclose rows in banana brackets (||). For convenience, we leave these out when the row is directly Note that we assume a distfix notation where argument enclosed by a record or variant brackets. For example, positions are written as “ ”. Furthermore, we explicitly the unabbreviated type of a Point is {(|x ::Int, y ::Int|)}. quantify all types in this paper, but practical systems can normally use implicit quantification. The three ba- 2.1 Extensible rows sic record operations are not arbitrary but based on the corresponding primitive operations on products in cate- Following Gaster and Jones [5], we consider an exten- gory theory or logic. Note that all type schemes contain sible row calculus where a row is either empty or an a predicate (r\l) to ensure the validity of row extension. extension of a row. The empty row is written as (||) and For repeated record extension on terms, we apply the an extension of a row r with a label l and type τ is same abbreviations as for row extension on types. The written as (|l :: τ | r|). Here is an example of a row that basic operations can be used to implement a number of can be used to describe coordinates: other common operations like update and rename: (|x :: Int | (|y :: Int | (||)|)|) {l := x | r} ≡ {l = x | r − l} To reduce the number of brackets, we use the following {l ← m | r} ≡ {m = r.l | r − l} abbreviations:

2 2.4 Variant operations Note that we write r[i] to select the ith field of a mem- ory block. The offset 1 is the evidence for the predicate A variant interprets a row as a sum of types. Dual to (|age :: Int, name :: String|)\male, that resulted from se- records, the basic operations are based on the corre- lecting the male field. It is resolved to 1 as male is sponding operations on sums in category theory. The larger than age but smaller than name. Gaster and primitives consist of the empty variant, injection, em- Jones describe the formal translation from lacks predi- bedding, and decomposition: cates to evidence [5] and we will not repeat that here. hi :: hi Note that common transformations like inlining can be hl = i :: ∀ra. (r\l) ⇒ a → hl :: a | ri used to optimize this expression. As such, the calcu- hl | i :: ∀ra. (r\l) ⇒ hri → hl :: a | ri lus combines the flexibility of dynamic offset resolution (l ∈ ? : ) :: ∀rab. (r\l) ⇒ hl :: a | ri → (a → b) with the efficiency of static offsets. → (hri → b) → b It is also straightforward to support efficient vari- ant operations. Variants are represented at runtime by Concrete implementations can provide special pattern tagged values. The evidence for a variant operation is matching syntax that use the primitive decomposition not interpreted as an offset but as the value of its tag, operator. Here is a short example to demonstrate basic and it enables constant time matching of variant values. record selection and a possible pattern matching syntax:

showEvent :: Event → String 3 First class labels showEvent e = case e of In the previous section, we have described a conven- hkey = ci → "key " ++ showChar c tional of record and variant operations. Labels are hmouse = pi → "mouse " ++ showInt p.x considered part of the syntax of the language and each ++ ", " ++ showInt p.y basic operation was described as a family of functions test = showEvent hkey = ’a’i parameterized by the labels. For example, it is not pos- sible to write a general select function:

2.5 Implementation select r l = r.l A na¨ıve implementation of records would use a dynamic As it stands, the l parameter is unrelated to the con- map from labels to values. Unfortunately, this prevents stant l label. The function simply selects the constant constant time access to the components of the record. label l, i.e. its type is: Ohori [18], Gaster and Jones [5], and Shields and Meijer select :: ∀rab. (r\l) ⇒ {l :: a | r} → b → a [25] represent records as contiguous blocks of memory where the type of the record is used to calculate the In this article we describe an extension where labels be- runtime offsets of its fields. come first-class values that can be passed as arguments. In the presence of polymorphic extensible rows, the On the type level, we introduce a new offset of a field must sometimes be passed at runtime. Lab, where a value of type Lab l is the term representa- This seems rather complicated at first, but, in a shin- tion of a label l. The basic record and variant operations ing instance of the Curry-Howard isomorphism, runtime are now primitive functions that take a label as an ar- offsets directly correspond to lacks predicates. That is, gument. For example, the types of record selection and each lacks predicate is translated to an implicit runtime restriction become: parameter that carries the evidence for that predicate, ( . ) :: ∀rla. (r\l) ⇒ {l :: a | r} → Lab l → a namely an offset into the record. In the general treat- ( − ) :: ∀rla. (r\l) ⇒ {l :: a | r} → Lab l → {r} ment of qualified types, this is just an instance of evi- dence translation [9]. As an example, we consider the These functions are now polymorphic in the label l. following expression: Furthermore, the l parameter to the Lab constructor nicely captures the connection between the label value {male = True, age = 31, name = "Daan"}.male and the label type; without it, we can not express the After evidence translation, the record is represented by necessary typing constraints. A lacks predicate is now a contiguous block of memory (31, True, "Daan"), where a binary constraint that takes a row and a label as ar- the order is determined by the lexicographical order of guments. the labels. The field is selected by passing evidence to a general selection function: (λev r → r[ev]) 1 (31, True, "Daan")

3 3.1 Label constants f r l m = r.l + r.m With labels as first-class citizens, we can create func- The type we could give to f using lacks predicates is tions that use labels passed as arguments. For example, not as general as we would expect, as it restricts the the zero function creates a record with two zero values labels l and m to be distinct: when given two distinct labels: f :: ∀rlm. ((|m :: Int | r|)\l, r\m) zero :: ∀lm. ((|m :: Int|)\l) ⇒ {l :: Int, m :: Int | r} → Lab m → Lab l → Int ⇒ Lab l → Lab m → {l :: Int, m :: Int} Sulzmann acknowledges this problem in his thesis [29] zero l m = {l = 0, m = 0} where he describes an instantiation of HM(X) [17] However, to use a function like zero, we need initial with first-class labels, called REC l. In this system, label constants. We write a constant label type l as only record update and selection are label polymorphic; @l. On the term level, we introduce a family of label record construction is still restricted to constant labels. constructors that are also written as @l: Due to this restriction, has predicates suffice to type all operations, and the type of f becomes: @l :: Lab @l f :: ∀rlm. (l :: Int ∈ r, m :: Int ∈ r) The zero function can now be called with specific con- ⇒ {r} → Lab m → Lab l → Int stant labels: Unfortunately, as discussed by Sulzmann [29], it is not origin :: {@x :: Int, @y :: Int} straightforward to extend HM(REC l) to label polymor- origin = zero @x @y phic record extension. The explicit annotation of label constants quickly be- We will instead use row equality predicates. The comes a burden as most labels are constant in practice. predicate (r ∼ s) restricts row r and s to equal rows. To simplify notation, we use the following constantifi- The row equality predicate arises naturally as the most cation rule in Morrow: an identifier that is only used general predicate from the row unification rule of Gaster in label positions, is regarded as a constant label. An and Jones [5]. This rule is approximately: identifier is in a label position when it is used as a label in a row or as a selector. The constantification rule is (l :: τ) ∈ s r ∼ (s − l) used for both labels in types and labels in terms. In the (|l :: τ | r|) ∼ s following example, the l label is a variable, while the i label is regarded as a constant label: As we show later, there is not always a unique unifier between rows in the presence of polymorphic labels, and f :: ∀l. ((|i :: Int|)\l) ⇒ Lab l → {l :: Int, i :: Int} row equality predicates can be viewed as a delayed unifi- f l = {l = 0, i = 0} cation. As is apparent from the row unification rule, has In Morrow, the quantification is implicit and the lacks constraints alone lose information; namely r ∼ (s − l). predicate is inferred from the record expression. The This property is sometimes stated as “has constraints type of f can therefore also be written as: encode only positive information”. This is no problem when only selection and update are label polymorphic, f :: Lab l → {l :: Int, i :: Int} but record extension can not be typed with has pred- The weakness of constantification is that it is possible to icates alone. In an unpublished report by Sulzmann invalidate code by adding a binding with an identifier [28], in addition to the has predicate, the distinct pred- that previously only occurred in label positions. We icate is introduced especially to capture the required have not found this a problem yet in Morrow, but more type constraints. In contrast, it is easy to express has practical experience is needed. Actually, many times predicates in terms of row equality: this is an advantage, as we could redefine a label with (l :: a) ∈ r ≡ s\l, (|l :: a | s|) ∼ r (fresh s) a simple top-level definition. Morrow uses the above equivalence to allow has predi- 3.2 Equality predicates cates as convenient syntactic sugar. However, for clar- First class labels seem an innocent extension at first. In- ity, we normally use explicit lacks and row equality deed the thesis of Gaster [4] describes first-class labels as predicates in this paper. Using row equality predicates, a straightforward extension of extensible records. How- we can now assign the following principal type to f : ever, first-class labels require a new predicate to enjoy f :: ∀rstlm. (s\l, t\m, r ∼ (|l :: Int | s|), completeness of type inference. For example, the fol- r ∼ (|m :: Int | t|)) lowing program is not typeable in a na¨ıve extension: ⇒ {r} → Lab l → Lab m → Int

4 Note that we can substitute r with either row to get an side, we remark that the type of intBool can also be equivalent type signature: written more conveniently as: f :: ∀stlm. (s\l, m\t, (|l :: Int | s|) ∼ (|m :: Int | t|)) intBool :: ∀la. ((l :: a) ∈ (|i :: Int, b :: Bool|)) ⇒ a ⇒ {l :: Int | s} → Lab l → Lab m → Int We have not done so yet to illustrate clearly that lacks predicates correspond to the runtime evidence. How- 4 Expressiveness of first-class labels ever, in the next sections we will freely use this notation as convenient syntactic sugar. This section shows that first-class labels do not only get rid of families of primitive functions, but also increase 4.2 Overloading the expressiveness of our calculus. We assume the defi- nition of a polymorphic label any: Using intersection types, we can encode a form of closed world type overloading. This view of overloading is any :: ∀l. Lab l adopted by many object-oriented languages, for exam- any = ⊥ ple in Java. It assumes that all definitions for an over- With strict semantics, the any label can also be defined loaded identifier are known, and requires that each defi- with an abstract phantom datatype for labels [13]: nition is declared at a distinct type. As an example, we show how to overload addition in Morrow. The example data Lab a = Lab String | Any is adapted from Shields [24] where a similar encoding any :: ∀l. Lab l was presented based on λtir. First, we declare a record any = Any that contains all primitive addition functions: Such definition also gives a straightforward implemen- type PlusRec = {int :: Int → Int → Int, tation for showing and comparing labels. float :: Float → Float → Float } plusRec :: PlusRec 4.1 Intersection types plusRec = {int = plusInt, float = plusFloat } Using the polymorphic any label, we can select an ar- The (+) operator simply selects one of these functions bitrary field from a record. This can be used to encode using any: an . For example: (+) :: ∀la. ((l :: a) ∈ PlusRec)) ⇒ a intBool :: ∀rla. (r\l, (|l :: a | r|) ∼ (+) = plusRec.any (|i :: Int, b :: Bool|)) ⇒ a Addition can now be used at different types: intBool = {i = 1, b = True}.any ok = (1 + 2, 1.0 + 2.0) The only possible instances of a are Int or Bool, depend- ing on the context. As such it can be seen equivalent Just like the previous example, the evidence passed to to the type Int ∧ Bool in an intersection type discipline (+) selects the appropiate addition function based on [19, 22]. Here is an example of the use of intBool: the type context. In conventional closed-world over- loading, each overloaded definition must be sufficiently n = if intBool then intBool + 1 else 0 monomorphic to resolve the overloading statically, but On first sight, it seems that even though the expression this is not necessary in our system. We show an ex- type checks, the expression will fail at runtime since any ample by Shields [24] where we overload the identifier is undefined. However, the label values themselves are nList over functions that return a list with either one not used by selection at runtime; only the evidence pro- or two arguments: vided by the (r\l) predicate is used to select the proper nList :: ∀labc. ((l :: a) ∈ (|one :: b → [b ], field. The two occurrences of intBool above each unify two :: c → c → [c ]|)) ⇒ a with a different type, and the predicate (r\l) is resolved nList = {one = λx → [x ], two = λx y → [x, y ]}.any to either (|i :: Int|)\b or (|b :: Bool|)\i, corresponding to the appropiate offset at runtime. After evidence trans- Note that we do not experience the simplification lim- lation, the example looks like: itations described by Shields [24] as we always use ex- plicit labels. Besides closed-world overloading, we also intBool ev = (λi r l → r[i]) ev (True, 1) any have the open-world view of overloading, as adopted by n = if (intBool 0) then (intBool 1) + 1 else 0 Haskell. The type of each definition is required to be The implicit evidence parameter ensures that the cor- an instance of a certain type scheme, but the defini- rect value is chosen based on the type context. On the tions do not have to be in scope at each point of use.

5 As suggested by Shields [24], we can encode open-world tion into a function that takes its arguments in any or- overloading using implicit parameters [14], but a full der. First, we define a record that holds both variants: discussion is beyond the scope of this paper. type Perm a b c = {normal :: a → b → c, flipped :: b → a → c } 4.3 Type case Next, we define a function that constructs a permuta- Using the same mechanism as with intersection types, tion record from a two-argument function: we can also encode a form of type case. Given a record of functions, we select any function and apply it to the permRec :: ∀abc. (a → b → c) → Perm a b c type case argument: permRec f = {normal = f , flipped = λx y → f y x} typecase :: ∀rlab. (r\l) ⇒ a → {l :: a | r} → b Finally, we use any to select either function: typecase x r = (r.any) x permute :: ∀labcd. ((l :: d) ∈ Perm a b c) Note that this differs from the usual notion of type case, ⇒ (a → b → c) → d since the result of the operation can also depend on the permute f = (permRec f ).any type of the argument. We shall see in Section 5 how Using permute, a function with two arguments of a dif- we can use row polymorphic operations to enforce that ferent type can take them in any order: all functions return the same result type. The function typecase does not perform any runtime checking; the square xs = let pmap = permute map evidence for the predicate r\l determines which function in pmap xs (λi → i ∗ i) is selected, which is resolved at compile-time and passed We can also apply permute to a function that takes two at run-time as a fixed offset. arguments of equal type. However, when such function is used, the type inferencer complains about unresolved 4.4 Label selective functions predicates since the lacks predicate can not be resolved As shown by Sulzmann [28], we can also encode a label unambigiously. This is a weakness of our approach; selective calculus along the lines of Garrigue [3]. In a deferring unifications may lead to an expressive system, label selective calculus, each function argument is asso- but as a result, errors are also reported late. ciated with a label. Effectively, the arguments form a record, but we can still use curried function application. 4.6 First-class messages As an example, we create a label selective function lmap In object-oriented languages, methods are invoked by from the standard map function: sending a message. Even though messages are central map :: (a → b) → [a ] → [b ] to object-orientation, they are not first-class citizens in map f xs = ... most object-oriented languages, and most languages re- sort to dynamic type checks in the presence of poly- lmap :: ∀lmabcde. ((|m :: e|)\l, morphic messages. As a result, there has been a lot of (|l :: d, m :: e|) ∼ (|fun :: a → b, list :: [a ]|)) research into static type systems for dynamic messages ⇒ (Lab l, d) → (Lab m, e) → [b ] [26, 15, 16]. First-class messages can also be viewed as lmap (l, x)(m, y) = let r = {l = x, m = y} first-class labels in a record of functions, and we can in map r.fun r.list directly encode the proxy example by Nishimura [16]: square xs = lmap (list, xs)(fun, λi → i ∗ i) type Ftp = (|put :: String → IO (), ...|) We use label polymorphic record extension here and the ftp :: {Ftp} type of lmap cannot be expressed with has constraints ftp = {put = ...} alone. Sulzmann [28] specifically introduces the distinct predicate to capture the necessary type constraints. proxy :: ∀rla. ((l :: a) ∈ r) ⇒ {new :: r → {send :: Lab l → a}} 4.5 Type selective functions proxy = {new = λobj → {send = λm → obj .m}} fproxy :: ∀la. ((l :: a) ∈ Ftp) ⇒{send :: Lab l → a} A type selective function determines the rˆoleof an ar- fproxy = proxy.new ftp gument based on its type. In general we can encode submit :: IO () functions that take arguments of a different type in any submit = fproxy.send @put "paper.ps" permutation. As an example, we implement a func- tion permute that transforms any two argument func- Of course, although we can encode first-class messages, we still lack the usual contravariance and

6 rules associated with object-oriented languages. In our This operation can be used to give a principal types calculus, the subtype relation between records is always to two general functions for decomposing variants and explicit through polymorphic row extension. records: case :: ∀ra. hri → {to a r} → a 5 First-class patterns recElim :: ∀ra. {r} → hto a ri → a Our basic system can be naturally extended with row The to operator nicely captures that for a case on a vari- polymorphic type operations. Using those operations, ant over row r, we need to provide a function for each we can express first-class patterns. Instead of introduc- field in r that transforms the corresponding value into ing special syntax for destructing variants as shown in a common type a, i.e. a record over the row (to a r)! Section 2.4, we just use a normal function case that Using the dual type operator from, with the obvious takes a record of functions as the pattern. Based on the definition, we can specify the dual operations for con- variant label, the case function applies the correspond- structing records and variants: ing function from the record to the variant value. Here varIntro :: ∀ra. a → hfrom a ri → hri is an example in Morrow, where we calculate the sum recIntro :: ∀ra. a → {from a r} → {r} of a list of integers: We are are also investigating the use of other type oper- newtype List a = List hnil :: {}, ators, like zip and join, but a full discussion is beyond :: {hd :: a, tl :: List a}i the scope of this article. sum :: List Int → Int Even though Gaster and Jones describe the to and sum (List xs) from type operators, they also mention severe technical = case xs {nil = λr → 0, difficulties with unification. When unifying two rows cons = λr → r.hd + sum r.tl} to τ r and from τ 0 r 0, the types τ and τ 0 are not related 0 Just like Haskell, Morrow uses a newtype declaration when r (and r ) are empty. To obtain most general to express recursive types. A newtype constructor like unifiers, Gaster must restrict the entire record system List only serves as a hint to the type inferencer and has to deal with non-empty rows only [4]. This complicates no runtime representation. The pattern that we use to the system greatly. For example, special has constraints match on the list variant is just a normal record, and are added to give a principal type to record selection. patterns become first-class polymorphic and extensible Furthermore, they lose some of the mathematical ele- entities. As a result, we can express views on data types gance as they are no longer able to express the empty [30]. Here is an example of a function snoc that gives a record and variant. reversed view on lists: Serendipitously, we found that row equality con- straints that were introduced to support principal types snoc xs r = case (reverse xs) r for first-class labels could also express the necessary last xs = snoc xs {nil = λr → ⊥, type constraints for row polymorphic operations. As cons = λr → r.hd} we will see during the formal development in Section 6, the language of rows is restricted to predicates only, Note that we can implement the case function very ef- and the above unification problem simply becomes a ficiently: the tag of the variant corresponds directly to deferred predicate: to τ r ∼ from τ 0 r 0. the offset in the record! Even though patterns are first class, a practical system should probably add syntactic sugar to support nested patterns. Conventional pattern 6 Typing rules matching also supports default patterns; this is much This section starts the formal development of our cal- harder to accomodate in our system as a single default culus. First the kinds, types and syntax are explained, pattern represents a family of fields [2]. followed by the type rules and inference. We also dis- Gaster and Jones [5] describe two dual type opera- cuss issues as improvement, satisfiability, simplification, tions, to and from, to give a principal type to row poly- and subsumption in more detail. morphic functions such as case. These type operations Our formal development is based fundamentally on are built into the type inferencer and are treated as key- the theory of qualified types [8, 9] and higher-order words in the language. The to operator is recursively polymorphism [11]. The theory of qualified types gives defined as: us a general framework for constrained type inference, to a (||) = (||) while higher-order polymorphism allows us to introduce to a (|l :: b | r|) = (|l :: b → a | to a r|) rows and labels as orthogonal language features.

7 6.1 Kinds By definition, the kind of rows r is always row. Fur- thermore, labels become first-class entities since type A kind system distinguishes between different kinds of variables αlab are included in the set of labels. As a type constructors and ensures well-formedness of types. notational convenience, we write l for constant labels The set of kinds is defined by the following grammar: c and lα for label variables. κ ::= ∗ the kind of value types When we define relations over rows, we denote syn- | row the kind of row types tactic label equality by using the same label name. Du- | lab the kind of label types ally, the relation (l 6= l0) holds when two labels are 0 0 | κ → κ the kind of type constructors syntactically unequal. We also define a relation l 6=c l that only holds when two constant labels are distinct. All terms have types of kind star. The row and lab kinds are used for row and label types and only occur at the l 6= l0 l ∈ C l0 ∈ C type level. The arrow kind is used for type constructors 0 l 6=c l like polymorphic lists. In constrast to the work of Gaster and Jones [5], we 6.2 Types need to define this new notion of inequality, as it is closed under substitution while syntactic inequality is We assume an initial set of type variables α ∈ A and not. This becomes important when proving that our type constants c ∈ C, and we assume that the initial set entailment relation is closed under substitution; a nec- of type constants C includes at least: essary requirement in the theory of qualified types [8]. Int ::: ∗ integers We consider rows equal up to permutation of their → ::: ∗ → ∗ → ∗ function space fields: { } ::: row → ∗ record constructor (|l :: τ, l0 :: τ 0 | r|) = (|l0 :: τ 0, l :: τ | r|) h i ::: row → ∗ variant constructor Lab ::: lab → ∗ label constructor The membership relation (l :: τ) ∈ r defines whether a For each kind κ we have a collection of constructors Cκ particular field (l :: τ) is an element of row r. Gaster of kind κ. The set of all constructors is given by the [4] defines this relation as: following grammar: 0 (l :: τ) ∈ r l 6=c l (l :: τ) ∈ (|l :: τ | r|) Cκ ::= cκ type constants (l :: τ) ∈ (|l0 :: τ 0 | r|) | ακ type variables 0 0 | Cκ →κ Cκ type application Although the relation corresponds to our intuitive no- tion of membership, it is no longer well-defined with τ ::= C∗ (mono) types respect to row equality. For example, we can derive: The set of value types τ is simply the set of construc- (c :: τ) ∈ (|c :: τ, α :: τ 0|) tors of kind star. Although we explicitly annotated all but not: constructors with kinds, this not necessary in practice, 0 as they can be inferred from type expressions [11]. (c :: τ) ∈ (|α :: τ , c :: τ |) since label unequality is not defined on label variables, 6.3 Rows i.e. (c 6=c α). A solution is to strengthen the mem- bership relation such that the first property no longer Notably absent from the constructors and type con- holds. First, we define the lacks relation as: stants are row expressions such as row extension; only 0 row variables are part of the constructors. We will see l∈ / r l 6=c l l∈ / (||) later that row expressions are limited to the set of pred- l∈ / (|l0 :: τ | r|) icates in order to ensure a sound translation to qualified types. The set of row expressions r is defined as: The membership relation is now strengthened by requir- ing that a label no longer occurs in the tail: ρ ::= αrow row variables lab 0 l ::= c label constants l∈ / r (l :: τ) ∈ r l 6=c l lab | α label variables (l :: τ) ∈ (|l :: τ | r|) (l :: τ) ∈ (|l0 :: τ 0 | r|)

r ::= ρ row variable A straightforward proof by induction on the rows shows | (||) the empty row that this relation is well-defined with respect to equality | (|l :: τ | r|) row extension on rows. Furthermore, it closed under substitution.

8 The restriction relation (r−l) removes a label l from π ∈ P (taut) a row r. Just like membership, we strengthen the rela- P `` π tion with respect to Gaster and Jones’ original defini- tion to make it well-defined over row equality: (lackEmpty) P `` (||)\l (|l :: τ | r|) − l = r if l∈ / r 0 (|l0 :: τ | r|) − l = (|l0 :: τ | r − l|) if l 6= l0 P `` r\l l 6=c l c (lackTail) P `` (|l0 : τ 0 | r|)\l Gaster proves various properties of these relations [4], including a nice law that relates membership and re- (eqEmpty) P `` (||) ∼ (||) striction:

(l :: τ) ∈ r ⇒ r = (|l :: τ | r − l|) (eqVar) P `` ρ ∼ ρ (l : τ) ∈ s P `` r ∼ (s − l) 6.4 Predicates (eqHead) P `` (|l : τ | r|) ∼ s Based on the theory of qualified types [9], we will use predicates to express necessary type constraints over Figure 1: Entailment row operations. A predicate π is either a lacks or a row equality predicate: induction. Furthermore, we can prove that entailment π ::= r\l lacks predicate is well-defined with respect to row equality. | r ∼ r0 row equality predicate In our calculus, row expressions are only part of the predicate language. This means that a practical system A lacks predicate (r\l) restricts a row r to rows that should view row expressions in types as syntactic sugar; do not contain a label l. An equality predicate (r ∼ s) indeed, we can view any row expression in a type as restricts the rows r and s to be equal (up to permuta- a row variable that is restricted to be equal to that tion). expression: Entailment relates two finite set of predicates P and Q. A derivation of P `` Q is a proof that when all pred- ... {l :: a} ... ≡ r ∼ (|l :: a|) ⇒ ... {r} ... (fresh r) icates in the finite set P hold, the predicates Q hold too. We can formalize lacks and row equality predi- 6.5 Qualified types cates using the entailment relation defined in Figure 1. We write P `` π as a shorthand for P ``{π}. The We use the same structure as in the theory of qualified first three entailment rules are standard. Note how the types [8] where we distinguish between qualified types ϕ rules on lacks predicates nicely correspond to our lacks and type schemes σ: relation defined in the previous section. The last three ϕ ::= τ | π ⇒ ϕ qualified type rules handle equality predicates. Rule (eqHead) is in- σ ::= ϕ | ∀α.σ quantified type or type scheme teresting: two non-empty rows are equal if the first field of the first row is an element of the second, and when The free type variables of a type scheme σ are written the remaining rows without this field are also equal. as ftv(σ). We overload this function on other objects in To make our calculus an instance of the theory of the obvious way. Since the order of the predicates and qualified types, we need to prove that the entailment quantified type variables does not matter, we sometimes relation is monotone, transitive, and closed under sub- use the following abbreviations: stitution. ∀α1 ... ∀αn.ϕ ≡ ∀α. ϕ Theorem 1 The instance relation in Figure 1 satisfies: π1 ⇒ ... ⇒ πn ⇒ τ ≡ π ⇒ τ ≡ P ⇒ τ • Monotonicity: Q ⊆ P ⇒ P `` Q. The term language of our calculus is an implicitly typed λ-calculus, extended with constants and let bindings: • Transitivity: P `` Q ∧ Q `` R ⇒ P `` R. e ::= x | k | e e0 | λx.e | let x = e in e0 • Closure: P `` Q ⇒ SP `` SQ, for any substitution S mapping type variables to type expressions. The constants k should include at least the basic record, variant, and label functions. Each constant k is also The first two properties are trivial as we use sets for assigned an appropiate (closed) type scheme σk. An predicates. The closure property is proved by laborious

9 (const) P | A ` k : σ k id α 6∈ ftv(C) α 6∈ ftv(C) C ∼ C [α7→C] [α7→C] (x : σ) ∈ A α ∼ C C ∼ α (var) 0 P | A ` x : σ C ∼U DUC0 ∼U UD0 U 0U P | A ` e : τ 0 → τ P | A ` e0 : τ 0 CC0 ∼ DD0 (→E) P | A ` e e0 : τ Figure 3: Kind preserving unification P | A, x : τ 0 ` e : τ (→I ) P | A ` λx.e : τ 0 → τ 7.1 Substitution and unification A substitution is an idempotent map from type vari- P | A ` e : π ⇒ ϕ P `` π (⇒E) ables α to constructors C, and which is the identity P | A ` e : ϕ function on all but a finite set of type variables. We write the empty substitution as id. A substitution that P ∪ {π} | A ` e : ϕ maps a type variable α to a constructor C is written as (⇒I ) P | A ` e : π ⇒ ϕ [α 7→ C]. Finally, we write TS for the composition of substitution T with S. For the purposes of this article, P | A ` e : ∀α.σ we restrict ourselves to kind preserving substitutions (∀E) P | A ` e :[α 7→ τ]σ where a type variable always maps to a constructor of the same kind. P | A ` e : σ α 6∈ ftv(A) ∪ ftv(P ) A substitution S is a unifier of two constructors C (∀I ) and C0 when SC = SC0. We call a substitution S of P | A ` e : ∀α.σ two constructors C and C0 most general when every unifier of these constructors can be written as TS for P | A ` e : σ Q | A , x : σ ` e0 : τ x some substitution T . As types are represented by simple (let) 0 P ∪ Q | A ` let x = e in e : τ Herbrand terms, we can use standard kind-preserving unification [11, 4] as shown in Figure 3. The expression Figure 2: Typing rules U τ ∼ τ 0 calculates a most general unifier U for types τ and τ 0. assumption A is a finite set of type assignments x : σ Theorem 2 The unification algorithm in Figure 3 re- in which no term variable x occurs more than once. turns a most general unifier when it exists. It fails pre- cisely when no such unifier exists. A ::= {x1 : σ1, . . . , xn : σn} Typothesis A proof of this result is given by Robinson [23]. With We write Ax to denote an assumption A where the the given unification algorithm, we can directly use the type assignment for x is removed. We also abbrevi- type inference algoritm for qualified types [9] as a type ate A ∪ {x : σ} as A, x : σ. Figure 2 shows the standard inference algorithm for our calculus. For completeness, typing rules for the theory of qualified types extended we include the inference algorithm in Figure 4. The type with the typing rule for constants. A typing judgement inference rules are interpreted as an attribute grammar P | A ` e : σ asserts that expression e has type σ when where each judgement of the form P | S | A `w e : τ is a the predicates P are satisfied and the types of the free semantic rule where the assumption A and expression variables are given by the assumption A. e are inherited, while the predicates P , substitution S, and the type τ are synthesized. 7 Type inference The (let w) rule uses a generalization function to quantify a qualified type. The generalization function This section discusses the standard inference algorithm quantifies over all free variables that are not present in by Jones [8] that calculates principal types with respect the assumption: to the typing rules of the previous section. Before show- ing the algorithm, we first look at substitution and uni- gen(A, ϕ) = ∀α.ϕ where α = ftv(ϕ) − ftv(A) fication. Jones proves that the given algorithm is both sound and complete with respect to the typing rules in Figure 2 [8]. Theorem 3 The algorithm in Figure 4 calculates a

10 (x : ∀α.P ⇒ τ) ∈ A to ∀sla. (s\l, r ∼ (l :: a | s)) ⇒ Lab l → a, where r is (var w) fresh(β) S = [α 7→ β] a monomorphic (scoped) type variable. SP | ² | A `w x : Sτ The second technical problem arises during unifica- tion. With constant labels, we can always find a single P | S | A `w e : τ Q | T | SA `w e0 : τ 0 most general unifier between two rows (if it exists). This makes it possible to extend Robinson unification with w U 0 (→E ) T τ ∼ τ → α fresh(α) row unification. However, with first-class labels there w U(TP ∪ Q) | UTS | A ` e e0 : Uα can be many general unifiers. For example:

w P | S | (Ax, x : α) ` e : τ fresh(α) (|α :: Int | β|) ∼ (|x :: Int, y :: Int|) (→I w) P | S | A `w λx.e : Sα → τ For this expression, there are two unifiers that can not w be written in terms of each other, namely: P | S | A ` e : τ σ = gen(SA, P ⇒ τ) w w 0 0 (let ) Q | T | SAx, x : σ ` e : τ [α 7→ x, β 7→ (|y :: Int|)] and [α 7→ y, β 7→ (|x :: Int|)] Q | TS | A `w let x = e in e0 : τ 0 The lack of unique most general unifiers has profound Figure 4: Type inference algorithm W implications with respect to efficient algorithms for sim- plifying predicates, and we discuss the unification of rows in more detail in the next section. principal type for a given expression e and assumption A. It fails precisely when no type exists for e under the assumption A. 8 Simplification and improvement By restricting row expressions to predicates, we have 7.2 Rows as predicates nicely sidestepped the problem of row unification in our calculus. Furthermore, we can use the standard We made rows part of the predicate language and use inference algorithm for qualified types to infer princi- standard Robinson unification to unify types. This pal types. Unfortunately, while technically correct, the means that we have deviated quite a from the ex- principal types do not take account of satisfiability of tensible records of Gaster and Jones. In their system, predicates. As a result, the inferred types are not al- row operations are part of the type language and the ways as accurate or simple as we may expect. Jones unification algorithm is extended to deal with row uni- describes a refined inference algorithm that takes satis- fication. There are two important technical problems fiability into account [10]. The new algorithm adds two with this approach in the presence of first-class labels. new rules: one for simplification and one for improve- First of all, we would lose completeness of type in- ment of predicates. ference when row extension is structural (i.e. part of the type language). Take for example the following def- 8.1 Simplification inition: Simplification allows us to replace a set of predicates in f :: ∀rsab. (s\x, s\y, r ∼ (|x :: a, y :: b | s|)) a type by another set of equivalent predicates. This can ⇒ {r} → (a, b) be used for example to discharge constant predicates. f r = let select l = r.l in (select @x, select @y) For example, the predicate (|x :: Int | r|)\y, can be sim- If row extension was part of the type language, the type plified to the equivalent predicate r\y, when x 6=c y. of f could not be inferred. When inferring the type of We write P ⇔ Q when P and Q entail each other, i.e. r.l, the record r would unify with {l :: a | s}. When P `` Q and Q `` P . The simplification rule is: generalizing the type of the body of select, namely w Lab l → a, the type variables l and a are free in the type P | S | A ` e : τ P ⇔ Q assignment for r in the assumption A. This would pre- Q | S | A `w e : τ vent generalization of l and a, making select monomor- phic in the label argument, which eventually leads to The addition of this rule makes the algorithm non- rejection of this example. deterministic, but the algorithm is sound as the inferred In our calculus, the row extension is part of the lan- types are equivalent [10]. In practice, we apply this rule guage of predicates. The expression r.l would not unify at generalisation. The rule is also used to detect type r with a row, but add a predicate r ∼ (|l :: a | s|) in- errors: during simplification, we can detect unsatisfiable stead. The type of select can now generalize properly constraints and emit an appropriate error message.

11 The algorithm for simplication in Morrow simplifies {id} ρ 6∈ ftv(r) ρ 6∈ ftv(r) constraints according to the entailment rules in Fig- (||) ∼ (||) {[ρ7→r]} {[ρ7→r]} ure 1. Full simplification of lacks constraints leads to ρ ∼ r r ∼ ρ constant evidence, while partial simplification of lacks Θ constraints introduces a coercion term that possibly in- (l : τ) ∈ s 0 creases the passed evidence. Θ00 = {θ0θ | θ ∈Θ, θr ∼Θ θs − θl, θ0 ∈Θ0} Θ00 8.2 Improvement (|l : τ | r|) ∼ s The improvement rule allows us to apply a substitution Figure 5: Row unifiers to the predicates, as long as the substitution does not change the satisfiable instances of a type. For example, fresh(β) α∈ / ftv(τ) the expression {x = 2}.x has the following principal (insVar) θ = [α 7→ (|l :: τ | β|)] type: {θ} (l :: τ) ∈ α ∀rsa. (s\x, r ∼ (|x :: Int|), r ∼ (|x :: a | s|)) ⇒ a Although correct, in a practical system we would rather Θ0 (l : τ)∼Θ (l0 : τ 0)(l :: τ) ∈ r infer the principal satisfiable type. If we would unify (insHead) Θ ∪ Θ0 the rows, we find an improving substitution, namely (l :: τ) ∈ (|l0 : τ 0 | r|) [a 7→ Int, s 7→ (||), r 7→ (|x :: Int|)]. When we apply this substitution and use simplification, we get the (ex- Figure 6: Inserters pected) principal satisfiable type: Int.

We write bP cP0 for the set of satisfiable instances of a predicate set P with respect to an initial predicate set 8.3 Row unification P0. It is defined as: Improvement in Morrow depends essentially on row uni- fication. As shown before, rows with label variables do bP cP0 = {SP | S ∈ Subst,P0 `` SP } not possess a most general unifier. We define a unifica- For the purposes of this article, the initial predicate tion algorithm between rows that derives a set of most set P is empty, and we will not write the subscript 0 general unifiers. Of course, these unifiers are not ‘most explicitly. We say that a substitution S improves P if general’ in the usual sense, but only most general for a bP c = bSP c, and when the free variables in S that do fixed permutation of row fields. not appear in P are ‘fresh’. The inference algorithm is Figure 5 shows the unification algorithm for rows. extended with the following improvement rule: Θ 0 w The expression r ∼ r finds the set Θ of most general P | S | A ` e : τ T improves P unifiers between two rows r and r0. Single substitutions w TP | TS | TA ` e : T τ are written as θ. Instead of failing when no rule applies, Jones proves that the inference algorithm is still sound we assume that row unification returns the empty set and complete when the simplification and improvement ∅ when no unification can be found. The first three rules are added [10]. However, the completeness of the unification rules are standard. The last rule finds all algorithm with respect to the typing rules is now defined inserters of a field in a row, and returns the cartesian in terms of principal satisfiable types. The inference product of the unifiers of the inserters with the unifiers algorithm will not always find the most general type of of the row tails. an expression, but it will find a most general type that The algorithm for finding inserters is defined in Fig- has satisfiable instances (if it exists). ure 6 and uses the field unification algorithm shown in Note that it is not a requirement to find optimal im- Figure 7. The rule (insVar) is standard. The rule (in- provements, even though it is desirable in practice. This sHead) takes the union between the unifiers of the head provides us with a flexible design space for exploring ef- field and the inserters of the tail. Note that field unifi- ficient algorithms for improving row equality predicates. cation returns either a singleton set or an empty set of We parameterize our inference algorithm with an im- unifiers. proving function impr such that impr(P ) improves P Using the unification of rows, we can now define an for any predicate set P . A trivial instance always re- improving function. Figure 8 defines an improving turns the identity substitution, impr(P ) = id. In the function impr(P ) that returns an improving substitu- ∼ next subsection, we discuss a more sophisticated algo- tion θ for the predicates P . We write P for the predi- \ rithm that is used by Morrow and that finds nearly cates P restricted to equality predicates, and P for the optimal improvements. restriction to lacks predicates. The function unis re-

12 τ ∼θ τ 0 equality predicates. Unfortunately, the algorithm is also (fieldEq) exponential in the number of polymorphic labels! This {θ} (l : τ) ∼ (l : τ 0) should not come as a surprise: Sulzmann shows that even for an weaker system of first-class labels, satis- 0 θ = [α 7→ l] θτ ∼θ θτ 0 fiability is NP-complete [29]. The worst case of our (fieldR) algorithm is formed by a row of n polymorphic labels {θ0θ} (l : τ) ∼ (α : τ 0) with polymorphic types that is unified with a row of constant labels. Each of the n polymorphic labels can 0 θ = [α 7→ l] θτ ∼θ θτ 0 unify with any of the n constant labels, and we find n! (fieldL) total unifiers. {θ0θ} (α : τ) ∼ (l : τ 0) However, we feel that there are some mitigating fac- tors. First of all, even though it is desirable to find Figure 7: Field unification ‘optimal’ improvements, this is not a requirement of the calculus. A practical implementation could resort unis(P ∼) = Θ to a more na¨ıve improvement algorithm if the optimal \ 0 (impr) single({θ | θ ∈ Θ,P0 `` θP }) = θ one takes too long. Secondly, in practice, the algorithm can be made more efficient by doing unambigious in- impr(P ) = θ0 sertions first. Morrow first transforms all the equality predicates to inserters and performs substitutions until (unis1 ) unis(∅) = {id} there are no more unambigious choices, before resorting to a brute-force enumeration of unifiers. Finally, just Θ 0 r∼ r like in normal Hindley-Milner type inference, practical 00 0 0 0 0 (unis2 ) Θ = {θ θ | θ∈Θ, unis(θP ) = Θ , θ ∈Θ } experience shows that the worst case does not seem to unis({r ∼ r0} ∪ P ) = Θ00 appear in normal programs. Even when it occurs, the number of labels tends to be low enough to be able to Figure 8: Improvement solve the improvement fast. However, experience with larger programs is necessary to validate this claim. turns the cartesian product of row equality unifications. In the rule (impr), the set of all row equality unifiers 9 Related work Θ is further restricted to those unifiers that satisfy the lacks constraints. Subtyping is one of the earliest approaches to record The function single reduces the resulting set of uni- type systems [1, 20]. Predicates include a subtype re- lation on rows, such that a row r is a subrow of r0 if r fiers to a single improving substitution. When the 0 unifier set is empty, the predicates are not satisfiable includes all fields of row r . The type of selection is: and an error message is emitted. When the unifier set ∀ra. (r 6 {l :: a}) ⇒ r → a consists of a single unifier, we can unambigiously im- prove the predicate set. A possible definition of single Unfortunately, with this approach information about therefore only succeeds when the choice is unambigious; the other fields of a row is lost, which makes it hard single({θ}) = θ. to describe operations like row extension. Cardelli and If more than one unifier is returned, there may still Mitchell [1] especially introduce an overriding operator be improving substitutions. In Morrow, we calculate on types to overcome this problem. Wand [31, 32] was the greatest common unifier of all the substitutions: the first to introduce row variables to capture row sub- single(Θ) = gcu(Θ). In order to calculate the great- typing explicitly through . In est common unifier efficiently, we need to be careful to his system, the type of selection is: construct only normalized substitutions during unifica- ∀ra. {l :: a | r} → a tion. For example, we take the order on type variables into account to only produce variable substitutions of The calculus does not impose constraints on record ex- the form [α 7→ β] instead of [β 7→ α]; However, a full tension though, and as a result, not all programs have discussion is beyond the scope of this paper. a principal type [31]. Remy [21] extended the work of Wand with constrained record operations. His calculus contains flags that denote the presence or absence of 8.4 Complexity of improvement certain fields. The type of selection becomes: The previous section developed an algorithm to cal- ∀ra. {l :: pre(a) | r} → a culate improving substitutions in the presence of row

13 The system of Remy does not lead directly to an effi- 10 Conclusion cient or simple compilation method though. Ohori [18] was the first to present a polymorphic record system We have described a sound and complete system of that had a simple and efficient compilation scheme, but polymorphic extensible records and variants with first- it can not handle extensible row operations. class labels. We have shown how first-class labels in- Gaster and Jones [5, 4] presented a polymorphic crease expressiveness significantly, and are able to ex- type system for extensible records and variants that was press intersection types, closed-world overloading, and based on the theory of qualified types [8, 9]. This sys- type selective functions. The type inference algorithm tem has a simple and efficient compilation method and is fully implemented in Morrow [12], including discussed is implemented as the TREX extension to Hugs. extensions such as row polymorphic operations. Type rules for record concatenation have been pro- In the future, we expect to try the inferencer on posed too [7, 32], where Sulzmann presents an effective larger programs. In particular, we would like to inves- type inference method [27]. However, no efficient com- tigate the use of higher-rank types to express first-class pilation method for concatenation has been published. modulesand we would like to experiment with first-class patterns to encode views [30]. First-class labels Acknowledgements Gaster describes first-class labels in his thesis [4] but does not introduce predicates beyond lacks predicates. I would like to thank Bastiaan Heeren, Mark Shields, As a result, not all programs can be assigned a (prin- and Frank Atanassow for their valuable comments on cipal) type. Sulzmann [29, 28] introduces first-class la- the draft paper. Furthermore, I thank Andres L¨oh,Erik bels as an instance of HM(X) [17] and his system enjoys Meijer, Martijn Schrage, and Arjan van IJzendoorn for principal types as it uses has constraints (instead of just their remarks and interesting discussions about the de- lacks constraints). Unfortunately, it is not obvious how sign of Morrow. to extend this system with polymorphic row extension, as it would prevent resolution of equality predicates [29]. References This is one of the main reasons to base our work on the more general theory of qualified types [9] instead, [1] L. Cardelli and J. Mitchell. Operations on records. which gives us a flexible design space with respect to Journal of Mathematical Structures in Computer improvement of equality predicates. Another reason is Science, 1(1):3–48, Mar. 1991. that a system based on HM(X) has no general compi- lation method, whereas we can use standard evidence [2] J. Garrigue. Typing deep pattern-matching in pres- translation. ence of polymorphic variants. In Proceedings of the Shields and Meijer introduce an extremely expres- JSSST Workshop on Programming and Program- sive row calculus, called λtir, where types themselves are ming Languages, Mar. 2004. tir used as indices in the rows [25, 24]. The λ calculus [3] J. Garrigue and H. A¨ıt-Kaci. The typed poly- is strictly more general than our system; our calculus tir morphic label selective calculus. In 21th ACM can be encoded in λ using opaque newtypes to ex- Symp. on Principles of Programming Languages press first-class labels. However, to avoid the exponen- (POPL’94), pages 35–47, Portland, OR, Jan. 1994. tial behaviour of our improvement algorithm, λtir uses a restricted polynomial improvement algorithm. Due to [4] B. R. Gaster. Records, Variants, and Qualified the lack of labels, this is more or less essential for λtir as Types. PhD thesis, Dept. of , there are much more ambigious unifications in practical University of Nottingham, July 1998. programs. As a result of restricted improvement, λtir [5] B. R. Gaster and M. P. Jones. A polymorphic type can not always infer that a given type signature entails system for extensible records and variants. Techni- an inferred type. cal Report NOTTCS-TR-96-3, Dept. of Computer Our work is strongly motivated by λtir, where we Science, University of Nottingham, 1996. wanted to develop a simpler calculus based on the the- ory of qualified types. By using explicit labels that are [6] T. Gilliam and T. Jones. Monty Python and the not part of the type language, we explored a useful al- Holy Grail. By G. Chapman and J. Cleese, 1975. ternative point in the design space, where we can ex- press many of the motivating examples of λtir while us- [7] R. Harper and B. C. Pierce. A record calculus ing standard theory. based on symmetric concatenation. In 18th ACM Symp. on Principles of Programming Languages (POPL’91), pages 131–142, Jan. 1991.

14 [8] M. P. Jones. A theory of qualified types. In 4th. Eu- [21] D. R´emy. Type inference for records in a nat- ropean Symposium on Programming (ESOP’92), ural extension of ML. In C. A. Gunter and volume 582 of Lecture Notes in Computer Science, J. C. Mitchell, editors, Theoretical Aspects Of pages 287–306. Springer-Verlag, Feb. 1992. Object-Oriented Programming. Types, Semantics and Language Design. MIT Press, 1993. [9] M. P. Jones. Qualified types in Theory and Prac- tice. Distinguished Dissertations in Computer Sci- [22] J. C. Reynolds. Design of the programming lan- ence. Cambridge University Press, 1994. guage Forsythe. In P. O’Hearn and R. Tennent, editors, ALGOL-like languages, pages 173–233. [10] M. P. Jones. Simplifying and improving qualified Birkh¨auser,1997. types. Technical Report YALEU/DCS/RR-1040, Dept. of Computer Science, Yale University, 1994. [23] J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, [11] M. P. Jones. A system of constructor classes: over- 12(1):23–41, Jan. 1965. loading and implicit higher-order polymorphism. Journal of , 5(1):1–35, [24] M. Shields. Static Types for Dynamic Documents. Jan. 1995. PhD thesis, Oregon Graduate Institute, Feb. 2001. [12] D. Leijen. Morrow: a row-oriented programming [25] M. Shields and E. Meijer. Type indexed rows. language. http://www.cs.uu.nl/∼daan/morrow. In 28th Annual ACM SIGPLAN-SIGACT Sym- html, July 2004. posium on Principles of Programming Languages, pages 261–275, London, England, Jan. 2001. [13] D. Leijen and E. Meijer. Domain specific embed- ded compilers. In 2nd USENIX Conference on Do- [26] P. Shroff and S. F. Smith. Type inference for first- main Specific Languages (DSL’99), pages 109–122, class messages with match-functions. In Founda- Austin, Texas, Oct. 1999. Also appeared in ACM tions of Object-Oriented Languages, Jan. 2004. SIGPLAN Notices 35, 1, (Jan. 2000). [27] M. Sulzmann. Designing record systems. Techni- [14] J. Lewis, M. Shields, E. Meijer, and J. Launchbury. cal Report YALEU/DCS/RR-1128, Dept. of Com- Implicit parameters: dynamic scoping with static puter Science, Yale University, Apr. 1997. types. In 27th ACM Symp. on Principles of Pro- gramming Languages (POPL’00), pages 108–118, [28] M. Sulzmann. Type systems for records revisited. Boston, Massachussets, Jan. 2000. Unpublished report, June 1998. [15] M. M¨ullerand S. Nishimura. Type inference for [29] M. Sulzmann. A General Framework for Hind- first-class messages with feature constraints. In- ley/Milner Type Systems with Constraints. PhD ternational Journal of Foundations of Computer thesis, Dept. of Computer Science, Yale University, Science, 11(1):29– 63, 2000. May 2000. [16] S. Nishimura. Static typing for dynamic messages. [30] P. Wadler. Views: a way for pattern matching In 25th ACM Symp. on Principles of Programming to cohabit with data abstraction. In 14th ACM Languages (POPL’98), pages 266–278, 1998. Symp. on Principles of Programming Languages (POPL’87), pages 307–313. ACM Press, 1987. [17] M. Odersky, M. Sulzmann, and M. Wehr. Type in- ference with constrained types. Theory and Prac- [31] M. Wand. Complete type inference for simple ob- tice of Object Systems, 5(1):35–55, 1999. jects. In Proceedings of the 2nd. IEEE Symposium on Logic in Computer Science, pages 37–44, 1987. [18] A. Ohori. A polymorphic record calculus and its Corrigendum in LICS’88, page 132. compilation. ACM Transactions on Programming Languages and Systems, 17(6):844–895, 1995. [32] M. Wand. Type inference for record concatenation and multiple inheritance. Information and Com- [19] B. C. Pierce. Intersection types and bounded poly- putation, 93:1–15, 1991. morphism. Mathematical Structures in Computer Science, 7(2):129–193, Apr. 1997. [20] B. C. Pierce and D. N. Turner. Simple type theo- retic foundations for object-oriented programming. Journal of Functional Programming, 4(2):207–247, Apr. 1994.

15