First-Class Labels for Extensible Rows Technical Report: UU-CS-2004-051

First-class labels for extensible rows Technical report: UU-CS-2004-051 Daan Leijen Institute of Information and Computing Sciences, Utrecht University P.O.Box 80.089, 3508 TB Utrecht, The Netherlands [email protected] Abstract ants based on the theory of quali¯ed types. Unfortu- nately, as we show later in this article, type inference This paper describes a type system for extensible for their proposed extension is incomplete. Sulzmann records and variants with ¯rst-class labels; labels are acknowledges this problem and describes a record cal- polymorphic and can be passed as arguments. This in- culus with ¯rst-class labels as an instance of HM(X) creases the expressiveness of conventional record calculi [27, 28, 29], but this calculus is not extensible and lacks signi¯cantly, and we show how we can encode intersec- a general compilation method. Shields and Meijer intro- tion types, closed-world overloading, type case, label duce an intriguing label-less calculus, ¸tir, where records selective calculi, and ¯rst-class messages. We formally with ¯rst-class labels can be encoded with opaque types motivate the need for row equality predicates to express [25, 24]. However, due to general type equality con- type constraints in the presence of polymorphic labels. straints, ¸tir is a very complicated system. We wanted This naturally leads to an orthogonal treatment of unre- to explore an alternative point in the design space with stricted row polymorphism that can be used to express a simpler calculus where labels are explicit; indeed, we ¯rst-class patterns. can express many of the motivating examples of ¸tir Based on the theory of quali¯ed types, we present an without having to work from ¯rst principles. e®ective type inference algorithm and e±cient compila- We base our calculus directly on the original record tion method. The type inference algorithm, including system of Gaster and Jones, with two important tech- the discussed extensions, is fully implemented in the nical di®erences: we introduce row equality predicates experimental Morrow compiler. to express required type constraints in the presence of polymorphic labels, and we move the language of 1 Introduction rows into the predicate language to ensure completeness of type inference. Furthermore, our system is the Records and variants provide a convenient way to con- ¯rst to naturally describe row polymorphic operations, struct data types. Furthermore, record calculi can be and it completely avoids the complex uni¯cation prob- used as a foundation for features as objects and mod- lems, and resulting restriction to non-empty rows, as ule systems. Over the years, many aspects of record described by Gaster [4]. Being able to use unrestricted calculi have been studied, including polymorphic exten- row polymorphism, we can express ¯rst-class extensible sion, record concatenation, type directed compilation, patterns and views [30]. and even label-less calculi. We have fully implemented our type inference algo- The holy grail [6] of polymorphic extensible record rithm, including discussed extensions like row polymor- calculi are ¯rst-class labels, where labels are polymorphic operations, in the experimental Morrow compiler phic and can be passed as arguments. First-class labels [12]. Indeed, Morrow can infer all the types of the ex- increase the expressiveness of conventional record cal- amples in this paper. However, at the time of writing, culi signi¯cantly, and we will show how we can encode the code generator for Morrow is not yet complete. many interesting programming idioms, including inter- In our calculus, row terms are no longer part of the section types, closed-world overloading, type case, label type language but are only present in the predicate lan- selective calculi, and ¯rst-class messages. guage, to ensure completeness of type inference with One of the earliest works on ¯rst-class labels was polymorphic labels. By not folding labels back into done by Gaster and Jones [5, 4]. They describe a poly- the language of types, we also reduce the complexity morphic type system for extensible records and vari- of improvement with respect to ¸tir. Unfortunately, the improvement algorithm, and thus the test for satis¯a- (jl1 :: ¿1; :::; ln :: ¿nj) ´ (jl1 :: ¿1 j:::j (jln :: ¿n j(jj)j):::j) bility, ambiguity, and entailment, is still exponential. (jl1 :: ¿1; :::; ln :: ¿n jrj) ´ (jl1 :: ¿1 j:::j (jln :: ¿n j r j):::j) However, just like normal type inference, we believe The ¯elds of a row are distinguished by their label and that the worst case is unlikely to occur in practical pro- not by their position, and we consider rows equal up to grams. This intuition is largely con¯rmed by practical permutation of their ¯elds: experience with the Morrow compiler. In Section 2 we give an overview of conventional ex- (jl :: a; m :: b j rj) = (jm :: b; l :: a j rj) tensible records and variants `ala Gaster and Jones [5]. In Section 3 we extend this calculus with ¯rst-class polymorphic labels, and we discuss many interesting exam- 2.2 Lacks contraints ples in the following section. Section 5 discusses ¯rst- For the purposes of this paper, we restrict ourselves to class patterns. Section 6 and 7 formally de¯ne our cal- rows without duplicate labels. Without this constraint, culus, and give the typing rules and an inference algo- ¯elds can not be addressed unambiguously and, as a rithm. Section 8 describes simpli¯cation and improve- result, some programs can not be assigned a principal ment, and discusses complexity issues. We ¯nish with type [31]. A particularly elegant approach to enforce a discussion of related work and the conclusion. uniqueness of labels are lacks (or insertion) constraints [5, 25]. A predicate (rnl) restricts r to rows that do not 2 Records and variants contain a label l. In general, a row extension (jl :: a j rj) is only valid when the predicate (rnl) holds. For clarity, There are two dual concepts to describe the structure of we always write the lacks constraints explicitly in this data types: products group data items together, while paper, but a practical system can normally infer them sums describe a choice between alternatives. Here are from row expressions automatically, which simpli¯es the two examples of both operations: type signatures a lot. type Point = Int £ Int type Event = Char + Point 2.3 Record operations In most programming languages, we can name the com- A record interprets a row as a product of types. The ponents of a product and sum with a label. A labeled empty record is the only ground value with an empty product is called a record (or struct), while a labeled record type: sum is called a variant (or union, or data type). We de- fg :: fg scribe records with curly braces fg, and variants with angled brackets hi: Furthermore, there are three basic operations that can be performed on records, namely selection, restriction, type Point = fx :: Int; y :: Intg and extension: type Event = hkey :: Char; mouse :: Pointi ( :l) :: 8ra: (rnl) ) fl :: a j r g ! a We can see that both records and variants are described ( ¡ l) :: 8ra: (rnl) ) fl :: a j r g ! fr g by a sequence of labeled types, which we call a row. In fl = j g :: 8ra: (rnl) ) a ! fr g ! fl :: a j r g this article, we enclose rows in banana brackets (jj). For convenience, we leave these out when the row is directly Note that we assume a dist¯x notation where argument enclosed by a record or variant brackets. For example, positions are written as \ ". Furthermore, we explicitly the unabbreviated type of a Point is f(jx ::Int; y ::Intj)g. quantify all types in this paper, but practical systems can normally use implicit quanti¯cation. The three ba- 2.1 Extensible rows sic record operations are not arbitrary but based on the corresponding primitive operations on products in cate- Following Gaster and Jones [5], we consider an exten- gory theory or logic. Note that all type schemes contain sible row calculus where a row is either empty or an a predicate (rnl) to ensure the validity of row extension. extension of a row. The empty row is written as (jj) and For repeated record extension on terms, we apply the an extension of a row r with a label l and type ¿ is same abbreviations as for row extension on types. The written as (jl :: ¿ j rj). Here is an example of a row that basic operations can be used to implement a number of can be used to describe coordinates: other common operations like update and rename: (jx :: Int j (jy :: Int j (jj)j)j) fl := x j rg ´ fl = x j r ¡ lg To reduce the number of brackets, we use the following fl Ã m j rg ´ fm = r:l j r ¡ lg abbreviations: 2 2.4 Variant operations Note that we write r[i] to select the ith ¯eld of a mem- ory block. The o®set 1 is the evidence for the predicate A variant interprets a row as a sum of types. Dual to (jage :: Int; name :: Stringj)nmale, that resulted from se- records, the basic operations are based on the corre- lecting the male ¯eld. It is resolved to 1 as male is sponding operations on sums in category theory. The larger than age but smaller than name. Gaster and primitives consist of the empty variant, injection, em- Jones describe the formal translation from lacks predi- bedding, and decomposition: cates to evidence [5] and we will not repeat that here. hi :: hi Note that common transformations like inlining can be hl = i :: 8ra: (rnl) ) a ! hl :: a j ri used to optimize this expression.

Load more