A Polymorphic Typ e System for Extensible Records and Variants
Benedict R. Gaster and Mark P. Jones
Technical rep ort NOTTCS-TR-96-3, November 1996
Department of Computer Science, University of Nottingham,
University Park, Nottingham NG7 2RD, England.
fbrg,[email protected]
Abstract b oard, and another indicating a mouse click at a par-
ticular p oint on the screen:
Records and variants provide exible ways to construct
Event = Char + Point :
datatyp es, but the restrictions imp osed by practical
typ e systems can prevent them from b eing used in ex-
These de nitions are adequate, but they are not par-
ible ways. These limitations are often the result of con-
ticularly easy to work with in practice. For example, it
cerns ab out eciency or typ e inference, or of the di-
is easy to confuse datatyp e comp onents when they are
culty in providing accurate typ es for key op erations.
accessed by their p osition within a pro duct or sum, and
This pap er describ es a new typ e system that reme-
programs written in this way can b e hard to maintain.
dies these problems: it supp orts extensible records and
To avoid these problems, many programming lan-
variants, with a full complement of p olymorphic op era-
guages allow the comp onents of pro ducts, and the al-
tions on each; and it o ers an e ective type inference al-
ternatives of sums, to b e identi ed using names drawn
gorithm and a simple compilation metho d. It is a prac-
from some given set of labels. Lab elled pro ducts are
tical system that can b e understo o d and implemented
more commonly known as records or structs, while la-
as a natural extension of languages like Standard ML
b elled sums are p erhaps b etter known as variants or
and Haskell. In an area that has already received a
unions. For example, the Date and Event datatyp es
great deal of attention from the research community,
describ ed ab ove might b e de ned more attractively as:
the typ e system describ ed here is the rst to combine
all of these features in a practical framework.
Date = Rec fjday :Int ; month : Int ; year :Int jg
One imp ortant asp ect of this work is the emphasis
Event = Var fjkey :Char ; mouse : Point jg:
that it places on the use of rows in the construction of
This notation captures a common feature in the con-
record and variant typ es. As a result, with only small
struction of record and variant types, using rows of the
extensions of the core typ e system, we are able to intro-
form fjl : ;:::; l : jg to describ e mappings that as-
duce new, p owerful op erations on records using features
1 1 n n
so ciate a type with each of the (distinct) lab els l .
such as row p olymorphism and rst-class lab els.
i i
Record types are obtained by preceding rows with the
symb ol Rec . Variant types are constructed using Var .
1 Intro duction
For example, if r = fjl : ;:::; l : jg and e ,::: ,e have
1 1 n n 1 n
types ,::: , , resp ectively, then we can form a record
1 n
Pro ducts and sums play fundamental roles in the con-
(l = e ;:::; l = e ) of type Rec r , or distinct vari-
1 1 n n
struction of datatyp es: pro ducts describ e grouping of
ants, hl = e i;:::; hl = e i, each of type Var r . Thus,
1 1 n n
data items, while sums express choices b etween alter-
0 0
(day = 25; month = 12; year = 1996) and hkey = a i
natives. For example, we might represent a date as a
represent values of type Date and Event , resp ectively.
pro duct, with three integer comp onents sp ecifying the
day, month, and year:
1.1 Polymorphism and extensibility
Date = Int Int Int :
Unfortunately, practical languages are often less exi-
ble in the op erations that they provide to manipulate
For a simple example of a sum, consider the type of
records and variants. For example, many languages|
input events to a window system, with one alternative
from C to Standard ML (SML)|will only allow the pro-
indicating that a character has b een entered on the key-
grammer to select the l comp onent, r :l , from a record r
1
selection op erator ( if the typ e of r is uniquely determined at compile-time . :l ) has type:
These languages do not supp ort polymorphic op erations
8 :8r :Rec fjl : j r jg ! :
on records|such as a general selector function ( :l ) that
will extract a value from any record that has an l eld.
However, the op erations and types in Wand's system
A further weakness in many of these languages is that
are unchecked; for example, extending a row with an l
they provide no real supp ort for extensibility; there are
eld may either add a completely new eld, or replace
no general op erators for adding a eld, removing a eld,
an existing eld lab elled with l . As a result, some pro-
renaming a eld, or replacing a eld (p ossibly with a
grams do not have principal types [27].
value of a di erent typ e) in a record value.
There have b een many previous attempts to design
Flags: Remy has developed a exible treatment for
typ e systems for records and variants that supp ort p oly-
extensible records and variants in a natural extension of
morphism and extensibility. Such work is imp ortant,
ML [23]. A key feature of his system is the use of ags
not just in its own right, but also in its application to
to enco de b oth p ositive and negative information|that
the study of ob ject-oriented or database programming
is, to indicate which elds must b e present, and which
languages where these facilities seem particularly use-
must b e absent. Again, a concept of row variables is
ful. We will summarize the key features of some of these
used to deal with other elds whose presence or absence
earlier systems here b efore describing our own prop osal.
is not signi cant in a particular situation. For example,
the selection op erator has type:
Subtyping: Subtyping is one of the most widely used
techniques for building typ e systems for records and
8 :8r :Rec fjl : pre ( ) j r jg ! ;
variants [3, 5, 4, 21]. We can de ne a subtyping relation
by sp ecifying that a row r is a subrow of r , written
1 2
where pre ( ) is a ag indicating the presence of a eld
r r , if r contains all the elds of r , and p ossibly
1 2 1 2
of type , and r is a row variable representing the rest
more. The intuition here is that, for example, a record
of the record. Intuitively, records in Remy's system
of typ e Rec r could b e used in any context where a
1
are represented by tuples with a slot for each p ossible
value of typ e Rec r is exp ected, and conversely, that a
2
lab el. The type system prevents access to unde ned
variant of typ e Var r can b e substituted in any context
2
comp onents, but do es not lead directly to a simple and
where a value of typ e Var r is required. In particular,
1
ecient implementation.
:l ) can b e treated as a function the selection op erator (
of typ e:
Predicates: Harp er and Pierce [8, 7] studied type
8 :8r fjl : jg:Rec r ! :
systems for extensible records using predicates on types
This op eration is implemented, at least conceptually,
to capture information ab out presence or absence of
by co ercing from Rec r to a known type|the singleton
elds, and to restrict attention to checked op erations.
Rec fjl : jg|and then extracting the required eld. One
For example, writing r #r for the assertion that the
1 2
weakness of this approach is that information ab out the
rows r and r have disjoint sets of lab els, the selection
1 2
other elds in the record is lost, so it is harder to de-
op erator op erator ( :l ) has type:
scrib e op erations like record extension. For example,
8 :8r :(r #fjl : jg) ) Rec (r k fjl : jg) ! ;
observing that b ounded quanti cation is not by itself
sucient, Cardelli and Mitchell [5] used an overriding
where r k r is the row obtained by merging r and r ,
1 2 1 2
op erator on typ es to overcome this problem.
and is only de ned if r #r . Harp er and Pierce's work
1 2
do es not deal with variants, type inference, or compila-
Row extension: Motivated by studies of ob ject-
tion, and do es not provide an op erational interpretation
oriented programming, Wand [26] intro duced the con-
of predicates. However, their approach to record typing
cept of row variables to allow incremental construction
was one of the motivating examples in Jones' work on
of record and variant typ es. For example, a record of
quali ed types [11] where a general framework for type
typ e Rec fjl : j r jg has all of the elds of a record of
inference and compilation was developed, including a
typ e Rec r , together with a eld l of type . Wand
type system for records as a sp ecial case. One of the
did not discuss compilation, but his approach supp orts
achievements of the present pap er is to re ne and ex-
b oth p olymorphism and extensibility.For example, the
tend that work to a practical system, avoiding problems
1
In implementations using b oxed representations for values, only
such as the lack of most general uni ers in Jones' full
the set of lab els in r is needed; the actual comp onent types are not
system|the result of including record restriction in the
required.
type language. 2
Kinds: Ohori [18] describ ed a type system that ex- 2 Overview
tends SML with p olymorphic op erations on b oth records
Both record and variant types are de ned in terms of
and variants. Signi cantly, Ohori also presented a sim-
rows, and these are constructed by extension, starting
ple and e ective compilation metho d: input programs
from the empty row, fjjg. It is convenient to intro duce
are translated into a target language that adds extra pa-
abbreviations for rows obtained in this way:
rameters to sp ecify eld o sets. In fact, the end result
is much the same as that suggested by Jones' work on
fjl : ;:::; l : j r jg = fjl : j ::: fjl : j r jg ::: jg
1 1 n n 1 1 n n
quali ed typ es, even though the two approaches were
fjl : ;:::; l : jg = fjl : ;:::; l : j fjjgjg:
1 1 n n 1 1 n n
developed indep endently. But Ohori's work di ers sub-
stantially from other systems in its use of a kind system;
Note, however, that we treat rows, and hence record or
this allows variables ranging over record types to b e an-
variant types, as equals if they include the same elds,
notated with a sp eci cation of the elds that they are
regardless of the order in which those elds are listed.
exp ected to contain. For example, the selection op era-
tor op erator ( :l ) has typ e:
2.1 Basic op erations
fjl : jg
8 :8r :Rec r ! :
Intuitively, a record of type Rec fjl : j r jg is like a pair
The main limitation of Ohori's typ e system is its lack
whose rst comp onent is a value of type , and whose
of supp ort for extensibility.
second comp onent is a record of type Rec r . This mo-
tivates our choice of basic op erations on records, which
corresp ond directly to the two pro jections and the pair-
1.2 This pap er
ing constructor for pro ducts in category theory or logic.
The typ e system describ ed in this pap er combines many
There is, however, one complication; we do not allow re-
of the ideas that have b een used in previous work into a
p eated uses of any lab el within a particular row, so the
practical typ e system for implicitly typed languages like
expression fjl : j r jg is only valid if l do es not app ear
SML [16] and Haskell [20]. In particular, it supp orts
in r . This is re ected by pre xing each of the types b e-
p olymorphism and extensibility, records and variants,
low with a predicate (r nl ), pronounced \r lacks l ", that
typ e inference, and compilation. The type system is
restricts instantiation of r to rows without an l eld:
an application of quali ed typ es, extended to deal with
2
Selection: to extract the value of a eld l :
a general concept of rows. Positive information ab out
the elds in a given row is captured in the type lan-
:l ) :: (r nl ) ) Rec fjl : j r jg ! : (
guage using row extension, while negative information
is re ected by the use of predicates.
Restriction: to remove a eld lab elled l :
The most obvious b ene t of this approach is that
we can adapt results and prop erties from the general
( l ) :: (r nl ) ) Rec fjl : j r jg ! Rec r :
framework of quali ed typ es|such as the type infer-
ence algorithm and the compilation metho d|without
Extension: to add a eld l to an existing record:
having to go back to rst principles. The result is a con-
siderable simpli cation of b oth the overall presentation
j ) :: (r nl ) ) ! Rec r ! Rec fjl : j r jg: (l =
and of sp eci c pro ofs. Another imp ortant advantage of
this approach is that it guarantees compatibility with
We can use these basic op erations to implement a num-
other applications of quali ed typ es. For example, our
b er of additional op erators, including:
typ e system can b e used|and indeed, has already b een
used in our prototype implementation|in conjunction
Up date/replace: to up date the value in a particu-
with the typ e class mechanisms of Haskell.
lar eld, p ossibly with a value of a di erent type:
The remaining sections of this pap er are as follows.
Section 2 provides a general overview of our new type
(l := j ) :: (r nl ) ) ! Rec fjl : j r jg
system, with a more detailed formal presentation in Sec-
! Rec fjl : j r jg
tion 3. This is followed by discussions of type inference
(l := x j r ) = (l = x j r l )
in Section 4 and compilation in Section 5. With some
2
To simplify the notation, we assume implicit universal quanti -
small extensions to the core system, Section 6 shows
cation over free type variables. For example, the full type of the
how our framework can b e used to supp ort some new,
selection op erator is 8r :8 :(r nl ) ) Rec fjl : j r gj ! .
p owerful op erations on records and variants using row
p olymorphism and lab els as rst-class values. Finally,
Section 7 concludes with p ointers to further work. 3
Renaming: to change the lab el attached to a par- the most frequently used basic op eration. A naive ap-
ticular eld: proach would b e to represent a record by an asso ciation
list, pairing lab els with values. This would allow simple
[l m ] :: (r nl ; r nm ) ) Rec fjl : j r jg
implementations for each of the basic op erations, with
! Rec fjm : j r jg
the type system providing a guarantee that the search
r [l m ] = (m = r :l j r l )
for any given lab elled eld would not fail. A ma jor dis-
advantage is that it do es not allow constant time access
The empty record, (), plays an imp ortant role as the
to record comp onents.
only prop er value of typ e Rec fjjg. Again, it is convenient
To avoid these problems, we will assume instead that
to intro duce abbreviations for the construction of record
a record value is represented by a contiguous blo ck of
values by rep eated extension:
memory that contains a value for each individual eld.
To select a particular comp onent r :l from a record r ,
(l =e ;:::; l =e j r ) = (l = e j ::: (l =e j r ) :::)
1 1 n n 1 1 n n
we need to know the o set of the l eld in the blo ck of
(l =e ;:::; l =e ) = (l = e ;:::; l =e j ())
1 1 n n 1 1 n n
memory representing r . Languages without p olymor-
phic selection will usually only allow an expression of
We can sp ecify the basic op erations on variants in a sim-
the form r :l if the o set value, and hence the structure
ilar way. Again, they corresp ond closely to the standard
or even the full type of r , is known at compile-time.
op erations on sums in category theory or logic:
However, it is not actually necessary to know the
p osition of every eld at compile-time; instead, we can
Injection: to tag a value with the lab el l :
treat unknown o sets as implicit parameters whose val-
ues will b e supplied at run-time when the full types of
i :: (r nl ) ) ! Var fjl : j r jg: hl =
the records concerned are known. This is essentially
the compilation metho d that was used by Ohori [18],
Emb edding: to emb ed a value in a variant type
and also suggested, indep endently, by Jones [11]. If
that also allows for a new lab el, l :
we forget ab out typing issues for a moment and as-
hl j i :: (r nl ) ) Var r ! Var fjl : j r jg: sume that records are implemented as arrays or tu-
:l ) op erator can b e implemented by a ples, then the (
function i :r :r [i ], using the extra parameter i to sup-
Decomp osition: to act on the value in a variant,
ply the o set of l in r . For example, the expression:
according to whether or not it is lab elled with l :
(day = 25; month = 12; year = 1996):day can b e imple-
(l 2 ? : ) :: (r nl ) ) Var fjl : j r jg
mented by compiling it to:
! ( ! )
(i :r :r [i ]) 0 (25; 12; 1996)
! (Var r ! )
! :
which evaluates to 25, as exp ected. Of course, there are
run-time overheads in calculating and passing o set val-
The empty variant, hi, is the only value of type Var fjjg.
ues as extra parameters. However, an attractive feature
More sophisticated language constructs, for exam-
of our system is that these costs are only incurred when
ple, pattern matching facilities, or record up date, are
the extra exibility of p olymorphic selection is required.
easily describ ed using the op erations listed here. In ad-
Each predicate (r nl ) in the type of a function signals
dition, we exp ect that practical implementations will
the need for an extra run-time parameter to sp ecify the
use, but not display predicates implied by the context
o set at which a eld lab elled l would b e inserted into
in which they app ear. For example, all of the types
a record of type Rec r . Obviously, the same o set can
ab ove include a row fjl : j r jg that is only valid if r nl ;
also b e used to lo cate or remove the l eld from a record
so displaying this predicate is, in some sense, redun-
of type Rec fjl : j r jg, or treated as ordinal numb ers to
dant. However, as we will see in the next section, this
access and tag values in a variant. So, this one extra
predicate plays a central role in the implementation of
piece of information is all that we need to implement
the basic op erations.
the basic op erations.
Op erations like record extension and restriction will,
in general, b e implemented by copying. Optimizations
2.2 Implementation details
can b e used to combine multiple extensions or restric-
Our next task is to explain how the data structures and
tions of records, avoiding unnecessary allo cation and
op erations describ ed ab ove can b e implemented. We
initialization of intermediate values. For example, a
will fo cus on the treatment of records and, in particular,
compiler can generate co de that will allo cate and ini-
.l), which is probably the implementation of selection, (
tialize the storage for a record (x = 1; y = 2; z = 3) in 4
a single step, rather than a sequence of three individ- morphism and overloading. In the current appli-
ual allo cations and extensions as a naive interpretation cation, we use constraints to capture assumptions
might suggest. ab out the o ccurrences of lab els within rows.
The typ echecker gathers and simpli es the predi-
A higher-order version of the Hindley-Milner type
cates generated by each use of an op erator on records or
system [9, 15, 6], originally intro duced in the study
variants. For example, if today is a value of type Date ,
of constructor classes [12]. Amongst other things,
then an expression like today :month will generate a sin-
this provides a simple way to intro duce the new
gle constraint, fjday : Int ; year : Int jgnmonth . Predi-
constructs for rows, records, and variants without
cates like this, involving rows whose structure is known
the need for sp ecial, ad-ho c syntax.
at compile-time, are easily discharged by calculating the
We split the presentation into sections: kinds (Sec-
appropriate o set value. Obviously, a compiler can use
tion 3.1), types and constructors (Section 3.2), predi-
this information to pro duce ecient co de by inlining
cates (Section 3.3), and typing rules (Section 3.4).
and sp ecializing the selector function, ( :month ).
Predicates that are not discharged within a section
of co de will, instead, b e re ected in the type assigned
3.1 Kinds
to it. For example, there is nothing in the following
de nition to indicate the full typ e of d : One of the most imp ortant asp ects of the work de-
scrib ed here is the use of a kind system to distinguish
newYear d = d :day = 1 ^ d :month = 1;
b etween di erent kinds of type constructor. Formally,
the set of kinds is sp eci ed by the following grammar:
so the inferred typ e will b e:
::= the kind of al l types
(r nday ; r nmonth ) )
j row the kind of al l rows
Rec fjday : Int ; month : Int j r jg ! Bool :
j ! function kinds
1 2
We would not exp ect this de nition to have b een ac-
Intuitively, the kind ! represents constructors
1 2
cepted at all by a compiler for SML which requires the
that take something of kind and return something of
1
set of lab els in a record to b e uniquely determined by
kind . The row kind is new to the system presented
2
`program context'. But the meaning of this phrase is
here and was not part of the type system used in the
de ned only lo osely by an informal note in the de -
development of constructor classes.
nition of SML [16]. Now, with the ideas used in this
pap er, there is a way to make this precise: a de ni-
tion is only acceptable in SML if the inferred type do es
3.2 Types and constructors
not contain any predicates. For programs written with
For each kind , we have a collection of constructors
these restrictions, a language based on our type system
C (including variables ) of kind :
should o er the same levels of p erformance as SML.
It is p ossible that our more general treatment of
C ::= constants
record op erations could result in compiled programs
j variables
0 0
that are littered with unwanted o set parameters; ex-
!
j C C applications
p erience with our prototype implementations will help
::= C types
to substantiate or dismiss these concerns. In any case,
there are simple steps that can b e taken to avoid such
The usual collection of types, represented here by the
problems. For example, a compiler might reject any
symb ol , is just the constructors of kind . For the pur-
de nition with an inferred typ e containing predicates,
p oses of this pap er, we assume that the set of constant
unless an explicit typ e signature has b een given to sig-
constructors includes at least the following, writing ::
nal the programmer's acceptance. This is closely related
to indicate the kind asso ciated with each constant :
to the monomorphism restriction in Haskell [20] and to
! :: ! ! function space
prop osals for a value restriction in SML [29, 13].
fjjg :: row empty row
j jg :: ! row ! row extension, for each l fjl :
3 Formal presentation
Rec :: row ! record construction
Var :: row ! variant construction
This section provides a formal presentation of our type
For example:
system, based on two particular ingredients:
The theory of quali ed typ es [11], which provides The result of applying the function space construc-
0
a general framework for describing restricted p oly- tor ! to two types and is the type of functions 5
0 0
from to , and is written as ! in more con- example, is only valid if the row r do es not contain a
ventional notation. eld lab elled with l .
One way to achieve this is to use a more sophisticated
The result of applying the Rec constant to the
kind system, with sets of lab els, L, as kinds instead of
empty row fjjg of kind row is the type Rec fjjg of
the single row kind. For example, rows with eld lab els
kind .
l ,::: l can b e represented by the kind L = fl ;:::; l g;
1 n 1 n
this is essentially the approach adopted by Ohori [18].
The result of applying an extension constructor
Unfortunately, this b ecomes much more complicated if
fjl : j jg to a typ e and a row r is a row, usually
we try to extend it to deal with extensible rows. In
written as fjl : j r jg, obtained by extending r with a
particular, we would need to assign whole families of
eld lab elled l of typ e . Note that we include an
kinds, indexed by lab el sets, L, to some of the construc-
extension constructor for each di erent lab el l . To
tor constants intro duced in the previous section:
avoid problems later, we will also need to prohibit
partial application of extension constructors.
fjl : j jg :: ! L ! (L [ fl g) l 62 L
Rec ; Var :: L !
The kind system is used to ensure that type expressions
are well-formed. While it is sometimes convenient to an-
The alternative that we adopt in this pap er is based
notate individual constructors with their kinds, there is
on the theory of quali ed types [11], using predicates to
no need in practice for a programmer to supply these
capture any side conditions that are required to ensure
annotations. Instead, they can b e calculated automati-
that a given type expression is valid. In fact, only a
cally using a simple kind inference pro cess [12].
single form of predicate is needed for this purp ose:
We consider two rows to b e equivalent if they include
the same elds, regardless of the order in which they are
row
::= C nl predicates
listed. This is describ ed formally by the equation:
Intuitively, the predicate r nl can b e read as an asser-
0 0 0 0
fjl : ; l : j r jg = fjl : ; l : j r jg;
tion that the row r do es not contain an l eld. More
precisely, we explain the meaning of predicates using
and extends in the obvious way to an equality on arbi-
the entailment relation de ned in Figure 1. A deriva-
trary constructors.
For the purp oses of later sections, we de ne a mem-
bership relation, (l : )2 r , to describ e when a particular
0
eld (l : ) app ears in a row r :
P `` r nl l 6= l
P [ f g `` P `` fjjgnl
0
P `` fjl : j r jgnl
(l : )2 r
0
(l 6= l )
(l : )2 fjl : j r jg
0 0
(l : )2 fjl : j r jg
Figure 1: Predicate entailment for rows.
and a restriction op eration, r l , that returns the row
obtained from r by deleting the eld lab elled l :
tion of P `` from these rules can b e understo o d as
a pro of that, if all of the predicates in the set P hold,
fjl : j r jg l = r
then so do es . It is easy to prove that the relation ``
0 0
fjl : j r jg l = fjl : j r l jg:
is well-de ned with resp ect to equality of constructors.
It is easy to prove that these op erations are well-de ned
with resp ect to the equality on constructors, and to
3.4 Typing rules
con rm intuitions ab out their interpretation by showing
Following Damas and Milner [6], we distinguish b etween
that, if (l : )2 r , then r = fjl : j r l jg.
the simple types, , describ ed ab ove, and type schemes,
, describ ed by the grammar b elow:
3.3 Predicates
::= j 8 : type schemes
0
The syntax for rows allows examples like fjl : ; l : jg
::= j ) quali ed types
where a single lab el app ears in more than one eld.
While this might b e useful in some applications, it is not
Restrictions on the instantiation of universal quanti-
appropriate for work with records or variants where we
ers, and hence on p olymorphism, are describ ed by en-
allow at most one eld with any given lab el. Clearly,
co ding the required constraints as a set of predicates,
some additional mechanisms are needed to enable us
P , in a quali ed type of the form P ) . The set of
to sp ecify that a typ e of the form Rec fjl : j r jg, for
free type variables in a ob ject X is written as TV (X ). 6
The term language is just core-ML, an implicitly feature is the intro duction of inserters in Section 4.1 to
typ ed -calculus, extended with constants and a let account for non-trivial equalities b etween row expres-
construct, and describ ed by the following grammar: sions during uni cation.
E ::= x j c j EF j x :E j let x = E in F
4.1 Uni cation and insertion
We assume that the set of constants c includes the op-
Uni cation is a standard to ol in type inference, and is
erations and values required for manipulating records
used, for example, to ensure that the formal and actual
and variants as describ ed in Section 2, and that each
parameters of a function have the same type. Formally,
constant c is assigned a closed typ e scheme, .
c
3 0
a substitution S is a uni er of constructors C ; C 2 C
The typing rules are presented in Figure 2. A judge-
0 0
if SC = SC , and is a most general uni er of C and C
ment of the form P j A ` E : represents an assertion
if every uni er of these two constructors can b e written
that, if the predicates in P hold, then the term E has
in the form RS , for some substitution R .
typ e , using assumptions in A to provide types for free
The rules in Figure 3 provide an algorithm for cal-
variables. These are just the standard rules for quali ed
id
(id ) C C
(const ) P j A ` c :
c
9
[C = ]
(x : ) 2 A
=
C
(var )
(bind ) 62 TV (C )
P j A ` x :
[C = ]
;
C
0 0
P j A ` E : ! P j A ` F :
(!E )
0
U U
0 0
P j A ` EF :
C C UD UD
(apply )
0
0
P j A ; x : ` E :
U U
x
0 0
CD C D
(!I )
0
P j A ` x :E : !
I
U
0 0
(l : )2 r Ir (Ir l )
P j A ` E : ) P ``
(row )
()E )
UI
0
fjl : j r jg r
P j A ` E :
P [ f g j A ` E :
()I )
Figure 3: Kind-preserving uni cation.
P j A ` E : )
U
P j A ` E : 8 :
0
culating uni ers, writing C C for the assertion that
(8E )
0
P j A ` E :[ = ]
U is a uni er of the constructors C ; C 2 C . The
rst three rules are standard [25], and are even suit-
P j A ` E : 62 TV (A) [ TV (P )
able for unifying two row expressions that list exactly
(8I )
P j A ` E : 8 :
the same comp onents with exactly the same ordering in
each. But the fourth rule, (row ), is needed to deal with
P j A ` E : Q j A ; x : ` F :
x
the more general problems of row uni cation.
(let )
P ; Q j A ` (let x = E in F ):
To understand how this rule works, consider the task
0 0 0 0
of unifying two rows fjl : j r jg and fjl : j r jg, where l ,l
0
are distinct lab els, and r ,r are distinct row variables.
Figure 2: Typing rules.
Our goal is to nd a substitution S such that:
typ es [11], extending the rules of Damas and Milner [6]
fjl : S j Sr jg = S fjl : j r jg
to account for the use of predicates. Note that uses of
0 0 0
= S fjl : j r jg
the symb ols , , and in these rules is particularly
0 0 0
= fjl : S j Sr jg:
imp ortant in restricting their application to particular
classes of typ es or typ e schemes.
Clearly, the row on the left includes an (l : S ) eld,
0 0
while the last row on the right includes an (l : S )
eld. If these two types are to b e equal, then we must
4 Type Inference
3
For the purp oses of this pap er, we restrict our attention to kind-
This section provides a formal presentation of a type in-
preserving substitutions; that is, to substitutions that map variables
2 C to constructors of the corresp onding kind, .
ference algorithm for our system. The most imp ortant 7
choose the substitution S so that it will `insert' the miss-
0
ing elds into the two rows r and r , resp ectively. In
(x : 8 :P ) ) 2 A new
i i
0 0
W
this particular case, assuming that r ; r 62 TV ( ; ),
(var )
W
[ = ]P j A ` x :[ = ]
i i i i
then we can choose:
0 0 00 00 0
S = [fjl : j r jg=r ; fjl : j r jg=r ]
W W
0 0
P j TA ` E : Q j T TA ` F :
00
U
where r is a new typ e variable.
0 0
T ! new
W
More generally, we will say that a substitution S is
(!E )
W
0 0
row
U (T P [ Q ) j UT TA ` EF : U
an inserter of (l : ) into r 2 C if (l : S ) 2 Sr . S
is a most general inserter of (l : ) into r if every such
inserter can b e written in the form RS , for some substi-
W
P j T (A ; x : ) ` E : new
x
W
tution R . The rules in Figure 4 de ne an algorithm for
(!I )
W
I
P j TA ` x :E : T !
calculating inserters, writing (l : )2 r for the assertion
row
that I is an inserter of (l : ) into r 2 C . Note that
W
P j TA ` E : = Gen (TA; P ) )
W
0 0 0
0
P j T (TA ; x : ) ` F :
[fjl :jr jg=r ]
x
0
W
(inVar ) (l : ) 2 r ; r 62 TV ( ); r new
(let )
W
0 0 0
P j T TA ` (let x = E in F ):
I
0
(l : )2 r l 6= l
(inTail ) Figure 5: Typ e inference algorithm W.
I
0
(l : )2 fjl : j r jg
to calculate the generalization of a quali ed type with
U
0
resp ect to a type assignment A. This is de ned as:
(inHead )
U
(l : ) 2 fjl : j r jg
Gen (A; ) = 8 :; where f g = TV ()nTV (A):
i i
The type inference algorithm is b oth sound and com-
Figure 4: Kind-preserving insertion. plete with resp ect to the original typing rules.
Theorem 2 The algorithm described by the rules in
the last rule here, (inHead ), makes use of the uni ca-
Figure 5 can be used to calculate a principal type for
tion algorithm in Figure 3, so the two algorithms are
a given term E under assumptions A. The algorithm
mutually recursive. The imp ortant prop erties of the
fails precisely when there is no typing for E under A.
two algorithms|b oth soundness and completeness|
are captured in the following result:
5 Compilation
Theorem 1 The uni cation (insertion) algorithm de-
ned by the rules in Figure 3 (Figure 4) calculates most-
Previously, we have describ ed informally how programs
general uni ers (inserters) whenever they exist. The al-
involving op erations on records and variants can b e
gorithm fails precisely when no uni er (inserter) exists.
compiled and executed using a language that adds extra
parameters to supply appropriate o sets. This section
shows how this pro cess can b e formalized, including the
4.2 A typ e inference algorithm
calculation of o set values.
Given the uni cation algorithm describ ed in the pre-
vious section, we can use the typ e inference algorithm
5.1 Compilation by translation
for quali ed typ es [11] as a typ e inference algorithm for
the typ e system presented in this pap er. For complete-
In the general treatment of quali ed types [11], pro-
ness, we include a de nition of the algorithm using the
grams are compiled by translating them into a language
rules in Figure 5. Following Remy [23], these rules can
that adds extra parameters to supply evidence for pred-
b e understo o d as an attribute grammar; in each typ-
icates app earing in the types of the values concerned.
W
ing judgement P j TA ` E : , the type assignment
The whole pro cess can b e describ ed by extending the
A and the term E are inherited attributes, while the
typing rules to use judgements of the form:
predicate assignment P , typ e , and substitution T are
0
W
synthesized. The (let ) rule uses an auxiliary function
P j A ` E ; E : 8
This tells us that we need to supply suitable evidence which include b oth the original source term E and a
0
e in the translation of any program whose type is qual- p ossible translation, E . A further change here is the
i ed by a predicate . The second rule is analogous to switch from predicate sets to predicate assignments; the
function abstraction, and allows us to move constraints symb ol P used ab ove represents a set of pairs (v : )
from the predicate assignment P into the inferred type: in which no variable v app ears twice. Each variable v
corresp onds to an extra parameter that will b e added
0
P [ fv : g j A ` E ; E :
during compilation; v can b e used whenever evidence
0
for the corresp onding predicate is required in E .
0
P j A ` E ; v :E : )
In the current setting, predicates are expressions of
These two rules are direct extensions of ()E ) and ()I )
the form (r nl ) whose evidence is the o set in r at which
in Figure 2, and combined with simple extensions of the
a eld lab elled l would b e inserted. The calculation of
other rules there, we can construct a translation for any
evidence is describ ed by the rules in Figure 6, which
term in the source calculus.
P [ fv : g `` v :
6 Extensions
(
The type system that we have describ ed in the previous
0
P `` e :(r nl )
e ; l < l
sections o ers a exible, but fairly conventional set of
m =
0
0
P `` m :(fjl : j r jgnl )
e + 1; l < l
op erations on records and variants. In this section we
consider two extensions to the original system. Section
6.1 describ es a generalization of the record and variant
P `` 0 : (fjjgnl )
op erations to include, amongst others, rst-class exten-
sible case statements. Section 6.2 shows how the system
can b e mo di ed to allow lab els to b e treated as rst-
Figure 6: Predicate entailment for rows with evidence.
class values.
are direct extensions of the earlier rules for predicate
entailment that were given in Figure 1. Intuitively, a
6.1 Row p olymorphism
derivation of P `` e : tells us that we can use e as
Working with a general notion of rows has provided us
evidence for the predicate in any environment where
with an elegant way to deal with the common structure
the assumptions in P are valid. The second rule is the
in record and variant types. However, we have not seen
most interesting and tells us how to nd the p osition at
0
any comp elling examples in the previous sections where
which a lab el l should b e inserted in a row fjl : j r jg:
it was essential to consider rows separately from records
0
If l comes b efore l in the total ordering, <, on
and variants; we could have just de ned completely in-
lab els, then the required o set will b e the same as
dep endent sets of record and variant types.
the o set e of l in r .
However, there are some applications in which the
0 ability to separate rows from records and variants o ers
But, if l comes b efore l , then we need to use an
0 signi cant b ene ts. To illustrate this, consider again
o set of e + 1 to account for the insertion of l .
the basic op erations that were discussed in Section 2.
In general, these rules calculate o sets that are either a
For example, if we generalize the rules in category the-
xed natural numb er, or a xed o set from one of the
ory or logic for decomp osing a sum to deal with n -ary
variables in P . For simplicity, we have assumed a b oxed
sums, then we obtain the following rule:
representation in which all record comp onents o ccupy
the same amount of storage. It is actually quite easy to
A ! C ::: A ! C
1 n
:
allow for varying comp onent sizes by replacing e + 1 in
A + ::: + A ! C
1 n
the calculation ab ove with e + size ( ).
For reasons of space, we omit the complete descrip-
In terms of records and variants, this rule provides a
tion of translation from this pap er, and restrict our-
metho d for decomp osing a variant|represented by the
selves instead to describing the two rules that account
sum A + ::: + A in the conclusion|using a record
1 n
for the use and intro duction of o set parameters. The
of functions|represented by the hypotheses A ! C .
i
rst of these is a variation on function application:
This suggests a general op eration for variant elimina-
tion:
0
P j A ` E ; E : ) P `` e :
0
plusElim :: 8 :8r :Rec (to r ) ! Var r ! :
P j A ` E ; E e : 9
The to r construct used here is de ned as follows: kind system that distinguishes b etween empty and non-
empty rows. We leave further investigation of this topic
to fjjg = fjjg
to future work.
0 0
to fjl : j r jg = fjl : ! j to r jg:
This b ehaves like a particular kind of map op eration on
6.2 First-class lab els
0
rows, replacing each comp onent typ e in r with a type
0 In previous sections of the pap er, we have considered
of the form ! .
the lab els used to refer to elds as part of the basic
For an example of where such an op eration may
syntax of our language. As a result, we had to describ e
prove useful, consider the typ e of integer lists that can
selection from a record using a family of functions:
b e obtained as a xp oint of the following functor [14]:
:l ) :: (r nl ) ) Rec fjl : j r jg ! ; (
data L l = L (Var fjnil : Rec fjjg;
cons : Rec fjtl : l ; hd : Int jgjg)
with one function for each choice of lab el l . A more
attractive approach might b e to allow primitive op era-
The sum of a list of integers can b e calculated using a
tions on records and variants to b e parameterized over
general catamorphism:
lab els. We can extend the type system describ ed in pre-
cata ((L v ):plusElim (nil = ():0;
vious sections to accomplish this, treating selection, for
cons = r :r :hd + r :tl ) v ):
example, as a single function of type:
From this example, it is clear that plusElim is an alter-
: ) :: (r nl ) ) Rec fjl : j r jg ! Label l ! ; (
native to the case construct of languages like Haskell
and SML. However, unlike these languages, it is not a
This requires an extension of the kind system in Sec-
builtin part of the syntax; instead, it allows us to treat
tion 3.1 with a new kind lab , and also a new type con-
case constructs as rst-class, extensible values.
stant Label of kind lab ! . Intuitively, each type of
In a similar way, we can adapt the other rules for
the form Label l contains a unique lab el value. The
constructing sums, and for decomp osing or constructing
l parameter is imp ortant b ecause it establishes a con-
pro ducts, to functions involving records and variants.
nection b etween types and lab el values; a nullary Label
The full set of op erations are sp eci ed by the following
type would not have provided any way for us to ex-
4
typ e signatures :
press the necessary typing constraints. We can also dis-
p ense with the family of extension constructors de ned
plusElim :: 8 :8r :Rec (to r ) ! Var r !
in Section 3.2, replacing them with a single constructor
plusIntro :: 8 :8r :Var (from r ) ! ! Var r
constant:
prodElim :: 8 :8r :Var (to r ) ! Rec r !
prodIntro :: 8 :8r :Rec (from r ) ! ! Rec r
fj : j jg :: lab ! ! row ! row :
Given our earlier representations for records and vari-
Finally, we need to generalize the lacks predicate from
ants, it is easy to implement each of these op erations as
Section 3.3 to a two place relation r nl which takes b oth
builtin primitives.
a row r and a lab el l of kind lab . This can b e de ned
One technical diculty that we face with this ap-
in much the same way as b efore, and has the same in-
proach is in extending the treatment of uni cation to
terpretation as an o set value in the underlying imple-
deal with uses of the from and to constructs. This turns
mentation.
out to b e straightforward, except for p otential complica-
We can generalize the other basic op erations on records
tions caused by the presence of empty rows. For exam-
and variants in a similar way. For example, the expres-
0 0
ple, in unifying two rows to t r and to t r , we cannot
sion x :y :(y = 2; x = 3), which involves two uses of
0 0
simply unify t with t ; if r , and hence r , is empty, then
extension, will b e assigned the type:
0
there is no direct relationship b etween t and t . More
precisely, to obtain most general uni ers for from and
fjx : Int jgny )
to constructs, we need to restrict ourselves to work with
Label x ! Label y ! Rec fjx : Int ; y : Int jg:
non-empty rows. There are several alternatives to con-
To our knowledge, this is the rst work|in either im-
sider here. For example, the obvious approach would
plicitly or explicitly typed record and variant calculi|
b e to banish the empty row from our original type sys-
to consider a type system in which lab els can b e treated
tem. A more satisfactory solution might b e to adopt a
as rst-class values. This increases the expressiveness
4
The from r construct used in the types of plusIntro and
of our system quite dramatically, and we already have
prodIntro is de ned as an obvious dual to the to r construct that
we used previously. some interesting applications for these new features. 10
However, it remains to see how well this will work in semantics. Journal of Functional Programming,
practice, and what its implications for language design 4(2):127{206, April 1994.
might b e; we leave these topics to future work.
[3] L. Cardelli. A semantics of multiple inheritance. In
G. Kahn, D. MacQueen, and G. Plotkin, editors,
Semantics of Data Types, volume 173 of Lecture
7 Conclusion
Notes in Computer Science, pages 51{67. Springer-
We have describ ed a exible, p olymorphic type sys-
Verlag, 1984. Full version in Information and Com-
tem for extensible records and variants with an e ec-
putation 76(2/3):138{164, 1988.
tive typ e inference algorithm and compilation metho d.
[4] L. Cardelli. Extensible records in a pure calculus of
Prototype implementations have b een written in SML
subtyping. Research rep ort 81, DEC Systems Re-
and Haskell, and a implementation of records has b een
search Center, Jan. 1992. Also in Carl A. Gunter
added to Hugs, an implementation of Haskell 1.3. Our
and John C. Mitchell, editors, Theoretical Aspects
exp erience to date shows that these implementations
of Object-Oriented Programming: Types, Seman-
works well in practice. There are a numb er of areas for
tics, and Language Design (MIT Press, 1994).
further work:
Ob ject systems. Another p otential application for
[5] L. Cardelli and J. Mitchell. Op erations on records.
rows would b e in a typ e system for ob ject-oriented pro-
Mathematical Structures in Computer Science, 1:3{
gramming. For example, a constructor Obj of kind
48, 1991. Also in Carl A. Gunter and John C.
row ! might b e used to describ e ob jects with a given
Mitchell, editors, Theoretical Aspects of Object-
row of metho ds [17, 22, 1, 10, 21, 2].
Oriented Programming: Types, Semantics, and
More sophisticated basic op erations. Our type sys-
Language Design (MIT Press, 1994); available as
tem do es not supp ort record app end [28, 24] or the
DEC Systems Research Center Research Rep ort
database join that was one of the original motivations
#48, August, 1989, and in the pro ceedings of
for Ohori's work [19]. One approach would b e to adopt
MFPS '89, Springer LNCS volume 442.
the ideas of Harp er and Pierce [7], choosing an appro-
[6] L. Damas and R. Milner. Principal type schemes
priate form of evidence for the (r #r ) predicates de-
1 2
for functional programs. In Proceedings of the 9th
scrib ed in Section 1.1.
ACM Symposium on Principles of Programming
A new approach to datatyp es. Languages like SML
Languages, pages 207{212, 1982.
and Haskell already provide facilities for de ning new
datatyp es and these overlap to some extent with the
[7] R. Harp er and B. Pierce. A record calculus based
mechanisms describ ed in this pap er. Unifying and com-
on symmetric concatenation. In Proceedings of
bining these di erent approaches would obviously b e
the 18th Annual ACM Symposium on Principles of
useful, with a long term goal of developing a general
Programming Languages, Orlando FL, pages 131{
framework for extensible datatyp es.
142. ACM, Jan. 1991. Extended version available
as Carnegie Mellon Technical Rep ort CMU-CS-90-
157.
Acknowledgements
[8] R. W. Harp er and B. C. Pierce. Extensible records
This work was supp orted in part by an EPSRC stu-
without subsumption. Technical Rep ort CMU-CS-
dentship 9530 6293. We would also like to thank our
90-102, School of Computer Science, Carnegie Mel-
colleagues, Colin Taylor and Graham Hutton, in the
lon University,Feburary 1990.
functional programming group at Nottingham, for the
valuable contributions that they have made to the work
[9] J. R. Hindley. The principal type scheme of an
describ ed in this pap er.
ob ject in combinatory logic. Transactions of the
American Mathematical Society, 146:29{60, De-
cemb er 1969.
References
[10] M. Hofmann and B. Pierce. A unifying type-
[1] M. Abadi and L. Cardelli. A semantics of ob ject
theoretic framework for ob jects. Journal of Func-
typ es. In In Proceedings of the 9th Symposium on
tional Programming, 1995. Previous versions ap-
Logic in Computer Science, pages 332{341, July
p eared in the Symp osium on Theoretical Asp ects
1994.
of Computer Science, 1994, (pages 251{262) and,
under the title \An Abstract View of Ob jects and
[2] K. B. Bruce. A paradigmatic ob ject-oriented pro-
Subtyping (Preliminary Rep ort)," as University of
gramming language: Design, static typing and 11
Edinburgh, LFCS technical rep ort ECS-LFCS-92- 1993, and as University of Edinburgh technical re-
226, 1992. p ort ECS-LFCS-92-225, under the title \Ob ject-
Oriented Programming Without Recursive Typ es".
[11] M. P. Jones. Quali ed Types Theory and Practice.
Distinguished Dissertations in Computer Science.
[22] D. Remy. Programming ob jects with ML-ART: An
Cambridge University Press, 1994.
extension to ML with abstract and record types. In
M. Hagiya and J. C. Mitchell, editors, Theoretical
[12] M. P. Jones. A system of constructor classes: over-
Aspects of Computer Software, volume 789 of Lec-
loading and implicit higher-order p olymorphism.
ture Notes in Computer Science, pages 321{346.
Journal of Functional Programming, 5(1):1{35,
Springer-Verlag, April 1994.
Jan. 1995. An earlier version app eared in Pro c.
FPCA 1993.
[23] D. Remy. Typ e inference for records in a nat-
ural extension of ML. In C. A. Gunter and
[13] X. Leroy. Polymorphism by name for references
J. C. Mitchell, editors, Theoretical Aspects of
and continuations. In Principles of Programming
Object-Oriented Programming: Type, Semantics,
Languages, pages 220{231. ACM press, 1993.
and Language Design, Foundations of Computing
Series. MIT Press, 1994. Early version app eared
[14] E. Meijer and G. Hutton. Bananas in space: Ex-
in Sixteenth Annual Symp osium on Principles of
tending fold and unfold to exp onential types. In
Programming Languages. Austin, Texas, January
Proc. 7th International Conference on Functional
1989.
Programming and Computer Architecture. ACM
Press, San Diego, California, June 1995.
[24] D. Remy. Typing record concatenation for free.
In C. A. Gunter and J. C. Mitchell, editors, The-
[15] R. Milner. A theory of typ e p olymorphism in pro-
oretical Aspects Of Object-Oriented Programming.
gramming. Journal of Computer and System Sci-
Types, Semantics and Language Design, Founda-
ences, 17:348{375, August 1978.
tions of Computing Series. MIT Press, 1994.
[16] R. Milner, M. Tofte, and R. Harp er. The de nition
[25] J. A. Robinson. A machine-oriented logic based
of Standard ML. The MIT Press, 1990.
on the resolution principle. Journal of the Associ-
[17] J. C. Mitchell, F. Honsell, and K. Fisher. A lamb da
ation for Computing and Machinery, 12(1):23{41,
calculus of ob jects and metho d sp ecialization. In
January 1965.
1993 IEEE Symposium on Logic in Computer Sci-
[26] M. Wand. Complete type inference for simple ob-
ence, June 1993.
jects. In Proceedings of the IEEE Symposium on
[18] A. Ohori. A p olymorphic record calculus and its
Logic in Computer Science, Ithaca, NY, June 1987.
compilation. ACM Transactions on Programming
[27] M. Wand. Corrigendum: Complete type inference
Languages and Systems, 17(6):844{895, Nov. 1995.
for simple ob jects. In Proceedings of the IEEE Sym-
Preliminary version in Pro ceedings of ACM Sym-
p osium on Principles of Programming Languages, posium on Logic in Computer Science, 1988.
1992, under the title, A compilation metho d for
[28] M. Wand. Typ e inference for record concatenation
ML-style p olymorphic record calculi.
and multiple inheritance. Information and Com-
[19] A. Ohori and P. Buneman. Typ e inference in a
putation, (93):1{15, 1991. Preliminary version ap-
database programming language. In Proceddings
p eared in Pro c. 4th IEEE Symp osium on Logic in
of the ACM Conference on LISP and Functional
Computer Science, 1989, 92{97.
Programming, pages 174{183. ACM, 1988.
[29] A. Wright. Simple imp erative p olymorphism. Lisp
[20] J. Peterson and K. Hammond. Rep ort on the Pro-
and Symbolic Computation, 8(4):343{356, Decem-
gramming Language Haskell, A Non-strict, Purely
b er 1995.
Functional Language (Version 1.3). Technical Re-
p ort YALEU/DCS/RR-1106, Yale University, De-
partment of Computer Science, May 1996.
[21] B. C. Pierce and D. N. Turner. Simple type-
theoretic foundations for ob ject-oriented pro-
gramming. Journal of Functional Programming,
4(2):207{247, Apr. 1994. A preliminary version ap-
p eared in Principles of Programming Languages, 12