<<

A Polymorphic Typ System for Extensible Records and Variants

Benedict . Gaster and Mark P. Jones

Technical rep ort NOTTCS-TR-96-3, November 1996

Department of Science, University of Nottingham,

University Park, Nottingham NG7 2RD, England.

fbrg,[email protected]

Abstract oard, and another indicating a mouse click at a par-

ticular p oint on the screen:

Records and variants provide exible ways to construct

Event = Char + Point :

datatyp es, but the restrictions imp osed by practical

typ e systems can prevent them from b eing used in ex-

These de nitions are adequate, but they are not par-

ible ways. These limitations are often the result of con-

ticularly easy to work with in practice. For example, it

cerns ab out eciency or typ e inference, or of the di-

is easy to confuse datatyp e comp onents when they are

culty in providing accurate typ es for key op erations.

accessed by their p osition within a pro duct or sum, and

This pap er describ es a new typ e system that reme-

programs written in this way can b e hard to maintain.

dies these problems: it supp orts extensible records and

To avoid these problems, many programming lan-

variants, with a full complement of p olymorphic op era-

guages allow the comp onents of pro ducts, and the al-

tions on each; and it o ers an e ective al-

ternatives of sums, to b e identi ed using names drawn

gorithm and a simple compilation metho . It is a prac-

from some given of labels. Lab elled pro ducts are

tical system that can b e understo o d and implemented

more commonly known as records or structs, while la-

as a natural extension of languages like Standard ML

b elled sums are p erhaps b etter known as variants or

and Haskell. In an area that has already received a

unions. For example, the Date and Event datatyp es

great deal of attention from the research community,

describ ed ab ove might b e de ned more attractively as:

the typ e system describ ed here is the rst to combine

all of these features in a practical framework.

Date = Rec fjday :Int ; month : Int ; year :Int jg

One imp ortant asp ect of this work is the emphasis

Event = Var fjkey :Char ; mouse : Point jg:

that it places on the use of rows in the construction of

This notation captures a common feature in the con-

record and variant typ es. As a result, with only small

struction of record and variant types, using rows of the

extensions of the core typ e system, we are able to intro-

form fjl : ;:::; l : jg to describ e mappings that as-

duce new, p owerful op erations on records using features

1 1 n n

so ciate a type  with each of the (distinct) lab els l .

such as row p olymorphism and rst-class lab els.

i i

Record types are obtained by preceding rows with the

symb ol Rec . Variant types are constructed using Var .

1 Intro duction

For example, if r = fjl : ;:::; l : jg and e ,::: ,e have

1 1 n n 1 n

types  ,::: , , resp ectively, then we can form a record

1 n

Pro ducts and sums play fundamental roles in the con-

(l = e ;:::; l = e ) of type Rec r , or distinct vari-

1 1 n n

struction of datatyp es: pro ducts describ e grouping of

ants, hl = e i;:::; hl = e i, each of type Var r . Thus,

1 1 n n

data items, while sums express choices b etween alter-

0 0

(day = 25; month = 12; year = 1996) and hkey = a i

natives. For example, we might represent a date as a

represent values of type Date and Event , resp ectively.

pro duct, with three comp onents sp ecifying the

day, month, and year:

1.1 Polymorphism and extensibility

Date = Int  Int  Int :

Unfortunately, practical languages are often less exi-

ble in the op erations that they provide to manipulate

For a simple example of a sum, consider the type of

records and variants. For example, many languages|

input events to a window system, with one alternative

from to Standard ML (SML)|will only allow the pro-

indicating that a has b een entered on the key-

grammer to select the l comp onent, r :l , from a record r

1

selection op erator ( if the typ e of r is uniquely determined at compile-time . :l ) has type:

These languages do not supp ort polymorphic op erations

8 :8r :Rec fjl : j r jg ! :

on records|such as a general selector function ( :l ) that

will extract a value from any record that has an l eld.

However, the op erations and types in Wand' system

A further weakness in many of these languages is that

are unchecked; for example, extending a row with an l

they provide no real supp ort for extensibility; there are

eld may either add a completely new eld, or replace

no general op erators for adding a eld, removing a eld,

an existing eld lab elled with l . As a result, some pro-

renaming a eld, or replacing a eld (p ossibly with a

grams do not have principal types [27].

value of a di erent typ e) in a record value.

There have b een many previous attempts to design

Flags: Remy has developed a exible treatment for

typ e systems for records and variants that supp ort p oly-

extensible records and variants in a natural extension of

morphism and extensibility. Such work is imp ortant,

ML [23]. A key feature of his system is the use of ags

not just in its own right, but also in its application to

to enco de b oth p ositive and negative information|that

the study of ob ject-oriented or database programming

is, to indicate which elds must b e present, and which

languages where these facilities seem particularly use-

must b e absent. Again, a concept of row variables is

ful. We will summarize the key features of some of these

used to deal with other elds whose presence or absence

earlier systems here b efore describing our own prop osal.

is not signi cant in a particular situation. For example,

the selection op erator has type:

Subtyping: is one of the most widely used

techniques for building typ e systems for records and

8 :8r :Rec fjl : pre ( ) j r jg ! ;

variants [3, 5, 4, 21]. We can de ne a subtyping relation

by sp ecifying that a row r is a subrow of r , written

1 2

where pre ( ) is a ag indicating the presence of a eld

r  r , if r contains all the elds of r , and p ossibly

1 2 1 2

of type , and r is a row variable representing the rest

more. The intuition here is that, for example, a record

of the record. Intuitively, records in Remy's system

of typ e Rec r could b e used in any context where a

1

are represented by tuples with a slot for each p ossible

value of typ e Rec r is exp ected, and conversely, that a

2

lab el. The prevents access to unde ned

variant of typ e Var r can b e substituted in any context

2

comp onents, but do es not lead directly to a simple and

where a value of typ e Var r is required. In particular,

1

ecient implementation.

:l ) can b e treated as a function the selection op erator (

of typ e:

Predicates: Harp er and Pierce [8, 7] studied type

8 :8r  fjl : jg:Rec r ! :

systems for extensible records using predicates on types

This op eration is implemented, at least conceptually,

to capture information ab out presence or absence of

by co ercing from Rec r to a known type|the singleton

elds, and to restrict attention to checked op erations.

Rec fjl : jg|and then extracting the required eld. One

For example, writing r #r for the assertion that the

1 2

weakness of this approach is that information ab out the

rows r and r have disjoint sets of lab els, the selection

1 2

other elds in the record is lost, so it is harder to de-

op erator op erator ( :l ) has type:

scrib e op erations like record extension. For example,

8 :8r :(r #fjl : jg) ) Rec (r fjl : jg) ! ;

observing that b ounded quanti cation is not by itself

sucient, Cardelli and Mitchell [5] used an overriding

where r k r is the row obtained by merging r and r ,

1 2 1 2

op erator on typ es to overcome this problem.

and is only de ned if r #r . Harp er and Pierce's work

1 2

do es not deal with variants, type inference, or compila-

Row extension: Motivated by studies of ob ject-

tion, and do es not provide an op erational interpretation

oriented programming, Wand [26] intro duced the con-

of predicates. However, their approach to record typing

cept of row variables to allow incremental construction

was one of the motivating examples in Jones' work on

of record and variant typ es. For example, a record of

quali ed types [11] where a general framework for type

typ e Rec fjl :  j r jg has all of the elds of a record of

inference and compilation was developed, including a

typ e Rec r , together with a eld l of type  . Wand

type system for records as a sp ecial case. One of the

did not discuss compilation, but his approach supp orts

achievements of the present pap er is to re ne and ex-

b oth p olymorphism and extensibility.For example, the

tend that work to a practical system, avoiding problems

1

In implementations using b oxed representations for values, only

such as the lack of most general uni ers in Jones' full

the set of lab els in r is needed; the actual comp onent types are not

system|the result of including record restriction in the

required.

type language. 2

Kinds: Ohori [18] describ ed a type system that ex- 2 Overview

tends SML with p olymorphic op erations on b oth records

Both record and variant types are de ned in terms of

and variants. Signi cantly, Ohori also presented a sim-

rows, and these are constructed by extension, starting

ple and e ective compilation metho d: input programs

from the empty row, fjjg. It is convenient to intro duce

are translated into a target language that adds extra pa-

abbreviations for rows obtained in this way:

rameters to sp ecify eld o sets. In fact, the end result

is much the same as that suggested by Jones' work on

fjl : ;:::; l : j r jg = fjl : j ::: fjl : j r jg ::: jg

1 1 n n 1 1 n n

quali ed typ es, even though the two approaches were

fjl : ;:::; l : jg = fjl : ;:::; l : j fjjgjg:

1 1 n n 1 1 n n

developed indep endently. But Ohori's work di ers sub-

stantially from other systems in its use of a system;

Note, however, that we treat rows, and hence record or

this allows variables ranging over record types to b e an-

variant types, as equals if they include the same elds,

notated with a sp eci cation of the elds that they are

regardless of the order in which those elds are listed.

exp ected to contain. For example, the selection op era-

tor op erator ( :l ) has typ e:

2.1 Basic op erations

fjl : jg

8 :8r :Rec r ! :

Intuitively, a record of type Rec fjl : j r jg is like a pair

The main limitation of Ohori's typ e system is its lack

whose rst comp onent is a value of type , and whose

of supp ort for extensibility.

second comp onent is a record of type Rec r . This mo-

tivates our choice of op erations on records, which

corresp ond directly to the two pro jections and the pair-

1.2 This pap er

ing constructor for pro ducts in category theory or logic.

The typ e system describ ed in this pap er combines many

There is, however, one complication; we do not allow re-

of the ideas that have b een used in previous work into a

p eated uses of any lab el within a particular row, so the

practical typ e system for implicitly typed languages like

expression fjl : j r jg is only valid if l do es not app ear

SML [16] and Haskell [20]. In particular, it supp orts

in r . This is re ected by pre xing each of the types b e-

p olymorphism and extensibility, records and variants,

low with a predicate (r nl ), pronounced \r lacks l ", that

typ e inference, and compilation. The type system is

restricts instantiation of r to rows without an l eld:

an application of quali ed typ es, extended to deal with

2

 Selection: to extract the value of a eld l :

a general concept of rows. Positive information ab out

the elds in a given row is captured in the type lan-

:l ) :: (r nl ) ) Rec fjl : j r jg ! : (

guage using row extension, while negative information

is re ected by the use of predicates.

 Restriction: to remove a eld lab elled l :

The most obvious b ene of this approach is that

we can adapt results and prop erties from the general

( l ) :: (r nl ) ) Rec fjl : j r jg ! Rec r :

framework of quali ed typ es|such as the type infer-

ence algorithm and the compilation metho d|without

 Extension: to add a eld l to an existing record:

having to go back to rst principles. The result is a con-

siderable simpli cation of b oth the overall presentation

j ) :: (r nl ) ) ! Rec r ! Rec fjl : j r jg: (l =

and of sp eci c pro ofs. Another imp ortant advantage of

this approach is that it guarantees compatibility with

We can use these basic op erations to implement a num-

other applications of quali ed typ es. For example, our

b er of additional op erators, including:

typ e system can b e used|and indeed, has already b een

used in our prototype implementation|in conjunction

 Up date/replace: to up date the value in a particu-

with the typ e class mechanisms of Haskell.

lar eld, p ossibly with a value of a di erent type:

The remaining sections of this pap er are as follows.

Section 2 provides a general overview of our new type

(l := j ) :: (r nl ) ) ! Rec fjl : j r jg

system, with a more detailed formal presentation in Sec-

! Rec fjl : j r jg

tion 3. This is followed by discussions of type inference

(l := x j r ) = (l = x j r l )

in Section 4 and compilation in Section 5. With some

2

To simplify the notation, we assume implicit universal quanti -

small extensions to the core system, Section 6 shows

cation over free type variables. For example, the full type of the

how our framework can b e used to supp ort some new,

selection op erator is 8r :8 :(r nl ) ) Rec fjl : j r gj ! .

p owerful op erations on records and variants using row

p olymorphism and lab els as rst-class values. Finally,

Section 7 concludes with p ointers to further work. 3

 Renaming: to change the lab el attached to a par- the most frequently used basic op eration. A naive ap-

ticular eld: proach would b e to represent a record by an asso ciation

list, pairing lab els with values. This would allow simple

[l m ] :: (r nl ; r nm ) ) Rec fjl : j r jg

implementations for each of the basic op erations, with

! Rec fjm : j r jg

the type system providing a guarantee that the search

r [l m ] = (m = r :l j r l )

for any given lab elled eld would not fail. A ma jor dis-

advantage is that it do es not allow constant time access

The empty record, (), plays an imp ortant role as the

to record comp onents.

only prop er value of typ e Rec fjjg. Again, it is convenient

To avoid these problems, we will assume instead that

to intro duce abbreviations for the construction of record

a record value is represented by a contiguous blo ck of

values by rep eated extension:

memory that contains a value for each individual eld.

To select a particular comp onent r :l from a record r ,

(l =e ;:::; l =e j r ) = (l = e j ::: (l =e j r ) :::)

1 1 n n 1 1 n n

we need to know the o set of the l eld in the blo ck of

(l =e ;:::; l =e ) = (l = e ;:::; l =e j ())

1 1 n n 1 1 n n

memory representing r . Languages without p olymor-

phic selection will usually only allow an expression of

We can sp ecify the basic op erations on variants in a sim-

the form r :l if the o set value, and hence the structure

ilar way. Again, they corresp ond closely to the standard

or even the full type of r , is known at compile-time.

op erations on sums in category theory or logic:

However, it is not actually necessary to know the

p osition of every eld at compile-time; instead, we can

 Injection: to tag a value with the lab el l :

treat unknown o sets as implicit parameters whose val-

ues will b e supplied at run-time when the full types of

i :: (r nl ) ) ! Var fjl : j r jg: hl =

the records concerned are known. This is essentially

the compilation metho d that was used by Ohori [18],

 Emb edding: to emb ed a value in a

and also suggested, indep endently, by Jones [11]. If

that also allows for a new lab el, l :

we forget ab out typing issues for a moment and as-

hl j i :: (r nl ) ) Var r ! Var fjl : j r jg: sume that records are implemented as arrays or tu-

:l ) op erator can b e implemented by a ples, then the (

function i :r :r [i ], using the extra parameter i to sup-

 Decomp osition: to act on the value in a variant,

ply the o set of l in r . For example, the expression:

according to whether or not it is lab elled with l :

(day = 25; month = 12; year = 1996):day can b e imple-

(l 2 ? : ) :: (r nl ) ) Var fjl : j r jg

mented by compiling it to:

! ( ! )

(i :r :r [i ]) 0 (25; 12; 1996)

! (Var r ! )

! :

which evaluates to 25, as exp ected. Of course, there are

run-time overheads in calculating and passing o set val-

The empty variant, hi, is the only value of type Var fjjg.

ues as extra parameters. However, an attractive feature

More sophisticated language constructs, for exam-

of our system is that these costs are only incurred when

ple, pattern matching facilities, or record up date, are

the extra exibility of p olymorphic selection is required.

easily describ ed using the op erations listed here. In ad-

Each predicate (r nl ) in the type of a function signals

dition, we exp ect that practical implementations will

the need for an extra run-time parameter to sp ecify the

use, but not display predicates implied by the context

o set at which a eld lab elled l would b e inserted into

in which they app ear. For example, all of the types

a record of type Rec r . Obviously, the same o set can

ab ove include a row fjl : j r jg that is only valid if r nl ;

also b e used to lo cate or remove the l eld from a record

so displaying this predicate is, in some sense, redun-

of type Rec fjl : j r jg, or treated as ordinal numb ers to

dant. However, as we will see in the next section, this

access and tag values in a variant. So, this one extra

predicate plays a central role in the implementation of

piece of information is all that we need to implement

the basic op erations.

the basic op erations.

Op erations like record extension and restriction will,

in general, b e implemented by copying. Optimizations

2.2 Implementation details

can b e used to combine multiple extensions or restric-

Our next task is to explain how the data structures and

tions of records, avoiding unnecessary allo cation and

op erations describ ed ab ove can b e implemented. We

initialization of intermediate values. For example, a

will fo cus on the treatment of records and, in particular,

can generate co de that will allo cate and ini-

.l), which is probably the implementation of selection, (

tialize the storage for a record (x = 1; y = 2; z = 3) in 4

a single step, rather than a sequence of three individ- morphism and overloading. In the current appli-

ual allo cations and extensions as a naive interpretation cation, we use constraints to capture assumptions

might suggest. ab out the o ccurrences of lab els within rows.

The typ echecker gathers and simpli es the predi-

 A higher-order version of the Hindley-Milner type

cates generated by each use of an op erator on records or

system [9, 15, 6], originally intro duced in the study

variants. For example, if today is a value of type Date ,

of constructor classes [12]. Amongst other things,

then an expression like today :month will generate a sin-

this provides a simple way to intro duce the new

gle constraint, fjday : Int ; year : Int jgnmonth . Predi-

constructs for rows, records, and variants without

cates like this, involving rows whose structure is known

the need for sp ecial, ad-ho c syntax.

at compile-time, are easily discharged by calculating the

We split the presentation into sections: kinds (Sec-

appropriate o set value. Obviously, a compiler can use

tion 3.1), types and constructors (Section 3.2), predi-

this information to pro duce ecient co de by inlining

cates (Section 3.3), and typing rules (Section 3.4).

and sp ecializing the selector function, ( :month ).

Predicates that are not discharged within a section

of co de will, instead, b e re ected in the type assigned

3.1 Kinds

to it. For example, there is nothing in the following

de nition to indicate the full typ e of d : One of the most imp ortant asp ects of the work de-

scrib ed here is the use of a kind system to distinguish

newYear d = d :day = 1 ^ d :month = 1;

b etween di erent kinds of . Formally,

the set of kinds is sp eci ed by the following grammar:

so the inferred typ e will b e:

 ::=  the kind of al l types

(r nday ; r nmonth ) )

j row the kind of al l rows

Rec fjday : Int ; month : Int j r jg ! Bool :

j  !  function kinds

1 2

We would not exp ect this de nition to have b een ac-

Intuitively, the kind  !  represents constructors

1 2

cepted at all by a compiler for SML which requires the

that take something of kind  and return something of

1

set of lab els in a record to b e uniquely determined by

kind  . The row kind is new to the system presented

2

`program context'. But the meaning of this phrase is

here and was not part of the type system used in the

de ned only lo osely by an informal note in the de -

development of constructor classes.

nition of SML [16]. Now, with the ideas used in this

pap er, there is a way to make this precise: a de ni-

tion is only acceptable in SML if the inferred type do es

3.2 Types and constructors

not contain any predicates. For programs written with

For each kind , we have a collection of constructors

these restrictions, a language based on our type system

 

C (including variables ) of kind :

should o er the same levels of p erformance as SML.

It is p ossible that our more general treatment of

 

C ::= constants

record op erations could result in compiled programs



j variables

0 0

that are littered with unwanted o set parameters; ex-

 ! 

j C C applications

p erience with our prototype implementations will help



 ::= C types

to substantiate or dismiss these concerns. In any case,

there are simple steps that can b e taken to avoid such

The usual collection of types, represented here by the

problems. For example, a compiler might reject any

symb ol  , is just the constructors of kind . For the pur-

de nition with an inferred typ e containing predicates,

p oses of this pap er, we assume that the set of constant

unless an explicit typ e signature has b een given to sig-

constructors includes at least the following, writing ::

nal the programmer's acceptance. This is closely related

to indicate the kind  asso ciated with each constant :

to the monomorphism restriction in Haskell [20] and to

! ::  !  !  function space

prop osals for a value restriction in SML [29, 13].

fjjg :: row empty row

j jg ::  ! row ! row extension, for each l fjl :

3 Formal presentation

Rec :: row !  record construction

Var :: row !  variant construction

This section provides a formal presentation of our type

For example:

system, based on two particular ingredients:

 The theory of quali ed typ es [11], which provides  The result of applying the function space construc-

0

a general framework for describing restricted p oly- tor ! to two types  and  is the type of functions 5

0 0

from  to  , and is written as  !  in more con- example, is only valid if the row r do es not contain a

ventional notation. eld lab elled with l .

One way to achieve this is to use a more sophisticated

 The result of applying the Rec constant to the

kind system, with sets of lab els, L, as kinds instead of

empty row fjjg of kind row is the type Rec fjjg of

the single row kind. For example, rows with eld lab els

kind .

l ,::: l can b e represented by the kind L = fl ;:::; l g;

1 n 1 n

this is essentially the approach adopted by Ohori [18].

 The result of applying an extension constructor

Unfortunately, this b ecomes much more complicated if

fjl : j jg to a typ e  and a row r is a row, usually

we try to extend it to deal with extensible rows. In

written as fjl : j r jg, obtained by extending r with a

particular, we would need to assign whole families of

eld lab elled l of typ e  . Note that we include an

kinds, indexed by lab el sets, L, to some of the construc-

extension constructor for each di erent lab el l . To

tor constants intro duced in the previous section:

avoid problems later, we will also need to prohibit

partial application of extension constructors.

fjl : j jg ::  ! L ! (L [ fl g) l 62 L

Rec ; Var :: L ! 

The kind system is used to ensure that type expressions

are well-formed. While it is sometimes convenient to an-

The alternative that we adopt in this pap er is based

notate individual constructors with their kinds, there is

on the theory of quali ed types [11], using predicates to

no need in practice for a programmer to supply these

capture any side conditions that are required to ensure

annotations. Instead, they can b e calculated automati-

that a given type expression is valid. In fact, only a

cally using a simple kind inference pro cess [12].

single form of predicate is needed for this purp ose:

We consider two rows to b e equivalent if they include

the same elds, regardless of the order in which they are

row

 ::= C nl predicates

listed. This is describ ed formally by the equation:

Intuitively, the predicate r nl can b e read as an asser-

0 0 0 0

fjl : ; l : j r jg = fjl : ; l : j r jg;

tion that the row r do es not contain an l eld. More

precisely, we explain the meaning of predicates using

and extends in the obvious way to an equality on arbi-

the entailment relation de ned in Figure 1. A deriva-

trary constructors.

For the purp oses of later sections, we de ne a mem-

bership relation, (l :  )2 r , to describ e when a particular

0

eld (l :  ) app ears in a row r :

P `` r nl l 6= l

P [  g ``  P `` fjjgnl

0

P `` fjl :  j r jgnl

(l :  )2 r

0

(l 6= l )

(l :  )2 fjl :  j r jg

0 0

(l :  )2 fjl :  j r jg

Figure 1: Predicate entailment for rows.

and a restriction op eration, r l , that returns the row

obtained from r by deleting the eld lab elled l :

tion of P ``  from these rules can b e understo o d as

a pro of that, if all of the predicates in the set P hold,

fjl :  j r jg l = r

then so do es  . It is easy to prove that the relation ``

0 0

fjl :  j r jg l = fjl :  j r l jg:

is well-de ned with resp ect to equality of constructors.

It is easy to prove that these op erations are well-de ned

with resp ect to the equality on constructors, and to

3.4 Typing rules

con rm intuitions ab out their interpretation by showing

Following Damas and Milner [6], we distinguish b etween

that, if (l :  )2 r , then r = fjl :  j r l jg.

the simple types,  , describ ed ab ove, and type schemes,

 , describ ed by the grammar b elow:

3.3 Predicates

 ::=  j 8 : type schemes

0

The syntax for rows allows examples like fjl :  ; l :  jg

 ::=  j  )  quali ed types

where a single lab el app ears in more than one eld.

While this might b e useful in some applications, it is not

Restrictions on the instantiation of universal quanti-

appropriate for work with records or variants where we

ers, and hence on p olymorphism, are describ ed by en-

allow at most one eld with any given lab el. Clearly,

co ding the required constraints as a set of predicates,

some additional mechanisms are needed to enable us

P , in a quali ed type of the form P )  . The set of

to sp ecify that a typ e of the form Rec fjl :  j r jg, for

free type variables in a ob ject X is written as TV (X ). 6

The term language is just core-ML, an implicitly feature is the intro duction of inserters in Section 4.1 to

typ ed -calculus, extended with constants and a let account for non-trivial equalities b etween row expres-

construct, and describ ed by the following grammar: sions during uni cation.

E ::= x j c j EF j x :E j let x = E in F

4.1 Uni cation and insertion

We assume that the set of constants c includes the op-

Uni cation is a standard to ol in type inference, and is

erations and values required for manipulating records

used, for example, to ensure that the formal and actual

and variants as describ ed in Section 2, and that each

parameters of a function have the same type. Formally,

constant c is assigned a closed typ e scheme,  .

c

3 0 

a substitution S is a uni er of constructors C ; C 2 C

The typing rules are presented in Figure 2. A judge-

0 0

if SC = SC , and is a most general uni er of C and C

ment of the form P j A ` E :  represents an assertion

if every uni er of these two constructors can b e written

that, if the predicates in P hold, then the term E has

in the form RS , for some substitution R .

typ e  , using assumptions in A to provide types for free

The rules in Figure 3 provide an algorithm for cal-

variables. These are just the standard rules for quali ed

id

(id ) C  C

(const ) P j A ` c : 

c

9

[C = ]

(x :  ) 2 A

=

 C

(var )

(bind ) 62 TV (C )

P j A ` x : 

[C = ]

;

C 

0 0

P j A ` E :  !  P j A ` F : 

(!E )

0

U U

0 0

P j A ` EF : 

C  C UD  UD

(apply )

0

0

P j A ; x :  ` E : 

U U

x

0 0

CD  C D

(!I )

0

P j A ` x :E :  ! 

I

U

0 0

(l :  )2 r Ir  (Ir l )

P j A ` E :  )  P `` 

(row )

()E )

UI

0

fjl :  j r jg  r

P j A ` E : 

P [ f g j A ` E : 

()I )

Figure 3: Kind-preserving uni cation.

P j A ` E :  ) 

U

P j A ` E : 8 :

0

culating uni ers, writing C  C for the assertion that

(8E )

0 

P j A ` E :[ = ]

U is a uni er of the constructors C ; C 2 C . The

rst three rules are standard [25], and are even suit-

P j A ` E :  62 TV (A) [ TV (P )

able for unifying two row expressions that list exactly

(8I )

P j A ` E : 8 :

the same comp onents with exactly the same ordering in

each. But the fourth rule, (row ), is needed to deal with

P j A ` E :  Q j A ; x :  ` F : 

x

the more general problems of row uni cation.

(let )

P ; Q j A ` (let x = E in F ): 

To understand how this rule works, consider the task

0 0 0 0

of unifying two rows fjl :  j r jg and fjl :  j r jg, where l ,l

0

are distinct lab els, and r ,r are distinct row variables.

Figure 2: Typing rules.

Our goal is to nd a substitution S such that:

typ es [11], extending the rules of Damas and Milner [6]

fjl : S  j Sr jg = S fjl :  j r jg

to account for the use of predicates. Note that uses of

0 0 0

= S fjl :  j r jg

the symb ols  , , and  in these rules is particularly

0 0 0

= fjl : S  j Sr jg:

imp ortant in restricting their application to particular

classes of typ es or typ e schemes.

Clearly, the row on the left includes an (l : S  ) eld,

0 0

while the last row on the right includes an (l : S  )

eld. If these two types are to b e equal, then we must

4 Type Inference

3

For the purp oses of this pap er, we restrict our attention to kind-

This section provides a formal presentation of a type in-

preserving substitutions; that is, to substitutions that map variables



2 C to constructors of the corresp onding kind, .

ference algorithm for our system. The most imp ortant 7

choose the substitution S so that it will `insert' the miss-

0

ing elds into the two rows r and r , resp ectively. In

(x : 8 :P )  ) 2 A new

i i

0 0

W

this particular case, assuming that r ; r 62 TV ( ;  ),

(var )

W

[ = ]P j A ` x :[ = ]

i i i i

then we can choose:

0 0 00 00 0

S = [fjl :  j r jg=r ; fjl :  j r jg=r ]

W W

0 0

P j TA ` E :  Q j T TA ` F : 

00

U

where r is a new typ e variable.

0 0

T    ! new

W

More generally, we will say that a substitution S is

(!E )

W

0 0

row

U (T P [ Q ) j UT TA ` EF : U

an inserter of (l :  ) into r 2 C if (l : S  ) 2 Sr . S

is a most general inserter of (l :  ) into r if every such

inserter can b e written in the form RS , for some substi-

W

P j T (A ; x : ) ` E :  new

x

W

tution R . The rules in Figure 4 de ne an algorithm for

(!I )

W

I

P j TA ` x :E : T ! 

calculating inserters, writing (l :  )2 r for the assertion

row

that I is an inserter of (l :  ) into r 2 C . Note that

W

P j TA ` E :   = Gen (TA; P )  )

W

0 0 0

0

P j T (TA ; x :  ) ` F : 

[fjl :jr jg=r ]

x

0

W

(inVar ) (l :  ) 2 r ; r 62 TV ( ); r new

(let )

W

0 0 0

P j T TA ` (let x = E in F ): 

I

0

(l :  )2 r l 6= l

(inTail ) Figure 5: Typ e inference algorithm W.

I

0

(l :  )2 fjl :  j r jg

to calculate the generalization of a quali ed type  with

U

0

  

resp ect to a type assignment A. This is de ned as:

(inHead )

U

(l :  ) 2 fjl :  j r jg

Gen (A; ) = 8 :; where f g = TV ()nTV (A):

i i

The type inference algorithm is b oth sound and com-

Figure 4: Kind-preserving insertion. plete with resp ect to the original typing rules.

Theorem 2 The algorithm described by the rules in

the last rule here, (inHead ), makes use of the uni ca-

Figure 5 can be used to calculate a principal type for

tion algorithm in Figure 3, so the two algorithms are

a given term E under assumptions A. The algorithm

mutually recursive. The imp ortant prop erties of the

fails precisely when there is no typing for E under A.

two algorithms|b oth soundness and completeness|

are captured in the following result:

5 Compilation

Theorem 1 The uni cation (insertion) algorithm de-

ned by the rules in Figure 3 (Figure 4) calculates most-

Previously, we have describ ed informally how programs

general uni ers (inserters) whenever they exist. The al-

involving op erations on records and variants can b e

gorithm fails precisely when no uni er (inserter) exists.

compiled and executed using a language that adds extra

parameters to supply appropriate o sets. This section

shows how this pro cess can b e formalized, including the

4.2 A typ e inference algorithm

calculation of o set values.

Given the uni cation algorithm describ ed in the pre-

vious section, we can use the typ e inference algorithm

5.1 Compilation by translation

for quali ed typ es [11] as a typ e inference algorithm for

the typ e system presented in this pap er. For complete-

In the general treatment of quali ed types [11], pro-

ness, we include a de nition of the algorithm using the

grams are compiled by translating them into a language

rules in Figure 5. Following Remy [23], these rules can

that adds extra parameters to supply evidence for pred-

b e understo o d as an attribute grammar; in each typ-

icates app earing in the types of the values concerned.

W

ing judgement P j TA ` E :  , the type assignment

The whole pro cess can b e describ ed by extending the

A and the term E are inherited attributes, while the

typing rules to use judgements of the form:

predicate assignment P , typ e  , and substitution T are

0

W

synthesized. The (let ) rule uses an auxiliary function

P j A ` E ; E :  8

This tells us that we need to supply suitable evidence which include b oth the original source term E and a

0

e in the translation of any program whose type is qual- p ossible translation, E . A further change here is the

i ed by a predicate  . The second rule is analogous to switch from predicate sets to predicate assignments; the

function abstraction, and allows us to move constraints symb ol P used ab ove represents a set of pairs (v :  )

from the predicate assignment P into the inferred type: in which no variable v app ears twice. Each variable v

corresp onds to an extra parameter that will b e added

0

P [ fv :  g j A ` E ; E : 

during compilation; v can b e used whenever evidence

0

for the corresp onding predicate  is required in E .

0

P j A ` E ; v :E :  ) 

In the current setting, predicates are expressions of

These two rules are direct extensions of ()E ) and ()I )

the form (r nl ) whose evidence is the o set in r at which

in Figure 2, and combined with simple extensions of the

a eld lab elled l would b e inserted. The calculation of

other rules there, we can construct a translation for any

evidence is describ ed by the rules in Figure 6, which

term in the source calculus.

P [ fv :  g `` v : 

6 Extensions

(

The type system that we have describ ed in the previous

0

P `` e :(r nl )

e ; l < l

sections o ers a exible, but fairly conventional set of

m =

0

0

P `` m :(fjl :  j r jgnl )

e + 1; l < l

op erations on records and variants. In this section we

consider two extensions to the original system. Section

6.1 describ es a generalization of the record and variant

P `` 0 : (fjjgnl )

op erations to include, amongst others, rst-class exten-

sible case statements. Section 6.2 shows how the system

can b e mo di ed to allow lab els to b e treated as rst-

Figure 6: Predicate entailment for rows with evidence.

class values.

are direct extensions of the earlier rules for predicate

entailment that were given in Figure 1. Intuitively, a

6.1 Row p olymorphism

derivation of P `` e :  tells us that we can use e as

Working with a general notion of rows has provided us

evidence for the predicate  in any environment where

with an elegant way to deal with the common structure

the assumptions in P are valid. The second rule is the

in record and variant types. However, we have not seen

most interesting and tells us how to nd the p osition at

0

any comp elling examples in the previous sections where

which a lab el l should b e inserted in a row fjl :  j r jg:

it was essential to consider rows separately from records

0

 If l comes b efore l in the total ordering, <, on

and variants; we could have just de ned completely in-

lab els, then the required o set will b e the same as

dep endent sets of record and variant types.

the o set e of l in r .

However, there are some applications in which the

0 ability to separate rows from records and variants o ers

 But, if l comes b efore l , then we need to use an

0 signi cant b ene ts. To illustrate this, consider again

o set of e + 1 to account for the insertion of l .

the basic op erations that were discussed in Section 2.

In general, these rules calculate o sets that are either a

For example, if we generalize the rules in category the-

xed natural numb er, or a xed o set from one of the

ory or logic for decomp osing a sum to deal with n -ary

variables in P . For simplicity, we have assumed a b oxed

sums, then we obtain the following rule:

representation in which all record comp onents o ccupy

the same amount of storage. It is actually quite easy to

A ! C ::: A ! C

1 n

:

allow for varying comp onent sizes by replacing e + 1 in

A + ::: + A ! C

1 n

the calculation ab ove with e + size ( ).

For reasons of space, we omit the complete descrip-

In terms of records and variants, this rule provides a

tion of translation from this pap er, and restrict our-

metho d for decomp osing a variant|represented by the

selves instead to describing the two rules that account

sum A + ::: + A in the conclusion|using a record

1 n

for the use and intro duction of o set parameters. The

of functions|represented by the hypotheses A ! C .

i

rst of these is a variation on function application:

This suggests a general op eration for variant elimina-

tion:

0

P j A ` E ; E :  )  P `` e : 

0

plusElim :: 8 :8r :Rec (to r ) ! Var r ! :

P j A ` E ; E e :  9

The to  r construct used here is de ned as follows: kind system that distinguishes b etween empty and non-

empty rows. We leave further investigation of this topic

to  fjjg = fjjg

to future work.

0 0

to  fjl :  j r jg = fjl :  !  j to  r jg:

This b ehaves like a particular kind of map op eration on

6.2 First-class lab els

0

rows, replacing each comp onent typ e  in r with a type

0 In previous sections of the pap er, we have considered

of the form  !  .

the lab els used to refer to elds as part of the basic

For an example of where such an op eration may

syntax of our language. As a result, we had to describ e

prove useful, consider the typ e of integer lists that can

selection from a record using a family of functions:

b e obtained as a xp oint of the following functor [14]:

:l ) :: (r nl ) ) Rec fjl : j r jg ! ; (

data L l = L (Var fjnil : Rec fjjg;

: Rec fjtl : l ; hd : Int jgjg)

with one function for each choice of lab el l . A more

attractive approach might b e to allow primitive op era-

The sum of a list of can b e calculated using a

tions on records and variants to b e parameterized over

general catamorphism:

lab els. We can extend the type system describ ed in pre-

cata ((L v ):plusElim (nil = ():0;

vious sections to accomplish this, treating selection, for

cons = r :r :hd + r :tl ) v ):

example, as a single function of type:

From this example, it is clear that plusElim is an alter-

: ) :: (r nl ) ) Rec fjl : j r jg ! Label l ! ; (

native to the case construct of languages like Haskell

and SML. However, unlike these languages, it is not a

This requires an extension of the kind system in Sec-

builtin part of the syntax; instead, it allows us to treat

tion 3.1 with a new kind lab , and also a new type con-

case constructs as rst-class, extensible values.

stant Label of kind lab ! . Intuitively, each type of

In a similar way, we can adapt the other rules for

the form Label l contains a unique lab el value. The

constructing sums, and for decomp osing or constructing

l parameter is imp ortant b ecause it establishes a con-

pro ducts, to functions involving records and variants.

nection b etween types and lab el values; a nullary Label

The full set of op erations are sp eci ed by the following

type would not have provided any way for us to ex-

4

typ e signatures :

press the necessary typing constraints. We can also dis-

p ense with the family of extension constructors de ned

plusElim :: 8 :8r :Rec (to r ) ! Var r !

in Section 3.2, replacing them with a single constructor

plusIntro :: 8 :8r :Var (from r ) ! ! Var r

constant:

prodElim :: 8 :8r :Var (to r ) ! Rec r !

prodIntro :: 8 :8r :Rec (from r ) ! ! Rec r

fj : j jg :: lab !  ! row ! row :

Given our earlier representations for records and vari-

Finally, we need to generalize the lacks predicate from

ants, it is easy to implement each of these op erations as

Section 3.3 to a two place relation r nl which takes b oth

builtin primitives.

a row r and a lab el l of kind lab . This can b e de ned

One technical diculty that we face with this ap-

in much the same way as b efore, and has the same in-

proach is in extending the treatment of uni cation to

terpretation as an o set value in the underlying imple-

deal with uses of the from and to constructs. This turns

mentation.

out to b e straightforward, except for p otential complica-

We can generalize the other basic op erations on records

tions caused by the presence of empty rows. For exam-

and variants in a similar way. For example, the expres-

0 0

ple, in unifying two rows to t r and to t r , we cannot

sion x :y :(y = 2; x = 3), which involves two uses of

0 0

simply unify t with t ; if r , and hence r , is empty, then

extension, will b e assigned the type:

0

there is no direct relationship b etween t and t . More

precisely, to obtain most general uni ers for from and

fjx : Int jgny )

to constructs, we need to restrict ourselves to work with

Label x ! Label y ! Rec fjx : Int ; y : Int jg:

non-empty rows. There are several alternatives to con-

To our knowledge, this is the rst work|in either im-

sider here. For example, the obvious approach would

plicitly or explicitly typed record and variant calculi|

b e to banish the empty row from our original type sys-

to consider a type system in which lab els can b e treated

tem. A more satisfactory solution might b e to adopt a

as rst-class values. This increases the expressiveness

4

The from r construct used in the types of plusIntro and

of our system quite dramatically, and we already have

prodIntro is de ned as an obvious dual to the to r construct that

we used previously. some interesting applications for these new features. 10

However, it remains to see how well this will work in semantics. Journal of ,

practice, and what its implications for language design 4(2):127{206, April 1994.

might b e; we leave these topics to future work.

[3] L. Cardelli. A semantics of multiple inheritance. In

G. Kahn, D. MacQueen, and G. Plotkin, editors,

Semantics of Data Types, volume 173 of Lecture

7 Conclusion

Notes in , pages 51{67. Springer-

We have describ ed a exible, p olymorphic type sys-

Verlag, 1984. Full version in Information and Com-

tem for extensible records and variants with an e ec-

putation 76(2/3):138{164, 1988.

tive typ e inference algorithm and compilation metho d.

[4] L. Cardelli. Extensible records in a pure calculus of

Prototype implementations have b een written in SML

subtyping. Research rep ort 81, DEC Systems Re-

and Haskell, and a implementation of records has b een

search Center, Jan. 1992. Also in Carl A. Gunter

added to Hugs, an implementation of Haskell 1.3. Our

and John C. Mitchell, editors, Theoretical Aspects

exp erience to date shows that these implementations

of Object-Oriented Programming: Types, Seman-

works well in practice. There are a numb er of areas for

tics, and Language Design (MIT Press, 1994).

further work:

Ob ject systems. Another p otential application for

[5] L. Cardelli and J. Mitchell. Op erations on records.

rows would b e in a typ e system for ob ject-oriented pro-

Mathematical Structures in Computer Science, 1:3{

gramming. For example, a constructor Obj of kind

48, 1991. Also in Carl A. Gunter and John C.

row !  might b e used to describ e ob jects with a given

Mitchell, editors, Theoretical Aspects of Object-

row of metho ds [17, 22, 1, 10, 21, 2].

Oriented Programming: Types, Semantics, and

More sophisticated basic op erations. Our type sys-

Language Design (MIT Press, 1994); available as

tem do es not supp ort record app end [28, 24] or the

DEC Systems Research Center Research Rep ort

database join that was one of the original motivations

#48, August, 1989, and in the pro ceedings of

for Ohori's work [19]. One approach would b e to adopt

MFPS '89, Springer LNCS volume 442.

the ideas of Harp er and Pierce [7], choosing an appro-

[6] L. Damas and R. Milner. Principal type schemes

priate form of evidence for the (r #r ) predicates de-

1 2

for functional programs. In Proceedings of the 9th

scrib ed in Section 1.1.

ACM Symposium on Principles of Programming

A new approach to datatyp es. Languages like SML

Languages, pages 207{212, 1982.

and Haskell already provide facilities for de ning new

datatyp es and these overlap to some extent with the

[7] R. Harp er and B. Pierce. A record calculus based

mechanisms describ ed in this pap er. Unifying and com-

on symmetric concatenation. In Proceedings of

bining these di erent approaches would obviously b e

the 18th Annual ACM Symposium on Principles of

useful, with a long term goal of developing a general

Programming Languages, Orlando FL, pages 131{

framework for extensible datatyp es.

142. ACM, Jan. 1991. Extended version available

as Carnegie Mellon Technical Rep ort CMU-CS-90-

157.

Acknowledgements

[8] R. W. Harp er and B. C. Pierce. Extensible records

This work was supp orted in part by an EPSRC stu-

without subsumption. Technical Rep ort CMU-CS-

dentship 9530 6293. We would also like to thank our

90-102, School of Computer Science, Carnegie Mel-

colleagues, Colin Taylor and Graham Hutton, in the

lon University,Feburary 1990.

functional programming group at Nottingham, for the

valuable contributions that they have made to the work

[9] J. R. Hindley. The principal type scheme of an

describ ed in this pap er.

ob ject in combinatory logic. Transactions of the

American Mathematical Society, 146:29{60, De-

cemb er 1969.

References

[10] M. Hofmann and B. Pierce. A unifying type-

[1] M. Abadi and L. Cardelli. A semantics of ob ject

theoretic framework for ob jects. Journal of Func-

typ es. In In Proceedings of the 9th Symposium on

tional Programming, 1995. Previous versions ap-

Logic in Computer Science, pages 332{341, July

p eared in the Symp osium on Theoretical Asp ects

1994.

of Computer Science, 1994, (pages 251{262) and,

under the title \An Abstract View of Ob jects and

[2] K. B. Bruce. A paradigmatic ob ject-oriented pro-

Subtyping (Preliminary Rep ort)," as University of

gramming language: Design, static typing and 11

Edinburgh, LFCS technical rep ort ECS-LFCS-92- 1993, and as University of Edinburgh technical re-

226, 1992. p ort ECS-LFCS-92-225, under the title \Ob ject-

Oriented Programming Without Recursive Typ es".

[11] M. P. Jones. Quali ed Types Theory and Practice.

Distinguished Dissertations in Computer Science.

[22] D. Remy. Programming ob jects with ML-ART: An

Cambridge University Press, 1994.

extension to ML with abstract and record types. In

M. Hagiya and J. C. Mitchell, editors, Theoretical

[12] M. P. Jones. A system of constructor classes: over-

Aspects of Computer Software, volume 789 of Lec-

loading and implicit higher-order p olymorphism.

ture Notes in Computer Science, pages 321{346.

Journal of Functional Programming, 5(1):1{35,

Springer-Verlag, April 1994.

Jan. 1995. An earlier version app eared in Pro c.

FPCA 1993.

[23] D. Remy. Typ e inference for records in a nat-

ural extension of ML. In C. A. Gunter and

[13] X. Leroy. Polymorphism by name for references

J. C. Mitchell, editors, Theoretical Aspects of

and continuations. In Principles of Programming

Object-Oriented Programming: Type, Semantics,

Languages, pages 220{231. ACM press, 1993.

and Language Design, Foundations of Computing

Series. MIT Press, 1994. Early version app eared

[14] E. Meijer and G. Hutton. Bananas in space: Ex-

in Sixteenth Annual Symp osium on Principles of

tending fold and unfold to exp onential types. In

Programming Languages. Austin, Texas, January

Proc. 7th International Conference on Functional

1989.

Programming and Computer Architecture. ACM

Press, San Diego, California, June 1995.

[24] D. Remy. Typing record concatenation for free.

In C. A. Gunter and J. C. Mitchell, editors, The-

[15] R. Milner. A theory of typ e p olymorphism in pro-

oretical Aspects Of Object-Oriented Programming.

gramming. Journal of Computer and System Sci-

Types, Semantics and Language Design, Founda-

ences, 17:348{375, August 1978.

tions of Computing Series. MIT Press, 1994.

[16] R. Milner, M. Tofte, and R. Harp er. The de nition

[25] J. A. Robinson. A machine-oriented logic based

of Standard ML. The MIT Press, 1990.

on the resolution principle. Journal of the Associ-

[17] J. C. Mitchell, F. Honsell, and K. Fisher. A lamb da

ation for Computing and Machinery, 12(1):23{41,

calculus of ob jects and metho d sp ecialization. In

January 1965.

1993 IEEE Symposium on Logic in Computer Sci-

[26] M. Wand. Complete type inference for simple ob-

ence, June 1993.

jects. In Proceedings of the IEEE Symposium on

[18] A. Ohori. A p olymorphic record calculus and its

Logic in Computer Science, Ithaca, NY, June 1987.

compilation. ACM Transactions on Programming

[27] M. Wand. Corrigendum: Complete type inference

Languages and Systems, 17(6):844{895, Nov. 1995.

for simple ob jects. In Proceedings of the IEEE Sym-

Preliminary version in Pro ceedings of ACM Sym-

p osium on Principles of Programming Languages, posium on Logic in Computer Science, 1988.

1992, under the title, A compilation metho d for

[28] M. Wand. Typ e inference for record concatenation

ML-style p olymorphic record calculi.

and multiple inheritance. Information and Com-

[19] A. Ohori and P. Buneman. Typ e inference in a

putation, (93):1{15, 1991. Preliminary version ap-

database . In Proceddings

p eared in Pro c. 4th IEEE Symp osium on Logic in

of the ACM Conference on LISP and Functional

Computer Science, 1989, 92{97.

Programming, pages 174{183. ACM, 1988.

[29] A. Wright. Simple imp erative p olymorphism. Lisp

[20] J. Peterson and K. Hammond. Rep ort on the Pro-

and Symbolic Computation, 8(4):343{356, Decem-

gramming Language Haskell, A Non-strict, Purely

b er 1995.

Functional Language (Version 1.3). Technical Re-

p ort YALEU/DCS/RR-1106, Yale University, De-

partment of Computer Science, May 1996.

[21] B. C. Pierce and D. N. Turner. Simple type-

theoretic foundations for ob ject-oriented pro-

gramming. Journal of Functional Programming,

4(2):207{247, Apr. 1994. A preliminary version ap-

p eared in Principles of Programming Languages, 12