Henk a typed intermediate language

Simon Peyton Jones

University of Glasgow and Oregon Graduate Institute

Erik Meijer

University of Utrecht and Oregon Graduate Institute

May

Abstract accurate compiletime types and the compiler can check its

own activity if desired by checking the typecorrectness of

the intermediate program We elab orate each of these p oints

There is growing interest in the use of richlytyped interme

in Section

diate languages in sophisticated compilers for higherorder

typed source languages These intermediate languages are

This pap er describ es the design of a new typed intermediate

typically stratied involving terms types and kinds As

language Henk designed for compilers for purelyfunctiona l

the sophisticati on of the type system increases these three

languages It has the following distinctive features

levels b egin to lo ok more and more similar so an attractive

approach is to use a single syntax and a single data type in

Henk is based directly on the lambdacube an expres

the compiler to represent all three

sive family of typed lambda calculi We have found

a shortage of introductory material ab out the lambda

The theory of socalled pure type systems makes precisely

cub e so we present a tutorial in Section

such an identication This pap er describ es Henk a new

typed intermediate language based closely on a particular

Henk is a smal l language there are only seven con

the lambda cube On the way we give a

structors in the data type of expressions Even so it

tutorial introduction to the lambda cub e

is a real language in the sense that it is rich enough

to use as a compiler intermediate language

Because of its lambdacub e heritage Henk uses a sin

Overview

gle syntax for terms types and kinds Better still

compilers for Henk can use a single data type to repre

Many compilers can b e divided into three main stages The

sent all three levels This leads to considerable econ

front end translates the source language into an intermedi

omy in b oth the syntax of the language and the utility

ate language the midd le end transforms the intermediate

functions of the compiler itself

language into a more ecient form and the back end trans

Henk has an explicit concrete syntax Intermediate lan

lates the intermediate language into the target language

guages are typically expressed only as a data type in

In the past intermediate languages for source languages

a particular compiler Giving a concrete syntax is an

with rich type systems have usually b een untyped The

apparently trivial step but we b elieve it is an imp or

compiler rst typechecks the source program and then

tant one and it is one to which compilerwriters often

translates the program to the intermediate language dis

pay little attention A compiler front end can pro duce

carding all the type information After all the type check

Henk to b e consumed by the back end of a dierent

ing simply ensured that the program would not go wrong

compiler or to b e transformed by some external pro

at runtime and once that is checked there is no further use

gram b efore b eing fed back into the original compiler

for types

This pap er introduces no new technical results Rather our

Recently however there has b een increasing interest in typed

main contribution is to build a bridge b etween recently

intermediate languages Peyton Jones Peyton Jones

developed and the compiler research commu

et al Shao App el Tarditi et al

nity

There are several motivations for such an approach the

compiler may b e able to take advantage of type information

to generate b etter co de it may b e desirable to treat types as

We are the rst to suggest using the lambda cub e

values at runtime in which case it is necessary to maintain

as the basis for a typed intermediate language Why

b other We see two p ersuasive reasons

{ It dramatically reduces the number of data types

This pap er will b e presented at the June Work

and the volume of co de required in the com

shop on Types in Compilation

piler For example the Glasgow Haskell Compiler GHC has separate data types for terms types

and kinds and separate algorithms for parsing functions Gill Launchbury Peyton Jones

printing typing and transforming them Col Hu Iwasaki Takeichi Launch

lapsing the three levels gets rid of all this dupli bury Sheard

cation

Traditional static type checking is p erformed com

{ It is easier to accommo date new developments in

pletely at compile time In more sophisticated set

the type system of the source language b ecause

tings however it may b e useful to p ostp one some type

the lambda cub es type language is already as ex

checking until run time For example

pressive as its term language

{ Some reasonable programs cannot readily b e ex

We give a tutorial on the lambda cub e emphasising

pressed in a static type systems notably ones in

asp ects relevant to compiler builders Most of the lit

volving metaprogramming There are many pro

erature is relatively recent p ost and written

p osals for incorp orating a type Dynamic in an oth

from the p oint of view of theorists

erwise staticallytyped language but all involve

some runtime check that the type of a value

matches an exp ected type

Our initial fo cus is on nonstrict languages but we hop e to

{ Rather than statically sp ecialise a p olymorphic

make Henk neutral with resp ect to the strictnonstrict ques

function for the types at which it is called one

tion That is rather than having two variants of Henk one

can pass the type as an explicit argument so that

strict one nonstrict we hop e to have a single language

the function can b ehave appropriately Harp er

that treats b oth styles as rst class citizens and allows free

Morrisett

mixing of the two in a single program

{ Tagfree garbage collectors need some runtime

We assume familiarity with the in general

type information to guide them Tarditi

and with the secondorder lambda calculus also called F

Tolmach

in particular

All these applicatio ns require accurate type informa

tion to b e available at runtime and hence at compile

time to o

Background and motivation

Debugging a compiler can b e a nightmare The only

evidence of an incorrect transformation can b e a seg

Typedirected compilation

mentation fault in a large program compiled by the

compiler Identifying the cause of the crash and trac

ing it back to the faulty transformation is a long

It has b een recognized for some time that maintaining type

slow business If instead the compiler typechecks

information right through to co de generation and b eyond

the intermediatelanguag e program after every trans

can b e b enecial Sp ecicall y

formation the huge ma jority of transformation bugs

can b e nailed in short order It is quite dicult to write

Accurate type information can guide compiler analy

an incorrect transformation that is type correct Fur

ses and transformations

thermore the program cannot crash if it is type cor

For example

rect so even incorrect transformations can only give an

unexp ected result not a crash which is usually much

{ Strictness analysis over nonat domains has to

easier to trace

b e guided by type information Peyton Jones

We call the Henk typechecker Core Lint If the

Partain

compiler is correct Core Lint do es nothing useful and

{ A compiler may b e able to use more ecient rep

is switched o by default In our exp erience its ability

resentations of data values if it knows their types

to lo calise compiler bugs would by itself justify the use

For example an integer can b e represented by the

of a typed intermediate language

integer itself rather than a p ointer to a b ox con

taining the integer either in a strict language

Compilers that maintain and use type information through

or in a lazy one in contexts where the integer

out the compiler have come to b e called typedirected

is certainly evaluated The price to b e paid is

Standard ML of New Jersey has b een typedirected for some

that such unboxed integers cannot b e passed to

time Shao App el but its internal type system is

p olymorphic functions since their representation

monomorphic so it cannot track types through p olymorphic

diers from the simple p ointer that p olymorphic

functions Since the Glasgow Haskell Compiler GHC

co de typically exp ects Guided by type informa

has used a language based on the secondorder lambda calcu

tion one can sp ecialise the p olymorphic function

lus as its intermediate language Peyton Jones Pey

to the unboxed types at which it is used Pey

ton Jones et al

ton Jones Launchbury Tarditi et al

More recently the TIL compiler uses a considerably more so

{ Transformations that remove intermediate data phisticated and complicated intermediate language capa

structures often called deforestation or fusion ble of expressing intensional p olymorphism includin g recur

transformations rely heavily on types to guide sive functions at the level of types Morrisset Tarditi

the transformation In some cases the very cor et al TIL uses a stratied type system which sim

rectness of the transformation relies on para plies the pro of theory but it do es lead to considerable du

metricity a prop erty of welltyped p olymorphic plication in compiler transformations Tarditi We

sp eculate that the lambda cub e might provide a theoreti Sp ecicall y Henk go es b eyond the second order lambda cal

cally sound way to get the b est of b oth worlds culus in the following ways

Shao is also developing a typed intermediate language

It is elegantly parameterised Simply by selecting or

FLINT with similar goals to TILs Shao

discarding type rules one can force Henk to b e equiva

lent to the simply the second

order lambda calculus Girard its extension to

Implicit vs explicit typing

higherorder kinds F Girard or the calcu

lus of constructions C Co quand Huet

It is imp ortant to distinguish the typing requirements of a

In Section we show how we can also extract a pred

source language and an intermediate language

icative version of F

Almost all source languages are to some degree implicitly

As it happ ens Haskells type system allows type vari

typed There is a continuum b etween full type inference

ables to range over type constructors not merely

where no type information is given by the programmer

types so the generality of F is already required

and type checking where all type information is given

Henk puts this extension on a rm theoretical foun

The type system of a source language is usually a delicate

dation whereas it is a somewhat ad hoc extension of

compromise of expressiveness and type inference The more

GHCs earlier Core language

expressive the type system the more guidance in the form

Henk provides a full lambda calculus at the level of

of type signatures and the like must b e given to the type

types Provided recursion is disallowed a restric

checker

tion easily expressed in the type system evaluation

On the other hand it is highly implausibl e to have an

of types is strongly normalising ie guaranteed to ter

implicitl ytyped intermediate language b ecause it hard to

minate something that is really the dening prop erty

ensure that every transformation preserves the delicate

of a type

prop erty that types can b e inferred from the program source

This lambda calculus subsumes Haskells type syn

Indeed many transfomations do not eg desugaring a let

onyms which receive a rather ad hoc treatment in

expression to a lambda abstraction applied to the righthand

GHC and it also p ermits us to explore more ambi

side of the let denition An intermediate language can

tious paths such as those suggested by TIL

however b e explicitly typed When the front end trans

lates a source program into the intermediate language it

Despite this extra richness Henk is a very small lan

can decorate it with type information based on the results

guage The data type of expressions for example has

of type inference on the source program If the compiler

only seven constructors Better still the very same

for some reason wants to check the typecorrectness of an

syntax is used for expressions types and kinds This

intermediatelanguage program it should b e a matter of

economy is reected

type checking and not type inference In practical terms

{ in the type system by a single set of rules that

type checking do es not involve unication or other sophisti

say when a term is welltyped when a type is

cated algorithms

wellkinded and when a kind is well formed

Furthermore b ecause it is explicitly typed the intermedi

{ in the compiler by a single data type that repre

ate language can have a much more expressive type system

sents terms types and kinds and a single set of

than the source language For example the intermediate

utility functions to parse print typecheck and

language might p ermit arbitrary universal quantication in

so on

types whereas a source language like ML or Haskell restrict

1

universal quantication to the top level of a type

One might ob ject that using the same compiler data type

to represent terms and types would allow a buggy compiler

to construct illformed terms such as attributing the type

Towards the lambda cub e

to a variable With GHCs current structure such a thing

would b e identied as illtyped when compiling the compiler

While it is obvious enough in retrosp ect it was a break

b ecause something from the datatype representing terms

through when we realised in that the secondorder

cannot b e used in place of something from the datatype

lambda calculus was precisely what we needed to express

representing types

and maintain type information in the intermediate language

of a compiler Peyton Jones gives some examples A

GHCs current structure may prevent the compiler from ac

large b o dy of theory and its design choices could b e pressed

cidentally constructing some b ogus terms but it do es not

into immediate service

eliminate al l of them For example True is a legal value

in the datatype of terms but of course it is illformed In

With Henk we aim to take the same idea one step further

stead we rely on Core Lint Section to identify such

by appropriating another b o dy of theory the lambdacub e

bugs Folding together the three levels do es however p ost

and adopting its design choices to structure the language

p one the detection of certain actually rather unusual errors

1

from compilercompil atio ntime to compilerruntime

Haskells type classes actually give rise to intermediatelanguage

constants of rank p olymorphic type a nice example of the way in

which a language feature can make use of a type discipline which

would b e unworkable in its full generality

In our PTS language the same abstraction and application

E K Constant

forms serve for b oth types and terms so we can rewrite the

j x Variable

expression thus

j EE Application

j x E E Abstraction

x id x

j x E E Quantication

Notice that this decision forces us to attribute a type to the

variable The type of a type is called its kind and is

a kind constant usually pronounced type Thus

Figure Syntax of Pure Type System expressions

simply says that is of kind type that is is a type

variable The question that b egs to b e asked is this is the

only kind The answer is yes for F and no for F In

Pure Type Systems and the Lambda

F type variables can range over type constructors as well

Cub e

as over types indeed this is precisely what distinguishes it

from F For example in F we might write

So what should the new typed intermediate language lo ok

m x m Int Tree

like Fortunately Barendregt has done all the hard work

for us Barendregt Pure Type Systems PTS are

Here the rst lambda abstracts over m whose kind is

an elegant way of presenting explicitly typed lambdacalcul i

The second lambda abstracts over values of type m Int

in a uniform way and give us almost exactly what we want

The whole abstraction is applied to a Tree

However the literature on pure type systems is mostly writ

whose kind is presumably

ten from the p ersp ective of theorists and while much of it

is excellent it is not for the faint hearted In this section

In short even if the syntax had not forced us to attribute

we therefore b egin with a tutorial on Pure Type Systems

a type to the move from F to F would have done

using a very small expression language In Section we then

so Is the extra p ower of F required in a compiler inter

elab orate this calculus into a real language

mediate language Clearly this dep ends on the source lan

guage but the extra p ower is certainly required for Haskell

whose type system explicitly includes higherkinded type

2

variables most particularly to supp ort constructor classes

The familiar core

Jones

The syntax of PTS expressions is given in Figure The

So far this all seems attractive but there are at least three

rst four pro ductions should b e familiar constans vari

worries

ables applicatio ns and abstractions The language is ex

plicitly typed so that the variable b ound by a lambda ab

How should we interpret function arrow in the

straction is annotated with its type The fth pro duction

language of types As a constant Perhaps but it is

abstraction is a key idea of PTS and as we will see shortly

a very sp ecial one indeed b ecause it has an intimate

subsumes b oth function arrow and universal quantication

relationshi p with abstraction and application at the

term level

What is to play the role of universal quantication

Mixing terms and types

in the language of types It do es not take much

exp erimentation to convince oneself that is inappro

The unusual feature ab out the PTS world is that the type

priate for this purp ose

that decorates the b ound variable of a lambda abstraction

is simply another expression That is types have the very

Now that the types and values are mixed up together

same syntax as terms

how can we b e sure that the resulting expression still

makes sense That is can we give sensible type rules

This seems like an attractive idea After all like terms the

for the language

conventional syntax of types includes constants eg Int

type variables application s eg Tree Int abstractions

These questions are all elegantly resolved by the fth form of

in the form of type synonyms and in a p olymorphic sys

expressions given in Figure to which we turn our attention

tem a binding construct namely universal quantication

next

Furthermore the second order lambda calculus requires ab

stractions and applicatio ns of types to app ear in terms

In a PTS a single form of abstraction and application suf

Notation

ces at least from a syntactic p oint of view For example

here is an expression written in F

Before we do it will b e helpful to establish some terminol

ogy We have identied three levels so far terms types and

x id x

kinds An expression describ ed by the syntax of Figure

introduces a type abstraction that binds the type variable

can denote a term a type or a kind We call these three

which in turn is used as the type of x In the b o dy of

levels sorts so that we might say that an expression is of

the abstraction the p olymorphic identity function of type

2

is applied to and x We use square brackets to

This was an innovation in Haskell earlier versions of Haskell

did not have higherkinded type variables indicate type application

sort Term or of sort Type or of sort Kind More commonly For example consider the K combinator dened thus

though we simply say that an expression is a term or is a K x y y In F we would write the typing judgement

type or is a kind for K s b o dy like this

Unfortunately it is very hard to avoid using the word type

x y y

for multiple purp oses In particular note the dierence b e

3

tween sort Type and kind type For example b oth Int

In a PTS we would write the judgement like this

and Tree are of sort Type but only Int is of kind

x y y

type Int

Each wellformed term has b elongs to a type and each

Voila With one blow deals with two of the three worries

wellformed type has a kind A type system sp ecies pre

at the end of the Section However it do es so at the price

cisely which expressions are wellformed and which are not

of making the third worry even more worrying For example

In general a PTS may have more than three levels or even

what is to stop us from writing types like this one

an innite number but in this pap er we study a particular

family of PTSs called the lambda cube The type system of

x Intif x then Int else Bool

the lambda cub e ensures that no more than three levels are

This is the type of functions from values of type Int to

required apart from a solitary constant 2 at the fourth

a result of type Int if the argument is greater than or

level as we shall see

Bool otherwise Type checking may now require arbitrary

When writing example programs we will use typewriter font

computations at the term level

for term variables eg x and all constants eg Int

The PTS framework allows us to answer the question in one

and Greek font for type variables eg When writing

program schemes for example in type rules we will use of two ways

x y z to range over variables of all sorts and A B C a b c

to range over expressions of all sorts Generally A will b e

either we can arrange for strange types like the one

of a sort one higher than a thus we might say that the

ab ove to b e illformed

term a has type A Finally we use s t to range over the

constants and 2

or we can decide that we like the expressiveness that

it gives and p ermit it

Quantication

The latter choice is equivalent to adopting the calculus of

constructions Co quand Huet

The fth pro duction in the syntax of PTS expressions Fig

The PTS framework allows terms and types to mix We

ure introduces the dependent product There are

ensure that we can only construct expressions that make

many essentiallyequivalent ways of interpreting the expres

sense by means of a type system which is what we discuss

sion x AB but for our present purp oses the most useful

next

one is this

x AB is the type of functions from values of

The lambda cub e type system

type A to values of type B in which the result

type B may p erhaps dep end on the value of the

We write typing judgements in the conventional way The

argument x

judgement

E A

From this denition it is immediately clear that subsumes

is read in environment the expression E has type A

the function arrow

The environment gives types for the free variables of the

expression E So for example we could correctly state

A B is an abbreviation for AB

fInt Int Int Intg

where we use the underscore symbol to denote an anony

x Int x x Int Int

mous variable We could instead say x AB where x

do es not o ccur free in B but AB is briefer

The type rules for the lambda cub e are given in Figure and

What is not so obvious is that also subsumes universal

The VAR rule should b e quite familiar it simply says that

quantication Consider the type A where A is

if the environment attributes the type A to x then we can

a type This type is the type of functions from values of

conclude that x A The premise checks that the type A

kind that is types to values of type A that is terms

of x is itself well formed The context is a sequence not

where the type A may mention But that is precisely the

a set with inner bindings to the right of outer ones The

interpretation we would give to A In short

weakening rule WEAK allows us to throw away irrelevant

bindings but checking that they are each well formed it

is usually applied as often as necessary just b efore the VAR

rule

A is an abbreviation for A

3

Remember A ! B is just an abbreviation for _ AB

STAR ;

2 ;

2 ;

2 ;

A s

2 ; 2

VAR

2 ; 2

x A x A

; 2

; 2

b B A s

WEAK

Figure The ; judgement

x A b B

Rule APP exp ects f to have a type The next rule in

f x AB a A

APP

Figure LAM shows how types are introduced in the

f a B x a

rst place As b efore it is helpful to compare it with the

rules for F The F rules for value and type abstractions

are

x A b B x AB t

LAM

x Ab x AB

x A b B

V LAM

F 2

A s x A B t s ; t

x Ab A B

PI

x AB t

b B

T LAM

F 2

a A B s A B

b B

CONV

a B

The rst of these states that if assuming x has type A we

Figure Type rules for the Lambda Cub e

can prove that the b o dy b of the abstraction has type B

then the abstraction x Ab has type A B Compare

this with rule LAM in Figure the rst premise and the

The second rule STAR is also easy It states that the con

conclusion match the F rules directly The second premise

stant has sup erkind 2 This is where the hierarchy stops

is more interesting Its mission is to check that it is legiti

in the lambda cub e There is no typing rule for 2 and hence

mate to abstract a variable x A from an expression of type

it cannot app ear explicitl y in a program

B It do es this simply by requiring that the type in the

conclusion is a legitimate type that is that it itself has a

Things b ecome more interesting when we meet the rule for

type For example the type abstraction E is p ermitted

application s APP In ordinary lambda calculus one usually

i its type is p ermitted where E

sees a rule like this

Of course this just b egs the question When precisely is a

type a valid type To answer that lo ok at the PI rule It

f A B a A

sp ecies how to nd the type of the expression x AB

AP P

F 2

f a B

The rst premise checks that A is well formed and nds

its type s The second do es the same for B remembering

to bring x into scop e rst The third premise uses a new

Remembering that A B is an abbreviation for AB

judgement form ; which is also dened by the four axioms

it is easy to see the ordinary rule can b e obtained by

in Figure Since these axioms only involve constants the

sp ecialisi ng rule APP with x The substitution of a for

; judgement do es no more than generate four copies of rule

x in B do es nothing b ecause in this sp ecial case x cannot

PI each with its own values for s and t In fact the lambda

o ccur in B that is what the meant

cub e describ es a family of eight type systems each dened

The exciting thing is that the same rule APP also deals

by selecting a subset of the axioms for ; as we now discuss

correctly with type application s In F we have this rule

Let us rst sp ecialise rule PI with the axiom ; Then

for type applications often called SPEC since it sp ecialises

s and t What do es this mean It means that A and

a p olymorphic type

B must b oth b e of kind that is they must b oth b e types

and hence moving back to rule LAM we can abstract a

term variable over a term expression Indeed we can read

f B

; as terms can dep end on terms Furthermore if

SPEC

F 2

f A B A

; is the only axiom for ; then these value abstractions

are the only sort of abstractions we are allowed so we have

the simplytyped lambda calculus

A few moments thought remembering that B is an ab

breviation for B should convince you that APP in

What is needed to allow type abstractions which we need

deed subsumes SPEC This time the substitution of a

F 2

for the secondorder lambda calculus F Lo oking at rule

for x in B in APP is vital just as we must substitute A for

LAM we will need x to b e a type variable so A must b e a

in B in SPEC

F 2

LAM STAR

conv

x AB C B x C 2

plus the usual rules for symmetry reexivity and tran

x A

sitivity

VAR

x A

Figure Conversion rules for expressions

0 0

f x AB a A A A

APP

f a B x a

kind so s must b e 2 However the b o dy of a type abstrac

tion is a term so B is a type and hence t is Hence to get

F we need the axiom 2 ; which we can read as terms

x A b B x AB t

4

LAM

can dep end on types

x Ab x AB

To get F we need not only that A is a kind but also that

it can b e a higher kind such as If A the

A s x A B t s ; t

PI

rst premise of PI requires that s which in

x AB t

turn requires us to give the typing judgement for that

is for So we have to reuse rule PI It is an almost

immediate consequence that we require the axiom 2 ; 2

w h

a A A B

in order to conclude 2 Hence F requires the

RED

a B

axiom 2 ; 2 which we can read as types can dep end on

types

Figure Syntax Directed rules for the Lambda Cub e

What would happ en if we added the nal axiom ; 2

That would allow types can dep end on terms taking us

into the Calculus of Constructions Co quand Huet

Syntax directed rules

This is swampy territory for compilers so we stay well clear

and avoid ; 2

The inference rules given in Figure are not directly suitable

for use in a Core Lint type checker as the rules are not syntax

To summarise we have the following equivalences

directed In particular you cannot decide at which p oint

System Corresp onding subset of ;

in a derivation the rule CONV must b e applied simply by

Simply typed calculus f ; g lo oking at the structure of the term under consideration

F f ; 2 ; g

What we seek is a syntaxdirected presentation of the rules

f ; 2 ; 2 ; 2g F

that is b oth sound and complete with resp ect to the old set

f ; 2 ; 2 ; Calculus of constructions

that is the new presentation should type exactly the same

2 ; 2g

terms in exactly the same way The generality of PTSs

makes this task quite tricky but Pollack van Benthem Jut

There are eight systems given by selecting ; and any

ting McKinna have done much of the hard work

subset of the remaining three axioms which is what gives

for us Figure gives a simplied version of their syntax

rise to the term lambda cub e All eight systems make

f

directed system The main dierence compared with the

sense but the ones just identied are the interesting ones

rules of Figure is the strategic distributio n of reduction

w h

over the other rules The notation A B means

that A reduces to the weak head normal form B The new

Completing the type system

judgement form a B means that a A and

w h

A B as rule RE D states

The nal rule in Figure is CONV which states that if

we can deduce a type A for an expression a and B is

We have also taken the opp ortunity to introduce the so

equivalent to A then A is a valid type for a Why is this

called valid context optimization If we assume that the

rule necessary First notice that rule AP P might substitute

initial context is wellformed a notion it is easy to dene

an arbitrary term into the type of an expression so the type

formally then it will remain wellformed b ecause the only

of an expression is not necessarily in normal form it might

rules that extend it LAM and PI do so with a binding

b e an application for example Second notice that the

whose type is wellformed There is therefore no need to

rst premise of rule AP P requires the type of f to b e a

check for this wellformedness in rules VAR and WEAK

expression But supp ose the type of f is an expression that

Furthermore adopt the Barendregt convention that all vari

evaluates to a expression such as x xy AB

able names are distinct As a result we can regard as an

Rule CONV simply allows the necessary reduction to take

unordered bag rather than a sequence and this means we

place The relation is dened in Figure

can elininate WEAK altogether in favour of the premise

x A in VAR All of this is well known Pollack

4

There is a slight awkwardness here b ecause you have to read

van Benthem Jutting McKinna Section

the notation backwards p ; q means q can dep end on p Also

confusingly we read terms for and types for 2 b ecause

the ; judgement is two levels away from the original thing

This new presentation has the practical advantage in a com

a A means a has type A

type

piler of allowing us to separate the typender from the eval

uator in the compiler since no longer mentions substi

type

tution Instead we can extract the type of an expression

STAR

type

2

type

and only then evaluate it

A curious feature of this way of doing things is that the

VAR

type

type of an expression may not b e well formed Consider

x A A

type

the expression id Int Rule APP would say that it

type

has type Int This type evaluates to the

f F

type

wellformed Int Int as exp ected but it is not itself well

APP

type

formed Why Because has type rather

f a F a

type

than a type We are a bit suspicious of this intermediate

illformedness but its advantages are p ersuasive

b B

type

LAM

type

Note that the rules VAR and APP rules of gure can

x Ab x AB

type

easily b e mo died to incorp orate the changes introduced in

this section

B s

type

PI

type

x AB s

type

A predicative variant

0

A B B B

type 

CONV

type

One disadvantage of the system we have describ ed so far is

0

A B

type

that it is impredicative In an impredicative system type

variables can range over universallyquantied types For

Figure The typeof judgement

type

example supp ose that f Int Then the follow

ing type applicati on is legitimate

f

Factoring the typing judgement `

Here f is instantiated at the p olytype There

is nothing wrong with this in the sense that welltyped

The typing judgement actually do es two things

programs cant go wrong but the ability for type variable

to range over p olytypes greatly complicates the business of

It checks that an expression is well formed

providing a model for the calculus Mitchell Chapter

It nds its type

Inside a compiler one would hop e that the program was

Dening the predicative variant

always wellformed after an initial typecheck that is but

the compiler might quite frequently want to nd the type of

Fortunately it is fairly easy to pro duce a predicative variant

an expression

of our calculus Instead of just the kind we need two

The typeof judgement in Figure nds the type of

type

constants

a wellformed expression without using an environment To

is the kind of monotypes

achieve this we annotate each bound occurrence of a variable

is the kind of p olytypes

with its type This latter prop erty is very useful in practice

b ecause it means that the compiler can contain a simple

Corresp onding to these two kinds are two sup erkinds 2 and

function corresp onding to that maps a expression to

type

22 with 2 and 22 The latter requires a new rule

its type without needing to plumb around an environment

STAR given in Figure

But is it not rather exp ensive to annotate every variable

To make the system predicative requires that we make more

o ccurrence Not necessarily If the compiler maintains the

distinctions ab out what can dep end on what This is what

exression as a graph it can use a single data structure to

the new rule PI in Figure do es It makes use of a three

represent xInt say and simply p oint to that data structure

place judgement s ; t u also dened in the same Figure

from the binding site and each o ccurrence site

The rst four rules of the new ; judgement sp ecify which

The APP rule in Figure also uses a neat trick due

type

types are p olytypes and which are monotypes For example

to Kamareddine and Nederp elt Kamareddine Nederp elt

assuming that Int we can deduce that

Rather than actually p erform the substitution in

the rule as we did in APP Figure we simply apply the

type of the function F to the argument a As b efore

Int

CONV allows us to evaluate application s when neces

type

Int Int

sary but with the additional reduction rule

Int

x AB a B x a



Int Int

The rules VAR LAM APP and PI

STAR

A set S of constants f 2 22g in our case

22

A set A of typing axioms relating these constants

f 2 22g in our case

A s x A B t s ; t u

PI

x AB u

A set R of ; rules ranging over the constants S q

A PTS is called functional if

s t A and s t A t t

1 2 1 2

;

;

s t r R and s t r R r r

1 2 1 2

;

;

In practice almost all practically useful PTSs and certainly

the ones in this pap er are functional Many useful theo

rems such as the substitution lemma sub ject reduction

;

have b een proved for arbitrary PTSs and more are true of ;

functional PTSs such as uniqueness of types Barendregt

Section Functional PTSs seem to combine these

;

desirable theorems with a remarkable degree of exibili ty

;

for example we developed the predicative variant of this sec

tion well after the rst draft of this pap er was completed

2 ;

2 ;

A Real Language

2 ;

2 ;

We now elab orate the small language of the previous section

into a full language complete with a concrete syntax The

2 ; 2

5

2 ; 2 2

full language is given in Figure Compared to the previous

section we add the following features

Figure Mo died rules for the predicative variant

A program consists of

As the last example shows a value of p olymorphic type

{ A set of mutually recursive data declarations

can still b e passed as an argument to a function but the

each of which introduces a new data type Sec

resulting function type is of kind Notice that in the third

tion

rule t and u dier that is why we now need a threeplace

{ A sequence of value declarations each of which

judgement

may b e recursive or nonrecursive Section

The next two rules in ; 2 ; and 2 ; say that it

Expressions are augmented with let letrec and

is legitimate to abstract a monotyped variable from either a

case Section

monotype or a p olytype giving a p olytype in either case

A sp ecial anonymous variable is provided Sec

Finally 2 ; 2 says that it is OK to create lambda ab

tion

stractions at the type level provided we only abstract a

monotyped variable from a monotype One could imagine

A variety of abbreviations are provided Section

also having 2 ; 22 allowing us to abstract a monotyped

as syntactic sugar These abbreviations are marked

variable from a p olytype but there do esnt seem to b e a

with y in the left column of Figure Their purp ose

pressing reason to add it and life is simpler without

is to reduce the number of characters it takes to print

out a program and make it more comprehensible to

The imp ortant thing is that there is no axiom of the form

the human reader

22 ; corresp onding to the fact that we cannot abstract

a p olytyped variable from anything

There is only one name space Haskell programs that use

the same name for a data type and a data constructor will

need to b e renamed b efore b eing expressed in Henk

Pure Type Systems

5

Actually there is still one pro duction to come for primitive

op erations

The rules given in Figure take us outside the lambda cub e

Fortunately even this generalisation has b een well studied

The generalised PI rule together with the original rules

VAR LAM APP dene a Pure Type System A PTS is

dened by

Program pr og r am tdecl tdecl v decl v decl n m

1 n 1 m

Type declaration tdecl data tv ar tv ar tv ar n

1 n

Value declaration v decl let bind Nonrecursive

j letrec bind bind Recursive n

1 n

Binding bind tv ar expr

Expression expr bexpr

j tv ar tv ar expr n

1 n

j tv ar tv ar expr n

1 n

y j tv ar tv ar expr n

1 n

y j tv ar tv ar expr n

1 n

y j bexpr expr

j v decl in expr Lo cal declaration

j case expr of al t al t n

1 n

at aexpr aexpr m

1 m

y j case expr of al t al t

1 n

bexpr bexpr aexpr Application

j aexpr

aexpr tv ar Variable

j l iter al Literal

j Constant

j BOX Constant

j expr

Typed variable tv ar v ar aexpr

y j v ar

Variable v ar binding sites only

j identif ier

Case alternative al t pat expr

y j pat tv ar tv ar expr

1 n

Case patterns pat tv ar

j l iter al

y indicates syntactic sugar

Figure Concrete syntax

Type declarations Case expressions

An imp ortant design choice is that we introduce new types A case expression takes apart values built with constructors

with explicit declarations outside the syntax of expressions Here is an example

A data declaration denes a new algebraic data type It

case reverse xs of

introduces a new type constructor plus a number of data

Cons y ys rhs

constructors into the environment For example the type of

Nil rhs

generalized trees with values of type a is dened as

data Tree at Int

Branch m a

A case expression scrutinises an expression called the scru

a m Tree m a Tree m a

tinee The scrutinee is reverse xs in this example It eval

uates the expression to head normal form and matches it

This declaration introduces the type constructor Tree and against the alternatives The pattern in a case alternative

the data constructor Branch each with the sp ecied type must b e a literal a constructor name or All the patterns

in a case expression must have the same type The list of

Like ML and unlike Haskell type constructors and data

types in the at clause gives the types at which the construc

constructors are not required to start with an upp ercase

tors are instantiated it is empty if the patterns are literals

letter

In the example xs is presumably a list of Int so Cons and

Nil are instantiated at Int

The main design alternative would b e to provide primitive

type constructors for unit sum pro duct lifting and recur

When an alternative is selected its righthandside is ap

sion and then use ordinary value declarations to introduce

plied to the values of the arguments of the constructor

new types For example we might introduce lists thus

Thus if reverse xs evaluates to Cons Int e e the

result of the case expression will b e

List a rec l Lift Unit a l

y ys rhs e e

Whilst this is nice from a theoretical standp oint it has sev

eral disadvantages

If is used as a pattern it is selected only if all the others

fail to match regardless of order

It do es not generalise easily to handle mutual recursion

This form of case is a little unusual More typically the

and nonuniform data types the former is very com

patterns can also bind variables thus

mon and the latter legal in b oth ML and Haskell

case reverse as of

It is hard to know when to unwind the recursion for

Cons y ys rhs

example when testing types for equality

Nil rhs

It is harder to prove strong normalisation for types

We provide the conventional form as syntactic sugar Sec

We have opted to b e conservative and exploit the type struc

tion but the core form reduces the number of expres

ture we know we have The extra generality of arbitrary

sions that bind new variables This in turn reduces clutter

pro ducts and sums do es not seem worth the complexity

in the compiler without losing expressiveness

Value Declarations

Syntactic sugar

A value binding binds a variable to a value This can ei

The following syntactic sugar greatly reduces the size of the

ther b e a term value eg xInt or a type value

printed form of a program apart from the rst two which

eg Diag aPair a a Value bindings can b e

simply give more conventional equivalent forms

group ed into recursive or nonrecursive declarations These

value declarations app ear b oth at the top level of a program

and as lo cal declarations inside expressions

means the same as

means the same as

The anonymous variable _

x x e means the same as x x e

1 n 1 n

e e means the same as e e

1 2 1 2

The anonymous variable can b e used at the binding site

of a variable that is not mentioned in its scop e It is useful

A binding o ccurrence of an unannotated variable v

in the concrete syntax to reduce the creation of new names

means the same as v This allows us to omit the

More imp ortantly it is useful inside the compiler b ecause it

annotation for most type variables

allows the evaluator to eliminate a substitution step when

applying an abstraction or whose b ound variable is A bound o ccurrence of an unannotated variable v

unused Such abstractions are very common indeed every means the same as v t where t is the annotation

function arrow turns into one at its binding site It may b e necessary to p erform

conversion for this to have the exp ected meaning We should provide an op erational semantics for Henk

This abbreviation allows us to omit annotations on all

b ound o ccurrences

Subsequently we plan to move towards an implementation

in GHC More ambitiously we hop e that by giving Henk a

The at clause on a case expression can b e omitted

clear denition and a concrete syntax we may b e able to

b ecause it can readily b e reinferred

work more closely with other compilerwriters GHC has

The case alternative c y y e means the same as always b een intended as a substrate for others research

1 n

but it is such a daunting monster that it requires a certain

c y y e The variables y y need not

1 n 1 n

b e annotated with their types if they are not their determination to delve into its innards We hop e that Henk

may provide a route from GHC into other back ends or

types are recovered from the type of the constructor

and the instantiating types in the at clause analysis to ols and vice versa

Type rules

Acknowledgements

Figure extend our typechecking rules to deal with the ex

Thanks for Ko en Claessen John Launchbury Randy Pol

tended language

lack and Mark Shields for comments on earlier drafts We

acknowledge gratefully the supp ort of the Oregon Graduate

The rules for declarations return an environment as the

Institute during our sabbaticals funded by a contract with

type of a declaration To avoid clutter the rules for

US Air Force Material Command FC

letrec and case mention only a single binding or alter

native resp ectively

The main interesting p oint is in the LET and CASE rules

References

Their form is very like the APP rule in Figure in that

the type derived is of the same form as the expression b eing

typed We rely on the conversion rules which we dont have

H Barendregt Lambda calculi with types in Hand

space for here to convert the resulting type to the required

b o ok of Logic in Computer Science Volume I I S

form where necessary ie in APP and CASE

Abramsky DM Gabbay TSE Maibaum eds Ox

Rule LET raises an interesting question Clearly we need ford University Press

check that it is legal to abstract the b ound variables over

the b o dy ; A But if the binding is recursive

TH Co quand G Huet The calculus of construc

say letrec x a x a do we need to check that tions Information and Computation

1 1 2 2

it is legal to abstract the b ound variables over each of the

a We b elieve that the answer is no To see why consider

A Gill J Launchbury SL Peyton Jones June A

i

the expression let x A a let y B b c Here we

short cut to deforestation in Pro c Functional Pro

clearly do not check that we can abstract x A over b It is

gramming Languages and Computer Architecture

not clear to us what the correct answer is here but we plan

Cop enhagen ACM

to nd out

JY Girard Interpretationfo ctionelle et elimination

des coupures

dans larithmetique dordre superieur PhD the

Conclusions and further work

sis UniversiteParis VI I

We b elieve that the lambda cub e provides a solid foundation

R Harp er G Morrisett Jan Compiling p olymor

for the intermediate langauge of sophisticated compilers It phism using intensional type analysis in nd

is a subtle system and extending it to a real language raises

ACM Symp osium on Principles of Programming

new technical issues as we have just seen Probably some Languages San Francisco ACM

of these problems are old hat to the theorists which is why

the direct link to a wellstudied system is so valuable

RW Harp er JC Mitchell E Moggi Jan Higher

order mo dules and the phase distinction in th

There is still much groundwork left to do

ACM Symp osium on Principles of Programming

Languages San Francisco ACM

We have gaily extended the PTS framework with re

Z Hu H Iwasaki M Takeichi May Deriving struc

cursive data types let letrec case and constants

tural hylomorphisms from recursive denitions in

but it is necessary to prove that all the standard PTS

Pro c International Conference on Functional Pro

theorems still go through Indeed we noted some un

gramming Philadelphi a ACM

certainty ab out the exact typing rule for letrec in the

previous section

MP Jones Jan A system of constructor classes over

Section denes a predicative variant of Henk with

loading and implicit higherorder p olymorphism

the intention that it has a more tractable mo del but

Journal of Functional Programming

we have yet to exhibit such a mo del

S

t k n

i j k

1in

S S

v k m

i j k k

1in 1j k 1

PROG

t t v v

1 n 1 n

a A

VDECL

x A let x A a fx Ag

x A a A

VDECL

x A letrec x A a fx Ag

A s x A B t

DATA

x A y B data x A y B fx A y B g

v decl a A ; A

LET

v decl in a v decl in A

a A b X B c T X A

CASE

case a of c b at T case a of c B at T

w

Figure Extra rules for

R Pollack L van Benthem Jutting J McKinna May F Kamareddine R Nederp elt March Canonical

Typechecking in pure type systems in Types for typing and Piconversion in the Barendregt Cub e

Pro ofs and Programs Nijmegen May Selected Journal of Functional Programming

Papers Springer Verlag LNCS

J Launchbury T Sheard June Warm fusion

Z Shao The FLINT compiler system Presentation in Pro c Functional Programming Languages and

at IFIP Working Group Computer Architecture La Jolla ACM

Z Shao AW App el June A typebased compiler JC Mitchell Foundations for programming languages

for Standard ML in SIGPLAN Symp osium on MIT Press

Programming Language Design and Implementa

tion PLDI La Jolla ACM G Morrisset Dec Compiling with types PhD thesis

CMUCS Carnegie Mellon University

D Tarditi Optimizing ML PhD thesis Carnegie

Mellon University SL Peyton Jones Jan Compilation by transfor

mation a rep ort from the trenches in Euro

D Tarditi G Morrisett P Cheng C Stone R Harp er P p ean Symp osium on Programming ESOP

Lee May TIL A TypeDirected Optimiz LinkopingSweden Springer Verlag LNCS

ing Compiler for ML in SIGPLAN Symp osium

on Programming Language Design and Implemen SL Peyton Jones CV Hall K Hammond WD Partain PL

tation PLDI Philadelp hi a ACM

Wadler March The Glasgow Haskell com

piler a technical overview in Pro ceedings of Joint

A Tolmach June Tagfree garbage collection using Framework for Information Technology Technical

explicit type parameters in ACM Symp osium on Conference Keele DTISERC

Lisp and Functional Programming Orlando ACM

SL Peyton Jones J Launchbury Sept Unboxed

values as rst class citizens in Functional Pro

gramming Languages and Computer Architecture

Boston Hughes ed LNCS Springer Verlag

SL Peyton Jones WD Partain Measuring the ef

fectiveness of a simple strictness analyser in Func

tional Programming Glasgow K Hammond

JT ODonnell eds Workshops in Computing

Springer Verlag