<<

Mads Tofte March

Lab oratory for Foundations of Computer Science

Department of Computer Science

Edinburgh University

Four Lectures on Standard ML

The following notes give an overview of Standard ML with emphasis placed

on the Mo dules part of the language

The notes are to the b est of my knowledge faithful to The Denition of

Standard ML Version as regards syntax and terminology

They have b een written so as to b e indep endent of any particular implemen

tation The exercises in the rst lectures can b e tackled without the use

of a machine although having access to an implementation will no doubt b e

b enecial The pro ject in Lecture presupp oses access to an implementation

of the full language including mo dules At present the Edinburgh

do es not fall into this category the author used the New Jersey Standard ML

compiler

Lecture gives an introduction to ML aimed at the reader who is familiar with some

but do es not know ML Both the Core Language and the Mo dules

are covered by way of example

Lecture discusses the use of ML mo dules in the development of large programs A

useful metho dology for programming with functors signatures and structures is presented

Lecture gives a fairly detailed account of the static semantics of ML mo dules for

those who really want to understand the crucial notions of sharing and signature matching

Lecture presents a one day pro ject intended to give the student an opp ortunity of

mo difying a nontrivial piece of software using functors signatures and structures

Harp er R Milner and M Tofte The Denition of Standard ML Version

ECSLFCS Labroatory for Foundations of Computer Science Dept of Computer

Science

from functions and ML programmers do this

ML at a Glance

all the time ML is an example of a func

tional language PASCAL is an example of

Supp ose we were to draw a map of the land

a procedural language

scap e of programming languages Where

LISP is also sometimes referred to as a

would ML t in COBOL and ML could

functional language In LISP programs can

safely b e put down far apart The in

b e treated as data so that LISP programs

putoutput facilities in COBOL op erate on

directly can decomp ose and transform LISP

sp eci kinds of inputoutput devices for in

programs This is harder in ML On the other

stance allowing the programmer to declare

hand the type discipline of ML is extremely

index sequential les ML just has the no

helpful in detecting many of the mistakes that

tion of streams a stream b eing a sequence

pass unnoticed in a LISP program

of characters much like streams in UNIX or

Like ADA ML has language constructs for

text les in PASCAL On the other hand

writing large programs Roughly sp eaking a

ML is extremely concise compared to the

structure in ML corresp onds to a pack

verbose COBOL and ML is much b etter

age in ADA a signature corresp onds to

suited for structuring data and algorithms

a package interface and a functor in

than COBOL is

ML corresp onds to a generic package in

ML is closer related to PASCAL Like PAS

ADA However ML admits structures not

CAL ML has data types and there is a type

just types as parameters to functors

checker which checks the validity of programs

b efore they are run Both PASCAL and

ML follow the tradition of ALGOL in that

An ML session

variables can have lo cal scop e which is de

An ML session is an interactive dialogue b e

termined statically from the source program

tween the ML system and the user You type

However PASCAL and ML are radically dif

a program in the form of one or more dec

ferent in how algorithms are expressed In

larations terminated by semicolon and

PASCAL as in many other languages a vari

the system resp onds either by accepting the

able can b e up dated using Algorithms

declarations or in case the program is ill

are often expressed as iterated sequences of

formed by printing an error message

statements using while lo ops for instance

To give a concrete idea ab out what ML

where the eect of executing one statement

programs lo ok like we shall work through

is to change the underlying store In ML

the following example Consider the prob

statements are replaced by expressions the

lem of implementing heaps A heap

eect of evaluating an expression is to pro

is a binary tree of items for example

duce a value Moreover variables cannot

b e up dated references are sp ecial values

that can b e up dated and as all other values

they can b e b ound to identiers but only

rarely are the values one binds to variables

references Iteration is expressed using recur

sive functions instead of lo ops In ML func

tions are values which can b e passed as ar

guments to functions and returned as results

For a binary tree to b e a heap it must sat

isfy that for every item i in the tree i is less

type item int

than or equal to all items o ccurring b elow i

In the ab ove picture items are integers and

fun leqp item q item bool

the relation less than or equal is the nor

p q

mal on integers The advantage of a heap

is that it always gives fast access to a minimal

infix leq

item and that it is easy to insert and delete

fun maxp q if p leq q then q else p

items from a heap This has made the heap

and minp q if p leq q then p else q

a p opular data structure in a number of very

dierent applications It was originally con

datatype tree L of item

ceived under the name priority queue as a

N of item tree tree

means of scheduling pro cesses in an op erating

system in that case the items are pro cesses

val t N L N L L

and the partial ordering is that pro cess p is

less than or equal to pro cess q if p should

fun topL i i

b e executed no later than q Heaps are also

topNi i

used in the heap sort algorithm which is

based on the observation that one can sort a

list of items by rst inserting the items one

We start out by considering integer heaps

by one in a heap and then removing them one

only therefore we rst declare the type item

by one

to b e an abbreviation for int Then we de

clare a function leq to b e the p ervasive

on integers We then declare that leq is to

b e used as an inx op erator as illustrated in

the declaration of the two functions max and

min

Every binary tree is either a leaf containing

an item or it is a no de containing an item and

two trees the subtrees This is expressed by

Types and Values

the datatype declaration datatype decla

rations are automatically recursive ie data

types can b e declared in terms of themselves

This is illustrated by the declaration of tree

In the following gures we present the ML

This data type has two constructors L

declarations the author provided in this par

and N Note that for example is an item

ticular session The resp onses from the ML

but L applied to written L or just

compiler are not shown For clarity the ac

L is of type tree Then the heap from the

tual input has b een edited using typewriter

earlier picture is b ound to the value variable

font for the reserved words and italics for

t

identiers regardless of whether these iden

Exercise Declare a heap t of the same tiers are p ervasives eg int or declared by

depth as t containing the integers the user eg item

and Exercise Write a function size which

when applied to a tree returns the total num

To dene a function on trees it will suce

b er of items in the tree

to dene its value in the case the argument

tree is a no de and in the case the tree is a

Exercise The function top returns a

no de The declaration of the function top il

minimal item of a heap Write a recursive

lustrates this top applied to a tree returns

function maxItem which returns a maximal

the item at the top of the tree L i and

item

are examples of patterns Ap Ni

plying a function here top to an argument

Raising Exceptions

eg t is done by matching the argument

against the patterns till a matching pattern

One often wants to dene a function that can

is found For example top t evaluates to

not return a result for some of its argument

values Supp ose for example that we wish

to dene a function initHeap which for given

Recursive Functions

integer n returns a heap of depth n This

only makes sense for n This can b e

expressed in ML by raising an exception

fun depthL

in the case n The eect of evaluating

depthNi l r

the expression raise e where e is an excep

maxdepth l depth r

tion is to discontinue the current evaluation

Often the exception will b e handled by a

depth t

handle expression not illustrated by our ex

amples if no handler catches the exception

fun isHeapL bool true

it propagates to the toplevel where it will b e

isHeapNi l r

rep orted as an uncaught exception

i leq top l andalso

i leq top r andalso

val initial

isHeap l andalso

isHeap r exception InitHeap

fun initHeap n

if n then raise InitHeap

else if n then Linitial

else let val t initHeapn

The function depth maps trees to integers

in Ninitial t t

for instance depth t evaluates to As sp elled

end

out in the declaration of depth the depth of

a leaf is and the depth of any other tree

is plus the maximum of the depths of the

left and right subtrees The function depth Notice the let dec in exp end expression To

is recursive ie dened in terms of itself evaluate it one rst evaluates initHeapn

Another example of a recursive function is the and binds the resulting value to t Then

function isHeap which when applied to a tree one evaluates the b o dy Ninitial t t us

returns the value true if the tree is a heap and ing this value for t Notice that the scop e

false otherwise of the declaration of t is the expression

Ninitial t t in particular the two o ccur If one types an expression followed by a

rences of t in that expression do not refer to semicolon such as t in the ab ove program

N L N L L the ML system evaluates the expression and

prints the result In the ab ove example it will

Exercise Dene functions leftSub and

turn out that even after we have replaced

rightSub which when applied to a tree returns

by t is b ound to the original heap Indeed

the left and the right subtree resp ectively

this replacement in no way aects the value

Finally we shall write a function replace

b ound to t it simply results in a new value

which when applied to a pair i h where

which subsequently is b ound to t

i is an item and h is a heap returns a pair

0 0 0

i h where i is the item at the top of

0

the heap h and h is a heap obtained from h

by inserting i in place of the top of h We

must make sure that the resulting tree really

is a heap Therefore in the case that i is to

b e inserted in a no de ab ove a subtree with a

Exercise After the last declaration

smaller item i swops place with the smaller

what values are b ound to out and t

item

fun replacei h top h inserti h

Li and inserti L

inserti N l r

if i leq mintop l top r

then Ni l r

else if top l leq top r then

Ntop l inserti l r

else top r mini top l

Ntop r l inserti r

val out t replace t

Structures

t

val out t replace t

The ab ove declarations of heaps and op er

ations on heaps b elong together In ML

The sp ecial parenthesis and delimit

there is a program unit called a structure

comments

which encapsulates a sequence of declara

Exercise In the case where one recur tions The following declaration declares a

sively inserts i in the left subtree how can structure Heap containing all the declarations

one b e sure that it is valid to put top l ab ove copied from ab ove encapsulated by struct

r in the tree and end

structure Heap signature HEAP

struct sig

type item int type item

fun leqp item q item bool p q val leq item item bool

fun maxp q val max item item item

and minp q val min item item item

datatype tree L of item datatype tree L of item

N of item tree tree N of item tree tree

val t val t item

fun topL i val top tree item

fun depthL val depth tree int

fun isHeapL bool val isHeap tree bool

val initial val initial item

exception InitHeap exception InitHeap

fun initHeap n val initHeap int tree

fun replacei h val replace item tree item tree

and inserti L val insert item tree tree

end HEAP end Heap

val smal lHeap Heap initHeap

Heap replace smal lHeap

As one can check the structure Heap

matches signature HEAP in the following

sense for every type sp ecied in HEAP there

is a corresp onding type in Heap for every

Identiers declared in a structure are accessed

exception sp ecied in HEAP there is a cor

from outside the structure by a long identi

resp onding exception in Heap and for ev

fier for instance Heap initHeap which can

ery value sp ecied in HEAP there is a cor

b e read the initHeap in Heap or just Heap

resp onding value in Heap which has the sp ec

dot initHeap In a large program contain

ied type

ing many structures long identiers make it

much easier for the reader to nd the deni

tion of an identier

Co ersive Signature Match

ing

Signatures

However the signature HEAP reects details

of the implementation in Heap which heap The type of a structure is called a signa

users should not have to worry ab out Obvi ture A signature can sp ecify types without

ously the value t is completely unnecessary necessarily saying what the types are More

and there is no reason why users should have over one can sp ecify values in particular

access to the constructors L and N given that functions by sp ecifying a type for each vari

we have already given the user initHeap and able without saying how such a sp ecication

replace By pruning the signature we obtain can b e met by an actual declaration

HEAP However we can write Heap replace the following shorter declaration of HEAP

as replace is sp ecied Moreover although

HEAP do es not sp ecify that item should b e

signature HEAP

int the ML system discovers that item is

sig

in fact int in Heap and that is why will

type item

b e accepted as an item in the application

val leq item item bool

Heap replace Heap initHeap Thus a

signature constraint may hide comp onents of

type tree

a structure but it do es not hide the true iden

val top tree item

tity of the types declared in the structure ex

exception InitHeap

cept that one can hide the constructors of a

val initHeap int tree

datatype by sp ecifying it as a type

val replace item tree item tree

end HEAP

Functor Declaration

This is a much cleaner interface so whenever

Almost all of what we did for heaps contain

we refer to HEAP in the following we mean

ing integer items would work for a heap whose

this version

items are of a dierent type More precisely

In practice one should write down a sig

given any type item any binary function leq

nature before one attempts to write down a

on items and any initial item the signature

structure which matches it In this way one

HEAP is satised by the declarations we have

can decide what types and op erations are

already written Let us sp ecify the general

needed without having to think ab out algo

requirements of a Heap structure

rithms at the same time So let us assume

that we started out by declaring the HEAP

signature We then imprint the view provided

by HEAP on the declaration of the structure

signature ITEM

Heap by a signature constraint

sig

type item

val leq item item bool

structure Heap HEAP

val initial item

struct

end

type item int

end Heap

Heap t

What we are after is a structure which is

Heap replace Heap initHeap

parameterised on any structure Item say

which matches ITEM In ML a parame

terised structure is called a functor The

After this declaration of Heap we cannot following table contains the complete functor

write Heap t since t is not mentioned in declaration the new bits are in b old face

In the rst line Item is the parameter

structure of the functor and HEAP is the re

functor HeapItem ITEM HEAP

sult signature of the functor The body

struct

of the functor is everything after the in the

type item Itemitem

rst line

fun leqp item q item bool

Notice that we included declarations of

Itemleqpq

item and leq in the b o dy of the functor since

fun intmaxi int j

the result signature sp ecies them they must

if i j then i else j

b e provided If you read the b o dy carefully

infix leq

you will see that it makes sense for any struc

fun maxp q if p leq q then q else p

ture which matches ITEM

and minp q if p leq q then p else q

datatype tree L of item

Exercise Declare a functor Pair which

N of item tree tree

takes as a parameter a structure matching the

fun topL i i

simple signature

i topNi

sig type coord end

fun depthL

depthNi l r

and has the following result signature

intmaxdepth l depth r

sig

bool true fun isHeapL

type point

isHeapNi l r

val mkPoint coord coord point

i leq top l andalso

coord point coord val x

i leq top r andalso

val y coord point coord

isHeap l andalso

end

isHeap r

exception InitHeap

fun initHeap n

You do not have to name these signatures by

if n then raise InitHeap

the use of signature declarations they can b e

else if n then LIteminitial

written down directly where you need them

else let val t initHeapn

if you prefer

in NIteminitial t t

Exercise When the author rst tried to

end

write the Heap functor he simply copied the

fun replacei h top h inserti h

original depth function which used max not

Li and inserti L

intmax However the type checker did not

inserti N l r

let him get away with that Why

if i leq mintop l top r

then Ni l r

Functor Application

else if top l leq top r then

Ntop l inserti l r

We can now get various heaps indeed heaps

else top r mini top l

of heaps by applying the Heap functor to dif

Ntop r l inserti r

ferent argument structures Of course we can

end Heap

only apply it to structures that match ITEM

this will b e checked by the compiler

Here is how one can get a string heap

called a mo dule but ML programmers often

refer to a mo dule meaning a structure or a

structure StringItem

functor

struct

type item string

fun leqiitem j

ordi ordj

val initial

end

structure StringHeap HeapStringItem

val out t

StringHeap replaceabe

StringHeap initHeap

val out t

StringHeap replaceman t

The p ervasive ord function applied to a string

s returns the ASCI I ordinal value of the rst

character in s and raises exception Ord when

s is empty

Exercise Declare a structure IntItem us

ing the declarations we originally used for in

teger heaps Then obtain a structure IntHeap

by functor application

Exercise How do es one get an integer

heap whose top is always maximal

Exercise Declare a structure

IntHeapHeap whose items themselves are in

teger heaps You can use the top function to

dene a leq function on integer heaps

Summary

ML consists of a core language and a

modules language The core language has

values functions are values data types type

abbreviations and exceptions The mo dules

language has structures signatures and func

tors There is no actual language construct

to ensure that the constituent units can b e

Programming with ML

put together in a consistent manner

Mo dules

Another approach taken in several pro

gramming languages eg Ada and ML is

Introduction

to provide mo dule facilities in the program

ming language itself Many of the op erations

This lecture gives a more thorough intro

one needs when programming with mo dules

duction to the mo dules part of ML and de

are similar to op erations one needs when pro

scrib es a metho dology for programming with

gramming in the small so many ideas from

its main constructs structures signatures

usual programming languages apply to pro

and functors

gramming in the large as well For instance

The core language is interactive you type

just as it is a type error in the small to add

a declaration get a reply type another decla

true and say so it is a type error in the

ration and so on thus gradually adding more

large to write a mo dule M say assuming

and more bindings to the toplevel environ

the existence of a mo dule M which provides

ment If we could think strictly b ottom

a function f and then combine M with an

up declaring one value or type in terms

actual mo dule M which either do es not pro

of the preceding values and types without

vide any f or provides an f of the wrong type

ever making unfortunate implementation de

The idea is that such mistakes should b e de

cisions or losing the p ersp ective of the entire

tected by a type checker at the mo dules level

pro ject then this gradual expansion of the

This leads to the exciting idea of having

toplevel environment would b e quite su

just one language with constructs that work

cient Unfortunately we cannot indeed a

uniformly for small as well as for large

program which is written as one long list of

programs One such language is Pebble by

core language declarations can easily end up

Burstall and Lampson In Pebble records

lo oking rather like a long shopping list where

can contain types so a mo dule consisting of

items have b een added in the order they came

a collection of types and values is now itself

to mind

a value which for example can b e passed as

Regardless of whether a programming lan

an argument to a function There are some

guage is interactive or not one needs the abil

tradeos however The ML type checker is

ity to divide large programs into relatively in

based on a strict separation of runtime and

dep endent units which can b e written read

compiletime In designing the mo dules lan

compiled and changed in relative isolation

guage it has b een necessary to restrict the

from each other

op erations on types in comparison with the

One approach taken by some is to pro

op erations on values in order to maintain the

vide more or less language indep endent soft

static type checking This has led to a strati

ware packages that help programmers organ

ed language in which the mo dules language

ise collections of programs typically by allow

contains phrases from the core language but

ing or forcing them to do cument their pro

not the other way around

grams in sp ecic ways The crucial problem

with this approach is of course to ensure con I shall use the term mo dule rather

sistency b etween the do cumentation and the vaguely to mean a relatively indep endent

programs in particular to ensure that the in program unit In particular languages

formation held by the to ol really is sucient they have b een called packages clusters

mo dules and in ML we use the word struc it is opaque in the sense that it do es not reveal

ture many implementation details

Likewise there is no standard terminology

for the type of a mo dule which has aquired

signature OTable

names such as package description inter

sig

face and the ML term signature

type table

As we shall see the real p ower of a mo d

exception Lookup

ules system comes from the ability to param

val lookup table Sym sym Val value

eterise mo dules Ada has generic packages

val update table Sym sym Val value

ML has functors

table

In ML a structure is a collection of data

end

types types values exceptions and even

other structures A signature sp ecies types

and data types and gives the types of val

ues and exceptions A functor is essentially a

At this early stage we cannot know exactly

function from structures to structures Func

what symbols are going to b e nor can we

tors cannot take functors as arguments nor

know what kind of values we are going to

can they pro duce functors as results The

store with the symbols Therefore we imag

purp ose of this lecture is to convince you that

ine structures Sym and Val which declare the

even this apparently simple notion of functor

types sym and value resp ectively Sym sym

constitutes a p owerful extension of the core

is an example of a long identifier in this

language As will b e demonstrated one can

case a long type constructor The two struc

write an entire system using just signatures

ture identiers Sym and Val are free in

and functors and then build the system using

OTable

functor applications

There are many dierent ways of imple

Imagine we want to write a parser for a

menting a symbol table which matches this

programming language In order to build the

signature One p ossibility is to use an asso

parser topdown we might start by sketch

ciation list ie a list of pairs of symbols and

ing the parser itself However as program

values Since the symbol table is going to b e

ming normally is a complex pro cess involv

used extensively we will probably want some

ing b oth the o dd lowlevel implementation

thing more ecient We cannot use an array

idea and more highlevel considerations ab out

for arrays map integers rather than symbols

overall structure let us start at an interme

to values Actually the ML language de

diate level the problem of writing a symbol

nition do es not include arrays but they are

table A symbol table is simply a facility

provided in most implementations But we

which allows one to store and retrieve infor

can implement the symbol table as a hash

mation ab out symbols

table we can require that the Sym struc

ture provides a hash function from symbols

to integers and then assume the existence of

Signatures

another structure IntMap which implements

maps on the integers Since the hash function The way one in ML sketches a structure is to

may map dierent symbols to the same inte write down a signature Here is a rst sketch

ger we take an IntMap which maps integers of a symbol table signature called OTable as

to lists of pairs of symbols and values When binding a structure to a structure

identier one can imp ose a signature con

straint on the structure

signature TTable

sig

structure SymTbl OTable

datatype table TBL of

struct

Sym sym Val valuelist IntMap map

exception Lookup

end

val lookup table Sym sym Val value

val update table Sym sym Val value

table

As a result all identiers of the structure that

end

are not mentioned in the signature are hid

den In the ab ove example we hide the con

structor TBL and the function nd Besides

update the in SymTbl may declare extra

values exceptions and types but as a result

Structures

of the signature constraint none of these ex

Here is a structure which implements a sym

tra comp onents will b e visible from outside

b ol table

the structure

It is often the case that there is not a sin

gle signature which b est constrains a given

structure SymTbl

structure b ecause dierent parts of the pro

struct

gram should see dierent degrees of details of

datatype table TBL of

the structure The parser should b e written

Sym sym Val valuelist IntMap map

using a opaque signature for the symbol ta

exception Lookup

ble by contrast a structure which prints out

the symbol table for testing for example

fun ndsym raise Lookup

will need to know more details

ndsymsymv rest

During the design of ML it was decided

if sym sym then v

that it is vital to admit dierent views of the

else ndsymrest

same structure One way of achieving this is

to bind the structure to more that one struc

fun lookupTBL map s

ture identier each time using a dierent sig

let val n Sym hashs

nature constraint

val l IntMap applymapn

in ndsl

structure SymTbl TTable end handle IntMap NotFound

struct raise Lookup

datatype table TBL of

Sym sym Val valuelist IntMap map end

exception Lookup

and Sym and SymTbl in addition relies on

fun ndsym raise Lookup

IntMap As a consequence we can compile

ndsymsymv rest

neither OTable nor SymTbl in the initial top

if sym sym then v

level environment

else ndsymrest

What we need to achieve this is clearly the

fun lookupTBL map s ability to abstract b oth OTable and SymTbl

let val n Sym hashs on their free identiers Such an abstraction

val l IntMap applymapn is called a functor in ML

in ndsl

end handle IntMap NotFound

raise Lookup

functor SymTblFct

structure IntMap IntMapSig

end

structure Val ValSig

structure Sym SymSig

structure Smal lTbl OTable SymTbl

sig

type table

exception Lookup

The dynamic evaluation of the

val lookup table Sym sym Val value

struct end yields an environment just as

val update table Sym sym Val value

if we had typed the constituent declarations

table

at toplevel Dynamically there is just one

end

lookup function for example and as a re

sult of the ab ove declarations this function

struct

is shared b etween SymTbl and Smal lTbl

datatype table TBL of

Statically however the elab oration of the

Sym sym Val valuelist IntMap map

ab ove declarations yields two dierent views

exception Lookup

of this environment Since there is just

one lookup function we should of course

fun ndsym raise Lookup

b e free to refer to it as SymTbl lookup or

ndsymsymv rest

Smal lTbl lookup whichever we prefer This

if sym sym then v

requires that these two long identiers have

else ndsymrest

the same type so the static semantics must

b e such that the types SymTbl table and

fun lookupTBL map s

Smal lTbl table are considered shared

let val n Sym hashs

val l IntMap applymapn

in ndsl

Functors

end handle IntMap NotFound

raise Lookup

Unfortunately neither the declaration of the

signature OTable nor the declaration of the

end

structure SymTbl makes sense on its own

The reason is that they b oth contain free

identiers OTable relies on structures Val

Now the Sym Val and IntMap that o ccur in hence we are prevented from putting together

the result signature and in the functor b o dy the inconsistent structures

are b ound as formal parameters of the func

What is the signature of MyTbl It is

tor Of course for this functor declaration to

not simply the result signature of SymTblFct

make sense we must rst declare the three

for that signature refers to the formal func

signatures IntMapSig ValSig and SymSig

tor parameters Val and Sym Clearly if

but that can b e done without worrying ab out

Identier sym is string and Data value is

how we get the corresp onding structures In

real then we should b e able to write for in

deed declaring these signatures is healthy ex

stance

ercise as it makes us summarise what a sym

b ol table needs to know ab out symbols val

ues and intmaps

sqrtMyTbl lookupt pi

Exercise Declare the signatures

IntMapSig ValSig and SymSig Also com

plete the functor b o dy by declaring update

The signature of MyTbl therefore is ob

extending your signatures if needed

tained by substituting the types of the actual

When in due course we have dened

arguments for the types in the formal result

actual structures FastIntMap Data and

signature of SymTblFct

Identier say corresp onding to the formal

structures IntMap Val and Sym resp ec

tively we can obtain a particular symbol ta

sig

ble by applying the symbol table functor

type table

exception Lookup

val lookup table Identier sym

structure MyTbl

Data value

SymTblFctstructure IntMap FastIntMap

val update table Identier sym

structure Val Data

Data value table

structure Sym Identier

end

Dynamically the functor b o dy is not eval

uated when the functor is declared but once

Substructures

for each time the functor is applied In this

resp ect functors b ehave a functions in the

As we saw ab ove the signature of the result

of a functor application dep ends on the ac core language

tual arguments to the functor So apparently As part of the functor application the com

there is no single signature which describ es piler will check to see whether the actual ar

all the symbol tables which can b e created gument structures match the sp ecied signa

by applying SymTblFct But how then are tures If that is not the case for instance

we going to declare a functor ParseFct say if Identier do es not contain a type sym or

which we can apply to any symbol table cre FastIntMap apply takes three instead of two

ated by SymTblFct arguments then an error will b e rep orted and

The solution to this problem is to make ex

datatype table TBL of

plicit in the symbol table signature that any

Sym sym Val valuelist IntMap map

symbol table dep ends on a Val and a Sym

exception Lookup

structure

fun ndsym raise Lookup

ndsymsymv rest

if sym sym then v

signature SymTblSig

else ndsymrest

sig

structure Val ValSig

fun lookupTBL map s

structure Sym SymSig

let val n Sym hashs

type table

val l IntMap applymapn

val lookup table Sym sym Val value

in ndsl

val update table Sym sym Val value

end handle IntMap NotFound

table

raise Lookup

end

end

The sp ecications of Val and Sym no longer

refer to particular structures outside the sig

nature ie Val and Sym are now considered

Sharing

b ound in the signature Of course the sig

A signature for lexical analysers might b e as

nature identiers SymSig and ValSig are still

follows a lexical analyser reads individual

free but those you have already declared in

characters from an input le and assembles

the exercise

them into symbols in the case of a pro

The idea is that a structures can contain

gramming language typically reserved words

not just values exceptions and types as com

and identiers

p onents but even other structures These

are called the substructures of the struc

ture To make the result of SymTblFct match

signature LexSig

SymTblSig we have to declare structures Val

sig

and Sym in the b o dy But that is easily done

structure Sym SymSig

we simply bind them to the formal parame

val getsym unit Sym sym

ters

end

functor SymTblFct

structure IntMap IntMapSig

We have included the sp ecication of a sub

structure Val ValSig

structure Sym b ecause a lexical analyser

structure Sym SymSig SymTblSig

needs to know ab out symbols Indeed if we

struct

want to declare LexSig b efore dening any

structure Val Val

particular Sym structure we are forced to in

structure Sym Sym

clude the substructure sp ecication

Here is a rst attempt at dening

struct

ParseFct

let val next Lex getsym

in SymTbl updatetable next declared

end

functor ParseFct

end

structure SymTbl SymTblSig

structure Lex LexSig

struct

One can sp ecify sharing of structures and

of types but not of values or exceptions In

let val next Lex getsym

our example we have to add yet a sharing

in SymTbl updatetable next declared

sp ecication this time a type sharing sp eci

end

cation

end

functor ParseFct

However the let expression in the b o dy

structure SymTbl SymTblSig

is not type correct Since the type of

structure Lex LexSig

getsym is Lex Sym sym next has type

sharing SymTbl Sym Lex Sym

Lex Sym sym However by the sp ecica

and type SymTbl Val value string

tion of update its second argument must b e

struct

of type SymTbl Sym sym The problem is

that although we have sp ecied that SymTbl

let val next Lex getsym

dep ends on a Sym structure and Lex dep ends

in SymTbl updatetable next declared

on a Sym structure nowhere have we sp ec

end

ied that they dep end on the same Sym

end

structure The type checker will not make

an attempt to identify these two types for

the idea is that the functor should b e appli

cable to any arguments that satify the formal

Building the System

parameter sp ecication not just those that

Notice that we have now written the co de of

satisfy the sp ecication and in addition have

the parser solely by declaring signatures and

extra sharing Therefore one is allowed to

functors We have not had to write a single

sp ecify needed sharing as well as needed com

toplevel structure declaration Having n

p onents by a socalled sharing specifica

ished declaring the parser functor we can re

tion Grammatically a sharing sp ecication

turn to the and declare functors that

can o ccur anywhere amongst structure type

implement Sym and Val These functors can

value and exception sp ecications

b e nullary ie have an empty sp ecication

of formal parameters

functor ParseFct

Exercise Write nullary func

structure SymTbl SymTblSig

tors ValFct SymFct and IntMapFct whose

structure Lex LexSig

result match your signatures from the previ

sharing SymTbl Sym Lex Sym

ous exercise

We can now build the entire system by Separate Compilation

functor applications and toplevel structure

Some ML implementations have facilities

declarations

that allow you to compile declarations for

instance functor declarations in such a way

that the compiled co de can p ersist b etween

structure Val ValFct

sessions However even without such a facil

ity using signatures and functors in the man

structure Sym SymFct

ner describ ed ab ove gives the valuable ability

to separately compile mo dules consisting of

structure TTable

signature and functor declarations although

SymTblFctstructure IntMapIntMapFct

the result of the compilation will not outlive

structure Val Val

the session

structure Sym Sym

Most ML systems have a use function

structure Lex LexFctSym

which allows the ML source to b e read from

a le rather than from the terminal One can

structure Parser

then keep signatures in suitably named les

ParseFctstructure SymTbl TTable

and use these les at the b eginning of each

structure Lex Lex

mo dule

use symbsig

The compiler will check that the sharing

use valsig

sp ecied in the declaration of ParseFct really

use symtblsig

is met by the actual argumets

use lexsig

Exercise What is wrong with the fol

use parsesig

lowing attempt to build the parser

functor ParseFct

structure SymTbl SymTblSig

structure Val ValFct

structure Lex LexSig

sharing SymTbl Sym Lex Sym

structure TTable

and type SymTbl Val value string ParseSig

SymTblFctstructure IntMapIntMapFct

struct

structure Val Val

structure Sym SymFct

end

structure Lex LexFctSymFct

structure Parser

ParseFctstructure SymTbl TTable

In this way one avoids rep eating the same

structure Lex Lex

signature declaration in many les and thus

also the problem of up dating all copies if the

signature is changed

Go o d Style

structure Parser

It is go o d practice to keep signatures as small

struct

as p ossible If one programs using functors

structure Lex Lex

and signatures as describ ed ab ove then writ

structure MyPervasives MyPervasives

ing the b o dy of a functor will reveal which

structure ErrorReports ErrorReports

comp onents of its formal parameters that

structure PrintFcns PrintFcns

particular functor needs to know ab out

structure Table Table

Dierent functors will need dierent de

structure BigTable BigTable

tails Rather than gradually extending a sin

structure Aux Aux

gle signature till it gets very large one can use

the include sp ecication to enrich an exist

fun f Table lookup

ing signature

end

signature Smal lTbl sig end

Here the programmer has apparently

made some eort to show that the parser de

signature BigTbl

p ends on the structures listed at the b egin

sig

ning However if he has missed out a couple

include Smal lTbl

of structures from his list it will have no ef

datatype DebugInfo

fect on the declarations that follow and so

val printInfo unitunit

one do es not as a reader feel condent that

end

the list is exhaustive

Moreover when the reader wants to nd

out what the type of the lookup function is

he has to lo ok in the declaration of the Table

structure In case Table is constrained by a

Bad Style

signature the search continues in the decla

Signature declarations can contain free struc

ration of the signature Otherwise one will

ture and type identiers

have to lo ok at the co de for lookup

Structure declarations can contain free

Most serious of all when encountering the

identiers of any kind

call of lookup one has no idea whether lookup

This allows you to write for example

has side eects that are imp ortant to other

structures In that case the value of struc

turing co de into structures and substructures

is purely cosmetic The only reliable help it

structure Parser ParseSig ParseFct

gives you is a p ointer to the structure in which

the identier is declared

One particular horror is the misuse of open

Avaiable b oth in the core language and in

Unfortunately it also allows you to write the mo dules language open S is a declaration

things like which has the eect of adding all the bindings

of the structure S to the current environment

This is helpful if one has a single structure

MyPervasives say which is used everywhere

in the pro ject But lo ok at this

structure Parser

struct

structure Lex Lex

open MyPervasives ErrorReports PrintFcns

Table BigTable Aux

fun f lookup

end

Now nding lookup is reduced to pure

guesswork

Exercise For each of the ab ove p oints

of criticism consider to what extent it applies

if one programs with signatures and functors

only

To avoid such confusion concerning equal

The Static Semantics

ity it is often helpful to distinguish b etween

of Mo dules

a signature expression the syntactic ob

ject and a signature its meaning The

The purp ose of this lecture is to explain the

transition from signature expressions to sig

static semantics of mo dules In particular we

natures is called elaboration We use the

shall lo ok into the details of the crucial con

word elab oration instead of evaluation b e

cepts signature matching and sharing

cause unlike evaluation all elab oration can

b e done statically by a compiler The re

Elab oration

sult of elab orating a signature expression de

p ends on the meaning of the identiers o c

Consider the two following signatures the

curring free in the expression In any given

rst of which stem from the MyTbl example

context there are innitely many signature

of Lecture

expressions that elab orate to the same signa

ture It is even the case that in every con

text every signature expression elab orates to

sig

innitely many signatures if it elab orates to

type table

any at all However among these there will

exception Lookup

always b e some that are principal which

val lookup table Identier sym

means that in a certain technical sense all

Data value

the others are instances of them and one al

val update table Identier sym

ways takes a principal signature as the mean

Data value table

ing of a signature declared at toplevel

end

Elab oration applies to structure ex

sig

pressions and functor declarations as

type table

well yielding structures and functor

exception Lookup

signatures resp ectively

val lookup table string real

Essentially the mo dules part of ML is a

val update table string real table

language for computing these abstract sig

end

natures structures and functor signatures

The purp ose of this lecture is to explain the

principles that govern elab oration

In one sense these signatures are very dif We shall not introduce a separate notation

for structures signatures and functor signa

ferent the meaning of the rst one dep ends

on the free structures Data and Identier tures In many cases these are very similar

whereas the second dep ends on the p erva to the expressions from which they were ob

sives only However if Identier sym hap tained so we make do with the device of dec

p ens to b e string and Data value happ ens to orating expressions with socalled names

b e real then the two expression are just dif Names are semantic ob jects completely dis

ferent ways of expressing the same meaning tinct from identiers in the ab ove examples

In that sense the two signatures turn out to string and Identier sym are b oth identiers

b e equal which elab orate to the same type name

Names Decorating Structures

Each elab oration of a structure expression of

the form

structure Stack

struct end

struct

type elt int

yields a fresh structure ie a structure

datatype stack ST of elt list ref

decorated by a new name Therefore such

val initStack STref

expressions are called generative struc

end

ture expressions

Each elab oration of a data type declaration

structure StackUser

datatype yields a fresh type ie a type

struct

decorated by a new name

structure Stack Stack

structure Stack

end

n1

struct

type elt int

structure StackUser

int

datatype stack ST of elt list ref

struct

t1

val initStack STref

structure Stack Stack

t1

end

datatype stack ST of elt list ref

structure StackUser

end

n2

struct

structure Stack Stack

n1

All the following sharing equations hold

end

StackUser Stack StackUser Stack elt

int Stack stack StackUser

structure StackUser

n38

Stack stack None of the following shar

struct

ing equations hold StackUser StackUser

structure Stack Stack

n1

Stack stack StackUser stack

Sharing equations can b e decided by deco

datatype stack ST of elt list ref

t23

rating programs with names There are two

end

kinds

structure names n n

To b e complete one would have to dec

m m

orate each structure not merely by a name

type names t t

but also with its decorated comp onents and

s s

sub comp onents A structure expression in

unit int bool

the form of a functor application do es not in

itself reveal the comp onents of the resulting Two structures share if they are deco

structure However to keep decorations to a rated by the same structure name two types

minimum we shall usually not sp ell out the share if they are decorated by the same type

decorated sub comp onents name

Decorating Signatures at runtime Therefore when decorating sig

natures one must make sure that if two struc

tures are made to share by b eing given the

signature StackSig

(m1s1s2)

same name then any type or structure which

sig

m1

is visible in b oth structures must b e made to

type elt

s1

share as well

type stack

s2

Exercise Consider the following signa

val new unitstack

unit !s2

tures most of which you have already seen in

end

Lecture

signature TranspSig

(m1s1)

sig

m1

signature ValSig

type elt

s1

sig

type stack

t1

type value

sharing type stack Stack stack

t1 t1

end

val new unitstack

unit !t1

end

signature SymSig

sig

eqtype sym

Bound names are collected at the signature

val hash symint

identier They are listed b etween paren

end

thesis to indicate that they are merely place

holders

signature LexSig

The b ound names of StackSig are m s

sig

and s The free names of StackSig are unit

structure Sym SymSig

and The b ound names of TranspSig are

val getsym unitSym sym

m and s The free names of TranspSig are

end

t unit and

Exercise Consider

signature SymTblSig

sig

structure Val ValSig

signature Symbol

structure Sym SymSig

sig

type table

type symbol

val lookup

type value

table Sym sym Val value

sharing type value int

end

end

Decorate this signature declaration with type

signature ParseSig

and structure names How many b ound

sig

names are there How many free

structure Lex LexSig

structure Tbl SymTblSig

If two structures are found to share by the

sharing Lex Sym Tbl Sym

static analysis then they really are the same

Signature Instantiation

type abstsyn

val parse unitabstsyn

end

structure Stack

n1

struct

type elt int

int

Decorate these signatures When one signa

datatype stack ST of elt list ref

t1

ture refers to another for instance LexSig

fun new STref

unit !t1

refers to SymSig you should put a full deco

end

ration on the structure identier Sym ie a

decoration which shows b oth a name and the

signature StackSigA

(m1s1s2)

sub comp onents of the structure Full decora

sig

m1

tions can b e drawn as trees in the example

type elt

s1

at hand you can decorate Sym by

datatype stack ST of elt list ref

s2

m

val new unitstack

unit !s2

sym hash

R

end

s s int

Make sure that you decorate shared substruc

tures for instance Sym in ParseSig consis

Note that if we substitute n for m int

tently so as to represent that sharing of two

for s and t for s in the decoration of

structures implies sharing of their substruc

StackSigA then we get the decoaration of

tures

Stack We say that Stack is an instance

of StackSig More generally we say that a

structure is an instance of a signature if the

decoration of the former is obtained from the

decoration of the latter by p erforming a sub

stitution of names for the b ound names of

the signature the free names of the signa

ture must b e left unchanged The pro cess of

substituting names for b ound names is called

realisation

ab ove is concerned with instantiating the

b ound names of the signature to the real

structure Stack

n1

names of the structure The second is con

struct

cerned with ignoring information in the struc

type elt int

int

ture which is not required by the signature

datatype stack ST of elt list ref

t1

fun new STref

unit !t1

end

structure Tree

n1

struct

signature StackSigB

(m1s1)

datatype a tree LEAF of a

t1

sig

m1

NODE of a tree a tree

type elt

s1

type intTree int tree

int t1

datatype stack ST of elt list ref

t1

fun maxaint bint

sharing type stack Stack stack

t1 t1

if a b then a else b

val new unitstack

unit !t1

fun depth LEAF

0

a t1!int

end

depthNODEleftright

maxdepth left depth right

end

Stack is an instance of StackSigB via the re

alisation fm n s int g

signature TreeSig

(m1s1s2)

sig

m1

type a tree

structure OddStr

s1

n1

type intTree

struct

s2

fun depth intTreeint

type elt int

int

s2int

end

val test false

bool

end

signature WrongSig

(m1s1)

A structure matches a signature if the struc

sig

m1

ture can b e cut down to an instance of the

type elt

s1

signature by

val test elt

s1

end

forgetting comp onents

forgetting p olymorphism of variables

OddStr is not an instance of WrongSig for

Tree matches TreeSig First p erform the re

s would have to b e realised by int b ecause

alisation fm n s t s int tg on

of elt but then test is decorated by int in the

the signature The resulting decoration can

signature and by bool in the structure

b e obtained from the decoration of Tree by

forgetting the constructors LEAF and

Signature Matching

NODE

Matching of a structure against a signa

0

a t int to int t int instantiating ture is a combination of two op erations

ie the realisation of s int The rst signature instantiation describ ed

Exercise Let mytype b e a type which

signature SymSig

is declared in a structure and sp ecied in a

sig

signature In which of the following cases can

type sym

the structure match the signature

type code

mytype is declared as a datatype and

sharing type code int

sp ecied as a datatype

val hash symint

mytype is declared as a datatype and

val mksym stringsym

sp ecied as a type

val nameof symstring

mytype is declared as a type and sp ec

end

ied as a type

mytype is declared as a type and sp ec

structure Sym SymSig

ied as a datatype

struct

datatype sym SYM of string int

type code int

Signature Constraints

fun converts string code

fun hashSYMs n n

structure Tree TreeSig

fun mksyms SYMs convert s

struct end

fun nameofSYMs s

end

In a structure declaration with an explicit sig

nature constraint the resulting view of the

Exercise Complete the declaration of

declared structure is precisely the one given

convert

by the instantiated signature

In the example ab ove the resulting view

Exercise Declare a dierent structure

of Tree will hide the constructors NODE and

NewSym also constrained by SymSig such

LEAF and the function max Notice how

that NewSym sym shares with string Which

ever that intTree is decorated by the instance

of the following expressions are valid

of s ie by int t where t is the decora

a NewSym mksym d

0

tion of a tr ee Consequently Tree intTree

a Sym mksym d

and int Tree tree now mean the same thing

namely int t This sharing was obtained

through relisation it was not explicit in

TreeSig

In short explicit signature constraints can

remove comp onents and p olymorphism but

they do not aect existing sharing

Here is an example of a structure declared

with an explicit signature constraint You

should convince youself that the structure re

ally do es match the signature

Decorating Functors b o dy with each generative name replaced by

a fresh name

Dynamically the b o dy of a functor is not

evaluated when the functor is declared but

it is evaluated once for each time the functor

is applied

functor StackFct

(m1s1)

struct

m1

datatype stack ST of int list ref

s1

functor StackFct

val data STref

s1

struct

datatype stack ST of int list ref

end

val data STref

structure Stack StackFct

n7

end

structure Stack StackFct

n8

structure Stack StackFct

structure Stack StackFct

Exercise The decorations of Stack

Since the two applications of StackFct create

and Stack show the topmost structure

two distinct references Stack and Stack

name only Complete the decorations

are dierent and must not b e seen to share

Now let us consider the problem of deco

rating StackFct with names We start out

Notice that Stack and

by decorating the b o dy in the usual way

Stack do not share not even the types

However each time we need a fresh name

Stack stack and Stack stack share Con

we record it at the in the rst line The

sequently the variables Stack data and

names that hence are accumulated are called

Stack data have dierent types and so the

the generative names of the functor The

type checker prevents one from mistaking the

generative names are b ound in the sense that

one for the other

they stand as place holders for fresh names

which we choose when we eventually apply

the functor Like the b ound names in sig

External Sharing

natures we write generative names b etween

parenthesis unlike the b ound names of sig

natures generative names are written on the

Within the b o dy of a functor one may refer to

right b ecause they concern the right side

identiers of any kind declared in the con

only

text of the functor Such identiers are said

In the case of nullary functors ie functors to o ccur free in the functor This results

that take an empty argument the structure in external sharing ie a decoration in

resulting from a functor application is deco which some of the names stem from outside

rated by taking the decoration of the functor the functor

Functors with Arguments

structure MyPervasives

struct

n1

signature SymSig

(m1s1)

datatype num NUM of int

t1

sig

m1

eqtype sym

s1

end

end

0

functor StackFct

(m2)

functor SymDirSym SymSig

(m2s2)

struct

m2

struct

m2

structure MyPer MyPervasives

n1

datatype dir DIR of

s2

type stack

t1 list ref

Sym symint

MyPer num list ref

fun update

val data stack ref

t1 list ref

end

end

0 0

structure Stack StackFct

n8

0 0

When decorating the b o dy of a functor which

structure Stack StackFct

n9

has an argument we assume that we have a

structure by the name of the formal param

eter which precisely matches the parameter

signature We assume neither more comp o

Notice that external names are not genera

nents nor more sharing than is sp ecied in

tive they are left unchanged when the func

the signature for we want the functor to b e

tor is applied

applicable to all actual argument structures

Exercise Which of the following shar

that match the formal parameter signature

ing equations hold

In the ab ove example we simply assume

0 0

Stack Stack

that the name of Sym is m and that the

0 0

Stack MyPer Stack MyPer

name of Sym sym is s as those names are

0 0

type Stack stack Stack stack

not used of free structures elsewhere in gen

eral one might have to rename some of the

b ound names Having used m and s we

simply start generating names from m and

s in the b o dy

structure Actual

n1

struct type sym string

string

end

structure Result SymDirActual

n2

Same question but for the earlier de Result receives a fresh structure name and

nition of SymDir Result dir a fresh type name Note that

Actual matches SymSig

A full decoration of the result of applying

a functor with one argument can b e obtained

as follows

Sharing Between Argu

ment and Result

Match the actual argument against the

formal parameter signature yielding a re

signature SymSig

(m1s1)

alisation which maps b ound names to

sig

formal signature to names in the actual

m1

eqtype sym

argument

s1

end

Apply this realisation to the decoration

of the functor b o dy

functor SymDirSym SymSig

(m2)

struct

m2

also substitute fresh names for the gen

type dir Sym symint

s1!int

erative names of the functor b o dy

fun update

end

Explicit Result Signa

tures

When a functor declaration contains a result

The type name s is shared b etween the ar

signature the decoration of the functor dec

gument and the b o dy When the functor is

laration pro ceeds as follows

applied this sharing must b e translated into

sharing b etween the actual argument and the

decorate the functor without the result

actual result

signature

decorate the result signature If one can

structure Actual

n1

get an instance of the result signature

struct type sym string

string

by removing comp onents and p olymor

end

phicm from the decorated b o dy then

this instance is used as a formal result

structure Result SymDirActual

n2

instead of the decorated b o dy otherwise

the declaration is rejected

This has the eect that up on application of

Exercise Complete the decoration of

the functor sharing is propagated as b efore

Result

but only the comp onents and p olymorphism

of the result signature are visible in the actual

Exercise Using the latest denition

result

of SymDir is the following expression legal

Exercise Why is it not always the case

fn dResult dir d ab c that obtaining R by

functor F S SIG SIG

struct end

structure R F

is equivalent to obtaining R by

functor F S SIG

struct end

structure R SIG F

Give a condition on SIG under which this

dierence disapp ears

Implementing an Interpreter in ML

The purp ose of this lecture is to show a worked example of program development using ML

mo dules We shall tackle the problem of implementing a small ML system The system

is of course going to considerable simplied compared to a real ML implementation

We implement only a few of the language constructs found in real ML The user of our

system will not get the ability to declare new types and data types however there will

b e arithmetic on the buildin integers if then else expressions and indeed lists

higher order functions and recursion so it is far from a trivial language We shall refer to

this language as Mini ML

Moreover the system will b e an interpreter rather than a compiler It still has a type

checker indeed we shall see how one can implement a restricted form of p olymorphism

The system is actually running and you can mo dify and extend it provided you have

access to an implementation and to the les listed in App endix B To make life easier for

you we provide a parse functor which can parse a string the Mini ML source expression

into an abstract syntax tree the shap e of which will b e dened b elow The rest of

the interpreter works on abstract syntax trees

The interpreter uses a typechecker to check the validity of input expressions and an

evaluator to evaluate them Initially the typechecker and evaluator handle only a tiny

subset of Mini ML In this lecture I shall show how one in successive steps can extend the

typechecker to handle p olymorphic lists variables and let expressions In the practical

sessions you can extend the evaluator in the same manner it is easier than extending the

typechecker

The typechecker and the evaluator can b e developed indep endently as long as you

do not change the signatures we provide This will allow you to take the typechecker

functors I have written and plug into your own system as you improve the p ower of your

evaluator Alternatively you might want to mo dify or extend my typechecker functors

and take over evaluator functors that other p eople write

The source of the bare interpreter is in App endix A An overview of how to run the

systems is provided in App endix B

The development of the typechecker and the evaluator need not b e in step You can

disable either by assigning false to one of the variables tc and eval

signature INTERPRETER

sig

val interpret string string

val eval bool ref

and tc bool ref

end

The syntax of the language is as follows

exp exp exp

exp exp

exp exp

true

false

exp exp

if exp then exp else exp

exp exp

exp exp n

1 n

let x exp in exp

let rec x exp in exp

x

fn x exp

exp exp function application

n natural numbers

exp

The abstract syntax of Mini ML is dened as a datatype in the signature EXPRESSION

Exercise Find this signature What is the constructor corresp onding to let expres

sions

We program with signatures and functors only After the signatures which we shall

not yet study the rst functor is the interpreter itself

Exercise Find this functor Find the application of TyprType Find its type What

do you think TyprType is supp osed to do What is the type of abstsyn What do you

think the evaluator is supp osed to do when asked to evaluate something which has not

yet b een implemented

We shall now describ e Version the bare typechecker and then pro ceed to the ex

tensions

VERSION The bare Typechecker App endix A

The rst version is just able to type check integer constants and As signature TYPE

reveals the type Type of types is abstract but there are functions we can use to build

types and decomp ose them unTypeInt is one of the latter it is supp osed to raise Type if

applied to any Mini ML type dierent from the int however the type int is represented

This is a common way of hiding implementation details and it might b e helpfull to lo ok

at how functor Type pro duces a structure which matches the signature Type

As revealed by signature TYPECHECKER the typechecker is going to dep end on the

abstract syntax and a Type structure However as you can see from the declaration of

functor TypeChecker all the typechecker knows ab out the implementation of types is

what is sp ecied by the signature TYPE This allows us to exp eriment with the implemen

tation of types to obtain greater eciency without changing the typechecker as we shall

see in the later stages As you see from functor TypeChecker all the typechecker is

capable of handling is integer constants and

Exercise Mo dify the typechecker to handle true false and multiplication of inte

gers

Given the signature and functor declarations in App endix A one can build the system

First we imp ort the parser

use parsersml

and then we build the system by the following declarations which can b e read from le

buildsml

structure Expression Expression

structure Parser ParserExpression

structure Value Value

structure Evaluator

Evaluatorstructure Expression Expression

structure Value Value

structure Ty Type

structure TyCh

TypeCheckerstructure Ex Expression

structure Ty Ty

structure Interpreter

Interpreterstructure Ty Ty

structure Value Value

structure Parser Parser

structure TyCh TyCh

structure Evaluator Evaluator

open Interpreter

VERSION Adding lists and p olymorphism

The rst extension is to implement the type checking of lists In Version the type of

an expression could b e inferred either directly as in the case of true and false or from

the type of the sub expressions as in the case of the arithmetic op erations When we

introduce list this is no longer the case Consider for example the expression

if then else

Supp ose we want to type check by rst type checking the left sub expression

then the right sub expression and nally checking that the left and righthand

sides are of the same type b efore returning the type bool The problem now is that

when we try to type check we cannot know that this empty list is supp osed to b e an

integer list The typechecker therefore just ascrib es the type a list to where a

is a type variable The of course turns out to b e an int list The typechecker

now compares the two types a list and int list and discovers that they can b e

made the same by applying the substitution that maps a to int Hence the type of the

expression dep ends not just on the expression itself but also on the context of the

expression The context can force the type inferred for the expression to b ecome more

sp ecic

This comparison of types p erformed by the typechecker is called unification and

is an algebraic op eration of great imp ortance in symbolic computing Indeed whole pro

gramming languages have evolved around the idea of unication for example

Here is a couple of examples to illustrate how unications works in the sp ecial case of

interest that of unifying types

This expression is welltyped The p oint is that the can b e regarded as an int list

list Let us see how the typechecker manages to infer the type int list list list for

The typechecker rst rewrites the expression to the equivalent

Checking the rst argument of the topmost yields

a list

To check we rst check the lefthand

To check this we rst check the lefthand To check this we rst check

the lefthand for which the typechecker wisely infer the type int Continuing to the

righthand part of gets the type a list To check the of

we now unify int list and a list which results in the substitution

S a int

1

Thus the type of is int list

Returning to the righthand rst gets type a list which

by unication with int list list yields the substitution

S a int list

2

Thus the type of is int list list

Returning to the righthand gets the type a

list which by unication with int list list list yields the substitution

S a int list list

3

Thus the type of is int list list list

Finally returning to and we get to unify a list with int list list list

yielding the substitution

S a int list list

4

The type of and therefore the type of is thus found to b e int list list list

Note that

is NOT welltyped In an attempt to compute S we would now b e unifying int list

4

list and int list list list and that gives a unication error

To implement all this we rst extend the TYPE signature and introduce a new signa

ture UNIFY

signature TYPE

sig

eqtype tyvar

val freshTyvar unit tyvar

val mkTypeTyvar tyvar Type

and unTypeTyvar Type tyvar

val mkTypeList Type Type

and unTypeList Type Type

type subst

val Id subst

the identify substitution

val mkSubst tyvarType subst

make singleton substitution

val on subst Type Type

application

val prType Typestring printing

end

signature UNIFY

sig

structure Type TYPE

exception NotImplemented of string

exception Unify

val unify TypeType TypeType Typesubst

end

The nice thing is that we can extend the typechecker without knowing anything ab out

the inner workings of unication simply by including a formal parameter of signature

UNIFY in the typechecker functor

functor TypeChecker

structure Ty TYPE

structure Unify UNIFY

sharing UnifyType Ty

struct

infix on

val op on Tyon

fun tc exp ExExpression TyType

case exp of

ExLISTexpr

let val new TyfreshTyvar

in TymkTypeListTymkTypeT yvar new

end

ExCONSexpree

let val t tc e

val t tc e

val new TyfreshTyvar

val newt TymkTypeTyvar new

val t TymkTypeList newt

val S Unifyunifyt t

handle UnifyUnify

raise TypeErroreexpected list type

val S UnifyunifyS on newtS on t

handle UnifyUnify

raise TypeErrorexp

element and list have different types

in S on S on t

end

handle UnifyNotImplemented msg raise NotImplemented msg

end TypeChecker

We also have to extend the Type functor to meet the enriched TYPE signature The

easiest way of doing this is

functor TypeTYPE

struct

type tyvar int

val freshTyvar

let val r ref in fnr r r end

datatype Type INT

BOOL

LIST of Type

TYVAR of tyvar

fun mkTypeTyvar tv TYVAR tv

and unTypeTyvarTYVAR tv tv

unTypeTyvar raise Type

fun mkTypeListtLIST t

and unTypeListLIST t t

unTypeList raise Type

type subst Type Type

fun Id x x

fun mkSubsttvty

let fun suTYVAR tv if tvtv then ty else TYVAR tv

suINT INT

suBOOL BOOL

suLIST ty LIST su ty

in su

end

fun onSt St

fun prType

prType LIST ty prType ty list

prType TYVAR tv a makestring tv

end

Exercise Extend Version to handle equality All you have to do is to ll in the

relevant case in the denition of the function tc See app endix B ab out how you get the

source of Version

VERSION A dierent implementation of types

Version arises from Version by replacing the Type functor by a dierent implementation

of types The idea is that istead of having substitutions as functions we can implement

type variables by references p ointers and then do substitutions directly by assignments

In case you have not seen the reserved word withtype b efore withtype is used to

declare a type abbreviaton lo cally within a datatype declaration

functor ImpTypeTYPE

struct

datatype a option NONE SOME of a

datatype Type INT

BOOL

LIST of Type

TYVAR of tyvar

withtype tyvar Type option ref

type tyvar Type option ref

fun freshTyvar ref NONE

exception Type

fun mkTypeInt INT

and unTypeIntINT

unTypeIntTYVARref SOME t unTypeInt t

unTypeInt raise Type

type subst unit

val Id

exception MkSubst

fun mkSubsttvty

case tv of

refNONE tv SOME ty

refSOME t raise MkSubst

fun onSt t

fun prType

prType TYVAR ref NONE a

prType TYVAR ref SOME t prType t

end

We can now build two systems at the same time and compare the eciency of the

two implementations The nice thing is that we do not have to mo dify the typechecker

functor at all nor do we even have to mo dify the unication functor we can just extend

the nal sequence of structure declarations to use b oth implementations of types

Exercise When I did this I found to my surprise that the functional version in

some cases was twice as fast and never slower than the imp erative variant The relative

p erformance of the two vary greatly from expression to expression Can you nd an

expression for which the imp erative version really is faster See App endix B for how to

get hold of the source of Version Be careful with generating very demanding tasks for

the ML system you can make it crash

ML implementors normally opt for the imp erative version In all fairness the ab ove

comparison ignores that comp osing substitutions is much easier in the imp erative version

than it is in the applicative version in the fragment of Mini ML considered so far we

have not had to comp ose substitutions

One should not b e to o concerned with p erformance issues at to o early a stage It can

b e surprisingly dicult to predict where eciency is most needed and it is much more

imp ortant at rst to get the overall structure of the system right It was imp ortant

for example that we did NOT make the constructors of the datatype Type visible in the

signature TYPE and that we wrote the unication algorithm in a way which do es not use

the internal structure of Type Had we not done this we would not have b een able to

switch from one implementation to another that easily and therefore chances are that we

would chosen the imp erative one assuming that it was the more ecient one without

ever trying the obvious applicative implentation

VERSION Introducing variables and let

We now extend Version by implementing the type checking of let expressions and of

identiers

The typechecker function tc now has to take TWO arguments

tcTE e

where e is an expression and TE is a type environment which maps variables o ccurring

free in e to type shemes The denition of what a type scheme is will b e given b elow

for now it suces to know that every type can b e regarded as a type scheme

To take an example if TE maps x to int and y to int then tc will deduce the type

int for the expression xy However if TE mapp ed y to bool there would b e a type

error

The fact that we can bind variables to expressions whose types have b een inferred to

contain type variables means that we get type variables in the type environment For

instance to type check

let x in x end

we rst check yielding the type a list say Then we bind x to the type scheme

aa list Here the binding a of a indicates that when we lo ok up the the type

of x in the type environment we return a type obtained from the type scheme aa

list by instantiating the b ound variables here just a by fresh type variables In our

example when we lo ok up x in the type environment during the checking of x we

instantiate a to a fresh type variable a say yielding the type a list for x Thus

we get to unify int list against a list yielding the substitution of int for a

Throughout the b o dy of the let x will b e b ound to aa list in the type

environment Since we take a fresh instance of this type scheme each time we lo ok up x

we can use x b oth as an int list and as an int list list say

let x in xx end

Exercise Assuming that you instantiate the b ound a to a when you meet the last

o ccurrence of x what two types should b e unied and what is the resulting substitution

on a

The variable x is an example of polymorphism after x has b een declared an o c

currence of x can p otentially b e given innitely many types int list bool list int

list list and so on all captured by the type scheme aa list In ML a type

scheme always takes the form n where are type variables

1 n 1 n

and is a type In the fragment of Mini ML considered so far it will always b e the case

that any type variable o ccurring in is amongst the but when one introduces

1 n

functions and application this no longer is the case

Here is how we implement variables and let We rst extend the TYPE signature

signature TYPE

sig

type TypeScheme

val instance TypeScheme Type

val close Type TypeScheme

end

Version App endix A already contains a signature for environments nd it It

was actually intended for the practical where you need it to extend the evaluator but we

can make use of it to implement type environments The signature of the typechecker

can b e left unchanged but we need to change the functor that builds the typechecker by

including the environment management among the formal parameters

functor TypeChecker

structure Ex EXPRESSION

structure Ty TYPE

structure Unify UNIFY

sharing UnifyType Ty

structure TE ENVIRONMENT

struct

infix on

val op on Tyon

structure Exp Ex

structure Type Ty

exception NotImplemented of string

exception TypeError of ExExpression string

fun tc TE TyTypeScheme TEEnvironment exp ExExpression TyType

case exp of

ExBOOLexpr b TymkTypeBool

ExNUMBERexpr TymkTypeInt

ExSUMexpree checkIntBinTEee

ExDIFFexpree checkIntBinTEee

ExPRODexpree checkIntBinTEee

ExLISTexpr

let val new TyfreshTyvar

in TymkTypeListTymkTypeT yvar new

end

ExLISTexprees tc TE ExCONSexpreExLISTex pr es

ExCONSexpree

let val t tcTE e

val t tcTE e

val new TyfreshTyvar

val newt TymkTypeTyvar new

val t TymkTypeList newt

val S Unifyunifyt t

handle UnifyUnify

raise TypeErroreexpected list type

val S UnifyunifyS on newtS on t

handle UnifyUnify

raise TypeErrorexpelement and list have different types

in S on S on t

end

ExEQexpr raise NotImplemented equality

ExCONDexpr raise NotImplemented conditional

ExDECLexprxee

let val t tcTEe

val typeScheme Tycloset

in tcTEdeclarextypeSche meT E e

end

ExRECDECLexpr raise NotImplemented rec decl

ExIDENTexpr x

TyinstanceTEretrieve xT E

handle TERetrieve

raise TypeErrorexpidentifier x not declared

ExLAMBDAexpr raise NotImplemented function

ExAPPLexpr raise NotImplemented application

handle UnifyNotImplemented msg raise NotImplemented msg

and checkIntBinTEee

let val t tcTEe

val TyunTypeInt t

handle TyType raise TypeErroreexpected int

val t tcTEe

val TyunTypeInt t

handle TyType raise TypeErroreexpected int

in TymkTypeInt

end

fun typechecke tcTEemptyEnve

end TypeChecker

Then we extend the Type functor to match the TYPE signature

functor TypeTYPE

struct

datatype TypeScheme FORALL of tyvar list Type

fun instanceFORALLtyvars ty

let val oldtonewtyvars map fn tvtvfreshTyvar tyvars

exception Find

fun findtv raise Find

findtvtvnewtvr est

if tvtv then newtv else findtvrest

fun tyinstance INT INT

tyinstance BOOL BOOL

tyinstance LIST t LISTtyinstance t

tyinstance TYVAR tv

TYVARfindtvoldtonew tyv ars

handle Find tv

in

tyinstance ty

end

fun closety

let fun fvINT

fvBOOL

fvLIST t fvt

fvTYVAR tv tv

in FORALLfv tyty

end

end

Finally the system is rebuilt as in Version except that we have to provide and link

in an Environment functor which matches ENVIRONMENT

Exercise Extend Version with if then else This extension has no

subtle implications for the type checking

Exercise For the extra keen Extend Version to cop e with lambda abstraction

fn and application First you have to introduce arrow types with constructors and

destructors Then you have to change the type of close so that it takes two arguments

namely a type environment and a type It should return the type scheme that is obtained

by quantifying on all the type variables that o ccur in the type but do not o ccur free in

the type environment

Then you can mo dify the type checker When you type check a lambda abstraction

you just bind the formal parameter to the trivial type scheme which is just a fresh type

variable no quantied variables Thus the type environment can now contain type

schemes with free type variables

An application tcTEe now yields two arguments namely a type t and a substitu

tion S the idea is that if you apply the substitution S to the type environment TE which

now can contain free type variables the expression e has the type t When an expression

consists of more than one sub expression the type environment gradually b ecomes more

and more sp ecic by applying the substitutions pro duced by the checking of the sub ex

pressions one by one Moreover the substitution returned from the whole expression is

the comp osition of these individual substitutions You have to extend the TYPE signature

and the Type functor with comp osition of substitutions

Finally you can extend the unication algorithm to cop e with arrow types This will

also use comp osition of substitutions

Acknowledgement

The parser and evaluator and all the signatures related to them are due to Nick Rothwell

App endix A The bare Interpreter

interpsml VERSION the bare interpreter

signature INTERPRETER

sig

val interpret string string

val eval bool ref

and tc bool ref

end

syntax

signature EXPRESSION

sig

datatype Expression

SUMexpr of Expression Expression

DIFFexpr of Expression Expression

PRODexpr of Expression Expression

BOOLexpr of bool

EQexpr of Expression Expression

CONDexpr of Expression Expression Expression

CONSexpr of Expression Expression

LISTexpr of Expression list

DECLexpr of string Expression Expression

RECDECLexpr of string Expression Expression

IDENTexpr of string

LAMBDAexpr of string Expression

APPLexpr of Expression Expression

NUMBERexpr of int

end

parsing

signature PARSER

sig

structure E EXPRESSION

exception Lexical of string

exception Syntax of string

val parse string EExpression

end

environments

signature ENVIRONMENT

sig

type object Environment

exception Retrieve of string

val emptyEnv object Environment

val declare string object object Environment

object Environment

val retrieve string object Environment object

end

evaluation

signature VALUE

sig

type Value

exception Value

val mkValueNumber int Value

and unValueNumber Value int

val mkValueBool bool Value

and unValueBool Value bool

val ValueNil Value

val mkValueCons Value Value Value

and unValueHead Value Value

and unValueTail Value Value

val eqValue Value Value bool

val printValue Value string

end

signature EVALUATOR

sig

structure Exp EXPRESSION

structure Val VALUE

exception Unimplemented

val evaluate ExpExpression ValValue

end

type checking

signature TYPE

sig

type Type

constructors and decstructors

exception Type

val mkTypeInt unit Type

and unTypeInt Type unit

val mkTypeBool unit Type

and unTypeBool Type unit

val prType Typestring

end

signature TYPECHECKER

sig

structure Exp EXPRESSION

structure Type TYPE

exception NotImplemented of string

exception TypeError of ExpExpression string

val typecheck ExpExpression TypeType

end

the interpreter

functor Interpreter

structure Ty TYPE

structure Value VALUE

structure Parser PARSER

structure TyCh TYPECHECKER

structure EvaluatorEVALUATOR

sharing ParserE TyChExp EvaluatorExp

and TyChType Ty

and EvaluatorVal Value

INTERPRETER

struct

val eval ref true toggle for evaluation

and tc ref true toggle for type checking

fun interpretstr

let val abstsyn Parserparse str

val typestr if tc then

TyprTypeTyChtypechec k abstsyn

else disabled

val valuestr if eval then

ValueprintValueEvaluator eva luat e abstsyn

else disabled

in valuestr typestr

end

handle EvaluatorUnimplemented

Evaluator not fully implemented

TyChNotImplemented msg

Typechecker not fully implemented msg

ValueValue Runtime error

ParserSyntax msg Syntax Error msg

ParserLexical msg Lexical Error msg

TyChTypeErrormsg Type Error msg

end

the evaluator

functor Evaluator

structure Expression EXPRESSION

structure Value VALUEEVALUATOR

struct

structure Exp Expression

structure Val Value

exception Unimplemented

local

open Expression Value

fun evaluate exp

case exp

of BOOLexpr b mkValueBool b

NUMBERexpr i mkValueNumber i

SUMexpre e

let val e evaluate e

val e evaluate e

in

mkValueNumberunValueNumb er e

unValueNumber e

end

DIFFexpre e

let val e evaluate e

val e evaluate e

in

mkValueNumberunValueNumb er e

unValueNumber e

end

PRODexpre e

let val e evaluate e

val e evaluate e

in

mkValueNumberunValueNumb er e

unValueNumber e

end

EQexpr raise Unimplemented

CONDexpr raise Unimplemented

CONSexpr raise Unimplemented

LISTexpr raise Unimplemented

DECLexpr raise Unimplemented

RECDECLexpr raise Unimplemented

IDENTexpr raise Unimplemented

LAMBDAexpr raise Unimplemented

APPLexpr raise Unimplemented

in

val evaluate evaluate

end

end

the typechecker

functor TypeChecker

structure Ex EXPRESSION

structure Ty TYPE

struct

structure Exp Ex

structure Type Ty

exception NotImplemented of string

exception TypeError of ExExpression string

fun tc exp ExExpression TyType

case exp of

ExBOOLexpr b raise NotImplemented

boolean constants

ExNUMBERexpr TymkTypeInt

ExSUMexpree checkIntBinee

ExDIFFexpr raise NotImplemented minus

ExPRODexpr raise NotImplemented product

ExLISTexpr raise NotImplemented lists

ExCONSexpr raise NotImplemented lists

ExEQexpr raise NotImplemented equality

ExCONDexpr raise NotImplemented conditional

ExDECLexpr raise NotImplemented declaration

ExRECDECLexpr raise NotImplemented rec decl

ExIDENTexpr raise NotImplemented identifier

ExLAMBDAexpr raise NotImplemented function

ExAPPLexpr raise NotImplemented application

and checkIntBinee

let val t tc e

val TyunTypeInt t

handle TyType

raise TypeErroreexpected int

val t tc e

val TyunTypeInt t

handle TyType

raise TypeErroreexpected int

in TymkTypeInt

end

val typecheck tc

end TypeChecker

the basics nullary functors

functor TypeTYPE

struct

datatype Type INT

BOOL

exception Type

fun mkTypeInt INT

and unTypeIntINT

unTypeInt raise Type

fun mkTypeBool BOOL

and unTypeBoolBOOL

unTypeBool raise Type

fun prType INT int

prType BOOL bool

end

functor Expression EXPRESSION

struct

type a pair a a

datatype Expression

SUMexpr of Expression pair

DIFFexpr of Expression pair

PRODexpr of Expression pair

BOOLexpr of bool

EQexpr of Expression pair

CONDexpr of Expression Expression Expression

CONSexpr of Expression pair

LISTexpr of Expression list

DECLexpr of string Expression Expression

RECDECLexpr of string Expression Expression

IDENTexpr of string

LAMBDAexpr of string Expression

APPLexpr of Expression Expression

NUMBERexpr of int

end

functor Value VALUE

struct

type a pair a a

datatype Value NUMBERvalue of int

BOOLvalue of bool

NILvalue

CONSvalue of Value pair

exception Value

val mkValueNumber NUMBERvalue

val mkValueBool BOOLvalue

val ValueNil NILvalue

val mkValueCons CONSvalue

fun unValueNumberNUMBERval uei i

unValueNumber raise Value

fun unValueBoolBOOLvalueb b

unValueBool raise Value

fun unValueHeadCONSvaluec c

unValueHead raise Value

fun unValueTailCONSvalue c c

unValueTail raise Value

fun eqValuec c c c

Prettyprinting

fun printValueNUMBERvalue i makestringi

printValueBOOLvaluetr ue true

printValueBOOLvaluefa lse false

printValueNILvalue

printValueCONSvalueco ns

printValueListcons

and printValueListhd NILvalue printValuehd

printValueListhd CONSvaluetl

printValuehd printValueListtl

printValueList raise Value

end

App endix B Files

The following les are available in the directory usrcheopsmadscourse

interpsml Version as included in App endix A

interpsml interpsml The other versions

buildsml the structure declarations needed to build Version

buildsml buildsml Similarly for the other versions

parsersml The parser functor

To build Version say you type the following assuming you have copied the les to

your directory

use interpsml

use parsersml

use buildsml

Since the parser functor is completely closed you dont have to include it more than

once in every session although you will probably want to build your system several times

while you exp eriment with the extensions