<<

On the Representation of Datatyp es in Isab elleHOL

N Volker

FernUniversitat Hagen Germany

Abstract

Representation of datatyp es is a necessary prerequisite if one wants to proverat

her than p ostulate the characteristic theorems of datatyp es This pap er intro duces

two notions of representation functions for typ es and shows how representations

of comp osed typ es can b e calculated from representations of their constituents

Together with a representation of basic typ es due to Paulson this provides a

basis for the mechanization of datatyp es in Isab elleHOL

Intro duction

Datatyp es are imp ortant ingredients of many theories mo delling computations We will

b e concerned here with datatyp es which are generated from a numb er of elements and

functions the constructors This means that every element of the typ e can b e written as

a constructor term ie as an application of constructors to constructors Furthermore

our datatyp es will b e freely generated by the constructors ie the constructors are

distinct and injective This implies that every element of the datatyp e is denoted by

a unique constructor term As a consequence suchtyp es enjoy a structural induction

theorem and allow the denition of functions by primitive recursion

In the categorical setting these typ es arise as initial elements in certain categories

of algebras Therefore one also sp eaks of initial algebras Other names used are

recursive or inductivetyp es without laws We will simply call them datatyp es

y destructor functions whose domain is The dual of typ es are characterized b

the carrier of the typ e An example for such a codatatyp e are innite lists Although

parts of our discussion should carry over to codatatyp es we will not consider them

here

Currently the Isab elleHOL system provides a datatyp e denition package which

hasanumb er of shortcomings

The prop erties of the new typ es are p ostulated axiomatically

The declaration of a typ e T maynotcontain applications of typ e op erators to

T This forbids denitions like

a b Tree LEAF a j NODE b a b Tree List

There is no supp ort for mutually recursive datatyp es

We will outline in this pap er an approach which aims to prove the characteristic theo

rems of the new datatyp e This is achieved by representing typ es by sets of a certain

typ e whichwas intro duced byPaulson for exactly this purp ose Our main contri

bution is the derivation of representations of typ e expressions from representations of

its constituents This construction based on a generalization of the map function from

lists to an arbitrary datatyp e

The typ e a b Tree mo dels trees with arbitrary nite branching and elements of

a resp b in the leafs resp no des This example is inspired from a slightly simpler

datatyp e in That article discusses some basic questions concerning automatic

supp ort for datatyp es in another higher order pro of assistant namely the HOL system

We are currently working on an implementation along the lines suggested in this

pap er We stress that this pap er rep orts on work in progress and do es not claim to b e

the nal word on representations of datatyp es in higher order logic

Map for arbitrary datatyp es

Atseveral places of our exp osition we will make use of the fact that for every datatyp e

map whichistheT analogue of the wellknown function T one can dene a function T

map on lists Since the denition of the general mapping function is technically some

what involved we will illustrate it rst by a couple of examples Recall that lists can

b e dened by the datatyp e declaration

a List j a a List

For the typ e T List the function T map agrees with map ie wehave

List map a b a List b List

and

map f List

List map f Cons x xs Cons fx List map fxs

Intuitively List map preserves the structure given by the list constructors and Cons

but changes the values of those constructor arguments whose typ e in is a

For the datatyp e Tree dened ab ove the typ e of the mapping function is

Tree map a b a b a a Tree b b Tree

   

map is dened by primitive recursion The recursive o ccur Again the function Tree

rence of Tree within List is reected by an application of List map to Tree map f f



map f f LEAF a LEAF f a Tree



Tree map f f NODE a ts NODE f a List map Tree map f f ts

    

For ary datatyp es ie unparameterized typ es suchasnat the mapping function

map f is the identity on that typ e Next we will showhow to dene the function T

for the case of an arbitrary one parameter datatyp e aT with m constructors The

declaration of suchatyp e has the form

aT T  j  j C T 

m m

where T is the j th argumenttyp e of constructor C Of course the number of typ e

i j i

arguments can vary from constructor to constructor and can also b e zero Every T is

i j

atyp e expression build up from the typ e variable a recursive o ccurrences of aT and

previously dened datatyp es Note that T is only allowed to o ccur with the parameter

a ie instantiations of T are not allowed

Let f b e some function from a typ e A to another typ e B Then the mapping of f

over T

map f AT BT T

preserves constructors For a constructor C with k  arguments we therefore have

i

T map f C x  x C h T x  h T x

i i i k i i i i k i k

T On ary constructors the function T map f is the identity for certain functions h

i j

As indicated by the notation the function h T dep ends on the typ e T in

i j i j

Ty is dened by induction on the structure of Ty as follows For a typ e expression Ty h

h a f

h aT T map f

h T T D map h T h T

l l

D stands here for an arbitrary previously datatyp e of some arity l

For a general n ary datatyp e T the typ e of the mapping function is

T map a b  a b a  a T b  b T

n n n n

The general denition of T map f f follows exactly the same scheme as for the

n

T stays the same except that case n The denition of the auxiliary functions h

i j

equation is replaced by setting

h a f

i i

for i n

Although it is not usual in higher order logic the pro duct and sum typ e themselves

can b e dened as datatyp es

a b Inl a j Inr b

a b a b

Their mapping op erators are characterized by the following equations

map f g Inl a Inl fa Sum

map f g Inr b Inr gb Sum

Prod map f g a b fa gb

A treatment of the generalized mapping function in the categorical framework can b e found in

Another generic function

For the denition of representing sets we will b e interested in computing the range of

T map f f It turns out that this can b e expressed in terms of another generic

n

function

a set a set a  a T set T

n n

set A  A which can b e dened for an arbitrary n ary datatyp e T Intuitively T

n

will consist of those elements of T which can b e generated from the sets A  A

n

The meaning of generated will b e made precise using inductive denitions

First let us consider the example of the typ e List The function

List set a set a List set

should takeasetA into the set of all lists with elements in A This set is characterized

by the following twointro duction rules

a A l List set A

List set A Cons a l List set A

Note how each rule corresp onds to the typing of one List constructor In fact wecan

deriveevery rule systematically from the typing rule of the corresp onding constructor

by simply replacing the name List by List set and a by A

For a general datatyp e a  a T with m constructors we obtain analogously

n

m intro duction rules for T set A  A from the typing rules of the constructors

n

set and a by A for i n by replacing the name T by T

i i

By the principle of inductive denition a nite numb er of rules such as the ones

ab ove uniquely sp ecies a set cf chapter of and the literature cited there This

set is the intersection of all the sets which comply to those rules

Supp ort for inductive denitions in higher order logic pro of systems has b een des

crib ed in and Both approaches implement inductively dened sets as least xed

p oints

Our interest in T set stems from the equality

range T map f f T set range f range f

n n

set is obviously monotonic We note the fact that T

Representations of typ es

The principal means to add new typ es in Isab elleHOL without risking inconsistencies is

the subtype facility This function is similar to the HOL systems new type definition

and allows to dene a new typ e T which is isomorphic to a nonempty subset S of an

existing typ e RT Isomorphic here means that there exists an injective function T

rep

from the new typ e T onto its representing set S In the subtype package the new

typ e is declared and constants are intro duced for the representation function T

rep

and an inverse abstraction function T The inverse relationship b etween these two

abs

functions and the fact that the representing set S is the image of the representation function are p ostulated as axioms RT T_rep T S

T_abs

Figure Denition of new typ es using subtype

The use of the subtype declaration is the only waytoavoid the intro duction of

non trivial axioms in the denition of a datatyp e T in Isab elleHOL It implies the

following sub division of our task

Construct a representing set S

Dene the new typ e T byasubtype declaration This requires a pro of that S is

not empty

Dene the constructors by using the abstraction and representation functions

Generate and prove the characteristic theorems

In the following we will b e considering not just the representation of a single typ e but

of a whole class C of typ es Such a representation asso ciates every typ e T in C with a

representation function T Obviouslywe require the representation functions to b e

rep

injective

Because we will need to combine representations all the representation functions

should ideally have the same target typ e Rep However there is a problem here

Consider the family A of typ es dened by

A a  a

n n

Clearly a typ e T can only b e emb edded in another typ e T if all typ e variables of T



also o ccur in T This implies that it is not p ossible to emb ed all elements of A in one



typ e For a general class C the b est we can ask for is that a typ e T parameterized by

n dierenttyp es variables a  a has a representation

n

T a  a T a  a Rep

rep n n

Of course rep eated typ e parameters only have to o ccur once in the representation typ e

In particular if all typ e parameters of a typ e are instantiated to one typ e parameter

we can exp ect a representation

T a  a T aRep

rep

Lastly unparameterized typ es T suchasnat should b e representable by functions

T T a Rep rep

Our aim is now to nd representations of typ e expressions built from typ es for which

we already have a representation As an example consider the typ e

a b Tree List

which o ccurs in the denition of typ e TreeIntuitively a representation of a list ts of

trees should b e formed by rst representing all elements of ts and then representing the

resulting list of representations This suggests a dierent notion of a representation of

a parameterized typ e T namely as a function which turns a T structure of represen

tations into a representation Formally this means that for a n parameter datatyp e T

we are lo oking for an injective function

T a Rep a Rep T a Rep

REP

z

n

Given such a function this would allow us to dene

a b Tree List List List map Tree

rep REP rep

More generally a representation of a typ e expression could b e derived from represen

tations of its constituents by setting

T T T T T map T T

n REP n

rep rep

rep

The injectivity of this function follows from the easily proven fact that the mapping

map of injective functions over some typ e Ty is again injective Ty

Can we construct T from a given representation function T Instantiating

REP rep

gives us

T a Rep a Rep a Rep Rep

rep

If we compare this with the typ e in we note that the target typ e should b e a Rep

This suggests that we require an injective function

flat a Rep Rep a Rep Rep

The name is b orrowed from the function flat which attens a list of lists by concate

nation Assuming the existence of such a function Rep flatwe can simply dene the

desired function T by

REP

T Rep flat T

REP rep

The injectivityofT follows of course from the fact that the comp osition of injective

REP

functions is injective

Unfortunately there are typ es Rep whichwe mightwant to use as a representation

typ e but for whichnosuch injection Rep flat exists A prominent example is the

typ e of endomorphic functions

a Fun a a

Its attening function is of typ e

flat a a a a a a Fun

Now for arbitrary typ es T and T the HOL typ e T T contains all set theoretic

 

functions from T to T However it is wellknown in set theory that only oneelement



sets S allow an injection of typ e S S S This follows from the fact that for S

with a cardinality strictly greater than one the cardinalityofS S is always greater

than the cardinalityofS

flat of the typ e required ab ove The Hence there exists no injective function Fun

same argument also shows immediately that the function space op erator has no REP

representation This is in contrast to the fact that it is easy to construct a representation

Fun a b a b Fun

rep

After constructing T from T assumenowconversely that we are given a repre

REP rep

sentation function T of typ e Let

REP

In a a  a

n i i n

b e the i th injection into the n fold sum Further assume the existence of an emb edding

 a a Rep

Then we can dene a representation function T of typ e by setting

rep

T T T map  In  In

rep REP n n n

The two kinds of representation functions are therefore equivalent in the sense that we

can calculate one from the other under suitable assumptions ab out the representation

typ e Rep Note that the two notions coincide for typ es without constants

The ab ove section describ es only some basic requirements of typ e representations

We are currently investigating the usefulness of further prop erties like monad axioms

for Rep cf

Representation of some basic typ es

For the derivation of representing sets for datatyp es we require at least representations

ofanumb er of basic typ es These typ es are the p olymorphic typ e a the unit typ e

unit and the sum and pro duct typ es The necessity to represent these typ es stems from

the fact that semantically the bars and commas in a datatyp e declaration corresp ond

to the sum resp pro duct of typ es The constructor names are simply the tags of the

sum An empty argument list corresp onds to the unit typ e The representation of a

is necessary for the treatmentoftyp e variables

A construction of representations for these four typ es has b een given byLPaulson

in It was implemented by him in the theory Univ which is part of the standard

Isab elle HOL library Rather than listing this theory here we will just summarize

some of its main features Details can b e found in the cited pap er resp in the theory

les

In Univthy the typ e used for representation is called item It corresp onds to

the typ e Rep of the previous section We will not go into the denition of this typ e

itself For our discussion here the main p oint of the typ e item is the existence of three

functions

Leaf a a item

Numb nat a item

a item a item a item infix

which are injective and havedisjoint ranges Hence these functions generate a subset of

a item which is isomorphic to a datatyp e with these three functions as constructors

Function Leaf corresp onds to the function  from ab ove It provides a represen

tation of a More generally n dierenttyp e variables a  a are represented by

n

setting

a Leaf In

i n i

rep

for i n

Function Numb provides a representation of natural numb ers In particular this can

b e used to dene a representation of the unit typ e by

unit Numb

REP

Using the bijection

 fab f fst ab snd ab

between twoparameter functions and functions with pairs as parameters we can trans

form op erator into a REP representation of the pro duct typ e

Prod a item a item a item

REP

Prod  ab  fst ab snd ab

REP

For the denition of a representation of injections

InA Numb A

InA Numb A

are dened Since these functions have disjoint images one obtains the following REP

representation of the sum typ e

Sum a item a item a item

REP

Sum sum case In In

REP

For the representation of typ e expressions it is imp ortantthatwe can dene an injec

tion

item flat a item item a item

Its eect on the subset of a item spanned by the three functions Leaf Numb and is

given by the primitive recursive equalities

flat Leaf a a item

item flat Numb n Numb n

flat M N item flat M item flatN item

For the derivation of map functions on datatyp es an item map function is useful Again

we do not give its denition but only note the following equalities

map f Leaf a Leaf fa item

item map f Numb n Numb n

item map f M N item map fM item map fN

The two functions item flat and item map are at the time of this writing not part of

the standard Isab elle distribution

Because of the injectivity of representation functions the range of T can serveas

rep

a representingsetforatyp e T For the following range calculations note the equality

range f g f range g

It uses the image op erator dened by

S

f A a A ffag

The range of a general typ e instantiation can b e computed as follows

range T T T

n

rep

f equation g

T range T T T map T

REP REP n

rep rep

f equations g

T T set range T range T

REP n

rep rep

This expression can b e evaluated further for concrete T In the case of the sum typ e

wehave

S S

set A B a A b B  fa b g Prod

which implies

range A B

rep

f derivation ab ove g

set range A range B Prod Prod

rep rep REP

f equations simplications g

S S

a range A  b range B fa b g

rep rep

f denition of  below g

range A  range B

rep rep

where the binary op erator  is dened by

S S

X  Y a A  b B  fa b g

Analogouslywe can derive

range A B range A  range B

rep rep

rep

where the binary op erator   is given by

X  Y In X In Y

Note that the two op erators  and  are obviously monotonic

The representing set of a datatyp e

Wewillnow explain how the little representation theory of the preceding sections can

b e used to construct representing sets for datatyp es Instead of showing this abstractly

we will demonstrate it for the case of the typ e Tree and trust that the reader can infer

the general pro cedure Recall the denition of Tree

a b Tree LEAF a j NODE b a b Tree List

As mentioned b efore the bars and the commas in a datatyp e declaration corresp ond

to sum and pro duct of typ es This means that we can interpret this typ e declaration

as an isomorphism of typ es

a b Tree a b a b Tree List

This isomorphism suggests that the range of the representation function of b oth sides

should b e the same ie

a b a b Tree List range a b Tree range

rep rep

Let

R range a b Tree

rep

Using the representation calculus develop ed in the previous sections equation can

b e reduced to

R range Leaf Inl

 range Leaf Inr  List List set R

REP

Because of the monotonicity of the o ccurring op erations on set the abstraction of the

right hand side over R is a monotonous function Hence Tarskis xed p oint theorem

implies the existence and uniqueness of a smallest solution R to this equality cf

or the theory Lfp in the standard Isab elleHOL distribution This set is used as the

representing set of a b Tree

In general the construction of the representing set requires that we already have

representation functions of all the typ es o ccurring in the denition of a datatyp e T

except of course for T itself

Instead of the denition of R as a least xed p oint an equivalentcharacterization

using inductive sets is also p ossible In our example we get the following two rules

rs List set R

In Leaf Inl a R In Leaf Inr b List rs

REP

In general if there are more than two constructors it is necessary to distinguish them

by suitable combinations of In and In

Denition of typ e and constructors

According to our workplan from section wecannow declare the typ e Tree bya

subtype denition using R as representing set The nonemptiness of R follows easily

from equality For a general datatyp e T it requires at least one constructor whose

argumenttyp es do not contain recursive o ccurrences of T

From the subset declaration of Treewe obtain a representation function Tree

rep

and an inverse abstraction function Tree The typing of Tree agrees with the

abs rep

general typing Per construction the ranges of the representing functions of the

twotyp es in are equal Note that nothing is said ab out the equality of the repre

sentation functions themselves

The constructors of a new typ e T are obtained by

representing the arguments

combining the results using and InIn and

abstracting into T

In the case of typ e Tree this results in

LEAF a a b Tree

NODE b a b Tree a b Tree

LEAF a Tree In Leaf Inl a

abs

NODE b ts Tree In Leaf Inr a

abs

List List map Tree ts

REP rep

ary constructors are represented using Numb ie the representation of the only

element of the unit typ e

Theorems

For a datatyp e T the induction theorem expresses the wellknown principle of structural

induction over typ e T For its pro of note that the inductivecharacterization of the

representing set gives us an induction theorem for that set The induction theorem for

T is obtained directly by lifting this theorem to T using T For our example typ e

abs

it reads as follows

a  P LEAF a

set f t  Ptg P NODE b ts b  ts  ts List

t  Pt

Primitive recursive functions are the unique functions solving certain functional equa

lities In our example this means that for a pair of functions

h a c

h b a b Tree List c List c



there exists a unique function f satisfying

f LEAF a h a

map fts f NODE b ts h b ts List 

Theorems justifying primitive recursion can b e proven by an inductive construction of

the graph of the function cf

In a dierent notion of primitive recursion is suggested In the case of typ e

Tree for example it is required that given

h a c

h b a b Tree List d c



h d



h a b Tree a b Tree List c d d



there exists a unique pair of functions f and g satisfying

f LEAF a h a

f NODE b ts h b ts gts



g h



f Cons t ts h t ts ft gts



For the typ e Treewehaveproven that this agrees with our notion of primitive recursive

functions It seems that this also holds for arbitrary datatyp es but wehave not proven

this formally yet

Concluding remarks

Wehave presented an approach to the representation of datatyp es which applies also to

datatyp es T with recursive o ccurrences of T inside of typ e expressions Our approachis

based on two generic functions T map and T set which can b e dened for an arbitrary

datatyp e T The use of these functions has led us also to a dierentformulation of the

induction theorem and of primitive recursive functions than usual

The discussion of representations in the rst four sections applies also to codatatyp es

and to mutual recursivetyp es By suitable adjustments of the material in the last sec

tions an extension of our approach to these typ es should b e p ossible

Acknowledgements

Iwould like to thank M Cieliebak for his comments

References

E Gunter Whywe cant have SMLstyle datatyp e declarations in HOL In LJM

Claesen and MJC Gordon editors Higher Order Logic Theorem Proving and its

Applicationsvolume A of IFIP Transactions pages Leuven Belgium

NorthHolland

J A Goguen J W Thatcher E G Wagner and J B Wright Initial algebra

semantics and continuous algebras Journal of the ACM

John Harrison Inductive denitions automation and application Pro ceedings

of the International Workshop on the HOL theorem proving system and its

applications Asp en Grove Utah To app ear

E Meijer MM Fokkinga and R Paterson with bananas

lenses envelop es and barb ed wire In John Hughes editor Functional Programming

and Computer Architecturevolume of Lecture Notes in

SpringerVerlag

MJC Gordon and TF Melham Introduction to HOL Cambridge University

Press

Larry Paulson Coinduction and corecursion in higherorder logic Technical

Rep ort Computer Lab oratory UniversityofCambridge England

Philip Wadler Monads for functional programming In Lecture notes for Markt

oberdorf Summer School on Program Design Calculi SpringerVerlag

G Winskel The Formal Semantics of Programming Languages An Introduction

Foundations of Computing The MIT Press