Henk: a Typed Intermediate Language
Total Page:16
File Type:pdf, Size:1020Kb
Henk a typed intermediate language Simon Peyton Jones University of Glasgow and Oregon Graduate Institute Erik Meijer University of Utrecht and Oregon Graduate Institute May Abstract accurate compiletime types and the compiler can check its own activity if desired by checking the typecorrectness of the intermediate program We elab orate each of these p oints There is growing interest in the use of richlytyped interme in Section diate languages in sophisticated compilers for higherorder typed source languages These intermediate languages are This pap er describ es the design of a new typed intermediate typically stratied involving terms types and kinds As language Henk designed for compilers for purelyfunctiona l the sophisticati on of the type system increases these three languages It has the following distinctive features levels b egin to lo ok more and more similar so an attractive approach is to use a single syntax and a single data type in Henk is based directly on the lambdacube an expres the compiler to represent all three sive family of typed lambda calculi We have found a shortage of introductory material ab out the lambda The theory of socalled pure type systems makes precisely cub e so we present a tutorial in Section such an identication This pap er describ es Henk a new typed intermediate language based closely on a particular Henk is a smal l language there are only seven con pure type system the lambda cube On the way we give a structors in the data type of expressions Even so it tutorial introduction to the lambda cub e is a real language in the sense that it is rich enough to use as a compiler intermediate language Because of its lambdacub e heritage Henk uses a sin Overview gle syntax for terms types and kinds Better still compilers for Henk can use a single data type to repre Many compilers can b e divided into three main stages The sent all three levels This leads to considerable econ front end translates the source language into an intermedi omy in b oth the syntax of the language and the utility ate language the midd le end transforms the intermediate functions of the compiler itself language into a more ecient form and the back end trans Henk has an explicit concrete syntax Intermediate lan lates the intermediate language into the target language guages are typically expressed only as a data type in In the past intermediate languages for source languages a particular compiler Giving a concrete syntax is an with rich type systems have usually b een untyped The apparently trivial step but we b elieve it is an imp or compiler rst typechecks the source program and then tant one and it is one to which compilerwriters often translates the program to the intermediate language dis pay little attention A compiler front end can pro duce carding all the type information After all the type check Henk to b e consumed by the back end of a dierent ing simply ensured that the program would not go wrong compiler or to b e transformed by some external pro at runtime and once that is checked there is no further use gram b efore b eing fed back into the original compiler for types This pap er introduces no new technical results Rather our Recently however there has b een increasing interest in typed main contribution is to build a bridge b etween recently intermediate languages Peyton Jones Peyton Jones developed type theory and the compiler research commu et al Shao App el Tarditi et al nity There are several motivations for such an approach the compiler may b e able to take advantage of type information to generate b etter co de it may b e desirable to treat types as We are the rst to suggest using the lambda cub e values at runtime in which case it is necessary to maintain as the basis for a typed intermediate language Why b other We see two p ersuasive reasons { It dramatically reduces the number of data types This pap er will b e presented at the June Work and the volume of co de required in the com shop on Types in Compilation piler For example the Glasgow Haskell Compiler GHC has separate data types for terms types and kinds and separate algorithms for parsing functions Gill Launchbury Peyton Jones printing typing and transforming them Col Hu Iwasaki Takeichi Launch lapsing the three levels gets rid of all this dupli bury Sheard cation Traditional static type checking is p erformed com { It is easier to accommo date new developments in pletely at compile time In more sophisticated set the type system of the source language b ecause tings however it may b e useful to p ostp one some type the lambda cub es type language is already as ex checking until run time For example pressive as its term language { Some reasonable programs cannot readily b e ex We give a tutorial on the lambda cub e emphasising pressed in a static type systems notably ones in asp ects relevant to compiler builders Most of the lit volving metaprogramming There are many pro erature is relatively recent p ost and written p osals for incorp orating a type Dynamic in an oth from the p oint of view of theorists erwise staticallytyped language but all involve some runtime check that the type of a value matches an exp ected type Our initial fo cus is on nonstrict languages but we hop e to { Rather than statically sp ecialise a p olymorphic make Henk neutral with resp ect to the strictnonstrict ques function for the types at which it is called one tion That is rather than having two variants of Henk one can pass the type as an explicit argument so that strict one nonstrict we hop e to have a single language the function can b ehave appropriately Harp er that treats b oth styles as rst class citizens and allows free Morrisett mixing of the two in a single program { Tagfree garbage collectors need some runtime We assume familiarity with the lambda calculus in general type information to guide them Tarditi and with the secondorder lambda calculus also called F Tolmach in particular All these applicatio ns require accurate type informa tion to b e available at runtime and hence at compile time to o Background and motivation Debugging a compiler can b e a nightmare The only evidence of an incorrect transformation can b e a seg Typedirected compilation mentation fault in a large program compiled by the compiler Identifying the cause of the crash and trac ing it back to the faulty transformation is a long It has b een recognized for some time that maintaining type slow business If instead the compiler typechecks information right through to co de generation and b eyond the intermediatelanguag e program after every trans can b e b enecial Sp ecicall y formation the huge ma jority of transformation bugs can b e nailed in short order It is quite dicult to write Accurate type information can guide compiler analy an incorrect transformation that is type correct Fur ses and transformations thermore the program cannot crash if it is type cor For example rect so even incorrect transformations can only give an unexp ected result not a crash which is usually much { Strictness analysis over nonat domains has to easier to trace b e guided by type information Peyton Jones We call the Henk typechecker Core Lint If the Partain compiler is correct Core Lint do es nothing useful and { A compiler may b e able to use more ecient rep is switched o by default In our exp erience its ability resentations of data values if it knows their types to lo calise compiler bugs would by itself justify the use For example an integer can b e represented by the of a typed intermediate language integer itself rather than a p ointer to a b ox con taining the integer either in a strict language Compilers that maintain and use type information through or in a lazy one in contexts where the integer out the compiler have come to b e called typedirected is certainly evaluated The price to b e paid is Standard ML of New Jersey has b een typedirected for some that such unboxed integers cannot b e passed to time Shao App el but its internal type system is p olymorphic functions since their representation monomorphic so it cannot track types through p olymorphic diers from the simple p ointer that p olymorphic functions Since the Glasgow Haskell Compiler GHC co de typically exp ects Guided by type informa has used a language based on the secondorder lambda calcu tion one can sp ecialise the p olymorphic function lus as its intermediate language Peyton Jones Pey to the unboxed types at which it is used Pey ton Jones et al ton Jones Launchbury Tarditi et al More recently the TIL compiler uses a considerably more so { Transformations that remove intermediate data phisticated and complicated intermediate language capa structures often called deforestation or fusion ble of expressing intensional p olymorphism includin g recur transformations rely heavily on types to guide sive functions at the level of types Morrisset Tarditi the transformation In some cases the very cor et al TIL uses a stratied type system which sim rectness of the transformation relies on para plies the pro of theory but it do es lead to considerable du metricity a prop erty of welltyped p olymorphic plication in compiler transformations Tarditi We sp eculate that the lambda cub e might provide a theoreti Sp ecicall y Henk go es b eyond the second order lambda cal cally sound way to get the b est of b oth worlds culus in the following ways Shao is also developing a typed intermediate language It is elegantly parameterised Simply by selecting or FLINT with similar goals to TILs Shao