Title: Region-Based Memory Management1 Authors: Mads Tofte and Jean-Pierre Talpin a Liation: Mads Tofte: Department of Computer
Total Page:16
File Type:pdf, Size:1020Kb
Title RegionBased Memory Management Authors Mads Tofte and JeanPierre Talpin Aliation Mads Tofte Department of Computer Science Universityof Cop enhagen JeanPierre Talpin IRISA Campus de Beaulieu France An earlier version of this work was presented at the st ACM SIGPLANSIGACT Sym p osium on Principles of Programming Languages Portland Oregon January Abstract This pap er describ es a memory management disciplin e for programs that p erform dynamic memory allo cation and deallo cation At runtime all values are put into regions The store consists of a stack of regions All p oints of region allo cation and deallo cation are inferred automatically using a typ e and eect based program analysis The scheme do es not assume the presence of a garbage collector The scheme was rst presented byTofte and Talpin subse quently it has b een tested in The ML Kit with Regions a regionbased garbagecollection free implementation of the Standard ML Core language which includes recursive datatyp es higherorder functions and up datable references Birkedal et al Elsman and Hallenb erg This pap er denes a regionbased dynamic semantics for a skeletal pro gramming language extracted from Standard ML We present the inference system which sp ecies where regions can b e allo cated and deallo cated and a detailed pro of that the system is sound with resp ect to a standard se mantics We conclude by giving some advice on how to write programs that run well on a stack of regions based on practical exp erience with the ML Kit Intro duction Computers have nite memory Very often the total of memory allo cated by a program as it is run far exceeds the size of the memory Thus a practical discipline of programming must provide some form of memory recycling One of the key achievements of early work in programming languages is the invention of the notion of blo ck structure and the asso ciated implementation tech nology of stackbased memory management for recycling of memory In blo ck structured languages every p oint of allo cation is matched bya point of deallo ca tion and these p oints can easily b e identied in the source programNaur Dijkstra Prop erly used the stack discipline can result in very ecient use of memory the maximum memory usage b eing b ounded by the depth of the call stack rather than the numb er of memory allo cations The stack discipline has its limitations however as witnessed by restrictions in the typ e systems of blo ckstructured languages For example pro cedures are typically prevented from returning lists or pro cedures as results There are two main reasons for such restrictions First for the stackdiscipline to work the size of a value must b e known at the latest when space for that value is allo cated This allows for example arrays which are lo cal to a pro cedure and have their size determined by the arguments of the pro cedure by contrast it is not in general p ossible to determine howbig a list is going to b ecome when generation of the list b egins alues must com Second for the stackdiscipline to work the lifetime of v ply with the allo cation and deallo cation scheme asso ciated with blo ck structure When pro cedures are values there is a danger that a pro cedure value refers to values whichhave b een deallo cated For example consider the following program let x in fn y x y end This expression is an application of a function denoted by letendtothe number The function has formal parameter y and b o dy x y where stands for rst pro jection fn is pronounced in SML Thus the op erator expression is supp osed to evaluate to fn y x y where x is b ound to the pair so that the whole expression evaluates to the pair However if we regard the letend construct as a blo ck construct rather than just a lexical scop e we see why a stackbased implementation would not work we cannot deallo cate the space for x at the end since the rst comp onentof x is still needed by the function which is returned by the en tire let expression One way to ease the limitations of the stack discipline is to allow program mer controlled allo cation and deallo cation of memory as is done in C C has r r r r Figure The store is a stack of regions every region is uniquely identied bya region name eg r and is depicted bya box in the picture two op erations malloc and free for allo cation and deallo cation resp ectively Unfortunately it is in general very hard for a programmer to know when a blo ck of memory do es not contain any livevalues and may therefore b e freed conse quently this solution very easily leads to socalled spaceleaks ie to programs that use much more memory than exp ected Functional languages like Haskell and Standard ML and some ob jectoriented languages eg JAVA instead let a separate routine in the runtime system the garbage col lectortake care of deallo cation of memory Knuth Baker Lieb erman Allo cation is done by the program often at a very high rate In our example the three expressions fn y x y and x y each allo cate memoryeach time they are evaluated The part of memory used for holding suchvalues is called the heap the role of the garbage collector is to recycle those parts of the heap that hold only dead values ie values which are of no consequence to the rest of the computation Garbage collection can b e very fast provided the computer has enough mem ory Indeed there is a much quoted argument that the amortized cost of copying garbage collection tends to zero as memory tends to innity App el page It is not the case however that languages such as Standard ML frees the programmer completely from having to worry ab out memory management To write ecient SML programs one must understand the p otential dangers of for example accidental copying or survival of large data structures If a program is written without concern for space usage it maywell use much more memory than one would like even if the problem is lo cated using a space proler for example turning a spacewasting program into a spaceecient one may require ma jor changes to the co de The purp ose of the work rep orted in this pap er is to advo cate a compromise between the two extremes completely manual vs completely automatic memory management We prop ose a memory mo del which the user may use when pro gramming memory can b e thought of as a stack of regions see Figure Each region is like a stackofunb ounded size which grows upwards in the picture until the region in its entirety is p opp ed o the region stack For example a typical use of a region is to hold a list A program analysis automatically identies pro gram p oints where entire regions can b e allo cated and deallo cated and decides for eachvaluepro ducing expression into which region the value should b e put More sp ecicallywe translate every welltyp ed source language expression einto a target language expression e which is identical with e except for certain region annotations The evaluation of e corresp onds step for step to the evaluation of eTwo forms of annotations are e at letregion in e end The rst form is used whenever e is an expression which directly pro duces a value Constant expressions abstractions and tuple expressions fall into this category The is a region variable it indicates that the value of e is to b e put in the region b ound to The second form intro duces a region variable with lo cal scop e e At run time rst an unused region identied bya region name r is allo cated and b ound to Then e is evaluated probably using the region named r Finally the region is deallo cated The letregion expression is the only wayofintro ducing and eliminating regions Hence regions are allo cated and deallo cated in a stacklike manner The target program which corresp onds to the ab ove source program is e letregion in letregion in let x at at at in y x y at at end end at end We shall step through the evaluation of this expression in detail in Section Brieyevaluation starts in a region stack with three regions and and and evaluation then allo cates and deallo cates three more regions at the end and contain the nal result The scheme forms the basis of the ML Kit with Regions a compiler for the Standard ML Core language including higherorder functions references and recursive datatyp es The region inference rules we describ e in this pap er address life times only A solution to the other problem handling values of unknown size is addressed in Birkedal et al An imp ortant optimisation turns out to b e to distinguish b etween regions whose size can b e determined statically and those that cannot The former can b e allo cated on a usual stack Using C terminology region analysis infers where to insert calls to malloc and free but b eware that the analysis has only b een develop ed in the context of Standard ML and relies on the fact that SML is rather more strongly typ ed than C For a strongly typ ed imp erative language likeJAVA region inference mightbe useful for freeing memory unlikeCJAVA do es not have free For readers who are interested in co de generation App endix A shows the threeaddress program which the ML Kit pro duces from the ab ove program using b oth region inference and the additional optimisations describ ed in Birkedal et al However this pap er is primarily ab out the semantics of regions not their implementation Exp erience with the Kit is that prop erly used the region scheme is strong enough to execute demanding b enchmarks and to make considerable space sav ings compared to a garbagecollected system