Region Inference for an Object-Oriented Language
Total Page:16
File Type:pdf, Size:1020Kb
Region Inference for an Object-Oriented Language £ £ ¤¡ £ ¥ Wei-Ngan Chin ¢¡ , Florin Craciun , Shengchao Qin , and Martin Rinard Computer Science Programme, Singapore-MIT Alliance £ Department of Computer Science, National University of Singapore ¥ Laboratory for Computer Science, Massachusetts Institute of Technology ¦ chinwn,craciunm,qinsc § @comp.nus.edu.sg, [email protected] ABSTRACT time tests for danglings can be omitted. Region-based memory management offers several important ad- There are two main approaches to prevent unsafe accesses. One vantages over garbage-collected heap, including real-time perfor- approach[12] uses a type-and-effect system to analyse regions that mance, better data locality and efficient use of limited memory. The might be accessed, so that those regions that are not being accessed concept of regions was first introduced for a call-by-value func- can be deallocated earlier, even if there are dangling references tional language by Tofte and Talpin, and has since been advocated to these regions. Another approach[8] prevents all potential dan- for imperative and object-oriented languages. Scope memory, a gling references (from older to younger regions) by ensuring that lexical variant of regions, is now a core feature in a recent proposal each object always point-to other objects in regions with equal or longer lifetimes. The former approach (no-dangling-access) may on Real-Time Specification for Java (RTSJ)[4]. 1 Recent works in region-based programming for Java have fo- yield more precise lifetimes , but the latter approach (no-dangling) cused on region-checking which requires manual effort in choos- is required by RTSJ and for co-existence with garbage collection. ing regions with appropriate lifetimes. In this paper, we propose an In both approaches, region annotation poses considerable mental automatic region-inference type system for a core subset of Java. overheads for programmers, not to mention compatibility problems To provide an inference method that is both precise and practical, with legacy code. In addition, the quality of hand-annotation varies, we support classes and methods that are region-polymorphic; and with sub-optimal outcome for less experienced programmers. with region-polymorphic recursion for methods. One challeng- In this paper, we provide a systematic attempt at formulating a ing aspect of our inference rules is to ensure safe region program- region inference type system for a core subset of Java. We shall ming in the presence of class subtyping, method overriding and adopt the no-dangling approach and make use of lexically scoped downcast operations. Our set of region inference rules can han- regions, where the last region to be created is the first to be deallo- dle these object-oriented features safely, without creating dangling cated. The adoption of stack of scoped regions (with determinable references. lifetimes) is intended to simplify our inference method, and for con- formance with RTSJ. Our main contributions are as follows: 1. INTRODUCTION ¨ To the best of our knowledge, no one has successfully for- In region-based memory management, each new object is added malised a sound and complete region inference method for to a region with a designated lifetime. While objects may be added object-oriented languages. We propose a systematic solution to their respective regions at different times, the entire set of ob- to this problem using a core subset of the Java language. jects in each region are freed simultaneously, when the region is ¨ deleted. Various studies have shown that region-based program- Our region type rules prevent dangling references by requir- ming can provide safe memory management with good real-time ing that all field references outlive the current object. We performance. Data locality is also improved when related objects formalise this explicitly through region lifetime constraints, are placed together in the same region. By classifying objects into without the need for an effect-based typing. regions based on their lifetimes, better utilization of memory can ¨ We support classes and methods with region-polymorphism. be achieved when dead regions are recovered on a timely basis. In addition, region polymorphic recursion is supported for Recently, several works have investigated region-based program- methods but not classes. This provides for an inference that ming for Java-based languages [17, 12, 11, 8]. Most of these works is precise and yet efficient. (e.g. [12, 8]) require programmers to manually annotate their pro- grams with regions. Region type-checking then guarantees that ¨ Our inference scheme has been crafted to support class sub- well-typed programs never access dangling references, so that run- typing and method overriding. Previous work [12] required “phantom regions” to ensure soundness of class inheritance which may cause a loss in lifetime precision. We propose an improved solution without phantom regions, where modular compilation can be guided by a global dependency graph. ¨ We provide a theorem stating the correctness of our inference scheme which always leads to well-typed programs that are 1However, experiments done in [24] indicates no noticeable per- . formance difference between the two approaches. region-safe. Such well-typed programs are guaranteed not to £ £('*) #%$ $ & create dangling references in either the store or stack. def meth £ £/ $ $ & £ . +-, def class +-, extends field meth ¨ $ & We also provide a compile-time analysis which ensures that $ 0 prim int 0 bool void $ & downcast operations are region-safe. Previous proposal for $ 1 a region-checking system (e.g. [6]) requires runtime checks +2,30 prim $ $ & for downcast operations. field 1 f £ '*) $ $ & 1 16587 meth mn( 4 ) ¨ '*) £9' / $ $ & 16587 We describe a prototype implementation of our region infer- . 4 ' '*) $ & ence system. Preliminary experiments suggest that our infer- $ 5 5©= 0 0 (cn) null 0;:<0 f ' ' £ & & 5>= 5 ence is competitive to hand annotation. 5 0 0 0 new cn( ) f ' ' £ £ 5>= 5 5 £ 0 0 0 mn( ) mn( ) ; ' ' 5 £ The remainder of this paper is organised as follows. Sec 2 in- 0 if then else troduces the language, called Core-Java, and its region-annotated target form. Sec 3 presents a set of guidelines for good region an- (a) The Source Language notation. Sec 4 is devoted to region inference. It formalises the £ £('*) $ & region inference rules and describes how method override conflicts #%$ def meth Q £ £/ $ $ & £ . can be rectified, together with correctness theorems. Sec 5 deals def class ca extends ca where rc field meth ¤CB $ & with the region-safety of downcast operations. Sec 6 reports on a $ @A preliminary experiment from our implementation of region infer- +-? cn $ $ & 0 ence. Sec 7 discusses related works, followed by some concluding prim int 0 bool void $ $ & 1 remarks in Sec 8. +-,D0 prim B £ $ $ & 1 t @A $ & 2. CORE-JAVA AND REGION TYPES field $ t f B £ £ '*) $ $ & 5E7 4 We present the syntax of a significant (subset of) Java-like lan- meth t mn @A ( t ) where rc '*) £F' / $ $ & 5E7 . guage named Core-Java in Fig 1(a), which will be adopted as the 4 t ' '*) $ $ & 5>= source language for our region inference process. Core-Java is de- 5 0;:<0 0 0 ( +G? ) null f ' ' £ & & 5 5>= 5 +-? 0 0 signed in the same minimalist spirit as Featherweight Java[26]. It 0 new ( ) f B B ' ' £ £ £ £ 5©= 5 5 £ @A 0 @A 0 supports assignments but is also an expression-oriented language 0 mn ( ) mn ( ) ; ' ' '*) 5 £ 0 0 where statements are expressions with the void type. Loops are if then else letreg A in $ $ & & & 6H £ £ KJ £ A 0IA A 0IA A 0 omitted, as their effects can be captured by tail-recursion. rc A B = = ML £ O O 0 0INE@A A The target language, called region-annotated Core-Java, is given 0 rc rc true QB £ / $ $ & & 7 = = O O in Fig 1(b). This language is extended with region types and region . A Q 4PNE@A rc constraints for each class and method. In addition, local regions with lexical scopes are introduced by the letreg declaration. (b) The Target Language Note that denotes a region variable, while ¡ represents a data £ variable. The suffix notation ¢ denotes a list of zero or more dis- tinct syntactic terms that are separated by appropriate separators, Figure 1: The Syntax of Core-Java ¤ while ¢ represents a list of one or more distinct syntactic terms. ¥§¦ ¡ ¡©¨ The syntactic terms, ¢ , could be , , , field, etc. For example, ¥§¦ ¥§¦ ¦ ! £ ¡ ¡ ¨ R*S ¡©¨ denotes where . stance, is a class annotated with region parameters £ The constraint indicates that the lifetime of region . Parameterization allows programmers to implement a £ is not shorter than that of £ . The constraint denotes that region-polymorphic class whose components can be allocated in £ £ and must be the same region, while denotes the different regions. The first region parameter is special: it refers converse. Note that this is a reserved data variable referring to the to the region in which a specified object of this class will be al- current object, while heap is a reserved region to denote the global located. All other regions should outlive this region via the con- C¥ GX X U ¨ heap with unlimited lifetime, that is for any : heap . straint T(UWV . That is the regions of the components ! Constraint abstraction from [23] of the form q rc (possibly in ) should have longer or equal lifetimes than denotes a parameterized constraint. For uniformity, the region con- the region (namely ) of its object. This condition, called no- straint of every class and method will be captured with a con- dangling requirement, can prevent dangling references completely, straint abstraction each. Region constraint for classes will also as it guarantees that each object can never reference another ob- ¥ be known as class invariant, and be denoted using either inv cn ¨ ject in a younger region. We do not impose region parameters ! or inv.cn , with the latter in constraint abstraction form.