Implementation and Performance Evaluation of a Safe Runtime System in Cyclone
∗ Matthew Fluet Daniel Wang Cornell University Princeton University Department of Computer Science Department of Computer Science 4133 Upson Hall 35 Olden Street Ithaca, NY 14853 Princeton, NJ 08544 fl[email protected] [email protected]
ABSTRACT related security problems found in applications written in In this paper we outline the implementation of a simple C. In addition to ensuring safety, these high-level languages Scheme interpreter and a copying garbage collector that are more convenient and preferable to low-level system lan- manages the memory allocated by the interpreter. The en- guages for building web based applications. One major con- tire system including the garbage collector is implemented venience, that also improves security, is automatic memory in Cyclone [11], a safe dialect of C, which supports safe and management. explicit memory management. We describe the high-level Although the majority of web based applications are writ- design of the system, report preliminary benchmarks, and ten in safe languages, the applications are typically hosted compare our approach to other Scheme systems. Our pre- on servers or web application platforms that are imple- liminary benchmarks demonstrate that one can build a sys- mented with unsafe languages such as C. For example, Perl tem with reasonable performance when compared to other based web applications are hosted on the Apache web server approaches that guarantee safety. More importantly we can by dynamically loading a native code module that imple- significantly reduce the amount of unsafe code needed to ments the Perl interpreter. Since the interpreter is written implement the system. Our benchmarks also identify some in C, we should be concerned that a bug in the module may key algorithmic bottlenecks related to our approach that we introduce a security hole. In fact since Apache itself is writ- hope to address in the future. ten in C, we must be concerned about its immunity to buffer Although most of the theoretical ideas in this work are not overruns. by themselves novel, there are a few interesting variations Some web application platforms such as Jetty [10] are and combinations of the existing literature we will discuss. written completely in Java. Even in this situation, we must However, the primary motivation and goal is to build a re- trust the implementation of the Java Virtual Machine to be alistic working system that puts all the existing theory to free from bugs. The implementation of a JVM contains a test against existing systems which rely on larger amounts significant amount of code written in unsafe languages. of unsafe code in their implementation. Reducing or eliminating code written in unsafe languages from the implementation of a web application server in- creases our confidence that the server is immune to com- 1. INTRODUCTION mon security bugs. There already has been a great deal Network servers written in unsafe languages, such as C, of work demonstrating how to build dynamically extensible are a significant source of known security exploits [19]. For- web servers in safe programming languages [8]. What has tunately, many new web based applications are written in not been addressed is how to implement an interpreter that high-level, safe languages such as C#, Java, Perl, PHP, executes the actual web application itself. Python, or Tcl. Web based applications written in safe lan- Implementing the core of an interpreter in a safe language guages are immune to common buffer overflow and other is not in itself a significant challenge. What is challeng- ∗ ing is implementing the runtime system for the interpreter This research was supported in part by ARDA contract that provides automatic memory management. Currently, NBCHC030106; this work does not necessarily reflect the all systems rely on a trusted runtime system to provide some opinion or policy of the federal government and no official endorsement should be inferred. sort of storage management. Significant advances in type and proof systems have made it possible to implement a run- time system that provides garbage-collection services using programming languages that guarantee safety. Permission to make digital or hard copies of all or part of this work for In this paper we outline the implementation of a sim- personal or classroom use is granted without fee provided that copies are ple Scheme interpreter and a copying garbage collector that not made or distributed for profit or commercial advantage and that copies manages the memory allocated by the interpreter. The en- bear this notice and the full citation on the first page. To copy otherwise, to tire system including the garbage collector is implemented republish, to post on servers or to redistribute to lists, requires prior specific in Cyclone [11], a safe dialect of C, which supports the safe permission and/or a fee. SPACE2004 2004 Venice, Italy and explicit management of memory. In Section 2 we mo- Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. tivate and describe the high-level design of the system. In Section 3 we report some preliminary benchmarks and com- one cannot cast an integer to a pointer. (Cyclone accepts 0 pare our approach to other Scheme systems. In Section 4 as a legal pointer value, but using NULL (a Cyclone keyword) we compare the amount of unsafe code needed to implement is preferred.) Second, one cannot perform arbitrary pointer our system. Our preliminary benchmarks demonstrate that arithmetic on a * pointer, which would also allow the pro- one can build a system with reasonable performance when grammer to overwrite arbitrary memory locations. Finally, compared to other approaches that guarantee safety. More Cyclone inserts a null check whenever the program deref- importantly we can significantly reduce the amount of un- erences a * pointer. While these may appear to be drastic safe code needed to implement the system. Our benchmarks restrictions, there are many patterns of pointer usage that also identify some key algorithmic bottlenecks related to our are unaffected by them. For example, in our interpreter, approach that we hope to address in the future in order to Scheme values are naturally represented by nullable pointers build a realistic production system. to data-structures, where NULL corresponds to the Scheme value nil. 2. WRITING A SCHEME INTERPRETER A variation on a nullable pointer is a non-null pointer, written t*@notnull. Such a pointer is guaranteed to not IN CYCLONE be NULL, which eliminates the need for null checks at deref- erences. In addition to this performance benefit, non-null 2.1 Why Scheme pointers capture a useful programming invariant which can We have chosen Scheme as our test bed language, primar- be statically checked. For example, in our interpreter, the 1 ily because of the relative ease of implementation and the abstract machine state is represented as a non-null pointer existence of well known benchmarks. Our implementation to a data-structure, which is side-effected by each transition is small, because we rely on an existing Scheme front-end [4] of the abstract machine. to expand the majority of the full Scheme language into a A third kind of pointer available in Cyclone is a fat core Scheme subset. The existing front-end handles parsing pointer, written t*@fat. A fat pointer comes with bounds and macro expansion of full Scheme. We take the resulting information (thus, is “fatter” than a traditional pointer). output and emit Cyclone source code that builds an abstract Each time a fat pointer is dereferenced or its contents are syntax tree, which we compile and link with our interpreter assigned to, Cyclone inserts both a null check and a bounds and runtime system. check.2 The bounds information and these run-time checks Not only is Scheme a well studied and compact language, ensure the safety of pointer arithmetic and array indexing. but it also has many features that make it desirable for web In addition, a fat pointer allows its size to be queried at run programming. In particular first class continuations allows time. In our interpreter, a Scheme vector is represented by for a much more natural structuring of web applications [17]. a fat pointer to an array of values. Scheme seems to be developing a large and useful set of tools for manipulating XML. Given that IEEE Scheme was Regions the original processing language for SGML from which XML and HTML are derived, we see a great future for Scheme as In the description of pointers above, we have demonstrated a web programming language. the ways in which Cyclone prevents programs from access- ing arbitrary memory. Another violation is to dereference a 2.2 Key Features of Cyclone dangling pointer: a pointer to storage that has been deal- Cyclone [5] is a safe dialect of C. Cyclone attempts to located. To prevent such violations, Cyclone adopts a type give programmers significant control over data represen- system for region-based memory management, based on the tation, memory management, and performance (like C), work of Tofte and Talpin [18], and described in detail in while preventing buffer overflows, format-string attacks, and previous work [6]. Here, we discuss only the aspects most dangling-pointer dereferences (unlike C). Cyclone ensures crucial to our implementation. the safety of programs through a combination of compile- Cyclone pointer types have the form t*‘r, where t is the time and run-time checks. Cyclone’s combination of per- type of the pointed to object and ‘r is a region name describ- formance, control, and safety make it a good language for ing the object’s lifetime. The Cyclone type system tracks writing low-level software, like runtime systems. the set of live regions at each program point; dereferencing In this section, we briefly introduce the key features of a pointer of type t*‘r requires that the region ‘r is in the Cyclone that are used in the implementation of our Scheme set. (If ‘r is not in the set, a compile-time error signals interpreter described in Sections 2.3 and 2.4. a possible dangling pointer dereference.) Region polymor- phism lets functions and data-structures abstract over the Pointers region of their arguments and fields. Region parameters in Cyclone are indicated by annotations of the form <‘r::R> As with C programs, Cyclone programs make extensive use on functions and types, where R distinguishes region param- of pointers. However, improper use of pointers can lead eters from other kinds of parameters. to core dumps and buffer overflows. To prevent these er- Cyclone provides a number of different kinds of regions, rors, Cyclone introduces different kinds of pointers, which suitable for different allocation and deallocation patterns. make various tradeoffs between expressiveness and run-time Stack regions correspond to local-declaration blocks: enter- checks. ing a block creates a region and allocates objects, while ex- The type of a (traditional) nullable pointer is written t*, iting the block deallocates the region’s objects. Hence, a as in C. For the most part, such pointers have the same be- stack region has lexical scope, but the number and sizes of havior as their counterparts in C. However, there are some restrictions on the use of nullable pointers in Cyclone. First, 2 ∼ In many cases, Cyclone’s flow analysis determines that the 1Our core interpreter loop is only 500 lines of code. checks are superfluous and removes them. stack allocated objects is fixed at compile time. Lexical re- gions are also created and deallocated according to program Symbols: scoping, but a region handle allows objects to be allocated q ∈ Q into the region throughout the region’s lifetime. Hence, a Booleans: lexical region has lexical scope, but the number and sizes of b ∈ B = {true, false} region allocated objects are not fixed at compile time. Numbers: The Cyclone heap is a special region with the name ‘H. x ∈ N ⊇ C ⊇ R ⊇ Q ⊇ Z All data allocated in the heap is managed by the Boehm- Characters: Demers-Weiser (BDW) conservative garbage collector [3]. c ∈ C Conceptually, the Cyclone heap is just a lexical region with Constants: global scope, and the global variable heap_region is its han- K ∈ Const dle. In order to understand the cost of a safe garbage collec- K ::= q | b | x | c | nil | tor, we use the Cyclone heap to implement a version of our undefined | unspecified Scheme interpreter in which all interpreter data is managed deBruijn Indices: by the (trusted) conservative garbage collector. ij ∈ N × N The lifetimes of stack and lexical regions follow the block Expressions: structure of the program, beginning and ending in a last- e ∈ Exp e ::= K | ij | (e e1 . . . en) | in-first-out (LIFO) discipline. Clearly, such a discipline can n m be too restrictive, as it can not accommodate objects with (lambda e) | (valambda e) | overlapping, non-nested lifetimes. (if e1 e2 e3) | (set! ij e) Recent work has added unique pointers and dynamic re- gions to Cyclone [9]. Unique pointers are based on linear Figure 1: Core Scheme Syntax (Expressions) type systems and provide fine-grained memory management for individual objects. In particular, a unique pointer’s ob- ject can be deallocated at any program point. On the other the region variable is existentially bound. Unpacking this hand, unique pointers cannot be freely copied and there are existential type yields a region variable which does not con- further restrictions on their use in Cyclone programs. A flict with any other region name. This is precisely the be- unique pointer is written t*‘U; conceptually, a pointer into havior we require for a function that creates a fresh region.) a special unique region with the name ‘U. While there is The free_ukey function reclaims the key’s region and the much more to be said concerning Cyclone’s unique pointers, storage for the key. Since the key is unique, it must be used for our purposes it suffices to reiterate that unique pointers in a linear manner; the free_ukey function consumes the are treated as linear objects, where the type system and a key. conventional flow analysis ensures that, at every program To access or allocate data within a dynamic re- point, there is at most one usable copy of a value assigned a gion, a key is presented to a special open primitive: unique-pointer type. We make use of unique pointers both region r = open(k). Within the remainder of the current to manage the garbage collector’s frontier data-structures scope, the region handle r can be used to access k’s re- and to implement our dynamic-region sequences. gion; furthermore, k is temporarily consumed throughout A dynamic region resembles a lexical region in many the scope (preventing deallocation) and becomes accessible ways; the crucial difference is that a dynamic region can be again when control leaves the scope. created and freed at (almost) any point within a program. However, before accessing or allocating data within a 2.3 The Core Scheme Interpreter dynamic region, the region must be opened. Opening a In this section, we introduce the Core Scheme language, dynamic region adds the region to the set of live regions its semantics, and its implementation in Cyclone. While and prevents the region from being freed while it is open. each of these components is relatively straightforward, it is The interface for creating and freeing dynamic regions is useful to have a concrete implementation in preparation for given by the following: the description of the garbage collector. typedef struct DynamicRegion<‘r>*@notnull‘U Formalism uregion_key_t<‘r::R>; The reader familiar with Scheme interpreters may feel free struct NewDynamicRegion { <‘r::R> to proceed directly to the next section, referring back to this uregion_key_t<‘r> key; reference formalism as necessary. Figure 1 describes the syn- }; tax of Core Scheme expressions. Literal constants for various struct NewDynamicRegion new_ukey(); types are provided. Variables are given as deBruijn indices. void free_ukey(uregion_key_t<‘r> k); Hence, procedures elide variable names and are instead an- notated with the number of arguments required for applica- A dynamic region is represented as a unique non-null pointer tion. In the case of a procedure with a variable number of to an abstract struct DynamicRegion<‘r> (which is param- arguments, the annotation indicates the minimum number eterized by the region ‘r and internally contains the handle of required arguments. Likewise, an assignment indicates to the region). This unique pointer is called the key, which the location to be updated by a deBruijn index. Finally, serves as a run-time capability granting access to the region. procedure calls and conditionals are standard. The new_ukey function creates a fresh dynamic region and The Core Scheme expression language corresponds closely returns the unique key for the region. (The <‘r::R> anno- to the primitive expression forms that appear in the Scheme tation in the struct NewDynamicRegion type indicates that standard [12]. As described in the standard, they are suffi- struct Value<‘r::R>; Locations: typedef struct Value<‘r>*‘r value_t<‘r::R>; l ∈ Loc datatype ValueD<‘r> { Values: Const_v(const_t<‘r> K); Primop_v(primop_t p); v ∈ Val Throw_v(stack_t<‘r> S, env_t<‘r> rho); v ::= (const K) | (prim p) | (throw S ρ) | Closure_v(unsigned int n, n m (closure ρ e) | (vaclosure ρ e) | env_t<‘r> rho, exp_t<‘r> e); ∗ (pair l1 l2) | (vector l ) VarArgsClosure_v(unsigned int m, Primitives: env_t<‘r> rho, exp_t<‘r> e); ∗ p ∈ (Heap × Stack × Env × Loc ) * State Pair_v(value_t<‘r> l1, value_t<‘r> l2); Vector_v(value_t<‘r>*@fat‘r ls); Stacks: ∗ }; S ∈ Stack = (Env × Frame) struct Value<‘r::R> { S ::= · | hρ, F i :: S datatype ValueD<‘r> value; Frames: }; F ∈ Frame F ::= (l1 . . . li−1 2 ei+1 en) | Figure 4: Cyclone Implementation (Values) (if 2 e2 e3) | (set! ij 2) Environments: ρ ∗ ∈ Env = Loc sume an initial heap H0 and an initial environment ρ0 in ρ ::= · | l :: ρ which all primitive operations are bound. Results: A primitive operation is a partial map from a heap, stack, r ∈ Res environment, and location sequence to a state. The location r ::= e | l sequence corresponds to the arguments to the primitive op- Heaps: eration. The other components of the state are provided for H ∈ Heap = Loc * Val primitive operations that perform heap allocation or require States: access to the stack and environment. σ ∈ State = Heap × Stack × Env × Res Values correspond to heap allocated data. In addi- σ ::= hH, S, ρ, ri tion to primitive operations, there are three other pro- cedure forms. The throw carries a stack and environ- Figure 2: Core Scheme Syntax (Runtime objects) ment; this is the form of the “escape procedure” passed as an argument to the argument of the primitive opera- tion call-with-current-continuation. The closure and vaclosure forms correspond to the evaluation of lambda and valambda expression-types; the closure forms capture ciently expressive to encode the various derived expression the current environment to be extended at the point of ap- forms that appear in realistic Scheme programs. Hence, we plication. expect an external Scheme program to first have derived ex- The final two value forms, pair and vector, combine lo- pression forms expanded into primitive expression forms and cations into larger data-structures. symbolic variables converted into deBruijn indices, yielding Figure 3 describes the operational semantics of the lan- a Core Scheme expression, before being executed. As ex- 0 guage. The main judgment σ → σ defines a small-step plained in Section 2.1, we use an external Scheme front-end rewriting relation from states to states. Rules in which the to do so, producing a Cyclone function that builds an ab- result is an expression either immediately return a loca- stract syntax tree. tion (possibly having performed an allocation) or push a Figure 2 describes the syntax of run-time objects that ap- frame onto the stack and start evaluating a sub-expression. pear during the evaluation of a Core Scheme expression and Rules in which the result is a location pop a frame, fill the Figure 3 describes the operational semantics of the language. hole, and continue as appropriate. The auxiliary judgments The operational semantics define a small-step rewriting rela- 0 hH, ij @ ρi ⇓ l and hH, ij @ ρ ← li ⇓ H are con- tion from states to states. A state has the form hH, S, ρ, ri, get set cerned with getting and setting deBruijn indexed variables, where H is the heap, S is the stack, ρ is the environment, 0 0 while the judgment hH, hl1, . . . , lnii ⇓ hH , l i allocates and r is the result (expression or value) in the current active listify arguments in a Scheme list. position. A heap is a partial map from an infinite set of locations (Loc) to values. A stack is a finite sequence of environ- Implementation ments and stack frames. A stack frame is a partially eval- We turn now to the implementation of the Core Scheme uated expression with a hole, corresponding to Wright and interpreter in Cyclone. As Cyclone is a C-like language, Felleisen-style semantics [21]. Conditional and assignments values are represented using allocated data-structures and have exactly one hole, while applications enforce a left-to- pointers (see Figure 4). right evaluation order by requiring locations (corresponding Recall from Section 2.2 that region polymorphism lets to heap allocated values) to the left of the hole and expres- data-structures abstract over the region of their fields. sions to the right of the hole. Hence, in Figure 4, struct Value<‘r::R> is a forward dec- An environment is a sequence of locations; each location is laration of a structure type, which is polymorphic in a re- expected to corresponds to a heap allocated vector. The de- gion ‘r. The typedef defines value_t<‘r> to be a (null- th Bruijn index ij denotes the j element of the vector pointed able) pointer to a struct Value<‘r> object in region ‘r; to by the ith location in the environment. We further as- essentially, value_t<‘r> corresponds to locations in the op- hH, ρ @ ij i ⇓get l
0 H(l) = (vector hl1, . . . , lni) hH, ρ @ ij i ⇓get l (1 ≤ j ≤ n) 0 hH, l :: ρ @ 0j i ⇓get lj hH, l :: ρ @ (i + 1)j i ⇓ l
0 hH, ρ @ ij ← li ⇓set H
0 0 H(l) = (vector hl1, . . . , lni) hH, ρ @ ij ← l i ⇓set H 0 0 (1 ≤ j ≤ n) 0 0 hH, l :: ρ @ 0j ← l i ⇓set H[l 7→ (vector hl1, . . . , lj−1, l , lj+1, . . . , lni)] hH, l :: ρ @ (i + 1)j ← l i ⇓set H
0 0 hH, hl1, . . . , lnii ⇓listify hH , l i
0 0 0 00 0 l ∈/ dom(H) hH, hl2, . . . , lnii ⇓listify hH , l i l ∈/ dom(H ) 0 0 0 00 0 00 hH, hii ⇓listify hH[l 7→ (const nil)], l i hH, hl1, . . . , lnii ⇓listify hH [l 7→ (pair l1 l )], l i
σ → σ0
dom const l ∈/ (H) var hH, ρ @ ij i ⇓get l hH, S, ρ, Ki → hH[l 7→ (const K)], S, ρ, li hH, S, ρ, ij i → hH, S, ρ, li apply hH, S, ρ, (e e1 . . . en)i → hH, h(2 e1 . . . en), ρi :: S, ρ, ei dom lambda l ∈/ (H) hH, S, ρ, (lambdan e)i → hH[l 7→ (closuren ρ e)], S, ρ, li
dom valambda l ∈/ (H) hH, S, ρ, (valambdan e)i → hH[l 7→ (vaclosurem ρ e)], S, ρ, li if set! hH, S, ρ, (if e1 e2 e3)i → hH, h(if 2 e2 e3), ρi :: S, ρ, e1i hH, S, ρ, (set! ij e)i → hH, h(set! ij 2), ρi :: S, ρ, ei apply-arg-eval 0 0 0 hH, h(l1 . . . li−1 2 ei+1 . . . en), ρ i :: S, ρ, li → hH, h(l1 . . . li−1 l 2 ei+2 . . . en), ρ i :: S, ρ , ei+1i
apply-prim-eval H(l) = (prim p) 0 hH, h(l l1 . . . ln−1 2), ρ i :: S, ρ, lni → p(H, S, ρ, hl1, . . . , lni)
00 00 apply-throw-eval H(l) = (throw S ρ ) hH, h(l 2), ρ0i :: S, ρ, l0i → hH, S00, ρ00, l0i
n 00 00 00 dom apply-closure-eval H(l) = (closure ρ e ) l ∈/ (H) 0 00 00 00 00 hH, h(l l1 . . . ln−1 2), ρ i :: S, ρ, lni → hH[l 7→ (vector hl1, . . . , lni)], S, l :: ρ , e i
vaclosurem 00 00 0 0 00 dom 0 apply-vaclosure-eval H(l) = ( ρ e ) hH, hlm+1, . . . , lnii ⇓listify hH , l i l ∈/ (H ) 0 0 00 0 00 00 00 (m ≤ n) hH, h(l l1 . . . ln−1 2), ρ i :: S, ρ, lni → hH [l 7→ (vector hl1, . . . , lm, l i)], S, l :: ρ , e i if-true-eval H(l) 6= (const false) if-false-eval H(l) = (const false) 0 0 0 0 hH, h(if 2 e1 e2), ρ i :: S, ρ, li → hH, S, ρ , e1i hH, h(if 2 e1 e2), ρ i :: S, ρ, li → hH, S, ρ , e2i
0 0 0 dom 0 set!-eval hH, ρ @ ij ← li ⇓set H l ∈/ (H ) 0 0 0 0 0 hH, h(set! ij 2), ρ i :: S, ρ, li → hH [l 7→ (const unspecified)], S, ρ , l i
Figure 3: Core Scheme Operational Semantics void scheme(exp_t<‘r> prog<‘r>(region_t<‘r>)) { // load the program into the Cyclone heap From−space exp_t<‘H> e = prog(heap_region); // load the initial environment nil env_t<‘H> env = initial_env(heap_region); // construct the initial state // - an empty stack, 7 9 // - the initial environment // - the expression to the evaluated state_t<‘H> state = State{NULL,env,{.expr = e}}; // take an unbounded number of steps bool done = stepi(-1,heap_region,&state);
}
¡ © §¡§¨ ¥¡¥¦ £¤ ¡ Figure 5: Heap Allocated Interpreter ¢¡¢ 7 9 nil
To−space erational semantics (and NULL corresponds to (const nil)). The concrete representation of an expression is given by the datatype ValueD<‘r> declaration. Cyclone provides root objects object pointer forwarding pointer datatypes as a safe alternative to unions for supporting het- erogenous values. Furthermore, while unions require space proportional to the largest member, a datatype only requires Figure 6: Copying GC Example space for the member being used and is thus more efficient in its use of memory. Note the use of a fat pointer in the Vector_v variant of the ValueD<‘r> datatype, which en- strated in the previous section, we choose to write our inter- sures the saftey of array accesses and supports querying the preter in a trampoline style, rather than in a continuation vector’s length. passing style. This gives rise to a simpler implementation, At the present time, the struct Value declaration serves one more in keeping with a byte-code interpreter. Previ- little purpose, as it simply embeds a datatype ValueD<‘r> ous work requires a continuation passing style, with each as the only member of a structure. In Section 2.4, we will continuation polymorphic in the heap region, complicating see how the garbage collector extends this structure with a the garbage collector interface. Also, as will be explained, forwarding pointer. the use of dynamic-region sequences and the next_rgn type Expressions are immutable – they do not change during constructor gives rise to a more natural typing of forwarding the execution of the program. However, as Cyclone is a C- pointers. like language, an expression is represented using allocated The remainder of this section proceeds in the following data-structures and pointers, in a manner similar to val- manner. First, we briefly review the copying algorithm. ues. Stacks and environments are represented as linked list Next, we consider an intuitive implementation in Cyclone. structures, where NULL corresponds to an empty stack or an Finally, we refine our implementation using our dynamic- empty environment. region sequence abstraction to achieve a complete implemen- We conclude this section by examining a simple Core tation in Cyclone. Scheme interpreter (see Figure 5), in which all data is al- Figure 6 illustrates a simple stop-and-copy collector. The located in the Cyclone heap and managed by a conservative collector stops the user program and begins with (live and garbage collector. The function scheme takes one argument, dead) cells in the From-space and an empty To-space. The pointer to a function that takes a region handle argument collector traverses the data structure in From-space, copying and returns an expression allocated in that region. (Note each live cell into To-space when the cell is first visited. The that the <‘r> annotation universally quantifies the region collector preserves sharing by leaving a forwarding pointer variable.) The remainder of the code is straightforward. in each From-space cell when it is copied. The forwarding First, the function pointer is executed to yield the initial ex- pointer points to the copied cell in To-space. Whenever a cell pression. Next, the initial environment is allocated. The ini- in From-space is visited, the copying function first examines tial expression and environment are combined with an empty the forwarding pointer. If it is non-NULL, then the copy stack to create the initial state. Finally, the interpreter re- 0 function returns the forwarding address. Otherwise, space peatedly makes transitions corresponding to σ → σ , until for the cell is reserved in To-space, the forwarding pointer termination. Note that all region parameters in expression, is set to the address of the reserved space, and the fields of environment, and state types are instantiated with the heap the cell are copied. After all live cells in From-space have region (‘H) and all functions performing allocations take the been copied (depicted in Figure 6), the collector can free heap-region handle (heap_region) as a parameter. (Unlike all memory in From-space and the program can continue the operational semantics given above, the state is implicitly execution, allocating new cells in To-space. side-effected and no state is returned by the stepi function.) In the copying algorithm, the separation of managed memory into a From-space and a To-space suggests a 2.4 The Garbage Collector natural correspondence with Cyclone’s regions. Clearly, the In this section, we describe a safe, copying garbage collec- LIFO discipline of Cyclone’s lexical regions is insufficient for tor for the Core Scheme language described above. Before our copying garbage collector, as we would like the lifetime proceeding, we point out some novelties in comparison to of From-space to end after the beginning but before the other type-preserving garbage collectors [15, 20]. As demon- end of To-space’s lifetime. Hence, we turn our attention to Cyclone’s dynamic regions, where it appears that we have unique pointers provide exactly this linearity. Thus, we sufficient expressiveness to write a simple copying garbage declare collector: struct DynamicRegionGen<‘r::R>; ... typedef struct DynamicRegionGen<‘r>*@notnull‘U // create the to-space’s key uregion_gen_t<‘r>; let NewDynamicRegion {<‘to> to_key} = new_ukey(); state_t<‘to> to_state; where struct DynamicRegionGen<‘r::R> is abstract. // open the from-space’s key Because uregion_gen_t<‘r> is a unique pointer, it serves { region from_r = open(from_key); as a capability; in particular, it will serve as a capability // open the to-space’s key to produce the key for
void scheme(exp_t<‘r> prog<‘r>(region_t<‘r>)) { // construct the initial heap let NewDynamicRegionSeq {<‘r> DynamicRegionSeq{key, gen}} = new_drseq(); // load the program and initial environment into the initial heap exp_t<‘r> e; env_t<‘r> env; { region r = open(key); e = prog(r); env = initial_env(r); } // construct the initial state state_t<‘r> state = State{NULL,env,{.expr = e}}; // construct the initial gc state with // - the dynamic region sequence // - the initial state struct GCState gcs = GCState{{key,gen},state}; while (true) { // unpack the current gc state, naming the existentially bound region (‘s) let GCState{<‘s> DynamicRegionSeq{key, gen}, state} = gcs; { // open the current heap region r = open(key); // take a fixed number of steps bool done = stepi(MAX_STEPS,r,&state); // check for termination if (done) { goto Finished; } } // pack the new gc state and allow a GC gcs = GCState{DynamicRegionSeq{key, gen}, state}; gcs = maybeGC(gcs); } Finished: // unpack the final gc state and free the final region let GCState{<‘r> DynamicRegionSeq{key, gen}, state} = gcs; free_ukey(key); } Figure 8: Dynamic Region Sequence Allocated Interpreter static $(frontier_t<‘r>, value_t
Table 1: Total Execution Time in Milliseconds and with Ratio of Execution Times Normalized to Cyclone Safe GC in Parentheses the current heap size. The other is based on the real mem- tion. While the first numbers give us some insight into the ory in use as reported by the underlying operating system. algorithmic consumption of space, these numbers allow us Unfortunately, there is no easy portable way to measure to understand how those numbers may translate into real the actual heap usage of each system without including the system performance. startup costs associated with each system. It is clear from both measurements that our Cyclone im- The numbers we report include the overhead for start up plementation using the Safe GC has a significantly larger and compilation/parsing of the benchmark programs. This footprint when compared to the equivalent implementation makes SISC and MzScheme look artificially worse since we that uses the BDW collector. This difference is accounted are also measuring the overhead of their entire system, in- for by the use of a copying GC compared to a mark-sweep cluding the infrastructure need to support an interactive GC. REPL. We provide these numbers for completeness, but will Figure 10 show the resident set size as a function of time focus on the comparisons between our two different Cyclone for the various systems during the execution of the nboyer versions. Both these systems have comparably small startup benchmark. The graphs clearly show the difference in the costs. algorithms used, and show that the actual working set size To collect information about heap size, we enable de- at a particular point in time can be significantly smaller bugging information printed at each invocation of the GC. than our maximum set. Regardless of this fact, the extra For our own Cyclone Safe GC collector we simply instru- copying will have a negative effect on overall performance if mented our collector appropriately. For the BDW collector we consider the costs of servicing page faults. In our cur- we set the internal variable GC_print_stats which causes rent experiments, memory is not a scare resource, so most the BDW collector to report the heap size after each GC. page faults can be serviced without going to disk. In a real- For the JVM we simply enabled the -verbosegc option to istic server environment, all others things being equal, our collect information about heap sizes. Table 2 summarizes copying GC will perform worse. the maximum heap size as reported by the various collec- Although a given server typically will have large amounts tors. Note that we do not report some numbers for pro- of memory available, that memory must be shared across grams running under MzScheme because these programs do many different instances of an application. Each applica- not cause a garbage collection to occur. tion has a limited amount of memory. Therefore, it is unre- If we look at the heap sizes for our two Cyclone systems, alistic to assume that memory is cheap or free. We should we see that for small programs our heap sizes are compa- explore how to implement other techniques such as mark- rable, however one can see systematic difference in larger sweep or generational-collection algorithms for our approach programs such as nboyer and traverse. The heap resizing to be competitive with other unsafe approaches. However, policy of our safe copying collector resizes the heap based our overall approach has other system level benefits that on the amount of reachable data that remains in the heap may recover or ameliorate these costs in a web-application after a collection and a tunable “live ratio” parameter. In server. We touch on these benefits in Section 5.1. all experiments we set our live ratio to 4.0. A larger live ra- tio amortizes the cost of the a GC over more of the program 4. SIZE OF UNSAFE CODE execution by consuming more space. While heap sizes is a useful metric in a system with vir- The primary goal of our approach is not to produce a tual memory, a more important metric is the amount of faster system, but to produce a system which we believe is physical memory needed to hold the program’s working set. safer. It is important to compare the amount of unsafe code Table 3 summarizes the size of the maximum resident set in the system. Table 4 summarizes the approximate amount of virtual memory pages as measured by repeated polling of unsafe C code used in the implementation of each system. of the /proc/ file system for each benchmark during execu- The line counts ignore comments and whitespace. The amount of unsafe C code need to implement the ba- Cyclone Cyclone SISC MzScheme Safe GC BDW GC Sun JVM BDW GC cpstak 276 (1.00) 292 (1.06) 3306 (11.98) 1024 (3.71) ctak 282 (1.00) 292 (1.04) 3312 (11.74) 3180 (11.28) deriv 381 (1.00) 392 (1.03) 3319 (8.71) 1024 (2.69) destruct 327 (1.00) 392 (1.20) 3364 (10.29) 1024 (3.13) div-iter 360 (1.00) 392 (1.09) 3316 (9.21) 1024 (2.84) div-rec 390 (1.00) 392 (1.01) 3319 (8.51) 1024 (2.63) fft 794 (1.00) 700 (0.88) 4065 (5.12) N/A nboyer 8913 (1.00) 2232 (0.25) 5522 (0.62) 2012 (0.23) puzzle 1554 (1.00) 700 (0.45) 3438 (2.21) 1024 (0.66) tak 269 (1.00) 292 (1.09) 3291 (12.23) N/A takl 282 (1.00) 392 (1.39) 3301 (11.71) N/A takr 955 (1.00) 524 (0.55) 3510 (3.68) 1024 (1.07) traverse 4984 (1.00) 1672 (0.34) 4227 (0.85) 2012 (0.40)
Table 2: Maximum Heap Size in Kilobytes Observed at GC Points with Ratio of Sizes Normalized to Cyclone Safe GC in Parentheses.
Cyclone Cyclone SISC MzScheme Safe GC BDW GC Sun JVM BDW GC cpstak 298 (1.00) 241 (0.81) 5072 (17.02) 652 (2.19) ctak 301 (1.00) 239 (0.79) 5071 (16.85) 1203 (4.00) deriv 335 (1.00) 276 (0.82) 5076 (15.15) 651 (1.94) destruct 316 (1.00) 276 (0.87) 5097 (16.13) 652 (2.06) div-iter 323 (1.00) 271 (0.84) 5067 (15.69) 650 (2.01) div-rec 301 (1.00) 271 (0.90) 5070 (16.84) 653 (2.17) fft 428 (1.00) 374 (0.87) 5299 (12.38) 627 (1.46) nboyer 3092 (1.00) 844 (0.27) 5657 (1.83) 947 (0.31) puzzle 593 (1.00) 380 (0.64) 5110 (8.62) 628 (1.06) tak 274 (1.00) 238 (0.87) 5068 (18.50) 589 (2.15) takl 292 (1.00) 240 (0.82) 5068 (17.36) 588 (2.01) takr 563 (1.00) 388 (0.69) 5160 (9.17) 628 (1.12) traverse 1828 (1.00) 628 (0.34) 5338 (2.92) 883 (0.48)
Table 3: Maximum Working Set in Virtual Pages with Ratio of Sizes Normalized to Cyclone Safe GC in Parentheses. Cyclone (Safe GC) Cyclone (BDW GC) 100 3090 100 845
80 2472 80 676
60 1854 60 507
40 1236 40 338
20 618 Working Set Size 20 169 Working Set Size % of Max. Working Set % of Max. Working Set 0 0 0 0 Time Time
SISC (Sun JVM) MzScheme (BDW GC) 100 5655 100 945
80 4524 80 756
60 3393 60 567
40 2262 40 378
20 1131 Working Set Size 20 189 Working Set Size % of Max. Working Set % of Max. Working Set 0 0 0 0 Time Time
Figure 10: Working Set Over Time for nboyer
Interpreter Runtime System system, running Java based servers must include a JIT and Cyclone Safe GC 0 1800 bytecode verifier in order to perform dynamic loading. So Cyclone BDW GC 0 9000 including the optimizing JIT compiler as part of the unsafe SISC Sun JVM 0 229,100 code of the system is reasonable. Other approaches can MzScheme BDW GC 31,000 9000 dynamically load native optimized code in a manner that does not require a heavy weight JIT [8]. MzScheme has a relatively compact implementation, but Table 4: Approximate Number of Lines of Unsafe C the total size of the system core is still on the order of 40,000 Code. lines of code. sic region primitives needed for our safe GC is very small 5. CONCLUSION in comparison to other systems. The vast majority of the Compared to other systems, it is clear that our approach code (1200 lines) is associated with a fast implementation of can significantly reduce the amount of unsafe code needed malloc[13], on which the region primitives are built. In the- to implement a system. This increases our confidence in ory one could build the region primitives on a much simpler the safety and security of the entire system. Unfortunately, underlying storage system. our copying-collection technique may incur a performance The BDW system is roughly 9000 lines of code. Despite its penalty for this extra degree of safety. Hopefully, in the wide usage and relatively small size there are several known future we can reduce this performance penalty so that the bugs that can cause the collector to fail in an unsafe way extra safety comes with little or no cost at all. because of compiler optimizations or unsafe pointer manip- ulations [2]. Simple verification of the core 9000 line of code 5.1 Cyclone vs. Java and Other Safe VMs does not guarantee safety of the system so the line count is We should also emphasize that our approach potentially misleadingly on the low side. Arguably, one has to verify has some significant system-level performance advantages, the entire system it is linked with and verify the client does when compared to a pure Java web application framework not violate any invariants needed for the safe operation of or any other safe virtual machine based approach. Typically, the collector. each web application will be allocated a separate thread of Appel and Wang [1] analyze the size of the safety TCB control to handle each web request. These threads are cre- for several Java VMs. We quote their number used for the ated frequently and have short lifetimes. In a pure Java Hotspot JIT compiler, which is now the standard Java VM environment there is no way to bound the allocation of a shipped by Sun. Of that 229,100 lines of code, the majority given Java thread or cheaply reclaim all the storage associ- is the actual optimizing compiler. Since in a web application ated with a thread when it dies. We must rely on a system framework there needs to be a facility to dynamically extend wide GC to efficiently reclaim storage for all our threads. the functionality of a running server without halting the The only other alternative is to spawn a heavy-weight sys- tem process. Technical Report CS-TR-4514, University of Maryland In a pure Cyclone system we can allocate a unique Scheme Department of Computer Science, July 2003. interpreter and garbage collector with its own private heap [10] Jetty Web Server and Servlet Container, 2003. on a per thread basis. We can tune the initial and maxi- http://jetty.mortbay.org/. mum heap sizes as well as the heap resizing policy on a per [11] T. Jim, G. Morrisett, D. Grossman, M. Hicks, thread basis taking into account knowledge about the par- J. Cheney, and Y. Wang. Cyclone: A safe dialect of C. ticular application servicing a request. More importantly, In USENIX Annual Technical Conference, pages we can immediately reclaim all the storage associated with 275–288, Monterey, CA, June 2002. a terminated thread, without requiring a system wide GC. [12] R. Kelsey, W. Clinger, and J. R. (Eds.). Revised5 These per thread policies are simply not possible if one relies report on the algorithmic language Scheme. on a system wide GC. Exploiting these large scale system- Higher-Order and Symbolic Computation, 11(1), level benefits is one area which we would like to explore in August 1998. the future. [13] D. Lea. A memory allocator. In general, if an application can benefit from a customized http://gee.cs.oswego.edu/dl/html/malloc.html. runtime system a pure Cyclone approach, allows the de- [14] S. G. Miller. SISC: A complete scheme interpreter in veloper to customize and deploy a runtime system without java. http://sisc.sourceforge.net/sisc.pdf, 2003. compromising the safety of the system. This level of cus- tomization is not typically available in systems like Java. [15] S. Monnier, B. Saha, and Z. Shao. Principled scavenging. Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Acknowledgements Implementation, pages 81–91, 2001. We benefited from discussions with Greg Morrisett, who also [16] MzScheme, 2003. contributed to the development of the Scheme interpreter http://www.plt-scheme.org/software/mzscheme. and garbage collector. [17] C. Queinnec. The influence of browsers on evaluators or, continuations to program web servers. In Proceedings of the fifth ACM SIGPLAN international 6. REFERENCES conference on Functional programming, pages 23–33. [1] A. W. Appel and D. C. Wang. JVM TCB: ACM Press, 2000. Measurements of the trusted computing base of Java [18] M. Tofte and J.-P. Talpin. Region-based memory virtual machines. Technical Report CS-TR-647-02, management. Information and Computation, Princeton University, Apr. 2002. 132(2):109–176, Feb. 1997. [2] H.-J. Boehm. Simple garbage-collector safety. In [19] D. Wagner, J. S. Foster, E. A. Brewer, and A. Aiken. Proceedings of SIGPLAN’96 Conference on A first step towards automated detection of buffer Programming Languages Design and Implementation, overrun vulnerabilities. In Network and Distributed ACM SIGPLAN Notices, pages 89–98. ACM Press, System Security Symposium, pages 3–17, San Diego, 1996. CA, February 2000. [3] H.-J. Boehm and M. Weiser. Garbage collection in an [20] D. Wang and A. Appel. Type-preserving garbage uncooperative environment. Software – Practice and collectors. Conference Record of the Twenty-Eigth Experience, 18(9):807–820, 1988. Annual ACM Symposium on Principles of [4] Chicken: A Scheme-to-C Compiler, 2003. Programming Languages, pages 166–178, 2001. http://www.call-with-current-continuation.org/ [21] A. K. Wright and M. Felleisen. A syntactic approach chicken.html. to type soundness. Information and Computation, [5] Cyclone user’s manual. Technical Report 2001-1855, 115(1):38–94, 1994. Department of Computer Science, Cornell University, Nov. 2001. Current version at http://www.cs.cornell.edu/projects/cyclone/. [6] D. Grossman, G. Morrisett, T. Jim, M. Hicks, Y. Wang, and J. Cheney. Region-based memory management in Cyclone. In Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 282–293, Berlin, Germany, June 2002. [7] C. Hawblitzel, H. Huang, E. Krupski, and E. Wei. Low-level linear memory management. http://www.cs.dartmouth.edu/~hawblitz/publish/ linearmem-draft4.ps, Dec. 2002. [8] M. Hicks, J. T. Moore, and S. Nettles. Dynamic software updating. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI’01), June 2001. [9] M. Hicks, G. Morrisett, D. Grossman, and T. Jim. Safe and flexible memory management in Cyclone.