Implementation and Performance Evaluation of a Safe in Cyclone

∗ Matthew Fluet Daniel Wang Cornell University Princeton University Department of Computer Science Department of Computer Science 4133 Upson Hall 35 Olden Street Ithaca, NY 14853 Princeton, NJ 08544 fl[email protected] [email protected]

ABSTRACT related security problems found in applications written in In this paper we outline the implementation of a simple . In addition to ensuring safety, these high-level languages Scheme interpreter and a copying garbage collector that are more convenient and preferable to low-level system lan- manages the memory allocated by the interpreter. The en- guages for building web based applications. One major con- tire system including the garbage collector is implemented venience, that also improves security, is automatic memory in Cyclone [11], a safe dialect of C, which supports safe and management. explicit memory management. We describe the high-level Although the majority of web based applications are writ- design of the system, report preliminary benchmarks, and ten in safe languages, the applications are typically hosted compare our approach to other Scheme systems. Our pre- on servers or web application platforms that are imple- liminary benchmarks demonstrate that one can build a sys- mented with unsafe languages such as C. For example, Perl tem with reasonable performance when compared to other based web applications are hosted on the Apache web server approaches that guarantee safety. More importantly we can by dynamically loading a native code module that imple- significantly reduce the amount of unsafe code needed to ments the Perl interpreter. Since the interpreter is written implement the system. Our benchmarks also identify some in C, we should be concerned that a bug in the module may key algorithmic bottlenecks related to our approach that we introduce a security hole. In fact since Apache itself is writ- hope to address in the future. ten in C, we must be concerned about its immunity to buffer Although most of the theoretical ideas in this work are not overruns. by themselves novel, there are a few interesting variations Some web application platforms such as Jetty [10] are and combinations of the existing literature we will discuss. written completely in Java. Even in this situation, we must However, the primary motivation and goal is to build a re- trust the implementation of the Java Virtual Machine to be alistic working system that puts all the existing theory to free from bugs. The implementation of a JVM contains a test against existing systems which rely on larger amounts significant amount of code written in unsafe languages. of unsafe code in their implementation. Reducing or eliminating code written in unsafe languages from the implementation of a web application server in- creases our confidence that the server is immune to com- 1. INTRODUCTION mon security bugs. There already has been a great deal Network servers written in unsafe languages, such as C, of work demonstrating how to build dynamically extensible are a significant source of known security exploits [19]. For- web servers in safe programming languages [8]. What has tunately, many new web based applications are written in not been addressed is how to implement an interpreter that high-level, safe languages such as C#, Java, Perl, PHP, executes the actual web application itself. Python, or Tcl. Web based applications written in safe lan- Implementing the core of an interpreter in a safe language guages are immune to common buffer overflow and other is not in itself a significant challenge. What is challeng- ∗ ing is implementing the runtime system for the interpreter This research was supported in part by ARDA contract that provides automatic memory management. Currently, NBCHC030106; this work does not necessarily reflect the all systems rely on a trusted runtime system to provide some opinion or policy of the federal government and no official endorsement should be inferred. sort of storage management. Significant advances in type and proof systems have made it possible to implement a run- time system that provides garbage-collection services using programming languages that guarantee safety. Permission to make digital or hard copies of all or part of this work for In this paper we outline the implementation of a sim- personal or classroom use is granted without fee provided that copies are ple Scheme interpreter and a copying garbage collector that not made or distributed for profit or commercial advantage and that copies manages the memory allocated by the interpreter. The en- bear this notice and the full citation on the first page. To copy otherwise, to tire system including the garbage collector is implemented republish, to post on servers or to redistribute to lists, requires prior specific in Cyclone [11], a safe dialect of C, which supports the safe permission and/or a fee. SPACE2004 2004 Venice, Italy and explicit management of memory. In Section 2 we mo- Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. tivate and describe the high-level design of the system. In Section 3 we report some preliminary benchmarks and com- one cannot cast an integer to a pointer. (Cyclone accepts 0 pare our approach to other Scheme systems. In Section 4 as a legal pointer value, but using NULL (a Cyclone keyword) we compare the amount of unsafe code needed to implement is preferred.) Second, one cannot perform arbitrary pointer our system. Our preliminary benchmarks demonstrate that arithmetic on a * pointer, which would also allow the pro- one can build a system with reasonable performance when grammer to overwrite arbitrary memory locations. Finally, compared to other approaches that guarantee safety. More Cyclone inserts a null check whenever the program deref- importantly we can significantly reduce the amount of un- erences a * pointer. While these may appear to be drastic safe code needed to implement the system. Our benchmarks restrictions, there are many patterns of pointer usage that also identify some key algorithmic bottlenecks related to our are unaffected by them. For example, in our interpreter, approach that we hope to address in the future in order to Scheme values are naturally represented by nullable pointers build a realistic production system. to data-structures, where NULL corresponds to the Scheme value nil. 2. WRITING A SCHEME INTERPRETER A variation on a nullable pointer is a non-null pointer, written t*@notnull. Such a pointer is guaranteed to not IN CYCLONE be NULL, which eliminates the need for null checks at deref- erences. In addition to this performance benefit, non-null 2.1 Why Scheme pointers capture a useful programming invariant which can We have chosen Scheme as our test bed language, primar- be statically checked. For example, in our interpreter, the 1 ily because of the relative ease of implementation and the abstract machine state is represented as a non-null pointer existence of well known benchmarks. Our implementation to a data-structure, which is side-effected by each transition is small, because we rely on an existing Scheme front-end [4] of the abstract machine. to expand the majority of the full Scheme language into a A third kind of pointer available in Cyclone is a fat core Scheme subset. The existing front-end handles parsing pointer, written t*@fat. A fat pointer comes with bounds and macro expansion of full Scheme. We take the resulting information (thus, is “fatter” than a traditional pointer). output and emit Cyclone source code that builds an abstract Each time a fat pointer is dereferenced or its contents are syntax tree, which we compile and link with our interpreter assigned to, Cyclone inserts both a null check and a bounds and runtime system. check.2 The bounds information and these run-time checks Not only is Scheme a well studied and compact language, ensure the safety of pointer arithmetic and array indexing. but it also has many features that make it desirable for web In addition, a fat pointer allows its size to be queried at run programming. In particular first class allows time. In our interpreter, a Scheme vector is represented by for a much more natural structuring of web applications [17]. a fat pointer to an array of values. Scheme seems to be developing a large and useful set of tools for manipulating XML. Given that IEEE Scheme was Regions the original processing language for SGML from which XML and HTML are derived, we see a great future for Scheme as In the description of pointers above, we have demonstrated a web . the ways in which Cyclone prevents programs from access- ing arbitrary memory. Another violation is to dereference a 2.2 Key Features of Cyclone dangling pointer: a pointer to storage that has been deal- Cyclone [5] is a safe dialect of C. Cyclone attempts to located. To prevent such violations, Cyclone adopts a type give programmers significant control over data represen- system for region-based memory management, based on the tation, memory management, and performance (like C), work of Tofte and Talpin [18], and described in detail in while preventing buffer overflows, format-string attacks, and previous work [6]. Here, we discuss only the aspects most dangling-pointer dereferences (unlike C). Cyclone ensures crucial to our implementation. the safety of programs through a combination of compile- Cyclone pointer types have the form t*‘r, where t is the time and run-time checks. Cyclone’s combination of per- type of the pointed to object and ‘r is a region name describ- formance, control, and safety make it a good language for ing the object’s lifetime. The Cyclone type system tracks writing low-level software, like runtime systems. the set of live regions at each program point; dereferencing In this section, we briefly introduce the key features of a pointer of type t*‘r requires that the region ‘r is in the Cyclone that are used in the implementation of our Scheme set. (If ‘r is not in the set, a compile-time error signals interpreter described in Sections 2.3 and 2.4. a possible dangling pointer dereference.) Region polymor- phism lets functions and data-structures abstract over the Pointers region of their arguments and fields. Region parameters in Cyclone are indicated by annotations of the form <‘r::R> As with C programs, Cyclone programs make extensive use on functions and types, where R distinguishes region param- of pointers. However, improper use of pointers can lead eters from other kinds of parameters. to core dumps and buffer overflows. To prevent these er- Cyclone provides a number of different kinds of regions, rors, Cyclone introduces different kinds of pointers, which suitable for different allocation and deallocation patterns. make various tradeoffs between expressiveness and run-time Stack regions correspond to local-declaration blocks: enter- checks. ing a block creates a region and allocates objects, while ex- The type of a (traditional) nullable pointer is written t*, iting the block deallocates the region’s objects. Hence, a as in C. For the most part, such pointers have the same be- stack region has lexical scope, but the number and sizes of havior as their counterparts in C. However, there are some restrictions on the use of nullable pointers in Cyclone. First, 2 ∼ In many cases, Cyclone’s flow analysis determines that the 1Our core interpreter loop is only 500 lines of code. checks are superfluous and removes them. stack allocated objects is fixed at compile time. Lexical re- gions are also created and deallocated according to program Symbols: scoping, but a region handle allows objects to be allocated q ∈ Q into the region throughout the region’s lifetime. Hence, a Booleans: lexical region has lexical scope, but the number and sizes of b ∈ B = {true, false} region allocated objects are not fixed at compile time. Numbers: The Cyclone heap is a special region with the name ‘H. x ∈ N ⊇ C ⊇ R ⊇ Q ⊇ Z All data allocated in the heap is managed by the Boehm- Characters: Demers-Weiser (BDW) conservative garbage collector [3]. c ∈ C Conceptually, the Cyclone heap is just a lexical region with Constants: global scope, and the global variable heap_region is its han- K ∈ Const dle. In order to understand the cost of a safe garbage collec- K ::= q | b | x | c | nil | tor, we use the Cyclone heap to implement a version of our undefined | unspecified Scheme interpreter in which all interpreter data is managed deBruijn Indices: by the (trusted) conservative garbage collector. ij ∈ N × N The lifetimes of stack and lexical regions follow the block Expressions: structure of the program, beginning and ending in a last- e ∈ Exp e ::= K | ij | (e e1 . . . en) | in-first-out (LIFO) discipline. Clearly, such a discipline can n m be too restrictive, as it can not accommodate objects with (lambda e) | (valambda e) | overlapping, non-nested lifetimes. (if e1 e2 e3) | (set! ij e) Recent work has added unique pointers and dynamic re- gions to Cyclone [9]. Unique pointers are based on linear Figure 1: Core Scheme Syntax (Expressions) type systems and provide fine-grained memory management for individual objects. In particular, a unique pointer’s ob- ject can be deallocated at any program point. On the other the region variable is existentially bound. Unpacking this hand, unique pointers cannot be freely copied and there are existential type yields a region variable which does not con- further restrictions on their use in Cyclone programs. A flict with any other region name. This is precisely the be- unique pointer is written t*‘U; conceptually, a pointer into havior we require for a function that creates a fresh region.) a special unique region with the name ‘U. While there is The free_ukey function reclaims the key’s region and the much more to be said concerning Cyclone’s unique pointers, storage for the key. Since the key is unique, it must be used for our purposes it suffices to reiterate that unique pointers in a linear manner; the free_ukey function consumes the are treated as linear objects, where the type system and a key. conventional flow analysis ensures that, at every program To access or allocate data within a dynamic re- point, there is at most one usable copy of a value assigned a gion, a key is presented to a special open primitive: unique-pointer type. We make use of unique pointers both region r = open(k). Within the remainder of the current to manage the garbage collector’s frontier data-structures scope, the region handle r can be used to access k’s re- and to implement our dynamic-region sequences. gion; furthermore, k is temporarily consumed throughout A dynamic region resembles a lexical region in many the scope (preventing deallocation) and becomes accessible ways; the crucial difference is that a dynamic region can be again when control leaves the scope. created and freed at (almost) any point within a program. However, before accessing or allocating data within a 2.3 The Core Scheme Interpreter dynamic region, the region must be opened. Opening a In this section, we introduce the Core Scheme language, dynamic region adds the region to the set of live regions its semantics, and its implementation in Cyclone. While and prevents the region from being freed while it is open. each of these components is relatively straightforward, it is The interface for creating and freeing dynamic regions is useful to have a concrete implementation in preparation for given by the following: the description of the garbage collector. typedef struct DynamicRegion<‘r>*@notnull‘U Formalism uregion_key_t<‘r::R>; The reader familiar with Scheme interpreters may feel free struct NewDynamicRegion { <‘r::R> to proceed directly to the next section, referring back to this uregion_key_t<‘r> key; reference formalism as necessary. Figure 1 describes the syn- }; tax of Core Scheme expressions. Literal constants for various struct NewDynamicRegion new_ukey(); types are provided. Variables are given as deBruijn indices. void free_ukey(uregion_key_t<‘r> k); Hence, procedures elide variable names and are instead an- notated with the number of arguments required for applica- A dynamic region is represented as a unique non-null pointer tion. In the case of a procedure with a variable number of to an abstract struct DynamicRegion<‘r> (which is param- arguments, the annotation indicates the minimum number eterized by the region ‘r and internally contains the handle of required arguments. Likewise, an assignment indicates to the region). This unique pointer is called the key, which the location to be updated by a deBruijn index. Finally, serves as a run-time capability granting access to the region. procedure calls and conditionals are standard. The new_ukey function creates a fresh dynamic region and The Core Scheme expression language corresponds closely returns the unique key for the region. (The <‘r::R> anno- to the primitive expression forms that appear in the Scheme tation in the struct NewDynamicRegion type indicates that standard [12]. As described in the standard, they are suffi- struct Value<‘r::R>; Locations: typedef struct Value<‘r>*‘r value_t<‘r::R>; l ∈ Loc datatype ValueD<‘r> { Values: Const_v(const_t<‘r> K); Primop_v(primop_t p); v ∈ Val Throw_v(stack_t<‘r> S, env_t<‘r> rho); v ::= (const K) | (prim p) | (throw S ρ) | Closure_v(unsigned int n, n m (closure ρ e) | (vaclosure ρ e) | env_t<‘r> rho, exp_t<‘r> e); ∗ (pair l1 l2) | (vector l ) VarArgsClosure_v(unsigned int m, Primitives: env_t<‘r> rho, exp_t<‘r> e); ∗ p ∈ (Heap × Stack × Env × Loc ) * State Pair_v(value_t<‘r> l1, value_t<‘r> l2); Vector_v(value_t<‘r>*@fat‘r ls); Stacks: ∗ }; S ∈ Stack = (Env × Frame) struct Value<‘r::R> { S ::= · | hρ, F i :: S datatype ValueD<‘r> value; Frames: }; F ∈ Frame F ::= (l1 . . . li−1 2 ei+1 en) | Figure 4: Cyclone Implementation (Values) (if 2 e2 e3) | (set! ij 2) Environments: ρ ∗ ∈ Env = Loc sume an initial heap H0 and an initial environment ρ0 in ρ ::= · | l :: ρ which all primitive operations are bound. Results: A primitive operation is a partial map from a heap, stack, r ∈ Res environment, and location sequence to a state. The location r ::= e | l sequence corresponds to the arguments to the primitive op- Heaps: eration. The other components of the state are provided for H ∈ Heap = Loc * Val primitive operations that perform heap allocation or require States: access to the stack and environment. σ ∈ State = Heap × Stack × Env × Res Values correspond to heap allocated data. In addi- σ ::= hH, S, ρ, ri tion to primitive operations, there are three other pro- cedure forms. The throw carries a stack and environ- Figure 2: Core Scheme Syntax (Runtime objects) ment; this is the form of the “escape procedure” passed as an argument to the argument of the primitive opera- tion call-with-current-. The closure and vaclosure forms correspond to the evaluation of lambda and valambda expression-types; the closure forms capture ciently expressive to encode the various derived expression the current environment to be extended at the point of ap- forms that appear in realistic Scheme programs. Hence, we plication. expect an external Scheme program to first have derived ex- The final two value forms, pair and vector, combine lo- pression forms expanded into primitive expression forms and cations into larger data-structures. symbolic variables converted into deBruijn indices, yielding Figure 3 describes the operational semantics of the lan- a Core Scheme expression, before being executed. As ex- 0 guage. The main judgment σ → σ defines a small-step plained in Section 2.1, we use an external Scheme front-end rewriting relation from states to states. Rules in which the to do so, producing a Cyclone function that builds an ab- result is an expression either immediately return a loca- stract syntax tree. tion (possibly having performed an allocation) or push a Figure 2 describes the syntax of run-time objects that ap- frame onto the stack and start evaluating a sub-expression. pear during the evaluation of a Core Scheme expression and Rules in which the result is a location pop a frame, fill the Figure 3 describes the operational semantics of the language. hole, and continue as appropriate. The auxiliary judgments The operational semantics define a small-step rewriting rela- 0 hH, ij @ ρi ⇓ l and hH, ij @ ρ ← li ⇓ H are con- tion from states to states. A state has the form hH, S, ρ, ri, get set cerned with getting and setting deBruijn indexed variables, where H is the heap, S is the stack, ρ is the environment, 0 0 while the judgment hH, hl1, . . . , lnii ⇓ hH , l i allocates and r is the result (expression or value) in the current active listify arguments in a Scheme list. position. A heap is a partial map from an infinite set of locations (Loc) to values. A stack is a finite sequence of environ- Implementation ments and stack frames. A stack frame is a partially eval- We turn now to the implementation of the Core Scheme uated expression with a hole, corresponding to Wright and interpreter in Cyclone. As Cyclone is a C-like language, Felleisen-style semantics [21]. Conditional and assignments values are represented using allocated data-structures and have exactly one hole, while applications enforce a left-to- pointers (see Figure 4). right evaluation order by requiring locations (corresponding Recall from Section 2.2 that region polymorphism lets to heap allocated values) to the left of the hole and expres- data-structures abstract over the region of their fields. sions to the right of the hole. Hence, in Figure 4, struct Value<‘r::R> is a forward dec- An environment is a sequence of locations; each location is laration of a structure type, which is polymorphic in a re- expected to corresponds to a heap allocated vector. The de- gion ‘r. The typedef defines value_t<‘r> to be a (null- th Bruijn index ij denotes the j element of the vector pointed able) pointer to a struct Value<‘r> object in region ‘r; to by the ith location in the environment. We further as- essentially, value_t<‘r> corresponds to locations in the op- hH, ρ @ ij i ⇓get l

0 H(l) = (vector hl1, . . . , lni) hH, ρ @ ij i ⇓get l (1 ≤ j ≤ n) 0 hH, l :: ρ @ 0j i ⇓get lj hH, l :: ρ @ (i + 1)j i ⇓ l

0 hH, ρ @ ij ← li ⇓set H

0 0 H(l) = (vector hl1, . . . , lni) hH, ρ @ ij ← l i ⇓set H 0 0 (1 ≤ j ≤ n) 0 0 hH, l :: ρ @ 0j ← l i ⇓set H[l 7→ (vector hl1, . . . , lj−1, l , lj+1, . . . , lni)] hH, l :: ρ @ (i + 1)j ← l i ⇓set H

0 0 hH, hl1, . . . , lnii ⇓listify hH , l i

0 0 0 00 0 l ∈/ dom(H) hH, hl2, . . . , lnii ⇓listify hH , l i l ∈/ dom(H ) 0 0 0 00 0 00 hH, hii ⇓listify hH[l 7→ (const nil)], l i hH, hl1, . . . , lnii ⇓listify hH [l 7→ (pair l1 l )], l i

σ → σ0

dom const l ∈/ (H) var hH, ρ @ ij i ⇓get l hH, S, ρ, Ki → hH[l 7→ (const K)], S, ρ, li hH, S, ρ, ij i → hH, S, ρ, li apply hH, S, ρ, (e e1 . . . en)i → hH, h(2 e1 . . . en), ρi :: S, ρ, ei dom lambda l ∈/ (H) hH, S, ρ, (lambdan e)i → hH[l 7→ (closuren ρ e)], S, ρ, li

dom valambda l ∈/ (H) hH, S, ρ, (valambdan e)i → hH[l 7→ (vaclosurem ρ e)], S, ρ, li if set! hH, S, ρ, (if e1 e2 e3)i → hH, h(if 2 e2 e3), ρi :: S, ρ, e1i hH, S, ρ, (set! ij e)i → hH, h(set! ij 2), ρi :: S, ρ, ei apply-arg-eval 0 0 0 hH, h(l1 . . . li−1 2 ei+1 . . . en), ρ i :: S, ρ, li → hH, h(l1 . . . li−1 l 2 ei+2 . . . en), ρ i :: S, ρ , ei+1i

apply-prim-eval H(l) = (prim p) 0 hH, h(l l1 . . . ln−1 2), ρ i :: S, ρ, lni → p(H, S, ρ, hl1, . . . , lni)

00 00 apply-throw-eval H(l) = (throw S ρ ) hH, h(l 2), ρ0i :: S, ρ, l0i → hH, S00, ρ00, l0i

n 00 00 00 dom apply-closure-eval H(l) = (closure ρ e ) l ∈/ (H) 0 00 00 00 00 hH, h(l l1 . . . ln−1 2), ρ i :: S, ρ, lni → hH[l 7→ (vector hl1, . . . , lni)], S, l :: ρ , e i

vaclosurem 00 00 0 0 00 dom 0 apply-vaclosure-eval H(l) = ( ρ e ) hH, hlm+1, . . . , lnii ⇓listify hH , l i l ∈/ (H ) 0 0 00 0 00 00 00 (m ≤ n) hH, h(l l1 . . . ln−1 2), ρ i :: S, ρ, lni → hH [l 7→ (vector hl1, . . . , lm, l i)], S, l :: ρ , e i if-true-eval H(l) 6= (const false) if-false-eval H(l) = (const false) 0 0 0 0 hH, h(if 2 e1 e2), ρ i :: S, ρ, li → hH, S, ρ , e1i hH, h(if 2 e1 e2), ρ i :: S, ρ, li → hH, S, ρ , e2i

0 0 0 dom 0 set!-eval hH, ρ @ ij ← li ⇓set H l ∈/ (H ) 0 0 0 0 0 hH, h(set! ij 2), ρ i :: S, ρ, li → hH [l 7→ (const unspecified)], S, ρ , l i

Figure 3: Core Scheme Operational Semantics void scheme(exp_t<‘r> prog<‘r>(region_t<‘r>)) { // load the program into the Cyclone heap From−space exp_t<‘H> e = prog(heap_region); // load the initial environment nil env_t<‘H> env = initial_env(heap_region); // construct the initial state // - an empty stack, 7 9 // - the initial environment // - the expression to the evaluated state_t<‘H> state = State{NULL,env,{.expr = e}}; // take an unbounded number of steps bool done = stepi(-1,heap_region,&state);

}

¡  © §¡§¨ ¥¡¥¦ £¤ ¡ Figure 5: Heap Allocated Interpreter ¢¡¢ 7 9 nil

To−space erational semantics (and NULL corresponds to (const nil)). The concrete representation of an expression is given by the datatype ValueD<‘r> declaration. Cyclone provides root objects object pointer forwarding pointer datatypes as a safe alternative to unions for supporting het- erogenous values. Furthermore, while unions require space proportional to the largest member, a datatype only requires Figure 6: Copying GC Example space for the member being used and is thus more efficient in its use of memory. Note the use of a fat pointer in the Vector_v variant of the ValueD<‘r> datatype, which en- strated in the previous section, we choose to write our inter- sures the saftey of array accesses and supports querying the preter in a trampoline style, rather than in a continuation vector’s length. passing style. This gives rise to a simpler implementation, At the present time, the struct Value declaration serves one more in keeping with a byte-code interpreter. Previ- little purpose, as it simply embeds a datatype ValueD<‘r> ous work requires a continuation passing style, with each as the only member of a structure. In Section 2.4, we will continuation polymorphic in the heap region, complicating see how the garbage collector extends this structure with a the garbage collector interface. Also, as will be explained, forwarding pointer. the use of dynamic-region sequences and the next_rgn type Expressions are immutable – they do not change during constructor gives rise to a more natural typing of forwarding the execution of the program. However, as Cyclone is a C- pointers. like language, an expression is represented using allocated The remainder of this section proceeds in the following data-structures and pointers, in a manner similar to val- manner. First, we briefly review the copying algorithm. ues. Stacks and environments are represented as linked list Next, we consider an intuitive implementation in Cyclone. structures, where NULL corresponds to an empty stack or an Finally, we refine our implementation using our dynamic- empty environment. region sequence abstraction to achieve a complete implemen- We conclude this section by examining a simple Core tation in Cyclone. Scheme interpreter (see Figure 5), in which all data is al- Figure 6 illustrates a simple stop-and-copy collector. The located in the Cyclone heap and managed by a conservative collector stops the user program and begins with (live and garbage collector. The function scheme takes one argument, dead) cells in the From-space and an empty To-space. The pointer to a function that takes a region handle argument collector traverses the data structure in From-space, copying and returns an expression allocated in that region. (Note each live cell into To-space when the cell is first visited. The that the <‘r> annotation universally quantifies the region collector preserves sharing by leaving a forwarding pointer variable.) The remainder of the code is straightforward. in each From-space cell when it is copied. The forwarding First, the function pointer is executed to yield the initial ex- pointer points to the copied cell in To-space. Whenever a cell pression. Next, the initial environment is allocated. The ini- in From-space is visited, the copying function first examines tial expression and environment are combined with an empty the forwarding pointer. If it is non-NULL, then the copy stack to create the initial state. Finally, the interpreter re- 0 function returns the forwarding address. Otherwise, space peatedly makes transitions corresponding to σ → σ , until for the cell is reserved in To-space, the forwarding pointer termination. Note that all region parameters in expression, is set to the address of the reserved space, and the fields of environment, and state types are instantiated with the heap the cell are copied. After all live cells in From-space have region (‘H) and all functions performing allocations take the been copied (depicted in Figure 6), the collector can free heap-region handle (heap_region) as a parameter. (Unlike all memory in From-space and the program can continue the operational semantics given above, the state is implicitly execution, allocating new cells in To-space. side-effected and no state is returned by the stepi function.) In the copying algorithm, the separation of managed memory into a From-space and a To-space suggests a 2.4 The Garbage Collector natural correspondence with Cyclone’s regions. Clearly, the In this section, we describe a safe, copying garbage collec- LIFO discipline of Cyclone’s lexical regions is insufficient for tor for the Core Scheme language described above. Before our copying garbage collector, as we would like the lifetime proceeding, we point out some novelties in comparison to of From-space to end after the beginning but before the other type-preserving garbage collectors [15, 20]. As demon- end of To-space’s lifetime. Hence, we turn our attention to Cyclone’s dynamic regions, where it appears that we have unique pointers provide exactly this linearity. Thus, we sufficient expressiveness to write a simple copying garbage declare collector: struct DynamicRegionGen<‘r::R>; ... typedef struct DynamicRegionGen<‘r>*@notnull‘U // create the to-space’s key uregion_gen_t<‘r>; let NewDynamicRegion {<‘to> to_key} = new_ukey(); state_t<‘to> to_state; where struct DynamicRegionGen<‘r::R> is abstract. // open the from-space’s key Because uregion_gen_t<‘r> is a unique pointer, it serves { region from_r = open(from_key); as a capability; in particular, it will serve as a capability // open the to-space’s key to produce the key for > and the next { region to_r = open(to_key); generator. Taken together, a key and a generator form a to_state = copy_state(to_r, from_state); dynamic-region sequence: } } struct DynamicRegionSeq<‘r> { // free the from-space uregion_key_t<‘r> key; free_ukey(from_key); uregion_gen_t<‘r> gen; ... }; typedef struct DynamicRegionSeq<‘r> drseq_t<‘r>; While this captures the spirit of the garbage collector, a number of details remain. As we described in Section 2.3, Finally, the interface for dynamic-region sequences is given the struct Value<‘r::R> declaration is intended to be by the following: extended with a forwarding pointer. The question is: “what is the type of the forwarding pointer?” The an- struct NewDynamicRegionSeq { <‘r::R> swer is (something along the lines of): “a pointer to a struct DynamicRegionSeq<‘r> drseq; struct Value in To-space, whose forwarding pointer is a }; pointer to a struct Value in To-space’s To-space, whose struct NewDynamicRegionSeq new_drseq(); forwarding pointer . . . .” What we need is a name for all struct DynamicRegionSeq> of the unwindings of the infinite sequence of pointers. We next_drseq(uregion_gen_t<‘r> gen); provide such a name in the form of a type constructor, which maps region names to region names and generates an The new_drseq function creates a fresh dynamic-region se- infinite supply of region names: quence and returns the key to the first region and a genera- tor for the next element in the sequence. (Note that the typedef _::R next_rgn<‘r::R> new_drseq function returns a structure that existentially quantifies the region variable. Unpacking this existential Thus, we give each garbage-collected value a forwarding yields a region variable which does not conflict with any pointer in the following manner: other region name.) The next_drseq function returns the key for next_rgn<‘r> and a new generator. Because the gen- typedef struct Value<‘r>*‘r value_t<‘r::R>; erator is unique and the next_drseq function consumes it, struct Value<‘r::R> { it follows that we can only create one key for next_rgn<‘r>; value_t> forward; hence, the sequence of regions is linear. Furthermore, the datatype ValueD value; key (and region handle) for next_rgn<‘r> need only be cre- }; ated when next_drseq is called; no keys or handles need be pre-allocated. Opening and freeing a key is accomplished Note that although the region names ‘r and next_rgn<‘r> through the dynamic region operations: the open primitive are related, the lifetimes of their corresponding regions are and the free_ukey function. not. In a similar manner, value_t> is a well- Figures 7 and 8 present the Cyclone code implementing formed type anywhere in the scope of ‘r – even if the region the central functions in the Core Scheme garbage collector handle corresponding to next_rgn<‘r> has not been created. and interpreter. The struct GCState existentially binds the The original inspiration for the next_rgn type constructor current region being used as the Core Scheme heap; each comes from Hawblitzel et. al. [7], where type sequences time the interpreter or garbage collector needs to access or (mappings from integers to types) are used to index regions allocate data in this region, the state is unpacked and the by a region number (“epoch”), yielding the connection be- region opened. The doGC function directly implements the tween successive regions in a copying collector. example code given above, modified to pack and unpack the Operationally, we expect dynamic region sequences to state and to generate the To-space from the dynamic-region behave as dynamic regions, with access mediated by a key. sequence. The scheme function is very similar to the one In particular, we expect operations to create, open, and given in Figure 5, modified to create the dynamic-region free dynamic-region sequence keys. In addition, we need sequence and allocate the initial program and environment an operation to produce the key for next_rgn<‘r>. While in the initial dynamic region. The termination check uses a this operation yields an infinite supply of dynamic region goto in order to transfer control from a scope in which the sequence keys, it is not quite enough. We need to ensure current region is opened to a scope where the current region that the supply is linear; i.e., there is exactly one way to can be freed. This ensures that the final heap is reclaimed generate a key for next_rgn<‘r> for any ‘r. Cyclone’s before terminating the Core Scheme interpreter. // the gc state encapsulates // - the current region (existentially bound) // - a dynamic region sequence (containing a key and generator) // - the state (containing roots) typedef struct GCState { <‘r> drseq_t<‘r> key_gen; state_t<‘r> state; } gcstate_t; gcstate_t doGC(gcstate_t gcs) { // unpack the gc state, naming the existentially bound region (‘r) let GCState{<‘r> DynamicRegionSeq {from_key, from_gen}, from_state} = gcs; // generate the to-space (next_rgn<‘r>) let DynamicRegionSeq{to_key, to_gen} = next_drseq(from_gen); state_t> to_state; // open the from-space’s key { region from_r = open(from_key); // open the to-space’s key { region to_r = open(to_key); // copy the state and reachable data to_state = copy_state(to_r, from_state); } // pack the new gc state gcs = GCState{DynamicRegionSeq{to_key, to_gen}, to_state}; } // free the from-space and return the new gc state free_ukey(from_key); return gcs; } Figure 7: Garbage Collector

void scheme(exp_t<‘r> prog<‘r>(region_t<‘r>)) { // construct the initial heap let NewDynamicRegionSeq {<‘r> DynamicRegionSeq{key, gen}} = new_drseq(); // load the program and initial environment into the initial heap exp_t<‘r> e; env_t<‘r> env; { region r = open(key); e = prog(r); env = initial_env(r); } // construct the initial state state_t<‘r> state = State{NULL,env,{.expr = e}}; // construct the initial gc state with // - the dynamic region sequence // - the initial state struct GCState gcs = GCState{{key,gen},state}; while (true) { // unpack the current gc state, naming the existentially bound region (‘s) let GCState{<‘s> DynamicRegionSeq{key, gen}, state} = gcs; { // open the current heap region r = open(key); // take a fixed number of steps bool done = stepi(MAX_STEPS,r,&state); // check for termination if (done) { goto Finished; } } // pack the new gc state and allow a GC gcs = GCState{DynamicRegionSeq{key, gen}, state}; gcs = maybeGC(gcs); } Finished: // unpack the final gc state and free the final region let GCState{<‘r> DynamicRegionSeq{key, gen}, state} = gcs; free_ukey(key); } Figure 8: Dynamic Region Sequence Allocated Interpreter static $(frontier_t<‘r>, value_t>) Scheme value. The &Value{*f,*from_obj_} “label” makes copy_value(frontier_t<‘r> frontier, use of Cyclone’s pattern matching facilities, which pro- region_t> to_r, vide an efficient method of binding parts of large objects value_t<‘r> from_obj) { to new local variables. In particular, this pattern binds switch (from_obj) { case NULL: return $(frontier, NULL); f to the address of the forwarding pointer component of case &Value{*f,*from_obj_}: the Value<‘r> structure and from_obj_ to the address of // if a forwarding pointer is installed, the datatype ValueD<‘r> component. The comparison // return it and an unmodified frontier. *f != NULL checks for an installed forwarding pointer; if if (*f != NULL) return $(frontier,*f); one exists, it is returned with an unmodified frontier. If switch (from_obj_) { no forwarding pointer has been installed, then the value case &Closure_v(n,from_env,from_exp): // allocate a new Closure in to-space, variant is determined, a new object is allocated in To- // extracting the addresses of child pointers space (via rnew(to_r)), the forwarding pointer is installed let to_obj as (*f = to_obj), and child pointers are extracted and added &Value{_,Closure_v(_,*to_envp,*to_expp)} = to the frontier (via add_front). rnew(to_r) Value{NULL,Closure_v(n,NULL,NULL)}; We have also said little about the triggering of a garbage // install the forwarding pointer collection. Currently, the Cyclone region interface allows the *f = to_obj; // add children to the frontier programmer to query the current size of allocated data in a frontier = add_front(frontier,copy_env, region. We use a simple live-ratio method to set a heap limit, from_env,to_envp); which is checked after a fixed number of interpreter steps. frontier = add_front(frontier,copy_exp, It is straightforward to extend the Cyclone region interface from_exp,to_expp); to establish an upper bound on a region’s size, causing an // return new frontier and pointer exception to be raised when an allocation would exceed the return $(frontier, to_obj); case &Pair_v(from_obj1,from_obj2): limit. The only complication is how to resume the compu- // allocate a new Pair in to-space, tation after the exception has occurred. We must structure // extracting the addresses of child pointers parts of our interpreter and primitives to guarantee they can let to_obj as be correctly resumed in the case of such a region exhaustion &Value{_,Pair_v(*to_obj1p,*to_obj2p)} = exception is rasied. rnew(to_r) Value{NULL,Pair_v(NULL,NULL)}; // install the forwarding pointer *f = to_obj; 3. PERFORMANCE EVALUATION // add children to the frontier frontier = add_front(frontier,copy_value, Now that we have outlined the design and implementa- from_obj1,to_obj1p); tion of our safe Scheme interpreter and runtime system, we frontier = add_front(frontier,copy_value, wish to evaluate the performance of the system compared from_obj2,to_obj2p); to various alternatives. Because we have constrained our- // return new frontier and pointer selves to programming in a safe language, some standard return $(frontier,to_obj); ... implementation techniques that are applicable in unsafe im- } plementations are unavailable to us. We want to understand } how much of a performance penalty this causes. For com- } parison we compare our system with a safe garbage collector against three different other Scheme implementations. Figure 9: Copy Function (Values) The Scheme Implementations and Benchmark Pro- grams Thus far, we have said little about the actual copying of We summarize the systems we study below. Core Scheme objects. A careful inspection of the code in Fig- ure 8 reveals that expressions, environments, and stacks are Cyclone Safe GC - Our implementation of a Core Scheme allocated in the Scheme heap, in addition to values. Hence, interpreter with a safe GC also written in Cyclone; objects of each kind must be garbage collected. The copy- essentially, that of Figure 8. ing algorithm needs space to manage the traversal of live data.3 A recursive algorithm that uses the run-time stack Cyclone BDW GC - The same implementation of our to maintain state runs the risk of stack overflow. There- Core Scheme interpreter, but using the Bohem- fore, we must maintain an explicit frontier of objects to be Demers-Weiser conservative collector; essentially that forwarded. Our solution is to use unique pointer allocated of Figure 5. data-structures which are freed at the end of the garbage col- lection. We use unique pointers in order to immediately free SISC Sun JVM - A freely available high-performance im- the data-structure if it must be resized to accommodate a plementation of Scheme written in Java [14]. It larger frontier. During a garbage collection, three regions are supports first-class continuations and is fully tail- used: the From-space (‘r), the To-space (next_rgn<‘r>), recursive. It relies on the native Java VM for it’s run- and the unique region (‘U). time memory management. Figure 9 presents the Cyclone code for copying a Core MzScheme BDW GC - Widely used implementation of 3A Cheney-style copying collector cannot be accommodated Scheme written in C [16]. It also uses the Bohem- in this framework, as we cannot iterate through the objects Demers-Weiser conservative collector for storage man- allocated in a region. agement. The table below summarizes the safety guarantees pro- Total Execution Time vided by each system. Our first experiment simply measures the execution time Interpreter Runtime System for each benchmark. The execution times are measured in Cyclone Safe GC Safe Safe terms of absolute wall clock time and do not include start up overheads for any preprocessing or compilation. Table 1 Cyclone BDW GC Safe Unsafe summarizes the absolute execution times for the various sys- SISC Sun JVM Safe Unsafe tems on the benchmark programs. All times reported are MzScheme BDW GC Unsafe Unsafe in milliseconds on a lightly loaded 993Mhz dual-processor By comparing the performance of the Cyclone Safe GC and x86 with 256KB of L2 cache and 2GB of physical memory, Cyclone BDW GC, we hope to understand the performance running a Linux 2.4 kernel. cost of a safe garbage collector independent of the core in- terpreter. Comparing SISC with both our Cyclone versions The Cost of Safety gives us an idea of how efficient our core interpreter is com- With the exception of ctak, it is clear that MzScheme pared with another realistic safe implementation. Finally, is significantly faster than all of the other systems. The comparing MzScheme with the other systems give us an idea performance difference for ctak is due to the fact that of how much performance can be had if we forgo safety en- MzScheme does not bias the implementation in-order to sup- tirely. port cheap first-class continuations. Some preliminary ex- We report execution times and memory consumption for periments with stack allocation of activation frames in our all these systems on a subset of the Gabriel benchmarks4 Cyclone implementation suggest that the performance dif- translated into Scheme. ference between the systems can not be simply be accounted for because of stack allocation. Being unsafe, MzScheme is Important Caveats able to use more compact and efficient data representations Before we begin, we should highlight some important differ- such as using the lower order bits of a pointer to encode type ences between the various systems in-order to properly in- tags. The other systems implemented in safe languages are terpret the meaning of the benchmarks. The first caveat is forced to use more heavyweight representations. that not all systems implement the same subset of Scheme. The Cyclone-based systems implement the bare minimum Comparison of Safe Interpreters set of primitives in order to properly execute the bench- If we examine the ratio of execution times with respect to the mark applications. The Cyclone-based systems also have a Cyclone Safe GC (presented in parentheses in Table 1), we space leak associated with the storage of symbols created can see the relative costs of safety. Looking at the normal- via string->symbol. This leak occurs because we do not ized execution times it is clear that all three safe systems are have support for weak pointers to properly collect useless roughly comparable. Our Cyclone implementation is faster entries from the symbol table. It is straightforward to ex- in many cases when compared to SISC, but is slower in oth- tend our system to do so. The Cyclone system does not ers cases. Given that a great deal of effort has been placed implement the full numeric tower or multiple return val- in optimizing SISC as well as the Java VM on which it runs, ues. Doing both is straightforward. We do not expect the we are happy with the performance of our core interpreter. lack of these features to significantly skew our results. Like We hope to make the gap between the Cyclone based in- the other implementations our Cyclone implementations are terpreter meet or exceed the performance of SISC in all the fully tail-recursive and support first-class continuations. benchmarks. MzScheme stack allocates activations frames and uses Doing so will probably require changing our rather setjmp/longjmp to implement first-class continuations, straightforward implementation of the Scheme operational while all of the other implementations heap allocate acti- semantics into more optimized code. We have applied many vation frames. This difference is apparent in the one bench- of the key algorithmic optimizations [14] used by SISC into mark (ctak) which heavily relies on first-class continuations. our interpreter, but still believe there is room for optimiza- Our safe GC uses a copying GC, the BDW collector uses a tion of the core implementation as well as library procedures. mark-sweep GC. To the best of our knowledge the JVM uses We also believe having the flexible memory management a hybrid two-generation scheme, where the first generation primitives that Cyclone offers may also provide a significant use a copying GC and the second generation uses a mark- boost in the performance of web application servers. We sweep algorithm. These algorithmic difference makes fair discuss this in more detail in Section 5.1. comparisons tricky to do at best. In performing the benchmarks we do not artificially limit Comparison of Safe vs. Unsafe GC the heap usage of each program. Since each program uses a Examining the execution times of Cyclone Safe GC and Cy- different amount of heap space and space can be traded for clone BDW GC, we can see that our safe collector does not time, it is not fair to simply examine execution times without add any significant run-time penalty. In fact in many cases taking into consideration memory usage. However, since it is it is slightly faster. However, this may be a misleading con- tricky and difficult to limit memory usage to create a “fair” clusion since the underlying GC algorithms are different. playing field we simply report the time and space metrics for Our safe GC uses a two-space copying GC while the com- “out of the box” configurations. Even with these caveats we parable BDW GC is using a mark-sweep algorithm. This can still come to some sound qualitative conclusions. difference is obvious when we look at the memory footprint 4Note nboyer is a corrected version of the original boyer of the various implementations. benchmark that fixes correctness bugs in the original bench- We provide two metrics to help understand memory con- mark. sumption. One is based on the garbage collector’s view of Cyclone Cyclone SISC MzScheme Safe GC BDW GC Sun JVM BDW GC cpstak 248 (1.00) 263 (1.06) 279 (1.12) 93 (0.37) ctak 357 (1.00) 381 (1.07) 412 (1.15) 3590 (10.05) deriv 1160 (1.00) 1145 (0.99) 660 (0.57) 190 (0.16) destruct 1027 (1.00) 1009 (0.98) 1027 (1.00) 351 (0.34) div-iter 466 (1.00) 467 (1.00) 360 (0.77) 139 (0.30) div-rec 448 (1.00) 453 (1.01) 363 (0.81) 133 (0.30) fft 234 (1.00) 223 (0.96) 450 (1.92) 36 (0.16) nboyer 4292 (1.00) 5018 (1.17) 3431 (0.80) 867 (0.20) puzzle 55 (1.00) 62 (1.13) 51 (0.93) 11 (0.20) tak 209 (1.00) 226 (1.08) 375 (1.79) 44 (0.21) takl 2134 (1.00) 2281 (1.07) 2010 (0.94) 434 (0.20) takr 241 (1.00) 258 (1.07) 238 (0.99) 46 (0.19) traverse 19060 (1.00) 23594 (1.24) 16632 (0.87) 4168 (0.22)

Table 1: Total Execution Time in Milliseconds and with Ratio of Execution Times Normalized to Cyclone Safe GC in Parentheses the current heap size. The other is based on the real mem- tion. While the first numbers give us some insight into the ory in use as reported by the underlying . algorithmic consumption of space, these numbers allow us Unfortunately, there is no easy portable way to measure to understand how those numbers may translate into real the actual heap usage of each system without including the system performance. startup costs associated with each system. It is clear from both measurements that our Cyclone im- The numbers we report include the overhead for start up plementation using the Safe GC has a significantly larger and compilation/parsing of the benchmark programs. This footprint when compared to the equivalent implementation makes SISC and MzScheme look artificially worse since we that uses the BDW collector. This difference is accounted are also measuring the overhead of their entire system, in- for by the use of a copying GC compared to a mark-sweep cluding the infrastructure need to support an interactive GC. REPL. We provide these numbers for completeness, but will Figure 10 show the resident set size as a function of time focus on the comparisons between our two different Cyclone for the various systems during the execution of the nboyer versions. Both these systems have comparably small startup benchmark. The graphs clearly show the difference in the costs. algorithms used, and show that the actual working set size To collect information about heap size, we enable de- at a particular point in time can be significantly smaller bugging information printed at each invocation of the GC. than our maximum set. Regardless of this fact, the extra For our own Cyclone Safe GC collector we simply instru- copying will have a negative effect on overall performance if mented our collector appropriately. For the BDW collector we consider the costs of servicing page faults. In our cur- we set the internal variable GC_print_stats which causes rent experiments, memory is not a scare resource, so most the BDW collector to report the heap size after each GC. page faults can be serviced without going to disk. In a real- For the JVM we simply enabled the -verbosegc option to istic server environment, all others things being equal, our collect information about heap sizes. Table 2 summarizes copying GC will perform worse. the maximum heap size as reported by the various collec- Although a given server typically will have large amounts tors. Note that we do not report some numbers for pro- of memory available, that memory must be shared across grams running under MzScheme because these programs do many different instances of an application. Each applica- not cause a garbage collection to occur. tion has a limited amount of memory. Therefore, it is unre- If we look at the heap sizes for our two Cyclone systems, alistic to assume that memory is cheap or free. We should we see that for small programs our heap sizes are compa- explore how to implement other techniques such as mark- rable, however one can see systematic difference in larger sweep or generational-collection algorithms for our approach programs such as nboyer and traverse. The heap resizing to be competitive with other unsafe approaches. However, policy of our safe copying collector resizes the heap based our overall approach has other system level benefits that on the amount of reachable data that remains in the heap may recover or ameliorate these costs in a web-application after a collection and a tunable “live ratio” parameter. In server. We touch on these benefits in Section 5.1. all experiments we set our live ratio to 4.0. A larger live ra- tio amortizes the cost of the a GC over more of the program 4. SIZE OF UNSAFE CODE execution by consuming more space. While heap sizes is a useful metric in a system with vir- The primary goal of our approach is not to produce a tual memory, a more important metric is the amount of faster system, but to produce a system which we believe is physical memory needed to hold the program’s working set. safer. It is important to compare the amount of unsafe code Table 3 summarizes the size of the maximum resident set in the system. Table 4 summarizes the approximate amount of virtual memory pages as measured by repeated polling of unsafe C code used in the implementation of each system. of the /proc/ file system for each benchmark during execu- The line counts ignore comments and whitespace. The amount of unsafe C code need to implement the ba- Cyclone Cyclone SISC MzScheme Safe GC BDW GC Sun JVM BDW GC cpstak 276 (1.00) 292 (1.06) 3306 (11.98) 1024 (3.71) ctak 282 (1.00) 292 (1.04) 3312 (11.74) 3180 (11.28) deriv 381 (1.00) 392 (1.03) 3319 (8.71) 1024 (2.69) destruct 327 (1.00) 392 (1.20) 3364 (10.29) 1024 (3.13) div-iter 360 (1.00) 392 (1.09) 3316 (9.21) 1024 (2.84) div-rec 390 (1.00) 392 (1.01) 3319 (8.51) 1024 (2.63) fft 794 (1.00) 700 (0.88) 4065 (5.12) N/A nboyer 8913 (1.00) 2232 (0.25) 5522 (0.62) 2012 (0.23) puzzle 1554 (1.00) 700 (0.45) 3438 (2.21) 1024 (0.66) tak 269 (1.00) 292 (1.09) 3291 (12.23) N/A takl 282 (1.00) 392 (1.39) 3301 (11.71) N/A takr 955 (1.00) 524 (0.55) 3510 (3.68) 1024 (1.07) traverse 4984 (1.00) 1672 (0.34) 4227 (0.85) 2012 (0.40)

Table 2: Maximum Heap Size in Kilobytes Observed at GC Points with Ratio of Sizes Normalized to Cyclone Safe GC in Parentheses.

Cyclone Cyclone SISC MzScheme Safe GC BDW GC Sun JVM BDW GC cpstak 298 (1.00) 241 (0.81) 5072 (17.02) 652 (2.19) ctak 301 (1.00) 239 (0.79) 5071 (16.85) 1203 (4.00) deriv 335 (1.00) 276 (0.82) 5076 (15.15) 651 (1.94) destruct 316 (1.00) 276 (0.87) 5097 (16.13) 652 (2.06) div-iter 323 (1.00) 271 (0.84) 5067 (15.69) 650 (2.01) div-rec 301 (1.00) 271 (0.90) 5070 (16.84) 653 (2.17) fft 428 (1.00) 374 (0.87) 5299 (12.38) 627 (1.46) nboyer 3092 (1.00) 844 (0.27) 5657 (1.83) 947 (0.31) puzzle 593 (1.00) 380 (0.64) 5110 (8.62) 628 (1.06) tak 274 (1.00) 238 (0.87) 5068 (18.50) 589 (2.15) takl 292 (1.00) 240 (0.82) 5068 (17.36) 588 (2.01) takr 563 (1.00) 388 (0.69) 5160 (9.17) 628 (1.12) traverse 1828 (1.00) 628 (0.34) 5338 (2.92) 883 (0.48)

Table 3: Maximum Working Set in Virtual Pages with Ratio of Sizes Normalized to Cyclone Safe GC in Parentheses. Cyclone (Safe GC) Cyclone (BDW GC) 100 3090 100 845

80 2472 80 676

60 1854 60 507

40 1236 40 338

20 618 Working Set Size 20 169 Working Set Size % of Max. Working Set % of Max. Working Set 0 0 0 0 Time Time

SISC (Sun JVM) MzScheme (BDW GC) 100 5655 100 945

80 4524 80 756

60 3393 60 567

40 2262 40 378

20 1131 Working Set Size 20 189 Working Set Size % of Max. Working Set % of Max. Working Set 0 0 0 0 Time Time

Figure 10: Working Set Over Time for nboyer

Interpreter Runtime System system, running Java based servers must include a JIT and Cyclone Safe GC 0 1800 bytecode verifier in order to perform dynamic loading. So Cyclone BDW GC 0 9000 including the optimizing JIT compiler as part of the unsafe SISC Sun JVM 0 229,100 code of the system is reasonable. Other approaches can MzScheme BDW GC 31,000 9000 dynamically load native optimized code in a manner that does not require a heavy weight JIT [8]. MzScheme has a relatively compact implementation, but Table 4: Approximate Number of Lines of Unsafe C the total size of the system core is still on the order of 40,000 Code. lines of code. sic region primitives needed for our safe GC is very small 5. CONCLUSION in comparison to other systems. The vast majority of the Compared to other systems, it is clear that our approach code (1200 lines) is associated with a fast implementation of can significantly reduce the amount of unsafe code needed malloc[13], on which the region primitives are built. In the- to implement a system. This increases our confidence in ory one could build the region primitives on a much simpler the safety and security of the entire system. Unfortunately, underlying storage system. our copying-collection technique may incur a performance The BDW system is roughly 9000 lines of code. Despite its penalty for this extra degree of safety. Hopefully, in the wide usage and relatively small size there are several known future we can reduce this performance penalty so that the bugs that can cause the collector to fail in an unsafe way extra safety comes with little or no cost at all. because of compiler optimizations or unsafe pointer manip- ulations [2]. Simple verification of the core 9000 line of code 5.1 Cyclone vs. Java and Other Safe VMs does not guarantee safety of the system so the line count is We should also emphasize that our approach potentially misleadingly on the low side. Arguably, one has to verify has some significant system-level performance advantages, the entire system it is linked with and verify the client does when compared to a pure Java web application framework not violate any invariants needed for the safe operation of or any other safe virtual machine based approach. Typically, the collector. each web application will be allocated a separate thread of Appel and Wang [1] analyze the size of the safety TCB control to handle each web request. These threads are cre- for several Java VMs. We quote their number used for the ated frequently and have short lifetimes. In a pure Java Hotspot JIT compiler, which is now the standard Java VM environment there is no way to bound the allocation of a shipped by Sun. Of that 229,100 lines of code, the majority given Java thread or cheaply reclaim all the storage associ- is the actual optimizing compiler. Since in a web application ated with a thread when it dies. We must rely on a system framework there needs to be a facility to dynamically extend wide GC to efficiently reclaim storage for all our threads. the functionality of a running server without halting the The only other alternative is to spawn a heavy-weight sys- tem process. Technical Report CS-TR-4514, University of Maryland In a pure Cyclone system we can allocate a unique Scheme Department of Computer Science, July 2003. interpreter and garbage collector with its own private heap [10] Jetty Web Server and Servlet Container, 2003. on a per thread basis. We can tune the initial and maxi- http://jetty.mortbay.org/. mum heap sizes as well as the heap resizing policy on a per [11] T. Jim, G. Morrisett, D. Grossman, M. Hicks, thread basis taking into account knowledge about the par- J. Cheney, and Y. Wang. Cyclone: A safe dialect of C. ticular application servicing a request. More importantly, In USENIX Annual Technical Conference, pages we can immediately reclaim all the storage associated with 275–288, Monterey, CA, June 2002. a terminated thread, without requiring a system wide GC. [12] R. Kelsey, W. Clinger, and J. R. (Eds.). Revised5 These per thread policies are simply not possible if one relies report on the algorithmic language Scheme. on a system wide GC. Exploiting these large scale system- Higher-Order and Symbolic Computation, 11(1), level benefits is one area which we would like to explore in August 1998. the future. [13] D. Lea. A memory allocator. In general, if an application can benefit from a customized http://gee.cs.oswego.edu/dl/html/malloc.html. runtime system a pure Cyclone approach, allows the de- [14] S. G. Miller. SISC: A complete scheme interpreter in veloper to customize and deploy a runtime system without java. http://sisc.sourceforge.net/sisc.pdf, 2003. compromising the safety of the system. This level of cus- tomization is not typically available in systems like Java. [15] S. Monnier, B. Saha, and Z. Shao. Principled scavenging. Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Acknowledgements Implementation, pages 81–91, 2001. We benefited from discussions with Greg Morrisett, who also [16] MzScheme, 2003. contributed to the development of the Scheme interpreter http://www.plt-scheme.org/software/mzscheme. and garbage collector. [17] C. Queinnec. The influence of browsers on evaluators or, continuations to program web servers. In Proceedings of the fifth ACM SIGPLAN international 6. REFERENCES conference on Functional programming, pages 23–33. [1] A. W. Appel and D. C. Wang. JVM TCB: ACM Press, 2000. Measurements of the trusted computing base of Java [18] M. Tofte and J.-P. Talpin. Region-based memory virtual machines. Technical Report CS-TR-647-02, management. Information and Computation, Princeton University, Apr. 2002. 132(2):109–176, Feb. 1997. [2] H.-J. Boehm. Simple garbage-collector safety. In [19] D. Wagner, J. S. Foster, E. A. Brewer, and A. Aiken. Proceedings of SIGPLAN’96 Conference on A first step towards automated detection of buffer Programming Languages Design and Implementation, overrun vulnerabilities. In Network and Distributed ACM SIGPLAN Notices, pages 89–98. ACM Press, System Security Symposium, pages 3–17, San Diego, 1996. CA, February 2000. [3] H.-J. Boehm and M. Weiser. Garbage collection in an [20] D. Wang and A. Appel. Type-preserving garbage uncooperative environment. Software – Practice and collectors. Conference Record of the Twenty-Eigth Experience, 18(9):807–820, 1988. Annual ACM Symposium on Principles of [4] Chicken: A Scheme-to-C Compiler, 2003. Programming Languages, pages 166–178, 2001. http://www.call-with-current-continuation.org/ [21] A. K. Wright and M. Felleisen. A syntactic approach chicken.html. to type soundness. Information and Computation, [5] Cyclone user’s manual. Technical Report 2001-1855, 115(1):38–94, 1994. Department of Computer Science, Cornell University, Nov. 2001. Current version at http://www.cs.cornell.edu/projects/cyclone/. [6] D. Grossman, G. Morrisett, T. Jim, M. Hicks, Y. Wang, and J. Cheney. Region-based memory management in Cyclone. In Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 282–293, Berlin, Germany, June 2002. [7] C. Hawblitzel, H. Huang, E. Krupski, and E. Wei. Low-level linear memory management. http://www.cs.dartmouth.edu/~hawblitz/publish/ linearmem-draft4.ps, Dec. 2002. [8] M. Hicks, J. T. Moore, and S. Nettles. Dynamic software updating. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI’01), June 2001. [9] M. Hicks, G. Morrisett, D. Grossman, and T. Jim. Safe and flexible memory management in Cyclone.