Implementation and Performance Evaluation of a Safe Runtime System in Cyclone
Total Page:16
File Type:pdf, Size:1020Kb
Implementation and Performance Evaluation of a Safe Runtime System in Cyclone ∗ Matthew Fluet Daniel Wang Cornell University Princeton University Department of Computer Science Department of Computer Science 4133 Upson Hall 35 Olden Street Ithaca, NY 14853 Princeton, NJ 08544 fl[email protected] [email protected] ABSTRACT related security problems found in applications written in In this paper we outline the implementation of a simple C. In addition to ensuring safety, these high-level languages Scheme interpreter and a copying garbage collector that are more convenient and preferable to low-level system lan- manages the memory allocated by the interpreter. The en- guages for building web based applications. One major con- tire system including the garbage collector is implemented venience, that also improves security, is automatic memory in Cyclone [11], a safe dialect of C, which supports safe and management. explicit memory management. We describe the high-level Although the majority of web based applications are writ- design of the system, report preliminary benchmarks, and ten in safe languages, the applications are typically hosted compare our approach to other Scheme systems. Our pre- on servers or web application platforms that are imple- liminary benchmarks demonstrate that one can build a sys- mented with unsafe languages such as C. For example, Perl tem with reasonable performance when compared to other based web applications are hosted on the Apache web server approaches that guarantee safety. More importantly we can by dynamically loading a native code module that imple- significantly reduce the amount of unsafe code needed to ments the Perl interpreter. Since the interpreter is written implement the system. Our benchmarks also identify some in C, we should be concerned that a bug in the module may key algorithmic bottlenecks related to our approach that we introduce a security hole. In fact since Apache itself is writ- hope to address in the future. ten in C, we must be concerned about its immunity to buffer Although most of the theoretical ideas in this work are not overruns. by themselves novel, there are a few interesting variations Some web application platforms such as Jetty [10] are and combinations of the existing literature we will discuss. written completely in Java. Even in this situation, we must However, the primary motivation and goal is to build a re- trust the implementation of the Java Virtual Machine to be alistic working system that puts all the existing theory to free from bugs. The implementation of a JVM contains a test against existing systems which rely on larger amounts significant amount of code written in unsafe languages. of unsafe code in their implementation. Reducing or eliminating code written in unsafe languages from the implementation of a web application server in- creases our confidence that the server is immune to com- 1. INTRODUCTION mon security bugs. There already has been a great deal Network servers written in unsafe languages, such as C, of work demonstrating how to build dynamically extensible are a significant source of known security exploits [19]. For- web servers in safe programming languages [8]. What has tunately, many new web based applications are written in not been addressed is how to implement an interpreter that high-level, safe languages such as C#, Java, Perl, PHP, executes the actual web application itself. Python, or Tcl. Web based applications written in safe lan- Implementing the core of an interpreter in a safe language guages are immune to common buffer overflow and other is not in itself a significant challenge. What is challeng- ∗ ing is implementing the runtime system for the interpreter This research was supported in part by ARDA contract that provides automatic memory management. Currently, NBCHC030106; this work does not necessarily reflect the all systems rely on a trusted runtime system to provide some opinion or policy of the federal government and no official endorsement should be inferred. sort of storage management. Significant advances in type and proof systems have made it possible to implement a run- time system that provides garbage-collection services using programming languages that guarantee safety. Permission to make digital or hard copies of all or part of this work for In this paper we outline the implementation of a sim- personal or classroom use is granted without fee provided that copies are ple Scheme interpreter and a copying garbage collector that not made or distributed for profit or commercial advantage and that copies manages the memory allocated by the interpreter. The en- bear this notice and the full citation on the first page. To copy otherwise, to tire system including the garbage collector is implemented republish, to post on servers or to redistribute to lists, requires prior specific in Cyclone [11], a safe dialect of C, which supports the safe permission and/or a fee. SPACE2004 2004 Venice, Italy and explicit management of memory. In Section 2 we mo- Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. tivate and describe the high-level design of the system. In Section 3 we report some preliminary benchmarks and com- one cannot cast an integer to a pointer. (Cyclone accepts 0 pare our approach to other Scheme systems. In Section 4 as a legal pointer value, but using NULL (a Cyclone keyword) we compare the amount of unsafe code needed to implement is preferred.) Second, one cannot perform arbitrary pointer our system. Our preliminary benchmarks demonstrate that arithmetic on a * pointer, which would also allow the pro- one can build a system with reasonable performance when grammer to overwrite arbitrary memory locations. Finally, compared to other approaches that guarantee safety. More Cyclone inserts a null check whenever the program deref- importantly we can significantly reduce the amount of un- erences a * pointer. While these may appear to be drastic safe code needed to implement the system. Our benchmarks restrictions, there are many patterns of pointer usage that also identify some key algorithmic bottlenecks related to our are unaffected by them. For example, in our interpreter, approach that we hope to address in the future in order to Scheme values are naturally represented by nullable pointers build a realistic production system. to data-structures, where NULL corresponds to the Scheme value nil. 2. WRITING A SCHEME INTERPRETER A variation on a nullable pointer is a non-null pointer, written t*@notnull. Such a pointer is guaranteed to not IN CYCLONE be NULL, which eliminates the need for null checks at deref- erences. In addition to this performance benefit, non-null 2.1 Why Scheme pointers capture a useful programming invariant which can We have chosen Scheme as our test bed language, primar- be statically checked. For example, in our interpreter, the 1 ily because of the relative ease of implementation and the abstract machine state is represented as a non-null pointer existence of well known benchmarks. Our implementation to a data-structure, which is side-effected by each transition is small, because we rely on an existing Scheme front-end [4] of the abstract machine. to expand the majority of the full Scheme language into a A third kind of pointer available in Cyclone is a fat core Scheme subset. The existing front-end handles parsing pointer, written t*@fat. A fat pointer comes with bounds and macro expansion of full Scheme. We take the resulting information (thus, is \fatter" than a traditional pointer). output and emit Cyclone source code that builds an abstract Each time a fat pointer is dereferenced or its contents are syntax tree, which we compile and link with our interpreter assigned to, Cyclone inserts both a null check and a bounds and runtime system. check.2 The bounds information and these run-time checks Not only is Scheme a well studied and compact language, ensure the safety of pointer arithmetic and array indexing. but it also has many features that make it desirable for web In addition, a fat pointer allows its size to be queried at run programming. In particular first class continuations allows time. In our interpreter, a Scheme vector is represented by for a much more natural structuring of web applications [17]. a fat pointer to an array of values. Scheme seems to be developing a large and useful set of tools for manipulating XML. Given that IEEE Scheme was Regions the original processing language for SGML from which XML and HTML are derived, we see a great future for Scheme as In the description of pointers above, we have demonstrated a web programming language. the ways in which Cyclone prevents programs from access- ing arbitrary memory. Another violation is to dereference a 2.2 Key Features of Cyclone dangling pointer: a pointer to storage that has been deal- Cyclone [5] is a safe dialect of C. Cyclone attempts to located. To prevent such violations, Cyclone adopts a type give programmers significant control over data represen- system for region-based memory management, based on the tation, memory management, and performance (like C), work of Tofte and Talpin [18], and described in detail in while preventing buffer overflows, format-string attacks, and previous work [6]. Here, we discuss only the aspects most dangling-pointer dereferences (unlike C). Cyclone ensures crucial to our implementation. the safety of programs through a combination of compile- Cyclone pointer types have the form t*`r, where t is the time and run-time checks. Cyclone's combination of per- type of the pointed to object and `r is a region name describ- formance, control, and safety make it a good language for ing the object's lifetime. The Cyclone type system tracks writing low-level software, like runtime systems.