Cyclone: a Type-Safe Dialect of C∗
Total Page:16
File Type:pdf, Size:1020Kb
Cyclone: A Type-Safe Dialect of C∗ Dan Grossman Michael Hicks Trevor Jim Greg Morrisett If any bug has achieved celebrity status, it is the • In C, an array of structs will be laid out contigu- buffer overflow. It made front-page news as early ously in memory, which is good for cache locality. as 1987, as the enabler of the Morris worm, the first In Java, the decision of how to lay out an array worm to spread through the Internet. In recent years, of objects is made by the compiler, and probably attacks exploiting buffer overflows have become more has indirections. frequent, and more virulent. This year, for exam- ple, the Witty worm was released to the wild less • C has data types that match hardware data than 48 hours after a buffer overflow vulnerability types and operations. Java abstracts from the was publicly announced; in 45 minutes, it infected hardware (“write once, run anywhere”). the entire world-wide population of 12,000 machines running the vulnerable programs. • C has manual memory management, whereas Notably, buffer overflows are a problem only for the Java has garbage collection. Garbage collec- C and C++ languages—Java and other “safe” lan- tion is safe and convenient, but places little con- guages have built-in protection against them. More- trol over performance in the hands of the pro- over, buffer overflows appear in C programs written grammer, and indeed encourages an allocation- by expert programmers who are security concious— intensive style. programs such as OpenSSH, Kerberos, and the com- In short, C programmers can see the costs of their mercial intrusion detection programs that were the programs simply by looking at them, and they can target of Witty. easily change data representations and fundamental This is bad news for C. If security experts have strategies like memory management. It’s easy for a trouble producing overflow-free C programs, then C programmer to tune their code for performance or there is not much hope for ordinary C program- for resource constraints. mers. On the other hand, programming in Java is Cyclone is a dialect of C that retains its trans- no panacea; for certain applications, C has no com- parency and control, but adds the benefits of safety petition. From a programmer’s point of view, all the (i.e., no unchecked run-time errors.) In Cyclone, safe languages are about the same, while C is a very buffer overflows and related bugs are prevented for different beast. all programs, whether written by a security expert or by a novice. The changes required to achieve safety are pervasive, but Cyclone is still recognizably C; in Cyclone fact, a good way to learn 80% of Cyclone is to pick up Kernighan and Ritchie. Cyclone is an effort to bring safety to C, without Safety has a price. Figure 1 shows the performance turning it into another Java. We can sum up why of Cyclone and Java code normalized to the perfor- we prefer C to Java in two words: transparency and mance of C code for most of the micro-benchmarks control. A few examples will make this clear. in the Great Programming Language Shootout (see ∗This is preprint of an article that appeared in the C/C++ http://shootout.alioth.debian.org/.) Though User’s Journal, January 2005. these micro-benchmarks should be taken with a large 1 10 8 gcc cyclone java 6 4 2 normalized elapsed time 0 ackermannecho fibo nestedlooprandomsieve sumcolwc matrixreversefileary3 hash hash2 heapsortmomentslists spellcheckstrcat wordfreqregexmatchprodcons average Figure 1: Great Programming Language Shootout Performance for C, Cyclone, and Java grain of salt, they do give a rough idea of the relative In addition to time, programmers may be worried performance of the different languages. about space. We have found that, again, there are overheads when using Cyclone as compared to C, but The benchmarks were run on a dual 2.8GHz/2GB these overheads are much less than for Java. For Red Hat Enterprise workstation. We used Cyclone instance, the C version of the heapsort benchmark version 0.8.2 with the -O3 flag, Sun’s Java client had a maximum resident working set size of 472 4KB SDK build version 1.4.2 05-b04, and GCC version pages, the Cyclone version used 504 pages, and the 3.2.3 with the -O3 flag. To avoid measuring start- Java version used 2,471. up and compilation time for the Java VM, we mea- There is another cost to achieving safety, namely, sured elapsed time after a warmup run. We also did the cost of porting a program from C to a safe lan- measurements using Sun’s server SDK and the GNU guage. Porting a program to Java essentially involves Java compiler gcj, and the results were essentially a complete rewrite, for anything but the simplest pro- the same as the Sun client SDK results. Each re- grams. In contrast, most of the Shootout benchmarks ported number in Figure 1 is the median of 11 trials; can be ported to Cyclone by touching 5 to 15 percent there was little variance. of the lines of code. To achieve the best performance, The average over all of the benchmarks (plot- programmers may have to provide additional infor- ted on the far right) shows that Cyclone is about mation in the form of extended type qualifiers, which 0.6 times slower than GCC, whereas Sun’s Java is we describe below. Of course, the number of lines about 6.5 times slower. However, the moments and of code that changed tells us little about how hard lists benchmarks are outliers for which the Java it is to make those changes. In all honesty, this can code has very bad performance (15× and 74× slow- still be a time-consuming and frustrating task. Nev- down respectively—so bad that we’ve had to clip the ertheless, it is considerably easier than rewriting the graph.) If we assume these were poorly coded and program from scratch in a new language. remove them from the averages, then we still see a Like Java, Cyclone gives strong safety guarantees. factor of 3 slowdown for the Java code, compared to There are some overheads, both in terms of run-time 0.4 for Cyclone. performance and porting costs, but the overheads of 2 Cyclone compare quite favourably to Java. For the #include <stdio.h> rest of the article, we will describe some features of int main(int argc, char *@fat *@fat argv) { Cyclone that allow us to achieve safety while mini- argc--; argv++; /* skip command name */ mizing these costs. while (argc > 0) { printf(" %s",*argv); argc--; argv++; Pointers } printf("\n"); A common way of preventing buffer overruns in lan- return 0; guages such as Java is to use dynamic bounds checks. } For example, in Java whenever you have an expres- sion like arr[i], the VM may check at run time Figure 2: Echoing command-line arguments whether i is within the bounds of the array arr. While this check may get optimized away, the pro- grammer has no way to be sure of this. Moreover, the programmer has no control over the memory repre- enced or its contents are assigned to, Cyclone inserts sentation of the array, e.g., to interact with hardware- a bounds check. This guarantees that a @fat pointer defined data structures in an operating system. Fi- can never cause a buffer overflow. nally, Java does not allow pointer arithmetic for it- Because of the bounds information contained in erating over the array, which is quite common in C @fat pointers, argc is superfluous: you can get the code. size of argv by writing numelts(argv). We’ve kept Cyclone provides a middle ground between C and argc as an argument of main for backwards com- Java when it comes to pointers. On the one hand, patibility. Because @fat pointers are common, one Cyclone will perform dynamic checks when it cannot can abbreviate * @fat as ? (question mark). So we be sure that dereferences are in bounds, and throw could write char *@fat *@fat as simply char ??. an exception when necessary. On the other hand, Quite often, porting C code to Cyclone merely means the programmer can choose from a variety of pointer changing some *’s to ?’s. qualifiers to have control over when such checks may occur, and how pointers are represented in memory. Cyclone basically supports three kinds of pointers: Thin Pointers fat pointers, thin pointers, and bounded pointers. Many times, there is no need to include bounds in- Fat Pointers formation on a pointer. For example, if we declare A fat pointer is similar to a Java array, in that it might incur a dynamic bounds check. A fat pointer int x = 3; is denoted by writing the qualifier @fat after the *. int *y = &x; For example, Figure 2 shows a program that echoes its command-line arguments. Except for the decla- then it is clear that y is a pointer to a single integer ration of argv, which holds the command-line argu- 3 (the contents of x). Just as in C, y is represented ments, the program looks just like you would write by a memory address (namely, the address of x); this it in C: pointer arithmetic (argv++) is used to move is why we call it a thin pointer. A dereference of a argv to point to each argument in turn, so it can be thin pointer (e.g., with syntax *y) will never incur a printed. The difference is that a @fat pointer comes bounds check.