Link-Time Improvement of Scheme Programs
Total Page:16
File Type:pdf, Size:1020Kb
Link-time ImprovementofScheme Programs Saumya Debray Rob ert Muth Scott Watterson Department of Computer Science University of Arizona Tucson, AZ 85721, U.S.A. fdebray, muth, [email protected] Technical Rep ort 98-16 Decemb er 1998 Abstract Optimizing compilers typically limit the scop e of their analyses and optimizations to individua l mo dules. This has two drawbacks: rst, library co de cannot b e optimized together with their callers, which implies that reusing co de through libraries incurs a p enalty; and second, the results of analysis and optimization cannot b e propagated from an application mo dule written in one language to a mo dule written in another. A p ossible solution is to carry out additional program optimization at link time. This pap er describ es our exp eriences with such optimization using two di erent optimizing Scheme compilers, and several b enchmark programs, via alto, a link-time optimizer we have develop ed for the DEC Alpha architecture. Exp eriments indicate that signi cant p erformance improvements are p ossible via link-time optimization even when the input programs have already b een sub jected to high levels of compile-time optimization . This work was supp orted in part by the National Science Foundation under grants CCR-9502826 and CCR 9711166. 1 Intro duction The traditional mo del of compilation usually limits the scop e of analysis and optimization to individual pro cedures, or p ossibly to mo dules. This mo del for co de optimization do es not take things as far as they could b e taken, in two resp ects. The rst is that co de involving calls to library routines, or to functions de ned in separately compiled mo dules, cannot b e e ectively optimized; this is unfortunate, b ecause one exp ects programmers to rely more and more on co de reuse through libraries as the complexity of software systems grows, there has b een some work recently on cross-mo dule co de optimization [4,13]: this works for separately compiled user mo dules but not for libraries. The second problem is that a compiler can only analyze and optimize co de written in the language it is designed to compile. Consider an application that investigates the synthesis of chemical comp ounds using a top-level Scheme program to direct a heuristic search of a space of various reaction sequences, and Fortran routines 1 to compute reaction rates and yields for individual reactions. With the traditional compiler mo del, analyses and optimizations will not b e able to cross the barrier b etween program mo dules written in di erent languages. For example, it seems unlikely that current compiler technology would allowFortran routines in such an application to b e inlined into the Scheme co de, or allow context information to b e propagated across language b oundaries during interpro cedural analysis of a Scheme function calling a Fortran routine. A p ossible solution is to carry out program optimization when the entire program|library calls and all|is available for insp ection: that is, at link time. While this makes it p ossible to address the shortcomings of the traditional compilation mo del, it gives rise to its own problems, for example: { Machine co de usually has much less semantic information than source co de, which makes it much more dicult to discover control ow or data ow information as an example, even for simple rst-order programs, determining the extent of a jump table in an executable le, and hence the p ossible targets of the co de derived from a case or switch statement, can b e dicult when dealing with executables; at the source level, by contrast, the corresp onding problem is straightforward. { Compiler analyses are typically carried out on representations of source programs in terms of source language constructs, disregarding \nasty" features suchaspointer arithmetic and out-of- b ounds array accesses. At the level of executable co de, on the other hand, all wehave are the nasty features. Nontrivial p ointer arithmetic is ubiquitous, b oth for ordinary address computations and for manipulating tagged p ointers. If the numb er of arguments to a function is large enough, some of the arguments mayhave to b e passed on the stack. In such a case, the arguments passed on the stack will typically reside at the top of the caller's stack frame, and the callee will \reachinto" the caller's frame to access them: this is nothing but an out-of-b ounds array reference. { Executable programs tend to b e signi cantly larger than the source programs they were derived from e.g., see Figure 2. Coupled with the lack of semantic information present in these pro- grams, this means that sophisticated analyses that are practical at the source level maybeoverly exp ensive at the level of executable co de b ecause of exorbitant time or space requirements. This pap er describ es our exp eriences with such optimization on a numberofScheme b enchmark pro- grams, using a link-time optimizer, called alto \a link-time optimizer", that wehave built for the DEC Alpha architecture. Apart from a variety of more or less \conventional" optimizations, alto implements several optimizations, or variations on optimizations, that are geared sp eci cally towards 1 This is a slightly paraphrased account of a pro ject in the Chemistry department at the University of Arizona a few years ago: the pro ject in question used Prolog rather than Scheme, but that do es not change the essential p oint here. 1 programs that are rich in function calls|in particular, recursion|and indirect jumps resulting b oth from higher order constructs and tail call optimization. Exp eriments indicate that signi cant p erfor- mance improvements are p ossible using link-time optimization, even for co de generated using p owerful optimizing compilers. 2 Scheme vs. C: Low Level Execution Characteristics Programs in languages suchasScheme di er in many resp ects from those written in imp erative languages such as C, e.g., in their use of higher order functions and recursion, control ow optimizations such as tail call optimization, and in their relatively high frequency of function calls. However, it is not a priori clear that, at the level of executable co de, the dynamic characteristics of Scheme programs are still signi cantly di erent from C co de. To this end, we examined the runtime distributions of di erent classes of op erations for a number of Scheme programs those considered in Section 7 and compared the results with the corresp onding gures for the eight SPEC-95 integer b enchmark programs. The results, shown in Figure 1, show some interesting contrasts: { The prop ortion of memory op erations in Scheme co de is 2.5 times larger than that in C co de; we b elieve that this is due to a combination of two factors: the use of dynamic data structures such as lists that are harder to keep in registers, and the presence of garbage collection. { The prop ortion of conditional branches is signi cantly higher by a factor of almost 1.8 in Scheme co de than in C co de. This is due at least in part to runtime dispatch op erations on tag bits enco ding typ e information. { The prop ortion of indirect jumps in Scheme co de is close to three times as high as that in C co de. This is due, in great part, to the way tail call optimization is handled. { The prop ortion of direct and indirect function calls is somewhat smaller in the Scheme programs than in the C co de. To a great extent, this is b ecause the Scheme compilers try hard to eliminate function calls wherever p ossible. Co de generated for programs in dynamically typ ed languages usually also carries out p ointer arith- metic to manipulate tagged p ointers. We didn't measure the prop ortion of instructions devoted to tag manipulation there didn't seem to b e a simple and reliable way to do this in the context of our imple- mentation, but we note that Steenkiste's studies indicate that Lisp programs sp end b etween 11 and 24 of their time on tag checking [17]. We exp ect that the overall conclusion|that programs sp end a signi cant amount of time in tag manipulation|holds for Scheme programs as well. Most of the prior work on link-time optimization has fo cused on imp erative languages [7, 12, 15,16, 19]. The di erences in runtime characteristics b etween Scheme and C programs, as discussed ab ove, can have a signi cant e ect on the extent to which systems designed for executables resulting from human-written C programs will b e e ective on co de generated from Scheme programs. The reasons for this are the following: 1. The p ointer arithmetic resulting from tag manipulation tends to defeat most alias analysis algo- rithms develop ed for languages such as C see, for example, [20]. 2. The higher prop ortion of memory op erations in Scheme programs can inhibit optimizations b e- cause, in the absence of accurate alias information, they greatly limit the optimizer's abilityto move co de around. The problem is comp ounded by the fact that p ointer arithmetic resulting from tag manipulation adversely a ects the quality of alias information available. 2 Op eration Scheme C Scheme/C integer ops 44.81 35.70 1.25 memory ops 34.02 13.63 2.50 oating p oint ops 0.58 0.76 0.76 conditional branches 10.44 5.89 1.77 indirect jumps 2.89 0.99 2.92 direct calls 0.29 0.44 0.66 indirect calls 1.51 1.72 0.88 Figure 1: Dynamic distributions for classes of common op erations 3.