TIL: a Type-Directed Optimizing Compiler for ML

Total Page:16

File Type:pdf, Size:1020Kb

TIL: a Type-Directed Optimizing Compiler for ML TIL:ATyp e-Directed Optimizing Compiler for ML David Tarditi Greg Morrisett Perry Cheng Chris Stone Rob ert Harp er Peter Lee February 29, 1996 CMU-CS-96-108 Scho ol of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This pap er will app ear in the Proceedings of the ACM SIGPLAN '96 ConferenceonProgram- ming Language Design and Implementation, Philadelphia, Pennsylvania, May 21-24, 1996. It is also published as Fox Memorandum CMU-CS-FOX-96-01 Abstract We describ e a new compiler for Standard ML called TIL, that is based on four technologies: inten- sional p olymorphism, tag-free garbage collection, conventional functional language optimization, and lo op optimization. We use intensional p olymorphism and tag-free garbage collection to pro- vide sp ecialized representations, even though SML is a p olymorphic language. We use conventional functional language optimization to reduce the cost of intensional p olymorphism, and lo op opti- mization to generate go o d co de for recursive functions. We present an example of TIL compiling an SML function to machine co de, and compare the p erformance of TIL co de against that of a widely used compiler, Standard ML of New Jersey. This researchwas sp onsored in part by the Advanced Research Pro jects Agency CSTO under the title \The Fox Pro ject: Advanced Languages for Systems Software", ARPA Order No. C533, issued by ESC/ENS under Contract No. F19628-95-C-0050, and in part by the National Science Foundation under Grant No. CCR-9502674, and in part by the Isaac Newton Institute for Mathematical Sciences, Cambridge, England. David Tarditi was also partly supp orted byanAT&T Bell Labs PhD Scholarship. The views and conclusions contained in this do cument are those of the authors and should not b e interpreted as representing ocial p olicies, either expressed or implied, of the Advanced Research Pro jects Agency, the U.S. Government, the National Science Foundation or AT&T. Keywords: Standard ML, compilation, typ e theory, p olymorphism, tag-free garbage collection, optimization, applicative (functional) programming 1 Intro duction We are investigating a new approach to compiling Standard ML (SML) based on four key tech- nologies: intensional polymorphism [23], nearly tag-freegarbage col lection [12, 46,34], conventional functional language optimization,andloop optimization. To explore the practicality of our ap- proach, wehave constructed a compiler for SML called TIL, and are thus far encouraged bythe results: On DEC ALPHA workstations, programs compiled by TIL are roughly three times faster, do one- fth the total heap allo cation, and use one-half the physical memory of programs compiled by SML of New Jersey (SML/NJ). However, our results are still preliminary | wehavenotyet investigated how to improve compile time; TIL takes ab out eight times longer to compile programs than SML/NJ. Also, wehave not yet implemented the full mo dule system of SML, although we do provide supp ort for structures and separate compilation. Finally,we exp ect the p erformance of programs compiled by TIL to improve signi cantly as we tune the compiler and implement more optimizations. Twokey issues in the compilation of advanced languages such as SML are the presence of garbage col lection and type variables. Most compilers use a universal representation for values of unknown or variable typ e. In particular, values are forced to t into a tagged machine word; values larger than a machine word are represented as p ointers to tagged, heap-allo cated ob jects. This approach supp orts fast garbage collection and ecient p olymorphic functions, but can result in inecient co de when typ es are known at compile time. Even with recentadvances in SML compilation, such as Leroy's representation analysis [28], values must b e placed in a universal representation b efore b eing stored in up dateable data structures (e.g., arrays) or recursive data structures (e.g., lists). Intensional polymorphism and tag-freegarbage col lection eliminate the need to use a universal representation when compiling p olymorphic languages. TIL uses these technologies to represent many data values \naturally". For example, TIL provides tag-free, unallo cated, word-sized integers; aligned, unboxed oating-p ointarrays; and unallo cated multi-argument functions. These natural representations and calling conventions not only improve the p erformance of SML programs, but also allow them to interop erate with legacy co de written in languages suchasCandFortran. When typ es are unknown at compile time, TIL may pro duce machine co de which is slower and bigger than conventional approaches. This is b ecause typ es must b e constructed and passed to p olymorphic functions, and p olymorphic functions must examine the typ es at run-time to determine appropriate execution paths. However, when typ es are known at compile time, no overhead is incurred to supp ort p olymorphism or garbage collection. Because these technologies make p olymorphic functions slower, it b ecomes imp ortant to elim- inate as many p olymorphic functions at compile time as is p ossible. Inlining and uncurrying are well-known techniques for eliminating p olymorphic and higher-order functions. Wehave found that for the b enchmarks used here, these techniques eliminate all p olymorphic functions and all but a few higher-order functions when programs are compiled as a whole. Wehave also found that applying traditional lo op optimizations to recursive functions, suchas common sub-expression elimination and invariant removal, is imp ortant. In fact, these optimization reduce execution time by a median of 39%. An imp ortant prop erty of TIL is that all optimizations and the key transformations are p er- formed on typed intermediate languages (hence the name TIL). Maintaining correct typ e information throughout optimization is necessary to supp ort b oth intensional p olymorphism and garbage col- lection, b oth of which require typ e information at run time. By using strongly-typ ed intermediate languages, we ensure that typ e information is maintained in a principled fashion, instead of relying up on ad hoc invariants. In fact, using the intermediate forms of TIL, an \untrusted" compiler can 1 pro duce fully optimized intermediate co de, and a client can automatically verify the typ e integrity of the co de. Wehave found that this ability has a strong engineering b ene t: typ e-checking the output of each optimization or transformation helps us identify and eliminate bugs in the compiler. In the remainder of this pap er, we describ e the technologies used by TIL in detail, givean overview of the structure of TIL, present a detailed example showing how TIL compiles ML co de, and give p erformance results of co de pro duced by TIL. 2 Overview of the Technologies This section contains a high-level overview of the technologies we use in TIL. 2.1 Intensional Polymorphism Intensional p olymorphism [23] eliminates restrictions on data representations due to p olymorphism, separate compilation, abstract datatyp es, and garbage collection. It also supp orts ecient calling conventions (multiple arguments passed in registers) and tag-free p olymorphic, structural equality. With intensional p olymorphism, typ es are constructed and passed as values at run time to p olymorphic functions, and these functions can branch based on the typ es. For example, when extracting a value from an array, TIL uses a typecase expression to determine the typeofthe array and to select the appropriate sp ecialized subscript op eration: fun sub[ ](x: array,i:int)= typecase of int => intsub(x, i) | oat => floatsub(x, i) | ptr( ) => ptrsub(x, i) If the typ e of the array can b e determined at compile-time, then an optimizer can eliminate the typecase: sub[ oat](a, 5) ,! floatsub(a, 5) However, intensional p olymorphism comes with two costs. First, wemust construct and pass representations of typ es to p olymorphic functions at run time. Furthermore, wemust compile p olymorphic functions to supp ort any p ossible representation and insert typecase constructs to select the appropriate co de paths. Hence, the co de we generate for p olymorphic functions is b oth bigger and slower, and minimizing p olymorphism b ecomes quite imp ortant. Second, in order to use typ e information at run time, for b oth intensional p olymorphism and tag- free garbage collection, wemust propagate typ es through each stage of compilation. To address this second problem, almost all compilation stages, including optimization and closure conversion, are expressed as typ e-directed, typ e-preserving translations to strongly-typ ed intermediate languages. The key dicultywithusingtyp ed intermediate languages is formulating a typ e system that is expressive enough to statically typ e check terms that branchontyp es at run time, suchassub. The typ e system used in TIL is based on the approach suggested by Harp er and Morrisett [23, 33]. Typ es themselves are represented as expressions in a simply-typ ed -calculus extended with an inductively generated base kind (the monotyp es), and a corresp onding induction elimination form. The induction elimination form is essentially a \Typ ecase" at the typ e level; this allows us to write typ e expressions that track the run-time control ow of term-level typecase expressions. Nevertheless, the typ e system used by TIL remains b oth sound and decidable. This implies that at any stage during optimization, we can automatically verify the typ e integrity of the co de. 2 2.2 Conventional and Lo op-Oriented Optimizations Program optimization is crucial to reducing the cost of intensional p olymorphism,improving lo ops and recursive functions, and eliminating higher-order and p olymorphic functions.
Recommended publications
  • Program Analysis and Optimization for Embedded Systems
    Program Analysis and Optimization for Embedded Systems Mike Jochen and Amie Souter jochen, souter ¡ @cis.udel.edu Computer and Information Sciences University of Delaware Newark, DE 19716 (302) 831-6339 1 Introduction ¢ have low cost A proliferation in use of embedded systems is giving cause ¢ have low power consumption for a rethinking of traditional methods and effects of pro- ¢ require as little physical space as possible gram optimization and how these methods and optimizations can be adapted for these highly specialized systems. This pa- ¢ meet rigid time constraints for computation completion per attempts to raise some of the issues unique to program These requirements place constraints on the underlying analysis and optimization for embedded system design and architecture’s design, which together affect compiler design to address some of the solutions that have been proposed to and program optimization techniques. Each of these crite- address these specialized issues. ria must be balanced against the other, as each will have an This paper is organized as follows: section 2 provides effect on the other (e.g. making a system smaller and faster pertinent background information on embedded systems, de- will typically increase its cost and power consumption). A lineating the special needs and problems inherent with em- typical embedded system, as illustrated in figure 1, consists bedded system design; section 3 addresses one such inherent of a processor core, a program ROM (Read Only Memory), problem, namely limited space for program code; section 4 RAM (Random Access Memory), and an ASIC (Applica- discusses current approaches and open issues for code min- tion Specific Integrated Circuit).
    [Show full text]
  • Type-Safe Multi-Tier Programming with Standard ML Modules
    Experience Report: Type-Safe Multi-Tier Programming with Standard ML Modules Martin Elsman Philip Munksgaard Ken Friis Larsen Department of Computer Science, iAlpha AG, Switzerland Department of Computer Science, University of Copenhagen, Denmark [email protected] University of Copenhagen, Denmark [email protected] kfl[email protected] Abstract shared module, it is now possible, on the server, to implement a We describe our experience with using Standard ML modules and module that for each function responds to a request by first deseri- Standard ML’s core language typing features for ensuring type- alising the argument into a value, calls the particular function and safety across physically separated tiers in an extensive web applica- replies to the client with a serialised version of the result value. tion built around a database-backed server platform and advanced Dually, on the client side, for each function, the client code can client code that accesses the server through asynchronous server serialise the argument and send the request asynchronously to the requests. server. When the server responds, the client will deserialise the re- sult value and apply the handler function to the value. Both for the server part and for the client part, the code can be implemented in 1. Introduction a type-safe fashion. In particular, if an interface type changes, the The iAlpha Platform is an advanced asset management system programmer will be notified about all type-inconsistencies at com- featuring a number of analytics modules for combining trading pile
    [Show full text]
  • Four Lectures on Standard ML
    Mads Tofte March Lab oratory for Foundations of Computer Science Department of Computer Science Edinburgh University Four Lectures on Standard ML The following notes give an overview of Standard ML with emphasis placed on the Mo dules part of the language The notes are to the b est of my knowledge faithful to The Denition of Standard ML Version as regards syntax semantics and terminology They have b een written so as to b e indep endent of any particular implemen tation The exercises in the rst lectures can b e tackled without the use of a machine although having access to an implementation will no doubt b e b enecial The pro ject in Lecture presupp oses access to an implementation of the full language including mo dules At present the Edinburgh compiler do es not fall into this category the author used the New Jersey Standard ML compiler Lecture gives an introduction to ML aimed at the reader who is familiar with some programming language but do es not know ML Both the Core Language and the Mo dules are covered by way of example Lecture discusses the use of ML mo dules in the development of large programs A useful metho dology for programming with functors signatures and structures is presented Lecture gives a fairly detailed account of the static semantics of ML mo dules for those who really want to understand the crucial notions of sharing and signature matching Lecture presents a one day pro ject intended to give the student an opp ortunity of mo difying a nontrivial piece of software using functors signatures and structures
    [Show full text]
  • Latest Results from the Procedure Calling Test, Ackermann's Function
    Latest results from the procedure calling test, Ackermann’s function B A WICHMANN National Physical Laboratory, Teddington, Middlesex Division of Information Technology and Computing March 1982 Abstract Ackermann’s function has been used to measure the procedure calling over- head in languages which support recursion. Two papers have been written on this which are reproduced1 in this report. Results from further measurements are in- cluded in this report together with comments on the data obtained and codings of the test in Ada and Basic. 1 INTRODUCTION In spite of the two publications on the use of Ackermann’s Function [1, 2] as a mea- sure of the procedure-calling efficiency of programming languages, there is still some interest in the topic. It is an easy test to perform and the large number of results ob- tained means that an implementation can be compared with many other systems. The purpose of this report is to provide a listing of all the results obtained to date and to show their relationship. Few modern languages do not provide recursion and hence the test is appropriate for measuring the overheads of procedure calls in most cases. Ackermann’s function is a small recursive function listed on page 2 of [1] in Al- gol 60. Although of no particular interest in itself, the function does perform other operations common to much systems programming (testing for zero, incrementing and decrementing integers). The function has two parameters M and N, the test being for (3, N) with N in the range 1 to 6. Like all tests, the interpretation of the results is not without difficulty.
    [Show full text]
  • A Separate Compilation Extension to Standard ML
    A Separate Compilation Extension to Standard ML David Swasey Tom Murphy VII Karl Crary Robert Harper Carnegie Mellon {swasey,tom7,crary,rwh}@cs.cmu.edu Abstract The goal of this work is to consolidate and synthesize previ- We present an extension to Standard ML, called SMLSC, to support ous work on compilation management for ML into a formally de- separate compilation. The system gives meaning to individual pro- fined extension to the Standard ML language. The extension itself gram fragments, called units. Units may depend on one another in a is syntactically and conceptually very simple. A unit is a series of way specified by the programmer. A dependency may be mediated Standard ML top-level declarations, given a name. To the current by an interface (the type of a unit); if so, the units can be compiled top-level declarations such as structure and functor we add an separately. Otherwise, they must be compiled in sequence. We also import declaration that is to units what the open declaration is to propose a methodology for programming in SMLSC that reflects structures. An import declaration may optionally specify an inter- code development practice and avoids syntactic repetition of inter- face for the unit, in which case we are able to compile separately faces. The language is given a formal semantics, and we argue that against that dependency; with no interface we must compile incre- this semantics is implementable in a variety of compilers. mentally. Compatibility with existing compilers, including whole- program compilers, is assured by making no commitment to the Categories and Subject Descriptors D.3.3 [Programming Lan- precise meaning of “compile” and “link”—a compiler is free to guages]: Language Constructs and Features—Modules, pack- limit compilation to elaboration and type checking, and to perform ages; D.3.1 [Programming Languages]: Formal Definitions and code generation as part of linking.
    [Show full text]
  • Functional Programming (ML)
    Functional Programming CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.edu/~cse215 Functional Programming Function evaluation is the basic concept for a programming paradigm that has been implemented in functional programming languages. The language ML (“Meta Language”) was originally introduced in the 1970’s as part of a theorem proving system, and was intended for describing and implementing proof strategies. Standard ML of New Jersey (SML) is an implementation of ML. The basic mode of computation in SML is the use of the definition and application of functions. 2 (c) Paul Fodor (CS Stony Brook) Install Standard ML Download from: http://www.smlnj.org Start Standard ML: Type ''sml'' from the shell (run command line in Windows) Exit Standard ML: Ctrl-Z under Windows Ctrl-D under Unix/Mac 3 (c) Paul Fodor (CS Stony Brook) Standard ML The basic cycle of SML activity has three parts: read input from the user, evaluate it, print the computed value (or an error message). 4 (c) Paul Fodor (CS Stony Brook) First SML example SML prompt: - Simple example: - 3; val it = 3 : int The first line contains the SML prompt, followed by an expression typed in by the user and ended by a semicolon. The second line is SML’s response, indicating the value of the input expression and its type. 5 (c) Paul Fodor (CS Stony Brook) Interacting with SML SML has a number of built-in operators and data types. it provides the standard arithmetic operators - 3+2; val it = 5 : int The Boolean values true and false are available, as are logical operators such as not (negation), andalso (conjunction), and orelse (disjunction).
    [Show full text]
  • Stochastic Program Optimization by Eric Schkufza, Rahul Sharma, and Alex Aiken
    research highlights DOI:10.1145/2863701 Stochastic Program Optimization By Eric Schkufza, Rahul Sharma, and Alex Aiken Abstract Although the technique sacrifices completeness it pro- The optimization of short sequences of loop-free, fixed-point duces dramatic increases in the quality of the resulting code. assembly code sequences is an important problem in high- Figure 1 shows two versions of the Montgomery multiplica- performance computing. However, the competing constraints tion kernel used by the OpenSSL RSA encryption library. of transformation correctness and performance improve- Beginning from code compiled by llvm −O0 (116 lines, not ment often force even special purpose compilers to pro- shown), our prototype stochastic optimizer STOKE produces duce sub-optimal code. We show that by encoding these code (right) that is 16 lines shorter and 1.6 times faster than constraints as terms in a cost function, and using a Markov the code produced by gcc −O3 (left), and even slightly faster Chain Monte Carlo sampler to rapidly explore the space of than the expert handwritten assembly included in the all possible code sequences, we are able to generate aggres- OpenSSL repository. sively optimized versions of a given target code sequence. Beginning from binaries compiled by llvm −O0, we are able 2. RELATED WORK to produce provably correct code sequences that either Although techniques that preserve completeness are effec- match or outperform the code produced by gcc −O3, icc tive within certain domains, their general applicability −O3, and in some cases expert handwritten assembly. remains limited. The shortcomings are best highlighted in the context of the code sequence shown in Figure 1.
    [Show full text]
  • Robert Harper Carnegie Mellon University
    The Future Of Standard ML Robert Harper Carnegie Mellon University Sunday, September 22, 13 Whither SML? SML has been hugely influential in both theory and practice. The world is slowly converging on ML as the language of choice. There remain big opportunities to be exploited in research and education. Sunday, September 22, 13 Convergence The world moves inexorably toward ML. Eager, not lazy evaluation. Static, not dynamic, typing. Value-, not object-, oriented. Modules, not classes. Every new language is more “ML-like”. Sunday, September 22, 13 Convergence Lots of ML’s and ML-like languages being developed. O’Caml, F#, Scala, Rust SML#, Manticore O’Caml is hugely successful in both research and industry. Sunday, September 22, 13 Convergence Rich typing supports verification. Polytyping >> Unityping Not all types are pointed. Useful cost model, especially for parallelism and space usage. Modules are far better than objects. Sunday, September 22, 13 Standard ML Standard ML remains important as a vehicle for teaching and research. Intro CS @ CMU is in SML. Lots of extensions proposed. We should consolidate advances and move forward. Sunday, September 22, 13 Standard ML SML is a language, not a compiler! It “exists” as a language. Stable, definitive criterion for compatibility. Having a semantics is a huge asset, provided that it can evolve. Sunday, September 22, 13 Standard ML At least five compatible compilers: SML/NJ, PolyML, MLKit, MosML, MLton, MLWorks (?). Several important extensions: CML, SML#, Manticore, SMLtoJS, ParallelSML (and probably more). Solid foundation on which to build and develop. Sunday, September 22, 13 The Way Forward Correct obvious shortcomings.
    [Show full text]
  • The Effect of Code Expanding Optimizations on Instruction Cache Design
    May 1991 UILU-EN G-91-2227 CRHC-91-17 Center for Reliable and High-Performance Computing THE EFFECT OF CODE EXPANDING OPTIMIZATIONS ON INSTRUCTION CACHE DESIGN William Y. Chen Pohua P. Chang Thomas M. Conte Wen-mei W. Hwu Coordinated Science Laboratory College of Engineering UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Approved for Public Release. Distribution Unlimited. UNCLASSIFIED____________ SECURITY ¿LASSIPlOvriON OF t h is PAGE REPORT DOCUMENTATION PAGE 1a. REPORT SECURITY CLASSIFICATION 1b. RESTRICTIVE MARKINGS Unclassified None 2a. SECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION/AVAILABILITY OF REPORT 2b. DECLASSIFICATION/DOWNGRADING SCHEDULE Approved for public release; distribution unlimited 4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S) UILU-ENG-91-2227 CRHC-91-17 6a. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION Coordinated Science Lab (If applicable) NCR, NSF, AMD, NASA University of Illinois N/A 6c ADDRESS (G'ty, Staff, and ZIP Code) 7b. ADDRESS (Oty, Staff, and ZIP Code) 1101 W. Springfield Avenue Dayton, OH 45420 Urbana, IL 61801 Washington DC 20550 Langley VA 20200 8a. NAME OF FUNDING/SPONSORING 8b. OFFICE SYMBOL ORGANIZATION 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 7a (If applicable) N00014-91-J-1283 NASA NAG 1-613 8c. ADDRESS (City, State, and ZIP Cod*) 10. SOURCE OF FUNDING NUMBERS PROGRAM PROJECT t a sk WORK UNIT 7b ELEMENT NO. NO. NO. ACCESSION NO. The Effect of Code Expanding Optimizations on Instruction Cache Design 12. PERSONAL AUTHOR(S) Chen, William, Pohua Chang, Thomas Conte and Wen-Mei Hwu 13a. TYPE OF REPORT 13b. TIME COVERED 14. OATE OF REPORT (Year, Month, Day) Jl5.
    [Show full text]
  • Standard ML Mini-Tutorial [1Mm] (In Particular SML/NJ)
    Standard ML Mini-tutorial (in particular SML/NJ) Programming Languages CS442 David Toman School of Computer Science University of Waterloo David Toman (University of Waterloo) Standard ML 1 / 21 Introduction • SML (Standard Meta Language) ⇒ originally part of the LCF project (Gordon et al.) • Industrial strength PL (SML’90, SML’97) ⇒ based formal semantics (Milner et al.) • SML “Basis Library” (all you ever wanted) ⇒ based on advanced module system • Quality compilers: ⇒ SML/NJ (Bell Labs) ⇒ Moscow ML David Toman (University of Waterloo) Standard ML 2 / 21 Features • Everything is built from expressions ⇒ functions are first class citizens ⇒ pretty much extension of our simple functional PL • Support for structured values: lists, trees, . • Strong type system ⇒ let-polymorphic functions ⇒ type inference • Powerful module system ⇒ signatures, implementations, ADTs,. • Imperative features (e.g., I/O) David Toman (University of Waterloo) Standard ML 3 / 21 Tutorial Goals 1 Make link from our functional language to SML 2 Provide enough SML syntax and examples for A2 • How to use SML/NJ interactive environment • How to write simple functional programs • How to define new data types • How to understand compiler errors • Where to find more information 3 Show type inference in action (so we understand what’s coming) David Toman (University of Waterloo) Standard ML 4 / 21 Getting started • Starting it up: sml in UNIX (click somewhere in W/XP) Example Standard ML of New Jersey, Version 110.0.7 [CM&CMB] - ⇒ great support in Emacs • Notation and simple
    [Show full text]
  • ARM Optimizing C/C++ Compiler V20.2.0.LTS User's Guide (Rev. V)
    ARM Optimizing C/C++ Compiler v20.2.0.LTS User's Guide Literature Number: SPNU151V January 1998–Revised February 2020 Contents Preface ........................................................................................................................................ 9 1 Introduction to the Software Development Tools.................................................................... 12 1.1 Software Development Tools Overview ................................................................................. 13 1.2 Compiler Interface.......................................................................................................... 15 1.3 ANSI/ISO Standard ........................................................................................................ 15 1.4 Output Files ................................................................................................................. 15 1.5 Utilities ....................................................................................................................... 16 2 Using the C/C++ Compiler ................................................................................................... 17 2.1 About the Compiler......................................................................................................... 18 2.2 Invoking the C/C++ Compiler ............................................................................................. 18 2.3 Changing the Compiler's Behavior with Options ......................................................................
    [Show full text]
  • Compiler Optimization-Space Exploration
    Journal of Instruction-Level Parallelism 7 (2005) 1-25 Submitted 10/04; published 2/05 Compiler Optimization-Space Exploration Spyridon Triantafyllis [email protected] Department of Computer Science Princeton University Princeton, NJ 08540 USA Manish Vachharajani [email protected] Department of Electrical and Computer Engineering University of Colorado at Boulder Boulder, CO 80309 USA David I. August [email protected] Department of Computer Science Princeton University Princeton, NJ 08540 USA Abstract To meet the performance demands of modern architectures, compilers incorporate an ever- increasing number of aggressive code transformations. Since most of these transformations are not universally beneficial, compilers traditionally control their application through predictive heuris- tics, which attempt to judge an optimization’s effect on final code quality a priori. However, com- plex target architectures and unpredictable optimization interactions severely limit the accuracy of these judgments, leading to performance degradation because of poor optimization decisions. This performance loss can be avoided through the iterative compilation approach, which ad- vocates exploring many optimization options and selecting the best one a posteriori. However, existing iterative compilation systems suffer from excessive compile times and narrow application domains. By overcoming these limitations, Optimization-Space Exploration (OSE) becomes the first iterative compilation technique suitable for general-purpose production compilers. OSE nar- rows down the space of optimization options explored through limited use of heuristics. A compiler tuning phase further limits the exploration space. At compile time, OSE prunes the remaining opti- mization configurations in the search space by exploiting feedback from earlier configurations tried. Finally, rather than measuring actual runtimes, OSE compares optimization outcomes through static performance estimation, further enhancing compilation speed.
    [Show full text]