Compiler Optimization Effects on Register Collisions

Total Page:16

File Type:pdf, Size:1020Kb

Compiler Optimization Effects on Register Collisions COMPILER OPTIMIZATION EFFECTS ON REGISTER COLLISIONS A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Jonathan Tan June 2018 c 2018 Jonathan Tan ALL RIGHTS RESERVED ii COMMITTEE MEMBERSHIP TITLE: Compiler Optimization Effects on Register Collisions AUTHOR: Jonathan Tan DATE SUBMITTED: June 2018 COMMITTEE CHAIR: Aaron Keen, Ph.D. Professor of Computer Science COMMITTEE MEMBER: Theresa Migler, Ph.D. Professor of Computer Science COMMITTEE MEMBER: John Seng, Ph.D. Professor of Computer Science iii ABSTRACT Compiler Optimization Effects on Register Collisions Jonathan Tan We often want a compiler to generate executable code that runs as fast as possible. One consideration toward this goal is to keep values in fast registers to limit the number of slower memory accesses that occur. When there are not enough physical registers available for use, values are \spilled" to the runtime stack. The need for spills is discovered during register allocation wherein values in use are mapped to physical registers. One factor in the efficacy of register allocation is the number of values in use at one time (register collisions). Register collision is affected by compiler optimizations that take place before register allocation. Though the main purpose of compiler optimizations is to make the overall code better and faster, some optimiza- tions can actually increase register collisions. This may force the register allocation process to spill. This thesis studies the effects of different compiler optimizations on register collisions. iv ACKNOWLEDGMENTS Thanks to: • My advisor, Aaron Keen, for guiding, encouraging, and mentoring me on the thesis and classes. Words cannot describe the impact you had on me. • My committee members (John Seng and Theresa Migler) for taking time out of their busy schedules to help better my thesis. • My many great professors that taught me so many invaluable lessons. • My Dad, Mom, and Brother (Jeffrey Tan, Shirley Her, and Benjamin Tan) for taking care of me and calling me to check in and see how I am doing. • My roomates (Andrew Kim, Michael Djaja, Dylan Sun, Pierson Yieh) for en- couragement to push through the hard times and the great times. • My entire EPIC family for the continued support and many great laughs and times throughout college. v TABLE OF CONTENTS Page LIST OF FIGURES . viii CHAPTER 1 Introduction . .1 1.1 Register Collisions . .1 1.2 Optimizations . .3 1.3 Motivation . .4 1.4 Contributions . .4 2 Background & Related Works . .5 2.1 Control-Flow Graph . .5 2.2 Live Ranges . .6 2.3 Register Allocation . .8 2.3.1 Interference Graph . .9 2.3.2 Graph Coloring . 10 2.4 Related Works . 11 2.4.1 Heuristics . 11 2.4.2 Graph Coloring Algorithm . 11 3 Setup & Experimental Design . 13 3.1 Clang and LLVM . 13 3.2 Aggregation . 17 3.3 Benchmarks . 18 4 Results & Analysis . 20 4.1 Baseline Graph Anaylsis . 21 4.1.1 Average Register Collisions Across Functions . 21 4.1.2 Maximum Register Collisions Across Functions . 22 4.1.3 Register Collisions Across Registers . 24 4.1.4 Stack Space . 28 4.2 All-Loops Configuration . 28 4.2.1 All-Loops Optimization On . 28 vi 4.2.2 All-Loops Optimization Off . 32 4.3 General Observations . 35 4.3.1 Stack Space On Optimization . 36 4.3.2 Stack Space Off Optimization . 40 4.4 Double Optimizations . 41 4.4.1 All-Loops and Argpromotion Optimization . 42 4.4.2 Licm and All-loops Optimization . 45 4.4.3 Licm and Argpromotion Optimization . 50 4.5 Thumb Architecture . 52 4.5.1 All-Loops Optimization . 52 4.5.2 Jump-threading and Early-cse-memssa Optimization . 56 4.5.3 Spills Between Architectures . 57 5 Future Works & Conclusion . 61 5.1 Future Works . 61 5.1.1 Subject Optimizations . 61 5.1.2 Optimization Combinations . 61 5.1.3 Register Allocator . 61 5.1.4 Timings . 62 5.1.5 Architectures . 62 5.2 Conclusion . 62 BIBLIOGRAPHY . 64 APPENDICES A -O3 Optimization List . 67 B ARM Double Optimization Numbers . 69 C Register Collision Graphs . 70 D Double Optimizations Spill Count . 94 vii LIST OF FIGURES Figure Page 1.1 Example of the loop-unrolling compiler optimization. Left side is original loop. Right side is with loop-unrolling applied. .2 2.1 Example of creating a CFG given a program . .6 2.2 Example of a live range with virtual registers . .7 2.3 Example of Non-SSA form when compared to SSA form . .8 2.4 Example of why SSA may be beneficial . .9 4.1 Average Reg Collisions by Function, Baseline All Subject Optimizations Off, No Minimum Register Collisions Threshold . 21 4.2 Average Reg Collisions by Function, Baseline All Subject Optimizations On, No Minimum Register Collisions Threshold . 22 4.3 Baseline Average Register Collisions by Function . 23 4.4 Baseline Maximum Register Collisions by Function . 25 4.5 Baseline Register Collisions by Register . 26 4.6 Baseline Stack Space . 27 4.7 Average Register Collisions by Function, All-Loops On . 29 4.8 Maximum Register Collisions by Function, All-Loops On . 30 4.9 Register Collisions by Register, All-Loops On . 30 4.10 Stack Space All-Loops On . 31 4.11 Average Register Collisions by Function, All-Loops Off . 33 4.12 Maximum Register Collisions by Function, All-Loops Off . 33 4.13 Register Collisions by Register, All-Loops Off . 34 4.14 Stack Space All-Loops Off . 35 4.15 x86 Average Register Collisions by Function, Collision Statistics (Percentage at or Below Threshold) . 36 4.16 x86 Maximum Register Collisions by Function, Collision Statistics (Percentage at or Below Threshold) . 37 4.17 x86 Register Collisions by Register Statistics (Percentage at or Below Threshold) . 37 4.18 Stack On LICM . 38 viii 4.19 Stack On Tail Call Elimination . 39 4.20 Stack Off LICM . 40 4.21 x86 Double Optimization Average Collisions by Function, Collision Statistics (Percentage at or Below Threshold) . 42 4.22 x86 Double Optimization Maximum Collisions by Function, Collision Statistics (Percentage at or Below Threshold) . 42 4.23 x86 Double Optimization Register Collisions by Registers, Collision Statistics (Percentage at or Below Threshold) . 43 4.24 Register Level On All-Loops and Argpromotion . 44 4.25 Register Collisions by Register Level Off All-Loops and Argpromotion . 45 4.26 Stack On All-Loops and Argpromotion . 46 4.27 Stack Off All-Loops and Argpromotion . 46 4.28 Stack On Licm and All-Loops . 48 4.29 Stack Off, Licm and All-Loops . 49 4.30 Stack On, Licm and Argpromotion . 51 4.31 Stack Off Licm and Argpromotion . 51 4.32 Thumb Average Register Collisions by Function Statistics . 53 4.33 Thumb Maximum Register Collisions by Function Statistics . 53 4.34 Thumb Register Collisions by Register Statistics . 54 4.35 ARM Average Register Collisions by Function, All-loops Off . 55 4.36 ARM Stack Space All-loops Off . 55 4.37 ARM Average Register Collisions by Function, Off Early-CSE-Memssa . 57 4.38 ARM Average Register Collisions by Function, Off Jump-Threading . 58 4.39 ARM Stack Space Off Early-CSE-Memssa . 59 4.40 ARM Stack Space Off Jump-Threading . 59 4.41 Number of Spills in the x86 Architecture . 60 4.42 Number of Spills in the Thumb Architecture . 60 B.1 THUMB Average Register Collisions by Function Statistics . 69 B.2 THUMB Maximum Register Collisions by Function Statistics . 69 B.3 THUMB Register Collisions by Register Statistics . 69 ix C.1 x86 Average Register Collisions by Function, Off Licm . 70 C.2 x86 Average Register Collisions by Function, On Licm . 71 C.3 x86 Maximum Register Collisions by Function, On Licm . 71 C.4 x86 Maximum Register Collisions by Function, Off Licm . 72 C.5 x86 Register Collisions by Register, On Licm . 72 C.6 x86 Register Collisions by Register, Off Licm . 73 C.7 x86 Average Register Collisions by Function, Off Tail Call Elimination . 73 C.8 x86 Average Register Collisions by Function, On Tail Call Elimination . 74 C.9 x86 Maximum Register Collisions by Function, Off Tail Call Elimination . 74 C.10 x86 Maximum Register Collisions by Function, On Tail Call Elimination . 75 C.11 x86 Register Collisions by Register, Off Tail Call Elimination . 75 C.12 x86 Register Collisions by Register, On Tail Call Elimination . 76 C.13 x86 Stack Space, Off Tail Call Elimination . 76 C.14 x86 Average Register Collisions by Function, Off All-Loops and Argpromotion . 77 C.15 x86 Average Register Collisions by Function, On All-Loops and Argpromotion . 77 C.16 x86 Maximum Register Collisions by Function, Off All-Loops and Argpromotion . 78 C.17 x86 Maximum Register Collisions by Function, On All-Loops and Argpromotion . 78 C.18 x86 Average Register Collisions by Function, On Licm and All-Loops . 79 C.19 x86 Average Register Collisions by Function, Off Licm and All-Loops . 79 C.20 x86 Maximum Register Collisions by Function, On Licm and All-Loops . 80 C.21 x86 Maximum Register Collisions by Function, Off Licm and All-Loops . ..
Recommended publications
  • Generalized Jump Threading in Libfirm
    Institut für Programmstrukturen und Datenorganisation (IPD) Lehrstuhl Prof. Dr.-Ing. Snelting Generalized Jump Threading in libFIRM Masterarbeit von Joachim Priesner an der Fakultät für Informatik Erstgutachter: Prof. Dr.-Ing. Gregor Snelting Zweitgutachter: Prof. Dr.-Ing. Jörg Henkel Betreuender Mitarbeiter: Dipl.-Inform. Andreas Zwinkau Bearbeitungszeit: 5. Oktober 2016 – 23. Januar 2017 KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu Zusammenfassung/Abstract Jump Threading (dt. „Sprünge fädeln“) ist eine Compileroptimierung, die statisch vorhersagbare bedingte Sprünge in unbedingte Sprünge umwandelt. Bei der Ausfüh- rung kann ein Prozessor bedingte Sprünge zunächst nur heuristisch mit Hilfe der Sprungvorhersage auswerten. Sie stellen daher generell ein Performancehindernis dar. Die Umwandlung ist insbesondere auch dann möglich, wenn das Sprungziel nur auf einer Teilmenge der zu dem Sprung führenden Ausführungspfade statisch be- stimmbar ist. In diesem Fall, der den überwiegenden Teil der durch Jump Threading optimierten Sprünge betrifft, muss die Optimierung Grundblöcke duplizieren, um jene Ausführungspfade zu isolieren. Verschiedene aktuelle Compiler enthalten sehr unterschiedliche Implementierungen von Jump Threading. In dieser Masterarbeit wird zunächst ein theoretischer Rahmen für Jump Threading vorgestellt. Sodann wird eine allgemeine Fassung eines Jump- Threading-Algorithmus entwickelt, implementiert und in diverser Hinsicht untersucht, insbesondere auf Wechselwirkungen mit anderen Optimierungen wie If
    [Show full text]
  • Improving the Compilation Process Using Program Annotations
    POLITECNICO DI MILANO Corso di Laurea Magistrale in Ingegneria Informatica Dipartimento di Elettronica, Informazione e Bioingegneria Improving the Compilation process using Program Annotations Relatore: Prof. Giovanni Agosta Correlatore: Prof. Lenore Zuck Tesi di Laurea di: Niko Zarzani, matricola 783452 Anno Accademico 2012-2013 Alla mia famiglia Alla mia ragazza Ai miei amici Ringraziamenti Ringrazio in primis i miei relatori, Prof. Giovanni Agosta e Prof. Lenore Zuck, per la loro disponibilità, i loro preziosi consigli e il loro sostegno. Grazie per avermi seguito sia nel corso della tesi che della mia carriera universitaria. Ringrazio poi tutti coloro con cui ho avuto modo di confrontarmi durante la mia ricerca, Dr. Venkat N. Venkatakrishnan, Dr. Rigel Gjomemo, Dr. Phu H. H. Phung e Giacomo Tagliabure, che mi sono stati accanto sin dall’inizio del mio percorso di tesi. Voglio ringraziare con tutto il cuore la mia famiglia per il prezioso sup- porto in questi anni di studi e Camilla per tutto l’amore che mi ha dato anche nei momenti più critici di questo percorso. Non avrei potuto su- perare questa avventura senza voi al mio fianco. Ringrazio le mie amiche e i miei amici più cari Ilaria, Carolina, Elisa, Riccardo e Marco per la nostra speciale amicizia a distanza e tutte le risate fatte assieme. Infine tutti i miei conquilini, dai più ai meno nerd, per i bei momenti pas- sati assieme. Ricorderò per sempre questi ultimi anni come un’esperienza stupenda che avete reso memorabile. Mi mancherete tutti. Contents 1 Introduction 1 2 Background 3 2.1 Annotated Code . .3 2.2 Sources of annotated code .
    [Show full text]
  • Control Flow Graphs
    CONTROL FLOW GRAPHS PROGRAM ANALYSIS AND OPTIMIZATION – DCC888 Fernando Magno Quintão Pereira [email protected] The material in these slides have been taken from the "Dragon Book", Secs 8.5 – 8.7, by Aho et al. Intermediate Program Representations • Optimizing compilers and human beings do not see the program in the same way. – We are more interested in source code. – But, source code is too different from machine code. – Besides, from an engineering point of view, it is better to have a common way to represent programs in different languages, and target different architectures. Fortran PowerPC COBOL x86 Front Back Optimizer Lisp End End ARM … … Basic Blocks and Flow Graphs • Usually compilers represent programs as control flow graphs (CFG). • A control flow graph is a directed graph. – Nodes are basic blocks. – There is an edge from basic block B1 to basic block B2 if program execution can flow from B1 to B2. • Before defining basic void identity(int** a, int N) { int i, j; block, we will illustrate for (i = 0; i < N; i++) { this notion by showing the for (j = 0; j < N; j++) { a[i][j] = 0; CFG of the function on the } right. } for (i = 0; i < N; i++) { What does a[i][i] = 1; this program } do? } The Low Level Virtual Machine • We will be working with a compilation framework called The Low Level Virtual Machine, or LLVM, for short. • LLVM is today the most used compiler in research. • Additionally, this compiler is used in many important companies: Apple, Cray, Google, etc. The front-end Machine independent Machine dependent that parses C optimizations, such as optimizations, such into bytecodes constant propagation as register allocation ../0 %"&'( *+, ""% !"#$% !"#$)% !"#$)% !"#$- Using LLVM to visualize a CFG • We can use the opt tool, the LLVM machine independent optimizer, to visualize the control flow graph of a given function $> clang -c -emit-llvm identity.c -o identity.bc $> opt –view-cfg identity.bc • We will be able to see the CFG of our target program, as long as we have the tool DOT installed in our system.
    [Show full text]
  • GCC Internals
    GCC Internals Diego Novillo [email protected] Red Hat Canada CGO 2007 San Jose, California March 2007 Outline 1. Overview 2. Source code organization 3. Internal architecture 4. Passes NOTE: Internal information valid for GCC mainline as of 2007-03-02 11 March 2007 GCC Internals - 2 1. Overview ➢ Major features ➢ Brief history ➢ Development model 11 March 2007 GCC Internals - 3 Major Features Availability – Free software (GPL) – Open and distributed development process – System compiler for popular UNIX variants – Large number of platforms (deeply embedded to big iron) – Supports all major languages: C, C++, Java, Fortran 95, Ada, Objective-C, Objective-C++, etc 11 March 2007 GCC Internals - 4 Major Features Code quality – Bootstraps on native platforms – Warning-free – Extensive regression testsuite – Widely deployed in industrial and research projects – Merit-based maintainership appointed by steering committee – Peer review by maintainers – Strict coding standards and patch reversion policy 11 March 2007 GCC Internals - 5 Major Features Analysis/Optimization – SSA-based high-level global optimizer – Constraint-based points-to alias analysis – Data dependency analysis based on chains of recurrences – Feedback directed optimization – Interprocedural optimization – Automatic pointer checking instrumentation – Automatic loop vectorization – OpenMP support 11 March 2007 GCC Internals - 6 1. Overview ➢ Major features ➢ Brief history ➢ Development model 11 March 2007 GCC Internals - 7 Brief History GCC 1 (1987) – Inspired on Pastel compiler
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • GCC Internals
    GCC Internals Diego Novillo [email protected] November 2007 Outline 1. Overview Features, history, development model 2. Source code organization Files, building, patch submission 3. Internal architecture Pipeline, representations, data structures, alias analysis, data flow, code generation 4. Passes Adding, debugging Internal information valid for GCC mainline as of 2007-11-20 November 27, 2007 GCC Internals - 2 1. Overview ➢ Major features ➢ Brief history ➢ Development model November 27, 2007 GCC Internals - 3 Major Features Availability – Free software (GPL) – Open and distributed development process – System compiler for popular UNIX variants – Large number of platforms (deeply embedded to big iron) – Supports all major languages: C, C++, Java, Fortran 95, Ada, Objective-C, Objective-C++, etc November 27, 2007 GCC Internals - 4 Major Features Code quality – Bootstraps on native platforms – Warning-free – Extensive regression testsuite – Widely deployed in industrial and research projects – Merit-based maintainership appointed by steering committee – Peer review by maintainers – Strict coding standards and patch reversion policy November 27, 2007 GCC Internals - 5 Major Features Analysis/Optimization – SSA-based high-level global optimizer – Constraint-based points-to alias analysis – Data dependency analysis based on chains of recurrences – Feedback directed optimization – Inter-procedural optimization – Automatic pointer checking instrumentation – Automatic loop vectorization – OpenMP support November 27, 2007 GCC Internals - 6 Overview
    [Show full text]
  • Simple and Efficient SSA Construction
    Simple and Efficient Construction of Static Single Assignment Form Matthias Braun1, Sebastian Buchwald1, Sebastian Hack2, Roland Leißa2, Christoph Mallon2, and Andreas Zwinkau1 1Karlsruhe Institute of Technology, {matthias.braun,buchwald,zwinkau}@kit.edu 2Saarland University {hack,leissa,mallon}@cs.uni-saarland.de Abstract. We present a simple SSA construction algorithm, which al- lows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate rep- resentation is in SSA form. This allows the application of SSA-based op- timizations during construction. After completion, the intermediate rep- resentation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm. 1 Introduction Many modern compilers feature intermediate representations (IR) based on the static single assignment form (SSA form). SSA was conceived to make program analyses more efficient by compactly representing use-def chains. Over the last years, it turned out that the SSA form not only helps to make analyses more efficient but also easier to implement, test, and debug. Thus, modern compilers such as the Java HotSpot VM [14], LLVM [2], and libFirm [1] exclusively base their intermediate representation on the SSA form. The first algorithm to efficiently construct the SSA form was introduced by Cytron et al. [10]. One reason, why this algorithm still is very popular, is that it guarantees a form of minimality on the number of placed φ functions. However, for compilers that are entirely SSA-based, this algorithm has a significant draw- back: Its input program has to be represented as a control flow graph (CFG) in non-SSA form.
    [Show full text]
  • A Case for an SC-Preserving Compiler
    A Case for an SC-Preserving Compiler Daniel Marino† Abhayendra Singh∗ Todd Millstein† Madanlal Musuvathi‡ Satish Narayanasamy∗ †University of California, Los Angeles ∗University of Michigan, Ann Arbor ‡Microsoft Research, Redmond Abstract 1. Introduction The most intuitive memory consistency model for shared-memory A memory consistency model (or simply memory model) defines the multi-threaded programming is sequential consistency (SC). How- semantics of a concurrent programming language by specifying the ever, current concurrent programming languages support a re- order in which memory operations performed by one thread become laxed model, as such relaxations are deemed necessary for en- visible to other threads in the program. The most natural memory abling important optimizations. This paper demonstrates that an model is sequential consistency (SC) [31]. Under SC, the individual SC-preserving compiler, one that ensures that every SC behavior of operations of the program appear to have executed in a global a compiler-generated binary is an SC behavior of the source pro- sequential order consistent with the per-thread program order. This gram, retains most of the performance benefits of an optimizing semantics matches the intuition of a concurrent program’s behavior compiler. The key observation is that a large class of optimizations as a set of possible thread interleavings. crucial for performance are either already SC-preserving or can be It is commonly accepted that programming languages must relax modified to preserve SC while retaining much of their effectiveness. the SC semantics of programs in order to allow effective compiler An SC-preserving compiler, obtained by restricting the optimiza- optimizations. This paper challenges that assumption by demon- tion phases in LLVM, a state-of-the-art C/C++ compiler, incurs an strating an optimizing compiler that retains most of the performance average slowdown of 3.8% and a maximum slowdown of 34% on of the generated code while preserving the SC semantics.
    [Show full text]
  • The University of Chicago a Compiler-Based Online
    THE UNIVERSITY OF CHICAGO A COMPILER-BASED ONLINE ADAPTIVE OPTIMIZER A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE BY KAVON FAR FARVARDIN CHICAGO, ILLINOIS DECEMBER 2020 Copyright © 2020 by Kavon Far Farvardin Permission is hereby granted to copy, distribute and/or modify this document under the terms of the Creative Commons Attribution 4.0 International license. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. To my cats Jiji and Pumpkin. Table of Contents List of Figures vii List of Tables ix Acknowledgments x Abstract xi I Introduction 1 1 Motivation 2 1.1 Adaptation . .4 1.2 Trial and Error . .6 1.3 Goals . .8 2 Background 11 2.1 Terminology . 12 2.2 Profile-guided Compilation . 14 2.3 Dynamic Compilation . 15 2.4 Autotuning . 17 2.5 Finding Balance . 19 3 Related Work 21 3.1 By Similarity . 21 3.1.1 Active Harmony . 22 3.1.2 Kistler’s Optimizer . 25 3.1.3 Jikes RVM . 27 3.1.4 ADAPT . 28 3.1.5 ADORE . 30 3.1.6 Suda’s Bayesian Online Autotuner . 31 3.1.7 PEAK . 32 3.1.8 PetaBricks . 33 3.2 By Philosophy . 34 iv 3.2.1 Adaptive Fortran . 34 3.2.2 Dynamic Feedback . 34 3.2.3 Dynamo . 35 3.2.4 CoCo . 36 3.2.5 MATE . 36 3.2.6 AOS . 37 3.2.7 Testarossa . 38 4 Thesis 40 II Halo: Wholly Adaptive LLVM Optimizer 43 5 System Overview 44 5.1 Clang .
    [Show full text]
  • Simple and Efficient Construction of Static Single Assignment Form
    Simple and Efficient Construction of Static Single Assignment Form Matthias Braun1, Sebastian Buchwald1, Sebastian Hack2, Roland Leißa2, Christoph Mallon2, and Andreas Zwinkau1 1 Karlsruhe Institute of Technology {matthias.braun,buchwald,zwinkau}@kit.edu 2 Saarland University {hack,leissa,mallon}@cs.uni-saarland.de Abstract. We present a simple SSA construction algorithm, which al- lows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate rep- resentation is in SSA form. This allows the application of SSA-based opti- mizations during construction. After completion, the intermediate representation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm. 1 Introduction Many modern compilers feature intermediate representations (IR) based on the static single assignment form (SSA form). SSA was conceived to make program analyses more efficient by compactly representing use-def chains. Over the last years, it turned out that the SSA form not only helps to make analyses more efficient but also easier to implement, test, and debug. Thus, modern compilers such as the Java HotSpot VM [14], LLVM [2], and libFirm [1] exclusively base their intermediate representation on the SSA form. The first algorithm to efficiently construct the SSA form was introduced by Cytron et al. [10]. One reason, why this algorithm still is very popular, is that it guarantees a form of minimality on the number of placed φ functions. However, for compilers that are entirely SSA-based, this algorithm has a significant draw- back: Its input program has to be represented as a control flow graph (CFG) in non-SSA form.
    [Show full text]
  • GCC—An Architectural Overview, Current Status, and Future Directions
    GCC—An Architectural Overview, Current Status, and Future Directions Diego Novillo Red Hat Canada [email protected] Abstract GCC one of the most popular compilers in use today. It serves as the system compiler for every Linux distribution and it is also fairly The GNU Compiler Collection (GCC) is one popular in academic circles, where it is used of the most popular compilers available and it for compiler research. Despite this popular- is the de facto system compiler for Linux sys- ity, GCC has traditionally proven difficult to tems. Despite its popularity, the internal work- maintain and enhance, to the extreme that some ings of GCC are relatively unknown outside of features were almost impossible to implement. the immediate developer community. And as a result GCC was starting to lag behind This paper provides a high-level architectural the competition. overview of GCC, its internal modules, and Part of the problem is the size of its code base. how it manages to support a wide variety of While GCC is not huge by industry standards, hardware architectures and languages. Special it still is a fairly large project, with a core emphasis is placed on high-level descriptions of about 1.3 MLOC. Including the runtime li- of the different modules to provide a roadmap braries needed for all the language support, to GCC. GCC comes to about 2.2 MLOC.1 Finally, the paper also describes recent techno- logical improvements that have been added to Size is not the only hurdle presented by GCC. GCC and discusses some of the major features Compilers are inherently complex and very de- that the developer community is thinking for manding in terms of the theoretical knowl- future versions of the compiler.
    [Show full text]
  • Optimizing Programs for Fast Verification
    -OVERIFY: Optimizing Programs for Fast Verification Jonas Wagner Volodymyr Kuznetsov George Candea School of Computer and Communication Sciences École polytechnique fédérale de Lausanne (EPFL), Switzerland int wc(unsigned char *str, int any) { Abstract int res = 0; int new_word = 1; Developers rely on automated testing and verification for(unsigned char *p = str; *p; ++p) { tools to gain confidence in their software. The input to if (isspace(*p) || such tools is often generated by compilers that have been (any && !isalpha(*p))) { new_word = 1; designed to generate code that runs fast, not code that } else{ can be verified easily and quickly. This makes the verifi- if (new_word) { cation tool’s task unnecessarily hard. ++res; new_word = 0; We propose that compilers support a new kind of } switch, -OVERIFY, that generates code optimized for the } } needs of verification tools. We implemented this idea for one class of verification (symbolic execution) and found return res; } that, when run on the Coreutils suite of UNIX utilities, it reduces verification time by up to 95×. Listing 1: Count words in a string; they are separated by whitespace or, if any6=0, by non-alphabetic characters. 1 Introduction Motivating example: The example in Listing 1, a Automated program verification tools are essential to function that counts words in a string, illustrates the ef- writing good quality software. They find bugs and are fect of compiler transformations on program analysis. crucial in safety-critical workflows. For example, for This simple function is challenging to analyze for sev- a bounded input size, symbolic execution engines like eral reasons: It contains an input-dependent loop whose KLEE [4] and Cloud9 [3] can verify the absence of bugs number of iterations cannot be statically predicted.
    [Show full text]