Always-Canonical Intermediate Representation Eric Drew Fritz University of Wisconsin-Milwaukee

Total Page:16

File Type:pdf, Size:1020Kb

Always-Canonical Intermediate Representation Eric Drew Fritz University of Wisconsin-Milwaukee University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations December 2018 Waddle - Always-canonical Intermediate Representation Eric Drew Fritz University of Wisconsin-Milwaukee Follow this and additional works at: https://dc.uwm.edu/etd Part of the Computer Sciences Commons Recommended Citation Fritz, Eric Drew, "Waddle - Always-canonical Intermediate Representation" (2018). Theses and Dissertations. 1989. https://dc.uwm.edu/etd/1989 This Dissertation is brought to you for free and open access by UWM Digital Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of UWM Digital Commons. For more information, please contact [email protected]. WADDLE – ALWAYS-CANONICAL INTERMEDIATE REPRESENTATION by Eric Fritz A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Engineering at The University of Wisconsin-Milwaukee December 2018 ABSTRACT WADDLE – ALWAYS-CANONICAL INTERMEDIATE REPRESENTATION by Eric Fritz The University of Wisconsin-Milwaukee, 2018 Under the Supervision of Professor John Boyland Program transformations that are able to rely on the presence of canonical properties of the program undergoing optimization can be written to be more robust and efficient than an equivalent but generalized transformation that also handles non-canonical programs. If a canonical property is required but broken earlier in an earlier transformation, it must be rebuilt (often from scratch). This additional work can be a dominating factor in compilation time when many transformations are applied over large programs. This dissertation intro- duces a methodology for constructing program transformations so that the program remains in an always-canonical form as the program is mutated, making only local changes to restore broken properties. ii © Copyright by Eric Fritz, 2018 All Rights Reserved iii TABLE OF CONTENTS Abstract ii List of Figures x 1 Introduction 2 1.1 Motivation ..................................... 2 1.2 Research Contributions .............................. 4 1.3 Organization ................................... 5 2 Preliminaries 6 2.1 Sequences ..................................... 6 2.2 Multisets ..................................... 6 2.3 Labels ....................................... 7 2.4 Control Flow Graph ............................... 8 2.4.1 Reducibility ................................ 9 2.4.2 Induced Trees ............................... 11 2.4.3 Depth-First Spanning Tree ....................... 12 2.5 Domination .................................... 12 2.5.1 Dominator Tree .............................. 13 2.6 Loops ....................................... 13 iv 2.6.1 Loop Nesting ............................... 15 2.6.2 Loop Nesting Forest ........................... 15 2.6.3 Loop Deconstruction ........................... 17 2.6.4 Identification of Reducible Loops .................... 18 2.6.5 Identification of Irreducible Loops .................... 18 3 Internal Representation 20 3.1 Syntax ....................................... 21 3.1.1 Blocks, Functions, and Programs .................... 21 3.1.2 Values and Expressions ......................... 22 3.1.3 Block Components ............................ 23 3.1.3.1 Block Parameters and Implicit Parameters ......... 24 3.1.3.2 Instructions ........................... 26 3.1.3.3 Terminators .......................... 28 3.2 Semantics ..................................... 29 3.2.1 Function Cloning ............................. 29 3.2.2 Environments ............................... 29 3.2.3 Evaluation ................................ 32 3.3 Type System ................................... 38 Proof Appendix 43 3.A Soundness ..................................... 43 4 Properties 50 4.1 Static Single Assignment Form ......................... 50 4.2 Loop-Closed Static Single Assignment Form .................. 53 4.3 Canonical Form .................................. 54 v 5 Related Work 57 5.1 Dominator Tree Construction .......................... 57 5.1.1 Iterative Algorithms ........................... 57 5.1.2 Lengauer-Tarjan Algorithm ....................... 59 5.1.3 Semi-NCA ................................. 61 5.1.4 Linear Time Algorithms ......................... 62 5.2 Dominator Tree Reconstruction ......................... 64 5.2.1 Ramalingam-Reps Algorithm ...................... 65 5.2.2 Dynamic SNCA Algorithm ....................... 66 5.2.3 Depth-Based Heuristic .......................... 68 5.3 SSA Construction ................................. 68 5.4 SSA Reconstruction ............................... 71 6 Transformations 75 6.1 Notation ...................................... 75 6.2 Theorems ..................................... 76 6.2.1 Symmetric Evaluation .......................... 77 6.2.2 Structural Theorems ........................... 81 Proof Appendix 83 6.A Proof Template for Maintenance of Evaluation ................. 83 6.B Symmetric Evaluation .............................. 90 6.B.1 Symmetric Instructions ......................... 90 6.B.2 Symmetric Function Calls ........................ 93 6.B.3 Symmetric Branch ............................ 94 6.B.4 Symmetric Return ............................ 95 vi 6.C Common Lemmas ................................. 97 7 Canonicalization 99 7.1 SSA Reconstruction ............................... 100 7.2 LCSSA Reconstruction .............................. 105 7.3 Edge Set Splitting ................................ 107 7.4 Repairing Violations ............................... 109 7.4.1 Property 4.3.1 – Unique latch ...................... 109 7.4.2 Property 4.3.2 – Dedicated preheader .................. 110 7.4.3 Property 4.3.3 – Dedicated exits .................... 111 Proof Appendix 114 7.A SSA Reconstruction ............................... 114 7.B LCSSA Reconstruction .............................. 125 7.C Edge Set Splitting ................................ 134 7.D Unique Latch ................................... 142 7.E Dedicated Preheader ............................... 143 7.F Dedicated Exits .................................. 148 8 Operations 150 8.1 Block Ejection .................................. 150 8.2 Edge Deletion ................................... 152 8.2.1 Change in Path Multiplicity ....................... 155 8.2.2 Change in Paths ............................. 156 8.3 Loop Duplication ................................. 158 Proof Appendix 162 vii 8.A Block Ejection .................................. 162 8.B Delete Edge .................................... 167 9 Optimizations 177 9.1 Straightening ................................... 177 9.2 If Simplification .................................. 179 9.3 Jump Simplification ............................... 181 9.4 Function Inlining ................................. 185 9.5 Loop Unswitching ................................. 188 9.6 Loop Unrolling .................................. 192 9.7 Loop Peeling ................................... 195 Proof Appendix 198 9.A Straightening ................................... 198 9.B If Simplification .................................. 206 9.C Jump Simplification ............................... 212 9.D Function Inlining ................................. 218 9.E Loop Unswitching ................................. 234 9.F Loop Unrolling .................................. 243 9.G Loop Peeling ................................... 252 10 Evaluation 261 10.1 Source Programs ................................. 261 10.2 Methodology ................................... 263 10.3 Single Pass .................................... 265 10.4 Pass Sequence ................................... 267 viii 11 Future Directions 270 Bibliography 274 Curriculum Vitae 281 ix LIST OF FIGURES 2.1 An example control flow graph containing a loop and a one-armed conditional. Block content is presented in Section 3.1. .................... 9 2.2 The process of collapsing a reducible graph into its limit graph (incomplete). 10 2.3 The process of collapsing an irreducible graph into its limit graph (incom- plete). Transformation (T3) is applied to block b so that transformations (T1) and (T2) can further reduce the graph (incomplete). .......... 11 2.4 A control flow graph with its dominator tree. An edge (b1; b2) in the dominator tree indicates that idom(b2) = b1. ........................ 14 2.5 A control flow graph containing three loops (one nested) shown with itsloop nesting forest. In this example, the set of abstract loops are lb = (b; fb; c; dg; feg), lc = (c; fc; dg; fbg), and le = (e; fe; fg; ;). .................... 16 2.6 An irreducible graph. ............................... 18 3.1 Syntax of values, expressions, and types. .................... 23 3.2 A graph annotated with implicit parameters (the second set of parameters). 26 3.3 Implicit parameters for the graph in Figure 3.2. ................ 26 3.4 The syntax of instructions. ............................ 27 3.5 The syntax of terminators. ............................ 28 3.6 Syntax of terms, environments, nondeterminsm state, and effects. ...... 30 x 3.7 Evaluation rules for instructions excluding call. ................ 33 3.8 Evaluation rules for instructions evaluating abnormally. ............ 34 3.9 Evaluation rules for the invocation of an intrinsic. ............... 35 3.10 Evaluation rules for the switch terminator. .................. 36 3.11 Evaluation rules for the call instruction and return terminator. ...... 37 3.12 Evaluation rule for invoking a program function externally.
Recommended publications
  • Generalized Jump Threading in Libfirm
    Institut für Programmstrukturen und Datenorganisation (IPD) Lehrstuhl Prof. Dr.-Ing. Snelting Generalized Jump Threading in libFIRM Masterarbeit von Joachim Priesner an der Fakultät für Informatik Erstgutachter: Prof. Dr.-Ing. Gregor Snelting Zweitgutachter: Prof. Dr.-Ing. Jörg Henkel Betreuender Mitarbeiter: Dipl.-Inform. Andreas Zwinkau Bearbeitungszeit: 5. Oktober 2016 – 23. Januar 2017 KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu Zusammenfassung/Abstract Jump Threading (dt. „Sprünge fädeln“) ist eine Compileroptimierung, die statisch vorhersagbare bedingte Sprünge in unbedingte Sprünge umwandelt. Bei der Ausfüh- rung kann ein Prozessor bedingte Sprünge zunächst nur heuristisch mit Hilfe der Sprungvorhersage auswerten. Sie stellen daher generell ein Performancehindernis dar. Die Umwandlung ist insbesondere auch dann möglich, wenn das Sprungziel nur auf einer Teilmenge der zu dem Sprung führenden Ausführungspfade statisch be- stimmbar ist. In diesem Fall, der den überwiegenden Teil der durch Jump Threading optimierten Sprünge betrifft, muss die Optimierung Grundblöcke duplizieren, um jene Ausführungspfade zu isolieren. Verschiedene aktuelle Compiler enthalten sehr unterschiedliche Implementierungen von Jump Threading. In dieser Masterarbeit wird zunächst ein theoretischer Rahmen für Jump Threading vorgestellt. Sodann wird eine allgemeine Fassung eines Jump- Threading-Algorithmus entwickelt, implementiert und in diverser Hinsicht untersucht, insbesondere auf Wechselwirkungen mit anderen Optimierungen wie If
    [Show full text]
  • Improving the Compilation Process Using Program Annotations
    POLITECNICO DI MILANO Corso di Laurea Magistrale in Ingegneria Informatica Dipartimento di Elettronica, Informazione e Bioingegneria Improving the Compilation process using Program Annotations Relatore: Prof. Giovanni Agosta Correlatore: Prof. Lenore Zuck Tesi di Laurea di: Niko Zarzani, matricola 783452 Anno Accademico 2012-2013 Alla mia famiglia Alla mia ragazza Ai miei amici Ringraziamenti Ringrazio in primis i miei relatori, Prof. Giovanni Agosta e Prof. Lenore Zuck, per la loro disponibilità, i loro preziosi consigli e il loro sostegno. Grazie per avermi seguito sia nel corso della tesi che della mia carriera universitaria. Ringrazio poi tutti coloro con cui ho avuto modo di confrontarmi durante la mia ricerca, Dr. Venkat N. Venkatakrishnan, Dr. Rigel Gjomemo, Dr. Phu H. H. Phung e Giacomo Tagliabure, che mi sono stati accanto sin dall’inizio del mio percorso di tesi. Voglio ringraziare con tutto il cuore la mia famiglia per il prezioso sup- porto in questi anni di studi e Camilla per tutto l’amore che mi ha dato anche nei momenti più critici di questo percorso. Non avrei potuto su- perare questa avventura senza voi al mio fianco. Ringrazio le mie amiche e i miei amici più cari Ilaria, Carolina, Elisa, Riccardo e Marco per la nostra speciale amicizia a distanza e tutte le risate fatte assieme. Infine tutti i miei conquilini, dai più ai meno nerd, per i bei momenti pas- sati assieme. Ricorderò per sempre questi ultimi anni come un’esperienza stupenda che avete reso memorabile. Mi mancherete tutti. Contents 1 Introduction 1 2 Background 3 2.1 Annotated Code . .3 2.2 Sources of annotated code .
    [Show full text]
  • Control Flow Graphs
    CONTROL FLOW GRAPHS PROGRAM ANALYSIS AND OPTIMIZATION – DCC888 Fernando Magno Quintão Pereira [email protected] The material in these slides have been taken from the "Dragon Book", Secs 8.5 – 8.7, by Aho et al. Intermediate Program Representations • Optimizing compilers and human beings do not see the program in the same way. – We are more interested in source code. – But, source code is too different from machine code. – Besides, from an engineering point of view, it is better to have a common way to represent programs in different languages, and target different architectures. Fortran PowerPC COBOL x86 Front Back Optimizer Lisp End End ARM … … Basic Blocks and Flow Graphs • Usually compilers represent programs as control flow graphs (CFG). • A control flow graph is a directed graph. – Nodes are basic blocks. – There is an edge from basic block B1 to basic block B2 if program execution can flow from B1 to B2. • Before defining basic void identity(int** a, int N) { int i, j; block, we will illustrate for (i = 0; i < N; i++) { this notion by showing the for (j = 0; j < N; j++) { a[i][j] = 0; CFG of the function on the } right. } for (i = 0; i < N; i++) { What does a[i][i] = 1; this program } do? } The Low Level Virtual Machine • We will be working with a compilation framework called The Low Level Virtual Machine, or LLVM, for short. • LLVM is today the most used compiler in research. • Additionally, this compiler is used in many important companies: Apple, Cray, Google, etc. The front-end Machine independent Machine dependent that parses C optimizations, such as optimizations, such into bytecodes constant propagation as register allocation ../0 %"&'( *+, ""% !"#$% !"#$)% !"#$)% !"#$- Using LLVM to visualize a CFG • We can use the opt tool, the LLVM machine independent optimizer, to visualize the control flow graph of a given function $> clang -c -emit-llvm identity.c -o identity.bc $> opt –view-cfg identity.bc • We will be able to see the CFG of our target program, as long as we have the tool DOT installed in our system.
    [Show full text]
  • GCC Internals
    GCC Internals Diego Novillo [email protected] Red Hat Canada CGO 2007 San Jose, California March 2007 Outline 1. Overview 2. Source code organization 3. Internal architecture 4. Passes NOTE: Internal information valid for GCC mainline as of 2007-03-02 11 March 2007 GCC Internals - 2 1. Overview ➢ Major features ➢ Brief history ➢ Development model 11 March 2007 GCC Internals - 3 Major Features Availability – Free software (GPL) – Open and distributed development process – System compiler for popular UNIX variants – Large number of platforms (deeply embedded to big iron) – Supports all major languages: C, C++, Java, Fortran 95, Ada, Objective-C, Objective-C++, etc 11 March 2007 GCC Internals - 4 Major Features Code quality – Bootstraps on native platforms – Warning-free – Extensive regression testsuite – Widely deployed in industrial and research projects – Merit-based maintainership appointed by steering committee – Peer review by maintainers – Strict coding standards and patch reversion policy 11 March 2007 GCC Internals - 5 Major Features Analysis/Optimization – SSA-based high-level global optimizer – Constraint-based points-to alias analysis – Data dependency analysis based on chains of recurrences – Feedback directed optimization – Interprocedural optimization – Automatic pointer checking instrumentation – Automatic loop vectorization – OpenMP support 11 March 2007 GCC Internals - 6 1. Overview ➢ Major features ➢ Brief history ➢ Development model 11 March 2007 GCC Internals - 7 Brief History GCC 1 (1987) – Inspired on Pastel compiler
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • GCC Internals
    GCC Internals Diego Novillo [email protected] November 2007 Outline 1. Overview Features, history, development model 2. Source code organization Files, building, patch submission 3. Internal architecture Pipeline, representations, data structures, alias analysis, data flow, code generation 4. Passes Adding, debugging Internal information valid for GCC mainline as of 2007-11-20 November 27, 2007 GCC Internals - 2 1. Overview ➢ Major features ➢ Brief history ➢ Development model November 27, 2007 GCC Internals - 3 Major Features Availability – Free software (GPL) – Open and distributed development process – System compiler for popular UNIX variants – Large number of platforms (deeply embedded to big iron) – Supports all major languages: C, C++, Java, Fortran 95, Ada, Objective-C, Objective-C++, etc November 27, 2007 GCC Internals - 4 Major Features Code quality – Bootstraps on native platforms – Warning-free – Extensive regression testsuite – Widely deployed in industrial and research projects – Merit-based maintainership appointed by steering committee – Peer review by maintainers – Strict coding standards and patch reversion policy November 27, 2007 GCC Internals - 5 Major Features Analysis/Optimization – SSA-based high-level global optimizer – Constraint-based points-to alias analysis – Data dependency analysis based on chains of recurrences – Feedback directed optimization – Inter-procedural optimization – Automatic pointer checking instrumentation – Automatic loop vectorization – OpenMP support November 27, 2007 GCC Internals - 6 Overview
    [Show full text]
  • Simple and Efficient SSA Construction
    Simple and Efficient Construction of Static Single Assignment Form Matthias Braun1, Sebastian Buchwald1, Sebastian Hack2, Roland Leißa2, Christoph Mallon2, and Andreas Zwinkau1 1Karlsruhe Institute of Technology, {matthias.braun,buchwald,zwinkau}@kit.edu 2Saarland University {hack,leissa,mallon}@cs.uni-saarland.de Abstract. We present a simple SSA construction algorithm, which al- lows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate rep- resentation is in SSA form. This allows the application of SSA-based op- timizations during construction. After completion, the intermediate rep- resentation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm. 1 Introduction Many modern compilers feature intermediate representations (IR) based on the static single assignment form (SSA form). SSA was conceived to make program analyses more efficient by compactly representing use-def chains. Over the last years, it turned out that the SSA form not only helps to make analyses more efficient but also easier to implement, test, and debug. Thus, modern compilers such as the Java HotSpot VM [14], LLVM [2], and libFirm [1] exclusively base their intermediate representation on the SSA form. The first algorithm to efficiently construct the SSA form was introduced by Cytron et al. [10]. One reason, why this algorithm still is very popular, is that it guarantees a form of minimality on the number of placed φ functions. However, for compilers that are entirely SSA-based, this algorithm has a significant draw- back: Its input program has to be represented as a control flow graph (CFG) in non-SSA form.
    [Show full text]
  • A Case for an SC-Preserving Compiler
    A Case for an SC-Preserving Compiler Daniel Marino† Abhayendra Singh∗ Todd Millstein† Madanlal Musuvathi‡ Satish Narayanasamy∗ †University of California, Los Angeles ∗University of Michigan, Ann Arbor ‡Microsoft Research, Redmond Abstract 1. Introduction The most intuitive memory consistency model for shared-memory A memory consistency model (or simply memory model) defines the multi-threaded programming is sequential consistency (SC). How- semantics of a concurrent programming language by specifying the ever, current concurrent programming languages support a re- order in which memory operations performed by one thread become laxed model, as such relaxations are deemed necessary for en- visible to other threads in the program. The most natural memory abling important optimizations. This paper demonstrates that an model is sequential consistency (SC) [31]. Under SC, the individual SC-preserving compiler, one that ensures that every SC behavior of operations of the program appear to have executed in a global a compiler-generated binary is an SC behavior of the source pro- sequential order consistent with the per-thread program order. This gram, retains most of the performance benefits of an optimizing semantics matches the intuition of a concurrent program’s behavior compiler. The key observation is that a large class of optimizations as a set of possible thread interleavings. crucial for performance are either already SC-preserving or can be It is commonly accepted that programming languages must relax modified to preserve SC while retaining much of their effectiveness. the SC semantics of programs in order to allow effective compiler An SC-preserving compiler, obtained by restricting the optimiza- optimizations. This paper challenges that assumption by demon- tion phases in LLVM, a state-of-the-art C/C++ compiler, incurs an strating an optimizing compiler that retains most of the performance average slowdown of 3.8% and a maximum slowdown of 34% on of the generated code while preserving the SC semantics.
    [Show full text]
  • The University of Chicago a Compiler-Based Online
    THE UNIVERSITY OF CHICAGO A COMPILER-BASED ONLINE ADAPTIVE OPTIMIZER A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE BY KAVON FAR FARVARDIN CHICAGO, ILLINOIS DECEMBER 2020 Copyright © 2020 by Kavon Far Farvardin Permission is hereby granted to copy, distribute and/or modify this document under the terms of the Creative Commons Attribution 4.0 International license. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. To my cats Jiji and Pumpkin. Table of Contents List of Figures vii List of Tables ix Acknowledgments x Abstract xi I Introduction 1 1 Motivation 2 1.1 Adaptation . .4 1.2 Trial and Error . .6 1.3 Goals . .8 2 Background 11 2.1 Terminology . 12 2.2 Profile-guided Compilation . 14 2.3 Dynamic Compilation . 15 2.4 Autotuning . 17 2.5 Finding Balance . 19 3 Related Work 21 3.1 By Similarity . 21 3.1.1 Active Harmony . 22 3.1.2 Kistler’s Optimizer . 25 3.1.3 Jikes RVM . 27 3.1.4 ADAPT . 28 3.1.5 ADORE . 30 3.1.6 Suda’s Bayesian Online Autotuner . 31 3.1.7 PEAK . 32 3.1.8 PetaBricks . 33 3.2 By Philosophy . 34 iv 3.2.1 Adaptive Fortran . 34 3.2.2 Dynamic Feedback . 34 3.2.3 Dynamo . 35 3.2.4 CoCo . 36 3.2.5 MATE . 36 3.2.6 AOS . 37 3.2.7 Testarossa . 38 4 Thesis 40 II Halo: Wholly Adaptive LLVM Optimizer 43 5 System Overview 44 5.1 Clang .
    [Show full text]
  • Compiler Optimization Effects on Register Collisions
    COMPILER OPTIMIZATION EFFECTS ON REGISTER COLLISIONS A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Jonathan Tan June 2018 c 2018 Jonathan Tan ALL RIGHTS RESERVED ii COMMITTEE MEMBERSHIP TITLE: Compiler Optimization Effects on Register Collisions AUTHOR: Jonathan Tan DATE SUBMITTED: June 2018 COMMITTEE CHAIR: Aaron Keen, Ph.D. Professor of Computer Science COMMITTEE MEMBER: Theresa Migler, Ph.D. Professor of Computer Science COMMITTEE MEMBER: John Seng, Ph.D. Professor of Computer Science iii ABSTRACT Compiler Optimization Effects on Register Collisions Jonathan Tan We often want a compiler to generate executable code that runs as fast as possible. One consideration toward this goal is to keep values in fast registers to limit the number of slower memory accesses that occur. When there are not enough physical registers available for use, values are \spilled" to the runtime stack. The need for spills is discovered during register allocation wherein values in use are mapped to physical registers. One factor in the efficacy of register allocation is the number of values in use at one time (register collisions). Register collision is affected by compiler optimizations that take place before register allocation. Though the main purpose of compiler optimizations is to make the overall code better and faster, some optimiza- tions can actually increase register collisions. This may force the register allocation process to spill. This thesis studies the effects of different compiler optimizations on register collisions. iv ACKNOWLEDGMENTS Thanks to: • My advisor, Aaron Keen, for guiding, encouraging, and mentoring me on the thesis and classes.
    [Show full text]
  • Simple and Efficient Construction of Static Single Assignment Form
    Simple and Efficient Construction of Static Single Assignment Form Matthias Braun1, Sebastian Buchwald1, Sebastian Hack2, Roland Leißa2, Christoph Mallon2, and Andreas Zwinkau1 1 Karlsruhe Institute of Technology {matthias.braun,buchwald,zwinkau}@kit.edu 2 Saarland University {hack,leissa,mallon}@cs.uni-saarland.de Abstract. We present a simple SSA construction algorithm, which al- lows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate rep- resentation is in SSA form. This allows the application of SSA-based opti- mizations during construction. After completion, the intermediate representation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm. 1 Introduction Many modern compilers feature intermediate representations (IR) based on the static single assignment form (SSA form). SSA was conceived to make program analyses more efficient by compactly representing use-def chains. Over the last years, it turned out that the SSA form not only helps to make analyses more efficient but also easier to implement, test, and debug. Thus, modern compilers such as the Java HotSpot VM [14], LLVM [2], and libFirm [1] exclusively base their intermediate representation on the SSA form. The first algorithm to efficiently construct the SSA form was introduced by Cytron et al. [10]. One reason, why this algorithm still is very popular, is that it guarantees a form of minimality on the number of placed φ functions. However, for compilers that are entirely SSA-based, this algorithm has a significant draw- back: Its input program has to be represented as a control flow graph (CFG) in non-SSA form.
    [Show full text]
  • GCC—An Architectural Overview, Current Status, and Future Directions
    GCC—An Architectural Overview, Current Status, and Future Directions Diego Novillo Red Hat Canada [email protected] Abstract GCC one of the most popular compilers in use today. It serves as the system compiler for every Linux distribution and it is also fairly The GNU Compiler Collection (GCC) is one popular in academic circles, where it is used of the most popular compilers available and it for compiler research. Despite this popular- is the de facto system compiler for Linux sys- ity, GCC has traditionally proven difficult to tems. Despite its popularity, the internal work- maintain and enhance, to the extreme that some ings of GCC are relatively unknown outside of features were almost impossible to implement. the immediate developer community. And as a result GCC was starting to lag behind This paper provides a high-level architectural the competition. overview of GCC, its internal modules, and Part of the problem is the size of its code base. how it manages to support a wide variety of While GCC is not huge by industry standards, hardware architectures and languages. Special it still is a fairly large project, with a core emphasis is placed on high-level descriptions of about 1.3 MLOC. Including the runtime li- of the different modules to provide a roadmap braries needed for all the language support, to GCC. GCC comes to about 2.2 MLOC.1 Finally, the paper also describes recent techno- logical improvements that have been added to Size is not the only hurdle presented by GCC. GCC and discusses some of the major features Compilers are inherently complex and very de- that the developer community is thinking for manding in terms of the theoretical knowl- future versions of the compiler.
    [Show full text]