An Intermediate Representation for Optimizing Compilers Nico Reissmann, Jan Christian Meyer, Magnus Själander

Total Page:16

File Type:pdf, Size:1020Kb

An Intermediate Representation for Optimizing Compilers Nico Reissmann, Jan Christian Meyer, Magnus Själander Thesis for the degree of Philosophiae Doctor Principles, Techniques, and Tools for Explicit and Automatic Parallelization Nico Reissmann January 28, 2019 Norwegian University of Science and Technology Faculty of Information Technology and Electrical Engineering Department of Computer Science Abstract The end of Dennard scaling also brought an end to technology scaling as a means to improve performance. Chip manufacturers had to abandon fre- quency and superscalar scaling as processors became increasingly power con- strained. An architecture’s power budget became the limiting factor to perfor- mance gains, and computations had to be performed more energy-efficiently. Designers turned to chip multiprocessors (CMPs) and developers began to employ specialized architectures, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to further improve performance while meeting the power envelope. The exploitation of parallelism in an energy- efficient manner became the only way forward. Until the end of technology scaling, programs experienced transparent per- formance gains with every new processor generation. However, CMPs, GPUs, and FPGAs rely on the static extraction of parallelism to improve performance, and programs need to be modified to take advantage of these architectures. Thus, performance gains are no longer achieved transparently, and develop- ers and tools are forced to face new, as well as long-neglected challenges in program parallelization. These challenges include the detection and encod- ing of potential parallelism in automatic approaches, application portability issues on GPUs, and performance portability issues on CMPs. It is essential to address these challenges, as the continuous increase in computer perfor- mance now solely relies on the exploitation of parallelism. This thesis consists of three parts, each addressing one of the aforementioned challenges in program parallelization. The first part addresses the detection and encoding of potential parallelism in automatic approaches. It presents the Regionalized Value State Dependence Graph (RVSDG) as an alternative intermediate representation for optimizing and parallelizing compilers. The RVSDG exposes the hierarchical structure of programs and explicitly models the dependencies between computations, permitting the explicit encoding of iii concurrent operations and program structures, such as conditionals, loops, and functions. This helps to expose the inherent parallelism in programs and its structures by employing well-known methods for the extraction of instruc- tion level parallelism. The second part addresses application portability issues on GPUs. A GPU’s specialized architecture is optimized for highly regular data-parallel applica- tions, but compromises program performance for workloads with irregular control flow, potentially leading to redundant code execution. We propose a control flow restructuring method to effectively eliminate repeated code ex- ecution on GPUs and potentially improve performance. The third part addresses performance portability on CMPs. This issue arises as developers overfit their application to a specific architecture, which results in suboptimal performance for different program inputs or different architec- tures. We improve performance analysis for OpenMP programs by address- ing the scalability challenges of the grain graph visualization method. We present an aggregation method for grain graphs that hierarchically groups re- lated nodes into a single node. This aggregated graph can then be navigated by progressively uncovering nodes with performance issues, while hiding un- related regions of the graph. This enhances productivity by enabling devel- opers to understand performance problems of highly-parallel OpenMP pro- grams more easily. The insights and techniques developed by addressing these three challenges may result in improved methods and tools for the exploitation of parallelism. The RVSDG is a promising IR for parallelizing compilers, as it permits the en- coding of concurrent computations. The grain graph offers a familiar struc- tural view to developers along with the performance issues of a particular pro- gram. In the future, it is necessary to cast these ideas into mature tools to make them applicable in practice and foster further research. iv List of Contributions Thesis Articles 1. RVSDG: An Intermediate Representation for Optimizing Compilers Nico Reissmann, Jan Christian Meyer, Magnus Själander. Unpublished Manuscript 2. Perfect Reconstructability of Control Flow from Demand Dependence Graphs Helge Bahmann, Nico Reissmann, Magnus Jahre, and Jan Christian Meyer. In Transactions on Architecture and Code Optimization (TACO), Volume 11 Issue 4, ACM, 2015 3. Efficient Control Flow Restructuring for GPUs Nico Reissmann, Thomas L. Falch, Benjamin A. Björnseth, Helge Bah- mann, Jan Christian Meyer, and Magnus Jahre. In Proceedings of the International Conference on High Performance Com- puting & Simulation (HPCS), IEEE, 2016 Winner of Outstanding Paper Award 4. Diagnosing Highly-Parallel OpenMP Programs With Aggregated Grain Graphs Nico Reissmann and Ananya Muddukrishna. In Proceedings of the International European Conference on Parallel and Distributed Computing (Euro-Par), Springer, 2018 v Other Articles 1. A Study of Energy and Locality Effects using Space-filling Curves Nico Reissmann, Jan Christian Meyer, Magnus Jahre. In Proceedings of the 28th IEEE International Parallel & Distributed Pro- cessing Symposium (IPDPS) and IPDPS Workshops (IPDPSW), 2014. 2. Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures Mohammed Sourouri, Espen Birker Raknes, Nico Reissmann, Johannes Langguth, Daniel Hackenberg, Robert Schöne, Per Gunnar Kjeldsberg. In Proceedings of the International Conference for High Performance Com- puting, Networking, Storage and Analysis (SC), 2017. 3. Aggregating Large Grain Graphs For Improved OpenMP Productivity Nico Reissmann, Magnus Jahre, Ananya Muddukrishna. In Fourth International Workshop on Visual Performance Analysis (VPA), 2017. 4. RVSDG: An Intermediate Representation for the Multi-Core Era Nico Reissmann, Jan Christian Meyer, Magnus Själander. In 11th Nordic Workshop on Multi-Core Computing (MCC), 2018. 5. Load Balancing Domain Decompositions of a Lattice-Boltzmann Proxy Application Jan Christian Meyer, Janusa Ragunathan, Jørgen Valstad, Nico Reissmann. In 11th Nordic Workshop on Multi-Core Computing (MCC), 2018. vi Acknowledgements First and foremost, my deepest and sincerest thanks to Helge Bahmann and Jan Christian Meyer, for their invaluable guidance over the last twelve years. Both have challenged me to explore new and unfamiliar topics, broadened my perspective, sparked my love for compilers, and helped me to become a better engineer and researcher. My heartfelt thanks also to Magnus Själander for his advice and encouragement over the last few years, as well as for his interest in my work. Sincere thanks also to Gunnar Tufte for his support and the many insightful conversations. Gratitude is also due to my co-supervisors, Lasse Natvig and Per Gunnar Kjeldsberg, for their support throughout the years. Moreover, I am grateful to Anne Berit Dahl, Birgit Sørgård, Berit Hellan, and Ellen Solberg for finding solutions instead of problems, and trying to reduce bureaucracy to a minimum. I would also like to thank my colleagues from the Computing group for making the fourth floor a fun and engaging workplace, and especially for the innumerable and illuminating discussions during lunch time and over payday drinks. They have been an indispensable part of my aca- demic life, and contributed substantially to my social well-being and growth as a researcher. A big round of thanks to my friends from Germany, Gothenburg, and Trond- heim, for helping me to escape from the office and enriching my life. In par- ticular, I would like to thank my long-term friends Carina Walter, René Kaiser, Jens Teuscher, Franziska Holzhauer-Pansa, Oliver Holzhauer, Aleksejs Sen- cenko, Marina Vazhnova, Dragana Laketi´c,Leif Tore Rusten, Elizabeth and Ja- cob Sturdy, David Fallon, and John Naliboff. My deepest gratitude is also due to my parents, Luise and Friedmar Reiss- mann, to my brother’s family, Mario, Yvonne, and Felix Reissmann, and to my close family, Birgit and Joachim Reissmann, as well as Urs, Anja, Nadja, and vii Larissa Weber, for their boundless patience, unwavering support, and endless encouragement. Thanks for always having my back! Last, but not least, I am eternally grateful to my better half, Ramona Enache, for her help throughout in preserving my sanity, providing support in times of need, and her constant love. You make me a better person! Nico Reissmann Trondheim, 4th November 2018 viii Contents A. Overview1 1. Introduction3 2. Motivation and Scope7 2.1. Modern Architectures . 10 2.2. Programming Modern Architectures . 12 2.3. Parallelization Challenges . 17 2.4. Summary . 25 3. Research Contributions 27 3.1. Thesis Articles . 29 3.2. Frameworks and Tools . 31 3.3. Other Articles . 32 4. Background and Related Work 35 4.1. Compiler Intermediate Representations . 35 4.2. Graphics Processing Units . 38 4.3. Grain Graphs . 41 5. Concluding Remarks 47 5.1. FutureWork ............................ 48 5.2. Outlook . 55 B. Regionalized Value State Dependence Graph 59 B1. RVSDG: An Intermediate Representation for Optimizing Compilers 61 B1.1. Introduction . 62 B1.2. Motivation . 65 B1.3. The Regionalized Value State Dependence Graph . 67
Recommended publications
  • Generalized Jump Threading in Libfirm
    Institut für Programmstrukturen und Datenorganisation (IPD) Lehrstuhl Prof. Dr.-Ing. Snelting Generalized Jump Threading in libFIRM Masterarbeit von Joachim Priesner an der Fakultät für Informatik Erstgutachter: Prof. Dr.-Ing. Gregor Snelting Zweitgutachter: Prof. Dr.-Ing. Jörg Henkel Betreuender Mitarbeiter: Dipl.-Inform. Andreas Zwinkau Bearbeitungszeit: 5. Oktober 2016 – 23. Januar 2017 KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu Zusammenfassung/Abstract Jump Threading (dt. „Sprünge fädeln“) ist eine Compileroptimierung, die statisch vorhersagbare bedingte Sprünge in unbedingte Sprünge umwandelt. Bei der Ausfüh- rung kann ein Prozessor bedingte Sprünge zunächst nur heuristisch mit Hilfe der Sprungvorhersage auswerten. Sie stellen daher generell ein Performancehindernis dar. Die Umwandlung ist insbesondere auch dann möglich, wenn das Sprungziel nur auf einer Teilmenge der zu dem Sprung führenden Ausführungspfade statisch be- stimmbar ist. In diesem Fall, der den überwiegenden Teil der durch Jump Threading optimierten Sprünge betrifft, muss die Optimierung Grundblöcke duplizieren, um jene Ausführungspfade zu isolieren. Verschiedene aktuelle Compiler enthalten sehr unterschiedliche Implementierungen von Jump Threading. In dieser Masterarbeit wird zunächst ein theoretischer Rahmen für Jump Threading vorgestellt. Sodann wird eine allgemeine Fassung eines Jump- Threading-Algorithmus entwickelt, implementiert und in diverser Hinsicht untersucht, insbesondere auf Wechselwirkungen mit anderen Optimierungen wie If
    [Show full text]
  • Improving the Compilation Process Using Program Annotations
    POLITECNICO DI MILANO Corso di Laurea Magistrale in Ingegneria Informatica Dipartimento di Elettronica, Informazione e Bioingegneria Improving the Compilation process using Program Annotations Relatore: Prof. Giovanni Agosta Correlatore: Prof. Lenore Zuck Tesi di Laurea di: Niko Zarzani, matricola 783452 Anno Accademico 2012-2013 Alla mia famiglia Alla mia ragazza Ai miei amici Ringraziamenti Ringrazio in primis i miei relatori, Prof. Giovanni Agosta e Prof. Lenore Zuck, per la loro disponibilità, i loro preziosi consigli e il loro sostegno. Grazie per avermi seguito sia nel corso della tesi che della mia carriera universitaria. Ringrazio poi tutti coloro con cui ho avuto modo di confrontarmi durante la mia ricerca, Dr. Venkat N. Venkatakrishnan, Dr. Rigel Gjomemo, Dr. Phu H. H. Phung e Giacomo Tagliabure, che mi sono stati accanto sin dall’inizio del mio percorso di tesi. Voglio ringraziare con tutto il cuore la mia famiglia per il prezioso sup- porto in questi anni di studi e Camilla per tutto l’amore che mi ha dato anche nei momenti più critici di questo percorso. Non avrei potuto su- perare questa avventura senza voi al mio fianco. Ringrazio le mie amiche e i miei amici più cari Ilaria, Carolina, Elisa, Riccardo e Marco per la nostra speciale amicizia a distanza e tutte le risate fatte assieme. Infine tutti i miei conquilini, dai più ai meno nerd, per i bei momenti pas- sati assieme. Ricorderò per sempre questi ultimi anni come un’esperienza stupenda che avete reso memorabile. Mi mancherete tutti. Contents 1 Introduction 1 2 Background 3 2.1 Annotated Code . .3 2.2 Sources of annotated code .
    [Show full text]
  • Control Flow Graphs
    CONTROL FLOW GRAPHS PROGRAM ANALYSIS AND OPTIMIZATION – DCC888 Fernando Magno Quintão Pereira [email protected] The material in these slides have been taken from the "Dragon Book", Secs 8.5 – 8.7, by Aho et al. Intermediate Program Representations • Optimizing compilers and human beings do not see the program in the same way. – We are more interested in source code. – But, source code is too different from machine code. – Besides, from an engineering point of view, it is better to have a common way to represent programs in different languages, and target different architectures. Fortran PowerPC COBOL x86 Front Back Optimizer Lisp End End ARM … … Basic Blocks and Flow Graphs • Usually compilers represent programs as control flow graphs (CFG). • A control flow graph is a directed graph. – Nodes are basic blocks. – There is an edge from basic block B1 to basic block B2 if program execution can flow from B1 to B2. • Before defining basic void identity(int** a, int N) { int i, j; block, we will illustrate for (i = 0; i < N; i++) { this notion by showing the for (j = 0; j < N; j++) { a[i][j] = 0; CFG of the function on the } right. } for (i = 0; i < N; i++) { What does a[i][i] = 1; this program } do? } The Low Level Virtual Machine • We will be working with a compilation framework called The Low Level Virtual Machine, or LLVM, for short. • LLVM is today the most used compiler in research. • Additionally, this compiler is used in many important companies: Apple, Cray, Google, etc. The front-end Machine independent Machine dependent that parses C optimizations, such as optimizations, such into bytecodes constant propagation as register allocation ../0 %"&'( *+, ""% !"#$% !"#$)% !"#$)% !"#$- Using LLVM to visualize a CFG • We can use the opt tool, the LLVM machine independent optimizer, to visualize the control flow graph of a given function $> clang -c -emit-llvm identity.c -o identity.bc $> opt –view-cfg identity.bc • We will be able to see the CFG of our target program, as long as we have the tool DOT installed in our system.
    [Show full text]
  • GCC Internals
    GCC Internals Diego Novillo [email protected] Red Hat Canada CGO 2007 San Jose, California March 2007 Outline 1. Overview 2. Source code organization 3. Internal architecture 4. Passes NOTE: Internal information valid for GCC mainline as of 2007-03-02 11 March 2007 GCC Internals - 2 1. Overview ➢ Major features ➢ Brief history ➢ Development model 11 March 2007 GCC Internals - 3 Major Features Availability – Free software (GPL) – Open and distributed development process – System compiler for popular UNIX variants – Large number of platforms (deeply embedded to big iron) – Supports all major languages: C, C++, Java, Fortran 95, Ada, Objective-C, Objective-C++, etc 11 March 2007 GCC Internals - 4 Major Features Code quality – Bootstraps on native platforms – Warning-free – Extensive regression testsuite – Widely deployed in industrial and research projects – Merit-based maintainership appointed by steering committee – Peer review by maintainers – Strict coding standards and patch reversion policy 11 March 2007 GCC Internals - 5 Major Features Analysis/Optimization – SSA-based high-level global optimizer – Constraint-based points-to alias analysis – Data dependency analysis based on chains of recurrences – Feedback directed optimization – Interprocedural optimization – Automatic pointer checking instrumentation – Automatic loop vectorization – OpenMP support 11 March 2007 GCC Internals - 6 1. Overview ➢ Major features ➢ Brief history ➢ Development model 11 March 2007 GCC Internals - 7 Brief History GCC 1 (1987) – Inspired on Pastel compiler
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • GCC Internals
    GCC Internals Diego Novillo [email protected] November 2007 Outline 1. Overview Features, history, development model 2. Source code organization Files, building, patch submission 3. Internal architecture Pipeline, representations, data structures, alias analysis, data flow, code generation 4. Passes Adding, debugging Internal information valid for GCC mainline as of 2007-11-20 November 27, 2007 GCC Internals - 2 1. Overview ➢ Major features ➢ Brief history ➢ Development model November 27, 2007 GCC Internals - 3 Major Features Availability – Free software (GPL) – Open and distributed development process – System compiler for popular UNIX variants – Large number of platforms (deeply embedded to big iron) – Supports all major languages: C, C++, Java, Fortran 95, Ada, Objective-C, Objective-C++, etc November 27, 2007 GCC Internals - 4 Major Features Code quality – Bootstraps on native platforms – Warning-free – Extensive regression testsuite – Widely deployed in industrial and research projects – Merit-based maintainership appointed by steering committee – Peer review by maintainers – Strict coding standards and patch reversion policy November 27, 2007 GCC Internals - 5 Major Features Analysis/Optimization – SSA-based high-level global optimizer – Constraint-based points-to alias analysis – Data dependency analysis based on chains of recurrences – Feedback directed optimization – Inter-procedural optimization – Automatic pointer checking instrumentation – Automatic loop vectorization – OpenMP support November 27, 2007 GCC Internals - 6 Overview
    [Show full text]
  • Simple and Efficient SSA Construction
    Simple and Efficient Construction of Static Single Assignment Form Matthias Braun1, Sebastian Buchwald1, Sebastian Hack2, Roland Leißa2, Christoph Mallon2, and Andreas Zwinkau1 1Karlsruhe Institute of Technology, {matthias.braun,buchwald,zwinkau}@kit.edu 2Saarland University {hack,leissa,mallon}@cs.uni-saarland.de Abstract. We present a simple SSA construction algorithm, which al- lows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate rep- resentation is in SSA form. This allows the application of SSA-based op- timizations during construction. After completion, the intermediate rep- resentation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm. 1 Introduction Many modern compilers feature intermediate representations (IR) based on the static single assignment form (SSA form). SSA was conceived to make program analyses more efficient by compactly representing use-def chains. Over the last years, it turned out that the SSA form not only helps to make analyses more efficient but also easier to implement, test, and debug. Thus, modern compilers such as the Java HotSpot VM [14], LLVM [2], and libFirm [1] exclusively base their intermediate representation on the SSA form. The first algorithm to efficiently construct the SSA form was introduced by Cytron et al. [10]. One reason, why this algorithm still is very popular, is that it guarantees a form of minimality on the number of placed φ functions. However, for compilers that are entirely SSA-based, this algorithm has a significant draw- back: Its input program has to be represented as a control flow graph (CFG) in non-SSA form.
    [Show full text]
  • A Case for an SC-Preserving Compiler
    A Case for an SC-Preserving Compiler Daniel Marino† Abhayendra Singh∗ Todd Millstein† Madanlal Musuvathi‡ Satish Narayanasamy∗ †University of California, Los Angeles ∗University of Michigan, Ann Arbor ‡Microsoft Research, Redmond Abstract 1. Introduction The most intuitive memory consistency model for shared-memory A memory consistency model (or simply memory model) defines the multi-threaded programming is sequential consistency (SC). How- semantics of a concurrent programming language by specifying the ever, current concurrent programming languages support a re- order in which memory operations performed by one thread become laxed model, as such relaxations are deemed necessary for en- visible to other threads in the program. The most natural memory abling important optimizations. This paper demonstrates that an model is sequential consistency (SC) [31]. Under SC, the individual SC-preserving compiler, one that ensures that every SC behavior of operations of the program appear to have executed in a global a compiler-generated binary is an SC behavior of the source pro- sequential order consistent with the per-thread program order. This gram, retains most of the performance benefits of an optimizing semantics matches the intuition of a concurrent program’s behavior compiler. The key observation is that a large class of optimizations as a set of possible thread interleavings. crucial for performance are either already SC-preserving or can be It is commonly accepted that programming languages must relax modified to preserve SC while retaining much of their effectiveness. the SC semantics of programs in order to allow effective compiler An SC-preserving compiler, obtained by restricting the optimiza- optimizations. This paper challenges that assumption by demon- tion phases in LLVM, a state-of-the-art C/C++ compiler, incurs an strating an optimizing compiler that retains most of the performance average slowdown of 3.8% and a maximum slowdown of 34% on of the generated code while preserving the SC semantics.
    [Show full text]
  • The University of Chicago a Compiler-Based Online
    THE UNIVERSITY OF CHICAGO A COMPILER-BASED ONLINE ADAPTIVE OPTIMIZER A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE BY KAVON FAR FARVARDIN CHICAGO, ILLINOIS DECEMBER 2020 Copyright © 2020 by Kavon Far Farvardin Permission is hereby granted to copy, distribute and/or modify this document under the terms of the Creative Commons Attribution 4.0 International license. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. To my cats Jiji and Pumpkin. Table of Contents List of Figures vii List of Tables ix Acknowledgments x Abstract xi I Introduction 1 1 Motivation 2 1.1 Adaptation . .4 1.2 Trial and Error . .6 1.3 Goals . .8 2 Background 11 2.1 Terminology . 12 2.2 Profile-guided Compilation . 14 2.3 Dynamic Compilation . 15 2.4 Autotuning . 17 2.5 Finding Balance . 19 3 Related Work 21 3.1 By Similarity . 21 3.1.1 Active Harmony . 22 3.1.2 Kistler’s Optimizer . 25 3.1.3 Jikes RVM . 27 3.1.4 ADAPT . 28 3.1.5 ADORE . 30 3.1.6 Suda’s Bayesian Online Autotuner . 31 3.1.7 PEAK . 32 3.1.8 PetaBricks . 33 3.2 By Philosophy . 34 iv 3.2.1 Adaptive Fortran . 34 3.2.2 Dynamic Feedback . 34 3.2.3 Dynamo . 35 3.2.4 CoCo . 36 3.2.5 MATE . 36 3.2.6 AOS . 37 3.2.7 Testarossa . 38 4 Thesis 40 II Halo: Wholly Adaptive LLVM Optimizer 43 5 System Overview 44 5.1 Clang .
    [Show full text]
  • Compiler Optimization Effects on Register Collisions
    COMPILER OPTIMIZATION EFFECTS ON REGISTER COLLISIONS A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Jonathan Tan June 2018 c 2018 Jonathan Tan ALL RIGHTS RESERVED ii COMMITTEE MEMBERSHIP TITLE: Compiler Optimization Effects on Register Collisions AUTHOR: Jonathan Tan DATE SUBMITTED: June 2018 COMMITTEE CHAIR: Aaron Keen, Ph.D. Professor of Computer Science COMMITTEE MEMBER: Theresa Migler, Ph.D. Professor of Computer Science COMMITTEE MEMBER: John Seng, Ph.D. Professor of Computer Science iii ABSTRACT Compiler Optimization Effects on Register Collisions Jonathan Tan We often want a compiler to generate executable code that runs as fast as possible. One consideration toward this goal is to keep values in fast registers to limit the number of slower memory accesses that occur. When there are not enough physical registers available for use, values are \spilled" to the runtime stack. The need for spills is discovered during register allocation wherein values in use are mapped to physical registers. One factor in the efficacy of register allocation is the number of values in use at one time (register collisions). Register collision is affected by compiler optimizations that take place before register allocation. Though the main purpose of compiler optimizations is to make the overall code better and faster, some optimiza- tions can actually increase register collisions. This may force the register allocation process to spill. This thesis studies the effects of different compiler optimizations on register collisions. iv ACKNOWLEDGMENTS Thanks to: • My advisor, Aaron Keen, for guiding, encouraging, and mentoring me on the thesis and classes.
    [Show full text]
  • Simple and Efficient Construction of Static Single Assignment Form
    Simple and Efficient Construction of Static Single Assignment Form Matthias Braun1, Sebastian Buchwald1, Sebastian Hack2, Roland Leißa2, Christoph Mallon2, and Andreas Zwinkau1 1 Karlsruhe Institute of Technology {matthias.braun,buchwald,zwinkau}@kit.edu 2 Saarland University {hack,leissa,mallon}@cs.uni-saarland.de Abstract. We present a simple SSA construction algorithm, which al- lows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate rep- resentation is in SSA form. This allows the application of SSA-based opti- mizations during construction. After completion, the intermediate representation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm. 1 Introduction Many modern compilers feature intermediate representations (IR) based on the static single assignment form (SSA form). SSA was conceived to make program analyses more efficient by compactly representing use-def chains. Over the last years, it turned out that the SSA form not only helps to make analyses more efficient but also easier to implement, test, and debug. Thus, modern compilers such as the Java HotSpot VM [14], LLVM [2], and libFirm [1] exclusively base their intermediate representation on the SSA form. The first algorithm to efficiently construct the SSA form was introduced by Cytron et al. [10]. One reason, why this algorithm still is very popular, is that it guarantees a form of minimality on the number of placed φ functions. However, for compilers that are entirely SSA-based, this algorithm has a significant draw- back: Its input program has to be represented as a control flow graph (CFG) in non-SSA form.
    [Show full text]
  • GCC—An Architectural Overview, Current Status, and Future Directions
    GCC—An Architectural Overview, Current Status, and Future Directions Diego Novillo Red Hat Canada [email protected] Abstract GCC one of the most popular compilers in use today. It serves as the system compiler for every Linux distribution and it is also fairly The GNU Compiler Collection (GCC) is one popular in academic circles, where it is used of the most popular compilers available and it for compiler research. Despite this popular- is the de facto system compiler for Linux sys- ity, GCC has traditionally proven difficult to tems. Despite its popularity, the internal work- maintain and enhance, to the extreme that some ings of GCC are relatively unknown outside of features were almost impossible to implement. the immediate developer community. And as a result GCC was starting to lag behind This paper provides a high-level architectural the competition. overview of GCC, its internal modules, and Part of the problem is the size of its code base. how it manages to support a wide variety of While GCC is not huge by industry standards, hardware architectures and languages. Special it still is a fairly large project, with a core emphasis is placed on high-level descriptions of about 1.3 MLOC. Including the runtime li- of the different modules to provide a roadmap braries needed for all the language support, to GCC. GCC comes to about 2.2 MLOC.1 Finally, the paper also describes recent techno- logical improvements that have been added to Size is not the only hurdle presented by GCC. GCC and discusses some of the major features Compilers are inherently complex and very de- that the developer community is thinking for manding in terms of the theoretical knowl- future versions of the compiler.
    [Show full text]