Fighting Register Pressure in GCC

Total Page:16

File Type:pdf, Size:1020Kb

Fighting Register Pressure in GCC Fighting register pressure in GCC Vladimir N. Makarov Red Hat [email protected] Abstract is infinite number of virtual registers called pseudo-registers. The optimizations use them The necessity of decreasing register pressure to store intermediate values and values of small in compilers is discussed. Various approaches variables. Although there is an untraditional to decreasing register pressure in compilers are approach to use only memory to store the val- given, including different algorithms of regis- ues. For both approaches we need a special ter live range splitting, register rematerializa- pass (or optimization) to map pseudo-registers tion, and register pressure sensitive instruction onto hard registers and memory; for the second scheduling before register allocation. approach we need to map memory into hard- registers instead of memory because most in- Some of the mentioned algorithms were tried structions work with hard-registers. This pass and rejected. Implementation of the rest, in- is called register allocation. cluding region based register live range split- ting and rematerialization driven by the regis- A good register allocator becomes a very ter allocator, is in progress and probably will significant component of an optimized com- be part of GCC. The effects of discussed op- piler nowadays because the gap between ac- timizations will be reported. The possible di- cess times to registers and to first level mem- rections of improving the register allocation in ory (cache) widens for the high-end proces- GCC will be given. sors. Many optimizations (especially inter- procedural and SSA-based ones) tend to cre- ate lots of pseudo-registers. The number of Introduction hard-registers is the same because it is a part of architecture. Even processors with new archi- Modern computers have several levels of stor- tectures containing more hard-registers need a age. The faster the storage, the smaller its size. good register allocator (although in less de- This is the consequence of a trade-off between gree) because the programs run on these com- the computer speed and its price. The fastest puters tend to be more complicated too. storage units are registers (or hard registers). The register allocator is one or more compiler They are not enough to store the values of op- components that could be considered as ones erations and directly referred variables for any solving two major tasks (mostly in an inte- serious program. grated way). The first and most interesting one It is very hard to force any optimization in a is to decrease register pressure to the level de- compiler (especially in a portable one) to use fined by the number of hard registers by differ- the hard registers effectively. Therefore most ent transformations. And the second one is to of compiler optimizations is written as if there assign hard registers to pseudo-registers effec- 86 • GCC Developers’ Summit tively. Despite its lack of many modern optimizations present in the new register allocator, the orig- So what is register pressure? There are two inal register allocator can easily compete with commonly used definitions. The wide one is the new one in terms of compiler speed, qual- the number of hard registers needed to store ity of the generated code and size of code. This values of the pseudo-registers at given program was a major reason for me to start work on im- point. Another one is the number of living proving the original register allocator. After pseudo-registers. thorough investigation, I found that the method There are a lot of known transformations that of assigning hard registers is very similar to decrease register pressure. Some of these the priority based colouring register allocator transformations generate code which could and [Chow84, Chow90], although it is more similar should be corrected later. Some transforma- to the modifications described in [Sorkin96]. It tions are easily and naturally integrated with was confirmed later. other transformations, such as the ones de- Chow’s approach is a real competitor to the creasing register pressure, assigning hard reg- Chaitin/Briggs approach. Some advantages isters, and fixing the pitfalls of the previous of Chow’s approach are acknowledged even transformations (such as register coalescing in by Preston Briggs [Briggs89]. Chow’s algo- a colouring based register allocator). Some of rithm is used in SGI Pro64 [Pro64] compiler them are hard to integrate in one pass. and derived compilers like Open64 [Open64] Currently GCC has two register allocators. The and ORC [ORC]. For example, as Briggs’ new one was written about two years ago optimistic colouring, Chow’s algorithm easily and is described in details in [Matz03]. It finds hard-registers for the diamond conflict is based on the Chaitin, Briggs, and Appel graph (see Figure 1). approaches to register allocation [Chaitin81, Briggs94, Appel96]. a The old register allocator (I will call it the orig- inal register allocator) has been existing since b c the very first version of GCC. It was written by Richard Stallman. Some of its important com- d ponents stayed practically unchanged since the first version. Richard Stallman took the regis- ter allocator design from a portable Pastel (an Figure 1: Diamond graph extension of the programming language Pas- cal) compiler written in Livermore Laborato- All that was mentioned above was a major mo- ries [Stallman04]. The design of the Pastel tivation to start work on improvement of the register allocator (which actually was a second original register allocator. This article is fo- version for the Pastel compiler) is very similar cused on improving the original GCC register to the GCC one [Killian04]—they both have allocator. The first section describes the orig- the same separation on a pass assigning hard inal GCC register allocator. The second sec- registers to pseudo-registers and a pass which tion describes the method for decreasing the actually changes the code following the assign- register pressure for the original register al- ment and, if it is not possible, generates addi- locator based on register live range splitting. tional instructions to reload the registers. The third section describes decreasing regis- GCC Developers’ Summit 2004 • 87 ter pressure based on the live range shrinking reload pass can solve this task too but in a approach. The fourth section describes other less effective manner. possible improvements to the original register If register coalescing and global value allocator. The fifth section gives conclusions numbering (mentioned in Section 4) are a from my work. part of GCC, we could try to remove reg- ister coalescing from this pass. 1 The original register allocator in The instruction scheduler is not a part of the GCC original register allocator. It is present just to show GCC’s major passes starting The original register allocator contains a lot of with the regmove pass. Although the in- passes. Figure 2 describes the major passes and struction scheduler could solve the task their order. of decreasing register pressure (see sec- tion “register pressure sensitive instruc- tion scheduling”). regmove (regmove.c) Regclass. GCC has a very powerful model for describing the target processor’s register insn scheduler file. In this model there is the notion of register class. The register class is a set of regclass (regclass.c) hard registers. You can describe as many register classes as possible. Of course, local global they should reflect the target processor’s allocator allocator register file. For example, some instruc- (local-alloc.c) (global.c) retry_global tions can accept only a subset of all reg- isters. In this case you should define a post-reload reload (postreload.c) (reload1.c, reload.c) register class for the subset. Any rela- tions are possible between different regis- ter classes: they can intersect or one regis- ter class can be a subset of another register Figure 2: The original register allocator class (there are reserved register classes like NO_REGS which does not contain any register or ALL_REGS which con- The regmove pass is usually not considered tains all registers). to be a part of the original register al- The pass regclass (file regclass.c) locator. I included it because the pass mainly finds the preferred and alternative solves one task (register coalescing) pe- register classes for each pseudo-register. culiar to register allocators. The pass re- The preferred class is the smallest class moves some register moves if the registers containing the union of all register classes have the same value and it can be found in which result in the minimal cost of their a basic block scope. Although the major usage for the given pseudo-register. The task of regmove is to generate move in- alternative class is the smallest class con- structions to satisfy two operand instruc- taining the union of all register classes, tion constraints when the destination and the usage of which is still more profitable source registers should be the same. The than memory (the class NO_REGS is used 88 • GCC Developers’ Summit for the alternative if there are no such reg- The global allocator also tries to do this isters besides the ones in the preferred in a less general way. The local alloca- class). tor also performs a simple copy and con- It is interesting to note that the pass also stant propagation. It is implemented in the update_reg_equiv implicitly does code selection. Regclass function . works in two passes. On the first pass, Actually all hard-registers could be as- it defines the preferred and alternative signed in the global allocator. Such di- classes without taking the possible classes vision between the local and global allo- of other operands into account. For ex- cator has historical roots. In my opinion ample, an instruction with two operand it is reasonable to remove the local al- pseudo-registers exists in two variants; locator in the future because faster allo- one accepting classes A and B, and other cation of local pseudo-registers does not one accepting C and D.
Recommended publications
  • Expression Rematerialization for VLIW DSP Processors with Distributed Register Files ?
    Expression Rematerialization for VLIW DSP Processors with Distributed Register Files ? Chung-Ju Wu, Chia-Han Lu, and Jenq-Kuen Lee Department of Computer Science, National Tsing-Hua University, Hsinchu 30013, Taiwan {jasonwu,chlu}@pllab.cs.nthu.edu.tw,[email protected] Abstract. Spill code is the overhead of memory load/store behavior if the available registers are not sufficient to map live ranges during the process of register allocation. Previously, works have been proposed to reduce spill code for the unified register file. For reducing power and cost in design of VLIW DSP processors, distributed register files and multi- bank register architectures are being adopted to eliminate the amount of read/write ports between functional units and registers. This presents new challenges for devising compiler optimization schemes for such ar- chitectures. This paper aims at addressing the issues of reducing spill code via rematerialization for a VLIW DSP processor with distributed register files. Rematerialization is a strategy for register allocator to de- termine if it is cheaper to recompute the value than to use memory load/store. In the paper, we propose a solution to exploit the character- istics of distributed register files where there is the chance to balance or split live ranges. By heuristically estimating register pressure for each register file, we are going to treat them as optional spilled locations rather than spilling to memory. The choice of spilled location might pre- serve an expression result and keep the value alive in different register file. It increases the possibility to do expression rematerialization which is effectively able to reduce spill code.
    [Show full text]
  • A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco Van Amstel
    A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel To cite this version: Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel. A Tiling Perspective for Register Op- timization. [Research Report] RR-8541, Inria. 2014, pp.24. hal-00998915 HAL Id: hal-00998915 https://hal.inria.fr/hal-00998915 Submitted on 3 Jun 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Tiling Perspective for Register Optimization Łukasz Domagała, Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel RESEARCH REPORT N° 8541 May 2014 Project-Teams GCG ISSN 0249-6399 ISRN INRIA/RR--8541--FR+ENG A Tiling Perspective for Register Optimization Lukasz Domaga la∗, Fabrice Rastello†, Sadayappan Ponnuswany‡, Duco van Amstel§ Project-Teams GCG Research Report n° 8541 — May 2014 — 21 pages Abstract: Register allocation is a much studied problem. A particularly important context for optimizing register allocation is within loops, since a significant fraction of the execution time of programs is often inside loop code. A variety of algorithms have been proposed in the past for register allocation, but the complexity of the problem has resulted in a decoupling of several important aspects, including loop unrolling, register promotion, and instruction reordering.
    [Show full text]
  • Static Instruction Scheduling for High Performance on Limited Hardware
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2017.2769641, IEEE Transactions on Computers 1 Static Instruction Scheduling for High Performance on Limited Hardware Kim-Anh Tran, Trevor E. Carlson, Konstantinos Koukos, Magnus Själander, Vasileios Spiliopoulos Stefanos Kaxiras, Alexandra Jimborean Abstract—Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14% for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications’ high-performance. Index Terms—Compilers, code generation, memory management, optimization ✦ 1 INTRODUCTION result in a sub-optimal utilization of the limited OoO engine Computer architects of the past have steadily improved that may stall the core for an extended period of time.
    [Show full text]
  • Validating Register Allocation and Spilling
    Validating Register Allocation and Spilling Silvain Rideau1 and Xavier Leroy2 1 Ecole´ Normale Sup´erieure,45 rue d'Ulm, 75005 Paris, France [email protected] 2 INRIA Paris-Rocquencourt, BP 105, 78153 Le Chesnay, France [email protected] Abstract. Following the translation validation approach to high- assurance compilation, we describe a new algorithm for validating a posteriori the results of a run of register allocation. The algorithm is based on backward dataflow inference of equations between variables, registers and stack locations, and can cope with sophisticated forms of spilling and live range splitting, as well as many architectural irregular- ities such as overlapping registers. The soundness of the algorithm was mechanically proved using the Coq proof assistant. 1 Introduction To generate fast and compact machine code, it is crucial to make effective use of the limited number of registers provided by hardware architectures. Register allocation and its accompanying code transformations (spilling, reloading, coa- lescing, live range splitting, rematerialization, etc) therefore play a prominent role in optimizing compilers. As in the case of any advanced compiler pass, mistakes sometimes happen in the design or implementation of register allocators, possibly causing incorrect machine code to be generated from a correct source program. Such compiler- introduced bugs are uncommon but especially difficult to exhibit and track down. In the context of safety-critical software, they can also invalidate all the safety guarantees obtained by formal verification of the source code, which is a growing concern in the formal methods world. There exist two major approaches to rule out incorrect compilations. Com- piler verification proves, once and for all, the correctness of a compiler or compi- lation pass, preferably using mechanical assistance (proof assistants) to conduct the proof.
    [Show full text]
  • Compiler-Based Code-Improvement Techniques
    Compiler-Based Code-Improvement Techniques KEITH D. COOPER, KATHRYN S. MCKINLEY, and LINDA TORCZON Since the earliest days of compilation, code quality has been recognized as an important problem [18]. A rich literature has developed around the issue of improving code quality. This paper surveys one part of that literature: code transformations intended to improve the running time of programs on uniprocessor machines. This paper emphasizes transformations intended to improve code quality rather than analysis methods. We describe analytical techniques and specific data-flow problems to the extent that they are necessary to understand the transformations. Other papers provide excellent summaries of the various sub-fields of program analysis. The paper is structured around a simple taxonomy that classifies transformations based on how they change the code. The taxonomy is populated with example transformations drawn from the literature. Each transformation is described at a depth that facilitates broad understanding; detailed references are provided for deeper study of individual transformations. The taxonomy provides the reader with a framework for thinking about code-improving transformations. It also serves as an organizing principle for the paper. Copyright 1998, all rights reserved. You may copy this article for your personal use in Comp 512. Further reproduction or distribution requires written permission from the authors. 1INTRODUCTION This paper presents an overview of compiler-based methods for improving the run-time behavior of programs — often mislabeled code optimization. These techniques have a long history in the literature. For example, Backus makes it quite clear that code quality was a major concern to the implementors of the first Fortran compilers [18].
    [Show full text]
  • Code Scheduling
    Register Allocation (wrapup) & Code Scheduling CS2210 Lecture 22 CS2210 Compiler Design 2004/5 Constructing and Representing the Interference Graph ■ Construction alternatives: ■ as side effect of live variables analysis (when variables are units of allocation) ■ compute du-chains & webs (or ssa form), do live variables analysis, compute interference graph ■ Representation ■ adjacency matrix: A[min(i,j), max(i,j)] = true iff (symbolic) register i adjacent to j (interferes) CS2210 Compiler Design 2004/5 Adjacency List ■ Used in actual allocation ■ A[i] lists nodes adjacent to node i ■ Components ■ color ■ disp: displacement (in stack for spilling) ■ spcost: spill cost (initialized 0 for symbolic, infinity for real registers) ■ nints: number of interferences ■ adjnds: list of real and symbolic registers currently interfere with i ■ rmvadj: list of real and symbolic registers that interfered withi bu have been removed CS2210 Compiler Design 2004/5 1 Register Coalescing ■ Goal: avoid unnecessary register to register copies by coalescing register ■ ensure that values are in proper argument registers before procedure calls ■ remove unnecessary copies introduced by code generation from SSA form ■ enforce source / target register constraints of certain instructions (important for CISC) ■ Approach: ■ search for copies si := sj where si and sj do not interfere (may be real or symbolic register copies) CS2210 Compiler Design 2004/5 Computing Spill Costs ■ Have to spill values to memory when not enough registers can be found (can’t find k- coloring)
    [Show full text]
  • Redundant Computation Elimination Optimizations
    Redundant Computation Elimination Optimizations CS2210 Lecture 20 CS2210 Compiler Design 2004/5 Redundancy Elimination ■ Several categories: ■ Value Numbering ■ local & global ■ Common subexpression elimination (CSE) ■ local & global ■ Loop-invariant code motion ■ Partial redundancy elimination ■ Most complex ■ Subsumes CSE & loop-invariant code motion CS2210 Compiler Design 2004/5 Value Numbering ■ Goal: identify expressions that have same value ■ Approach: hash expression to a hash code ■ Then have to compare only expressions that hash to same value CS2210 Compiler Design 2004/5 1 Example a := x | y b := x | y t1:= !z if t1 goto L1 x := !z c := x & y t2 := x & y if t2 trap 30 CS2210 Compiler Design 2004/5 Global Value Numbering ■ Generalization of value numbering for procedures ■ Requires code to be in SSA form ■ Crucial notion: congruence ■ Two variables are congruent if their defining computations have identical operators and congruent operands ■ Only precise in SSA form, otherwise the definition may not dominate the use CS2210 Compiler Design 2004/5 Value Graphs ■ Labeled, directed graph ■ Nodes are labeled with ■ Operators ■ Function symbols ■ Constants ■ Edges point from operator (or function) to its operands ■ Number labels indicate operand position CS2210 Compiler Design 2004/5 2 Example entry receive n1(val) i := 1 1 B1 j1 := 1 i3 := φ2(i1,i2) B2 j3 := φ2(j1,j2) i3 mod 2 = 0 i := i +1 i := i +3 B3 4 3 5 3 B4 j4:= j3+1 j5:= j3+3 i2 := φ5(i4,i5) j3 := φ5(j4,j5) B5 j2 > n1 exit CS2210 Compiler Design 2004/5 Congruence Definition ■ Maximal relation on the value graph ■ Two nodes are congruent iff ■ They are the same node, or ■ Their labels are constants and the constants are the same, or ■ They have the same operators and their operands are congruent ■ Variable equivalence := x and y are equivalent at program point p iff they are congruent and their defining assignments dominate p CS2210 Compiler Design 2004/5 Global Value Numbering Algo.
    [Show full text]
  • Integrated Code Motion and Register Allocation
    Integrated Code Motion and Register Allocation DISSERTATION submitted in partial fulfillment of the requirements for the degree of Doktor der Technischen Wissenschaften by Gergö Barany Registration Number 0026139 to the Faculty of Informatics at the Vienna University of Technology Advisor: ao. Univ. Prof. Dipl.-Ing. Dr. Andreas Krall The dissertation has been reviewed by: Andreas Krall Alain Darte Vienna, 14th January, 2015 Gergö Barany Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Acknowledgements Many people were very patient with me over the last few years. Thanks. iii Abstract Code motion and register allocation are important but fundamentally conflicting program transformations in optimizing compiler back-ends. Global code motion aims to place instructions in less frequently executed basic blocks, while instruction scheduling within blocks or regions (all subsumed here under ‘code motion’) arranges instructions such that independent computations can be performed in parallel. These optimizations tend to increase the distances between the definitions and uses of values, leading to more overlaps of live ranges. In general, more overlaps lead to higher register pressure and insertion of more expensive register spill and reload instructions in the program. Eager code motion performed before register allocation can thus lead to an overall performance decrease. On the other hand, register allocation before code motion will assign unrelated values to the same physical register, introducing false dependences between computations. These dependences block opportunities for code motion that may have been legal before assignment of registers. This is an instance of the phase ordering problem: Neither ordering of these phases of code generation provides optimal results.
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • Survey on Combinatorial Register Allocation and Instruction Scheduling ACM Computing Surveys
    http://www.diva-portal.org Postprint This is the accepted version of a paper published in ACM Computing Surveys. This paper has been peer-reviewed but does not include the final publisher proof-corrections or journal pagination. Citation for the original published paper (version of record): Castañeda Lozano, R., Schulte, C. (2018) Survey on Combinatorial Register Allocation and Instruction Scheduling ACM Computing Surveys Access to the published version may require subscription. N.B. When citing this work, cite the original published paper. Permanent link to this version: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232189 Survey on Combinatorial Register Allocation and Instruction Scheduling ROBERTO CASTAÑEDA LOZANO, RISE SICS, Sweden and KTH Royal Institute of Technology, Sweden CHRISTIAN SCHULTE, KTH Royal Institute of Technology, Sweden and RISE SICS, Sweden Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization ap- proaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization.
    [Show full text]
  • Optimal and Heuristic Global Code Motion for Minimal Spilling
    Optimal and Heuristic Global Code Motion for Minimal Spilling Gergö Barany and Andreas Krall? {gergo,andi}@complang.tuwien.ac.at Vienna University of Technology Abstract. The interaction of register allocation and instruction schedul- ing is a well-studied problem: Certain ways of arranging instructions within basic blocks reduce overlaps of live ranges, leading to the inser- tion of less costly spill code. However, there is little previous research on the extension of this problem to global code motion, i .e., the motion of instructions between blocks. We present an algorithm that models global code motion as an optimization problem with the goal of mini- mizing overlaps between live ranges in order to minimize spill code. Our approach analyzes the program to identify the live range overlaps for all possible placements of instructions in basic blocks and all orderings of instructions within blocks. Using this information, we formulate an opti- mization problem to determine code motions and partial local schedules that minimize the overall cost of live range overlaps. We evaluate so- lutions of this optimization problem using integer linear programming, where feasible, and a simple greedy heuristic. We conclude that global code motion with the sole goal of avoiding spills rarely leads to performance improvements because code is placed too con- servatively. On the other hand, purely local optimal instruction schedul- ing for minimal spilling is eective at improving performance when com- pared to a heuristic scheduler for minimal register use. 1 Introduction In an optimizing compiler's backend, various code generation passes apply code transformations with dierent, conicting goals in mind.
    [Show full text]
  • Simple and Efficient SSA Construction
    Simple and Efficient Construction of Static Single Assignment Form Matthias Braun1, Sebastian Buchwald1, Sebastian Hack2, Roland Leißa2, Christoph Mallon2, and Andreas Zwinkau1 1Karlsruhe Institute of Technology, {matthias.braun,buchwald,zwinkau}@kit.edu 2Saarland University {hack,leissa,mallon}@cs.uni-saarland.de Abstract. We present a simple SSA construction algorithm, which al- lows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate rep- resentation is in SSA form. This allows the application of SSA-based op- timizations during construction. After completion, the intermediate rep- resentation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm. 1 Introduction Many modern compilers feature intermediate representations (IR) based on the static single assignment form (SSA form). SSA was conceived to make program analyses more efficient by compactly representing use-def chains. Over the last years, it turned out that the SSA form not only helps to make analyses more efficient but also easier to implement, test, and debug. Thus, modern compilers such as the Java HotSpot VM [14], LLVM [2], and libFirm [1] exclusively base their intermediate representation on the SSA form. The first algorithm to efficiently construct the SSA form was introduced by Cytron et al. [10]. One reason, why this algorithm still is very popular, is that it guarantees a form of minimality on the number of placed φ functions. However, for compilers that are entirely SSA-based, this algorithm has a significant draw- back: Its input program has to be represented as a control flow graph (CFG) in non-SSA form.
    [Show full text]