Compiler Transformations for High-Performance Computing

Total Page:16

File Type:pdf, Size:1020Kb

Compiler Transformations for High-Performance Computing Compiler Transformations for High-Performance Computing DAVID F. BACON, SUSAN L. GRAHAM, AND OLIVER J. SHARP Computer Sc~ence Diviszon, Unwersbty of California, Berkeley, California 94720 In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimization for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimization for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis. This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each traneformation, explain how to determine if it is legal, and give an example of its application. Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimization that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as a bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques. Categories and Subject Descriptors: D.1.3 [Programming Techniques]: Concurrent Programming; D.3.4 [Programming Languages]: Processors—compilers; optimization; 1.2.2 [Artificial Intelligence]: Automatic Programming-program transformation General Terms: Languages, Performance Additional Key Words and Phrases: Compilation, dependence analysis, locality, multiprocessors, optimization, parallelism, superscalar processors, vectorization INTRODUCTION As optimizing compilers become more Optimizing compilers have become an es- effective, programmers can become less sential component of modern high-perfor- concerned about the details of the under- mance computer systems. In addition to lying machine architecture and can em- translating the input program into ma- ploy higher-level, more succinct, and chine language, they analyze it and ap- more intuitive programming constructs ply various transformations to reduce its and program organizations. Simultane- running time or its size. ously, hardware designers are able to This research has been sponsored in part by the Defense Advanced Research Projects Agency (DARPA) under contract DABT63-92-C-O026, by NSF grant CDA-8722788, and by an IBM Resident Study Program Fellowship to David Bacon, Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and]or specific permission. @ 01994 0360-0300/94/1200-0345 $03.50 ACM Computing Surveys, Vol. 26, No. 4, December 1994 346 * David F. Bacon et al. CONTENTS imperative languages such as Fortran and C for high-performance architec- tures, including superscalar, vector, and various classes of multiprocessor INTRODUCTION machines. 1. SOURCE LANGUAGE Most of the transformations we de- 2 TRANSFORMATION ISSUES scribe can be applied to anyofthe Algol- 2 1 Correctness 22 Scope family languages; many are applicable to 3 TARGET ARCHITECTURES functional, logic, distributed, and object- 31 Characterizing Performance oriented languages as well. These other 32 Model Architectures languages raise additional optimization 4 COMPILER ORGANIZATION issues that space does not permit us 5 DEPENDENCE ANALYSIS 51 Types of Dependences to cover in this survey. The references 52 Representing Dependences include some starting points for investi- 5,3 Loop Dependence Analysis gation of optimizations for LISP and 5.4 Subscript Analysis functional languages [Appel 1992; Clark 6. TRANSFORMATIONS and Peyton-Jones 1985; Kranz et al. 61 Data-Flow-Based Loop Transformations 6.2 Loop Reordering 1986], object-oriented languages [Cham- 6.3 Loop Restructuring bers and Ungar 1989], and the set-based 6.4 Loop Replacement Transformations language SETL [Freudenberger et al. 6.5 Memory Access Transformations 1983]. 6.6 Partial Evaluation 6.7 Redundancy Ehmmatlon We have also restricted the discussion 68 Procedure Call Transformations to higher-level transformations that re- 7. TRANSFORMATIONS FOR PARALLEL quire some program analysis. Thus we MACHINES exclude peephole optimizations and most 71 Data Layout machine-level optimizations. We use the 72 Exposing Coarse-Grained Parallelism 73 Computation PartltIOnmg term optimization as shorthand for opti- 74 Communication Optimization mizing transformation. 75 SIMD Transformations Finally, because of the richness of the 76 VLIWTransformatlons topic, we have not given a detailed de- 8 TRANSFORMATION FRAMEWORKS scription of intermediate program repre- 8.1 Unified Transformation sentations and analysis techniques. 8.2 Searching the Transformation Space 9. COMPILER EVALUATION We make use of a number of different 9.1 Benchmarks machine models, all based on a hypothet- 9.2 Code Characterlstlcs ical superscalar processor called S-DLX. 9.3 Compiler Effectiveness The Appendix details the machine mod- 94 Instruction-Level Parallehsm CONCLUSION els, presenting a simplified architecture APPENDIX: MACHINE MODELS and instruction set that we use when we A.1 Superscalar DLX need to discuss machine code. While all A 2 Vector DLX assembly language examples are com- A3 Multiprocessors ACKNOWLEDGMENTS mented, the reader will need to refer to REFERENCES the Armendix to understand some details of the’ ~xamples (such as cycle counts). We assume a basic familiarity with ~ro~am compilation issues. Readers un- employ designs that yield greatly im- ~am~liar with program flow analysis or proved performance because they need other basic compilation techniques may only concern themselves with the suit- consult Aho et al. [1986]. ability of the designas a compiler target, not with its suitability as a direct pro- 1. SOURCE LANGUAGE grammer interface. In this survey we describe transforma- All of the high-level examples in this tions that optimize programs written in survey are written in a language similar ACM Computing Surveys, Vol. 26, No. 4, December 1994 Compiler Transformations “ 347 to Fortran 90, with minor variations and Programmers unfamiliar with Fortran extensions; most examples use only fea- should also note that all arguments are tures found in Fortran 77. We have cho- passed by reference. sen to use Fortran because it is the de When describing compilation for vector facto standard of the high-performance machines, we sometimes use array nota- engineering and scientific computing tion when the mapping to hardware reg- community. Fortran has also been the isters is clear. For instance, the loop input language for a number of research doalli=l,64 projects studying parallelization [Allen et a[i] = a[i] + c al. 1988a; Balasundaram et al. 1989; end do all Polychronopoulos et al. 1989]. It was cho- sen by these projects not only because of could be implemented with a scalar-vec- its ubiquity among the user community, tor add instruction, assuming a machine but also because its lack of pointers and with vector registers of length 64. This its static memory model make it more would be written in Fortran 90 array amenable to analysis. It is not yet clear notation as what effect the recent introduction of a[l :64] = a[l :64] + c pointers into Fortran 90 will have on optimization. or in vector machine assembly language The optimizations we have presented as are not specific to Fortran—in fact, many commercial compilers use the same in- LF F2, c(R30) ;Ioad c into register F2 termediate language and optimizer for ADDI R8, R30, #a ;Ioad addr. of a into R8 both C and Fortran. The presence of un- LV VI, R8 ;Ioad vector a[l :64] to restricted pointers in C can reduce oppor- V1 tunities for optimization because it is ADDSV VI, F2, VI ;add scalar to vector Sv VI. R8 ;store vector in a[l :64] impossible to determine which variables may be referenced by a pointer. The pro- cess of determining which references may Array notation will not be used when point to the same storage locations is loop bounds are unknown, because there called alias analysis [Banning 1979; Choi is no longer an obvious correspondence et al. 1993; Cooper and Kennedy 1989; between the source code and the fixed- Landi et al. 1993]. length vector operations that perform the The only changes to Fortran in our computation. To use vector operations, examples are that array subscripting is the compiler must perform the transfor- denoted by square brackets to distin- mation called strip-mining, which is dis- guish it from function calls; and we use cussed in Section 6.2.4. do all loops to indicate textually that all the iterations of a loop may be executed 2. TRANSFORMATION ISSUES concurrently. To make the structure of For a compiler to apply an optimization the computation more explicit, we will to a program, it must do three things: generally express loops
Recommended publications
  • CSE 582 – Compilers
    CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2011 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-1 Agenda A sampler of typical optimizing transformations Mostly a teaser for later, particularly once we’ve looked at analyzing loops 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-2 Role of Transformations Data-flow analysis discovers opportunities for code improvement Compiler must rewrite the code (IR) to realize these improvements A transformation may reveal additional opportunities for further analysis & transformation May also block opportunities by obscuring information 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-3 Organizing Transformations in a Compiler Typically middle end consists of many individual transformations that filter the IR and produce rewritten IR No formal theory for order to apply them Some rules of thumb and best practices Some transformations can be profitably applied repeatedly, particularly if others transformations expose more opportunities 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-4 A Taxonomy Machine Independent Transformations Realized profitability may actually depend on machine architecture, but are typically implemented without considering this Machine Dependent Transformations Most of the machine dependent code is in instruction selection & scheduling and register allocation Some machine dependent code belongs in the optimizer 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-5 Machine Independent Transformations Dead code elimination Code motion Specialization Strength reduction
    [Show full text]
  • ICS803 Elective – III Multicore Architecture Teacher Name: Ms
    ICS803 Elective – III Multicore Architecture Teacher Name: Ms. Raksha Pandey Course Structure L T P 3 1 0 4 Prerequisite: Course Content: Unit-I: Multi-core Architectures Introduction to multi-core architectures, issues involved into writing code for multi-core architectures, Virtual Memory, VM addressing, VA to PA translation, Page fault, TLB- Parallel computers, Instruction level parallelism (ILP) vs. thread level parallelism (TLP), Performance issues, OpenMP and other message passing libraries, threads, mutex etc. Unit-II: Multi-threaded Architectures Brief introduction to cache hierarchy - Caches: Addressing a Cache, Cache Hierarchy, States of Cache line, Inclusion policy, TLB access, Memory Op latency, MLP, Memory Wall, communication latency, Shared memory multiprocessors, General architectures and the problem of cache coherence, Synchronization primitives: Atomic primitives; locks: TTS, ticket, array; barriers: central and tree; performance implications in shared memory programs; Chip multiprocessors: Why CMP (Moore's law, wire delay); shared L2 vs. tiled CMP; core complexity; power/performance; Snoopy coherence: invalidate vs. update, MSI, MESI, MOESI, MOSI; performance trade-offs; pipelined snoopy bus design; Memory consistency models: SC, PC, TSO, PSO, WO/WC, RC; Chip multiprocessor case studies: Intel Montecito and dual-core, Pentium4, IBM Power4, Sun Niagara Unit-III: Compiler Optimization Issues Code optimizations: Copy Propagation, dead Code elimination , Loop Optimizations-Loop Unrolling, Induction variable Simplification, Loop Jamming, Loop Unswitching, Techniques to improve detection of parallelism: Scalar Processors, Special locality, Temporal locality, Vector machine, Strip mining, Shared memory model, SIMD architecture, Dopar loop, Dosingle loop. Unit-IV: Control Flow analysis Control flow analysis, Flow graph, Loops in Flow graphs, Loop Detection, Approaches to Control Flow Analysis, Reducible Flow Graphs, Node Splitting.
    [Show full text]
  • Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗
    Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗ David Leopoldseder Roland Schatz Lukas Stadler Johannes Kepler University Linz Oracle Labs Oracle Labs Austria Linz, Austria Linz, Austria [email protected] [email protected] [email protected] Manuel Rigger Thomas Würthinger Hanspeter Mössenböck Johannes Kepler University Linz Oracle Labs Johannes Kepler University Linz Austria Zurich, Switzerland Austria [email protected] [email protected] [email protected] ABSTRACT Non-Counted Loops to Enable Subsequent Compiler Optimizations. In 15th Java programs can contain non-counted loops, that is, loops for International Conference on Managed Languages & Runtimes (ManLang’18), September 12–14, 2018, Linz, Austria. ACM, New York, NY, USA, 13 pages. which the iteration count can neither be determined at compile https://doi.org/10.1145/3237009.3237013 time nor at run time. State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often involves 1 INTRODUCTION duplicating also a loop’s exit condition, which thus only improves run-time performance if subsequent compiler optimizations can Generating fast machine code for loops depends on a selective appli- optimize the unrolled code. cation of different loop optimizations on the main optimizable parts This paper presents an unrolling approach for non-counted loops of a loop: the loop’s exit condition(s), the back edges and the loop that uses simulation at run time to determine whether unrolling body. All these parts must be optimized to generate optimal code such loops enables subsequent compiler optimizations. Simulat- for a loop.
    [Show full text]
  • Compiler Optimization for Configurable Accelerators Betul Buyukkurt Zhi Guo Walid A
    Compiler Optimization for Configurable Accelerators Betul Buyukkurt Zhi Guo Walid A. Najjar University of California Riverside University of California Riverside University of California Riverside Computer Science Department Electrical Engineering Department Computer Science Department Riverside, CA 92521 Riverside, CA 92521 Riverside, CA 92521 [email protected] [email protected] [email protected] ABSTRACT such as constant propagation, constant folding, dead code ROCCC (Riverside Optimizing Configurable Computing eliminations that enables other optimizations. Then, loop Compiler) is an optimizing C to HDL compiler targeting level optimizations such as loop unrolling, loop invariant FPGA and CSOC (Configurable System On a Chip) code motion, loop fusion and other such transforms are architectures. ROCCC system is built on the SUIF- applied. MACHSUIF compiler infrastructure. Our system first This paper is organized as follows. Next section discusses identifies frequently executed kernel loops inside programs on the related work. Section 3 gives an in depth description and then compiles them to VHDL after optimizing the of the ROCCC system. SUIF2 level optimizations are kernels to make best use of FPGA resources. This paper described in Section 4 followed by the compilation done at presents an overview of the ROCCC project as well as the MACHSUIF level in Section 5. Section 6 mentions our optimizations performed inside the ROCCC compiler. early results. Finally Section 7 concludes the paper. 1. INTRODUCTION 2. RELATED WORK FPGAs (Field Programmable Gate Array) can boost There are several projects and publications on translating C software performance due to the large amount of or other high level languages to different HDLs. Streams- parallelism they offer.
    [Show full text]
  • Functional Array Programming in Sac
    Functional Array Programming in SaC Sven-Bodo Scholz Dept of Computer Science, University of Hertfordshire, United Kingdom [email protected] Abstract. These notes present an introduction into array-based pro- gramming from a functional, i.e., side-effect-free perspective. The first part focuses on promoting arrays as predominant, stateless data structure. This leads to a programming style that favors compo- sitions of generic array operations that manipulate entire arrays over specifications that are made in an element-wise fashion. An algebraicly consistent set of such operations is defined and several examples are given demonstrating the expressiveness of the proposed set of operations. The second part shows how such a set of array operations can be defined within the first-order functional array language SaC.Itdoes not only discuss the language design issues involved but it also tackles implementation issues that are crucial for achieving acceptable runtimes from such genericly specified array operations. 1 Introduction Traditionally, binary lists and algebraic data types are the predominant data structures in functional programming languages. They fit nicely into the frame- work of recursive program specifications, lazy evaluation and demand driven garbage collection. Support for arrays in most languages is confined to a very resricted set of basic functionality similar to that found in imperative languages. Even if some languages do support more advanced specificational means such as array comprehensions, these usualy do not provide the same genericity as can be found in array programming languages such as Apl,J,orNial. Besides these specificational issues, typically, there is also a performance issue.
    [Show full text]
  • Loop Transformations and Parallelization
    Loop transformations and parallelization Claude Tadonki LAL/CNRS/IN2P3 University of Paris-Sud [email protected] December 2010 Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Introduction Most of the time, the most time consuming part of a program is on loops. Thus, loops optimization is critical in high performance computing. Depending on the target architecture, the goal of loops transformations are: improve data reuse and data locality efficient use of memory hierarchy reducing overheads associated with executing loops instructions pipeline maximize parallelism Loop transformations can be performed at different levels by the programmer, the compiler, or specialized tools. At high level, some well known transformations are commonly considered: loop interchange loop (node) splitting loop unswitching loop reversal loop fusion loop inversion loop skewing loop fission loop vectorization loop blocking loop unrolling loop parallelization Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Dependence analysis Extract and analyze the dependencies of a computation from its polyhedral model is a fundamental step toward loop optimization or scheduling. Definition For a given variable V and given indexes I1, I2, if the computation of X(I1) requires the value of X(I2), then I1 ¡ I2 is called a dependence vector for variable V . Drawing all the dependence vectors within the computation polytope yields the so-called dependencies diagram. Example The dependence vectors are (1; 0); (0; 1); (¡1; 1). Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Scheduling Definition The computation on the entire domain of a given loop can be performed following any valid schedule.A timing function tV for variable V yields a valid schedule if and only if t(x) > t(x ¡ d); 8d 2 DV ; (1) where DV is the set of all dependence vectors for variable V .
    [Show full text]
  • Compiler-Based Code-Improvement Techniques
    Compiler-Based Code-Improvement Techniques KEITH D. COOPER, KATHRYN S. MCKINLEY, and LINDA TORCZON Since the earliest days of compilation, code quality has been recognized as an important problem [18]. A rich literature has developed around the issue of improving code quality. This paper surveys one part of that literature: code transformations intended to improve the running time of programs on uniprocessor machines. This paper emphasizes transformations intended to improve code quality rather than analysis methods. We describe analytical techniques and specific data-flow problems to the extent that they are necessary to understand the transformations. Other papers provide excellent summaries of the various sub-fields of program analysis. The paper is structured around a simple taxonomy that classifies transformations based on how they change the code. The taxonomy is populated with example transformations drawn from the literature. Each transformation is described at a depth that facilitates broad understanding; detailed references are provided for deeper study of individual transformations. The taxonomy provides the reader with a framework for thinking about code-improving transformations. It also serves as an organizing principle for the paper. Copyright 1998, all rights reserved. You may copy this article for your personal use in Comp 512. Further reproduction or distribution requires written permission from the authors. 1INTRODUCTION This paper presents an overview of compiler-based methods for improving the run-time behavior of programs — often mislabeled code optimization. These techniques have a long history in the literature. For example, Backus makes it quite clear that code quality was a major concern to the implementors of the first Fortran compilers [18].
    [Show full text]
  • Foundations of Scientific Research
    2012 FOUNDATIONS OF SCIENTIFIC RESEARCH N. M. Glazunov National Aviation University 25.11.2012 CONTENTS Preface………………………………………………….…………………….….…3 Introduction……………………………………………….…..........................……4 1. General notions about scientific research (SR)……………….……….....……..6 1.1. Scientific method……………………………….………..……..……9 1.2. Basic research…………………………………………...……….…10 1.3. Information supply of scientific research……………..….………..12 2. Ontologies and upper ontologies……………………………….…..…….…….16 2.1. Concepts of Foundations of Research Activities 2.2. Ontology components 2.3. Ontology for the visualization of a lecture 3. Ontologies of object domains………………………………..………………..19 3.1. Elements of the ontology of spaces and symmetries 3.1.1. Concepts of electrodynamics and classical gauge theory 4. Examples of Research Activity………………….……………………………….21 4.1. Scientific activity in arithmetics, informatics and discrete mathematics 4.2. Algebra of logic and functions of the algebra of logic 4.3. Function of the algebra of logic 5. Some Notions of the Theory of Finite and Discrete Sets…………………………25 6. Algebraic Operations and Algebraic Structures……………………….………….26 7. Elements of the Theory of Graphs and Nets…………………………… 42 8. Scientific activity on the example “Information and its investigation”……….55 9. Scientific research in Artificial Intelligence……………………………………..59 10. Compilers and compilation…………………….......................................……69 11. Objective, Concepts and History of Computer security…….………………..93 12. Methodological and categorical apparatus of scientific research……………114 13. Methodology and methods of scientific research…………………………….116 13.1. Methods of theoretical level of research 13.1.1. Induction 13.1.2. Deduction 13.2. Methods of empirical level of research 14. Scientific idea and significance of scientific research………………………..119 15. Forms of scientific knowledge organization and principles of SR………….121 1 15.1. Forms of scientific knowledge 15.2.
    [Show full text]
  • Survey of Compiler Technology at the IBM Toronto Laboratory
    An (incomplete) Survey of Compiler Technology at the IBM Toronto Laboratory Bob Blainey March 26, 2002 Target systems Sovereign (Sun JDK-based) Just-in-Time (JIT) Compiler zSeries (S/390) OS/390, Linux Resettable, shareable pSeries (PowerPC) AIX 32-bit and 64-bit Linux xSeries (x86 or IA-32) Windows, OS/2, Linux, 4690 (POS) IA-64 (Itanium, McKinley) Windows, Linux C and C++ Compilers zSeries OS/390 pSeries AIX Fortran Compiler pSeries AIX Key Optimizing Compiler Components TOBEY (Toronto Optimizing Back End with Yorktown) Highly optimizing code generator for S/390 and PowerPC targets TPO (Toronto Portable Optimizer) Mostly machine-independent optimizer for Wcode intermediate language Interprocedural analysis, loop transformations, parallelization Sun JDK-based JIT (Sovereign) Best of breed JIT compiler for client and server applications Based very loosely on Sun JDK Inside a Batch Compilation C source C++ source Fortran source Other source C++ Front Fortran C Front End Other Front End Front End Ends Wcode Wcode Wcode++ Wcode Wcode TPO Wcode Scalarizer Wcode TOBEY Wcode Back End Object Code TOBEY Optimizing Back End Project started in 1983 targetting S/370 Later retargetted to ROMP (PC-RT), Power, Power2, PowerPC, SPARC, and ESAME/390 (64 bit) Experimental retargets to i386 and PA-RISC Shipped in over 40 compiler products on 3 different platforms with 8 different source languages Primary vehicle for compiler optimization since the creation of the RS/6000 (pSeries) Implemented in a combination of PL.8 ("80% of PL/I") and C++ on an AIX reference
    [Show full text]
  • Compiler Transformations for High-Performance Computing
    School of Electrical and Computer Engineering N.T.U.A. Embedded System Design High-Level Dimitrios Soudris Transformations for Embedded Computing Άδεια Χρήσης Το παρόν εκπαιδευτικό υλικό υπόκειται σε άδειες χρήσης Creative Commons. Για εκπαιδευτικό υλικό, όπως εικόνες, που υπόκειται σε άδεια χρήσης άλλου τύπου, αυτή πρέπει να αναφέρεται ρητώς. Organization of a hypothetical optimizing compiler 2 DEPENDENCE ANALYSIS Dependence analysis identifies these constraints, which are then used to determine whether a particular transformation can be applied without changing the semantics of the computation. A dependence is a relationship between two computations that places constraints on their execution order. Types dependences: (i) control dependence and (ii) data dependence. Control Dependence Two statements have a Data Dependence if they cannot be executed simultaneously due to conflicting uses of the same variable. 3 Types of Data Dependences (1) Flow dependence (also called true dependence) S1: a = c*10 S2: d = 2*a + c Anti-dependence S1: e = f*4 + g S2: g = 2*h 4 Types of Data Dependences (2) Output dependence both statements write the same variable S1: a = b*c S2: a = d+e Input Dependence when two accesses to the same location memory are both reads Dependence Graph nodes represents statements and arcs dependencies between computations 5 Loop Dependence Analysis To compute dependence information for loops, the key problem is understanding the use of arrays; scalar variables are relatively easy to manage. To track array behavior, the compiler must analyze the subscript expressions in each array reference. To discover whether there is a dependence in the loop nest, it is sufficient to determine whether any of the iterations can write a value that is read or written by any of the other iterations.
    [Show full text]
  • Generalized Index-Set Splitting
    Generalized Index-Set Splitting Christopher Barton1, Arie Tal2, Bob Blainey2, and Jos´e Nelson Amaral1 1 Department of Computing Science, University of Alberta, Edmonton, Canada {cbarton, amaral}@cs.ualberta.ca 2 IBM Toronto Software Laboratory, Toronto, Canada {arietal, blainey}@ca.ibm.com Abstract. This paper introduces Index-Set Splitting (ISS), a technique that splits a loop containing several conditional statements into sev- eral loops with less complex control flow. Contrary to the classic loop unswitching technique, ISS splits loops when the conditional is loop vari- ant. ISS uses an Index Sub-range Tree (IST) to identify the structure of the conditionals in the loop and to select which conditionals should be eliminated. This decision is based on an estimation of the code growth for each splitting: a greedy algorithm spends a pre-determined code growth budget. ISTs separate the decision about which splits to perform from the actual code generation for the split loops. The use of ISS to improve a loop fusion framework is then discussed. ISS opportunity identification in the SPEC2000 benchmark suite and three other suites demonstrate that ISS is a general technique that may benefit other compilers. 1 Introduction This paper describes Index-Set Splitting (ISS), a code transformation motivated by the implementation of loop fusion in the commercially distributed IBM XL Compilers. ISS is an enabling technique that increases the code scope where other optimizations, such as software pipelining, loop unroll-and-jam, unimodu- lar transformations, loop-based common expression elimination, can be applied. A loop that does not contain branch statements is a Single Basic Block Loop (SBBL).
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]