CS6013 - Modern Compilers: Theory and Practise Overview of Different Optimizations

Total Page:16

File Type:pdf, Size:1020Kb

CS6013 - Modern Compilers: Theory and Practise Overview of Different Optimizations Optimizing compilers CS6013 - Modern Compilers: Theory and Practise Overview of different optimizations V. Krishna Nandivada Copyright c 2020 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that IIT Madras copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected]. * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 2 / 1 Compiler structure Optimization token stream Potential optimizations: Source-language (AST): Parser constant bounds in syntax tree loops/arrays loop unrolling Goal: produce fast code Semantic analysis (eg, type checking) suppressing run-time What is optimality? checks syntax tree enable later optimisations Problems are often hard Many are intractable or even undecideable Intermediate IR: local and global code generator CSE elimination Many are NP-complete low−level IR live variable analysis Which optimizations should be used? (eg, canonical trees/tuples) code hoisting Many optimizations overlap or interact Optimizer enable later optimisations low−level IR Code-generation (machine (eg, canonical trees/tuples) code): Machine code register allocation generator instruction scheduling machine code peephole optimization * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 3 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 4 / 1 Optimization Machine-independent transformations applicable across broad range of machines Definition: An optimization is a transformation that is expected to: remove redundant computations improve the running time of a program or decrease its space requirements move evaluation to a less frequently executed place The point: specialize some general-purpose code “improved” code, not “optimal” code (forget “optimum”) find useless code and remove it sometimes produces worse code expose opportunities for other optimizations range of speedup might be from 1.000001 to xxx * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 5 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 6 / 1 Machine-dependent transformations A classical distinction capitalize on machine-specific properties improve mapping from IR onto machine The distinction is not always clear: replace multiply with shifts replace a costly operation with a cheaper one and adds hide latency replace sequence of instructions with more powerful one (use “exotic” instructions) * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 7 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 8 / 1 Optimization Optimization Desirable properties of an optimizing compiler Good compilers are crafted, not assembled code at least as good as an assembler programmer consistent philosophy careful selection of transformations stable, robust performance (predictability) thorough application architectural strengths fully exploited coordinate transformations and data structures architectural weaknesses fully hidden attention to results (code, time, space) broad, efficient support for language features Compilers are engineered objects instantaneous compiles minimize running time of compiled code minimize compile time use reasonable compile-time space (serious problem) Unfortunately, modern compilers often drop the ball Thus, results are sometimes unexpected * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 9 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 10 / 1 Scope of optimization Optimization Local (single block) confined to straight-line code Three considerations arise in applying a transformation: simplest to analyse safety time frame: ’60s to present, particularly now profitability Intraprocedural (global) consider the whole procedure opportunity What do we need to optimize an entire procedure? We need a clear understanding of these issues classical data-flow analysis, dependence analysis the literature often hides them time frame: ’70s to present every discussion should list them clearly Interprocedural (whole program) analyse whole programs What do we need to optimize and entire program? less information is discernible time frame: late ’70s to present, particularly now * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 11 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 12 / 1 Safety Profitability Fundamental question Does the transformation change the results of Fundamental question Is there a reasonable expectation that the executing the code? transformation will be an improvement? yes ) don’t do it! yes ) do it! no ) it is safe no ) don’t do it Compile-time analysis Compile-time estimation may be safe in all cases (loop unrolling) always profitable analysis may be simple (DAGs and CSEs) heuristic rules may require complex reasoning (data-flow analysis) compute benefit (rare) * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 13 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 14 / 1 Opportunity Optimization Fundamental question Can we efficiently locate sites for applying the transformation? Successful optimization requires test for safety yes ) compilation time won’t suffer profit is local improvement × executions no ) better be highly profitable ) focus on loops: Issues loop unrolling factoring loop invariants provides a framework for applying transformation strength reduction systematically find all sites want to minimize side-effects like code growth update safety information to reflect previous changes order of application (hard) * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 15 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 16 / 1 Example: loop unrolling Example: loop unrolling Matrix-matrix multiply Idea: reduce loop overhead by creating multiple successive copies of the loop’s body and increasing the increment do i 1, n, 1 appropriately do j 1, n, 1 c(i,j) 0 Safety: always safe do k 1, n, 1 Profitability: reduces overhead c(i,j) c(i,j) + a(i,k) * b(k,j) (instruction cache blowout) (subtle secondary effects) 2n3 flops, n3 loop increments and branches Opportunity: loops each iteration does 2 loads and 2 flops Unrolling is easy to understand and perform This is the most overstudied example in the literature * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 17 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 18 / 1 Example: loop unrolling Example: loop unrolling Matrix-matrix multiply (assume 4-word cache line) Matrix-matrix multiply (to improve traffic on b) do j 1, n, 1 do i 1, n, 1 do i 1, n, 4 do j 1, n, 1 c(i,j) 0 c(i,j) 0 do k 1, n, 4 do k 1, n, 4 c(i,j) c(i,j) + a(i,k) * b(k,j) c(i,j) c(i,j) + a(i,k) * b(k,j) + a(i,k+1) * b(k+1,j) + a(i,k+2) * b(k+2,j) c(i,j) c(i,j) + a(i,k+1) * b(k+1,j) + a(i,k+3) * b(k+3,j) c(i,j) c(i,j) + a(i,k+2) * b(k+2,j) c(i+1,j) c(i+1,j) + a(i+1,k) * b(k,j) c(i,j) c(i,j) + a(i,k+3) * b(k+3,j) + a(i+1,k+1) * b(k+1,j) + a(i+1,k+2) * b(k+2,j) + a(i+1,k+3) * b(k+3,j) c(i+2,j) c(i+2,j) + a(i+2,k) * b(k,j) 3 n3 2n flops, 4 loop increments and branches + a(i+2,k+1) * b(k+1,j) each iteration does 8 loads and 8 flops + a(i+2,k+2) * b(k+2,j) + a(i+2,k+3) b(k+3,j) memory traffic is better * c(i+3,j) c(i+3,j) + a(i+3,k) * b(k,j) c(i,j) is reused (put it in a register) + a(i+3,k+1) * b(k+1,j) a(i,k) reference are from cache + a(i+3,k+2) * b(k+2,j) b(k,j) is problematic * + a(i+3,k+3) * b(k+3,j) * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 19 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 20 / 1 Example: loop unrolling Loop transformations What happened? It is not as easy as it looks: interchanged i and j loops unrolled i loop Safety : loop interchange? loop unrolling? loop fusion? fused inner loops Opportunity : find memory-bound loop nests 3 Profitability : machine dependent (mostly) 2n3 flops, n loop increments and branches 16 Summary first assignment does 8 loads and 8 flops chance for large improvement 2nd through 4th do 4 loads and 8 flops answering the fundamentals is tough memory traffic is better resulting code is ugly c(i,j) is reused (register) a(i,k) references are from cache b(k,j) is reused (register) Matrix-matrix multiply is everyone’s favorite example * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 21 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 22 / 1 Loop optimizations: factoring loop-invariants Example: factoring loop invariants i=1 Loop invariants: expressions constant within loop body foreach .. 100 do // LoopDef = fi,j,k, Ag Relevant variables: those used to compute and expression foreach j=1 .. 100 do Opportunity: // LoopDef = fj,k, Ag 1 identify variables defined in body of loop (LoopDef) foreach k=1 .. 100 do 2 loop invariants have no relevant variables in LoopDef // LoopDef = fk, Ag 3 assign each loop-invariant to temp. in loop header A[i,j,k] = i * j * k; 4 use temporary in loop body end Safety: loop-invariant expression may throw exception early end Profitability: end loop may execute 0 times 3 million index operations loop-invariant may not be needed on every path through loop body 2 million multiplications * * V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 23 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2020 24 / 1 Example: factoring loop invariants (cont.) Strength reduction in loops Factoring the inner loop: And the second loop: foreach i=1 .
Recommended publications
  • CSE 582 – Compilers
    CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2011 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-1 Agenda A sampler of typical optimizing transformations Mostly a teaser for later, particularly once we’ve looked at analyzing loops 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-2 Role of Transformations Data-flow analysis discovers opportunities for code improvement Compiler must rewrite the code (IR) to realize these improvements A transformation may reveal additional opportunities for further analysis & transformation May also block opportunities by obscuring information 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-3 Organizing Transformations in a Compiler Typically middle end consists of many individual transformations that filter the IR and produce rewritten IR No formal theory for order to apply them Some rules of thumb and best practices Some transformations can be profitably applied repeatedly, particularly if others transformations expose more opportunities 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-4 A Taxonomy Machine Independent Transformations Realized profitability may actually depend on machine architecture, but are typically implemented without considering this Machine Dependent Transformations Most of the machine dependent code is in instruction selection & scheduling and register allocation Some machine dependent code belongs in the optimizer 11/8/2011 © 2002-11 Hal Perkins & UW CSE S-5 Machine Independent Transformations Dead code elimination Code motion Specialization Strength reduction
    [Show full text]
  • ICS803 Elective – III Multicore Architecture Teacher Name: Ms
    ICS803 Elective – III Multicore Architecture Teacher Name: Ms. Raksha Pandey Course Structure L T P 3 1 0 4 Prerequisite: Course Content: Unit-I: Multi-core Architectures Introduction to multi-core architectures, issues involved into writing code for multi-core architectures, Virtual Memory, VM addressing, VA to PA translation, Page fault, TLB- Parallel computers, Instruction level parallelism (ILP) vs. thread level parallelism (TLP), Performance issues, OpenMP and other message passing libraries, threads, mutex etc. Unit-II: Multi-threaded Architectures Brief introduction to cache hierarchy - Caches: Addressing a Cache, Cache Hierarchy, States of Cache line, Inclusion policy, TLB access, Memory Op latency, MLP, Memory Wall, communication latency, Shared memory multiprocessors, General architectures and the problem of cache coherence, Synchronization primitives: Atomic primitives; locks: TTS, ticket, array; barriers: central and tree; performance implications in shared memory programs; Chip multiprocessors: Why CMP (Moore's law, wire delay); shared L2 vs. tiled CMP; core complexity; power/performance; Snoopy coherence: invalidate vs. update, MSI, MESI, MOESI, MOSI; performance trade-offs; pipelined snoopy bus design; Memory consistency models: SC, PC, TSO, PSO, WO/WC, RC; Chip multiprocessor case studies: Intel Montecito and dual-core, Pentium4, IBM Power4, Sun Niagara Unit-III: Compiler Optimization Issues Code optimizations: Copy Propagation, dead Code elimination , Loop Optimizations-Loop Unrolling, Induction variable Simplification, Loop Jamming, Loop Unswitching, Techniques to improve detection of parallelism: Scalar Processors, Special locality, Temporal locality, Vector machine, Strip mining, Shared memory model, SIMD architecture, Dopar loop, Dosingle loop. Unit-IV: Control Flow analysis Control flow analysis, Flow graph, Loops in Flow graphs, Loop Detection, Approaches to Control Flow Analysis, Reducible Flow Graphs, Node Splitting.
    [Show full text]
  • Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗
    Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗ David Leopoldseder Roland Schatz Lukas Stadler Johannes Kepler University Linz Oracle Labs Oracle Labs Austria Linz, Austria Linz, Austria [email protected] [email protected] [email protected] Manuel Rigger Thomas Würthinger Hanspeter Mössenböck Johannes Kepler University Linz Oracle Labs Johannes Kepler University Linz Austria Zurich, Switzerland Austria [email protected] [email protected] [email protected] ABSTRACT Non-Counted Loops to Enable Subsequent Compiler Optimizations. In 15th Java programs can contain non-counted loops, that is, loops for International Conference on Managed Languages & Runtimes (ManLang’18), September 12–14, 2018, Linz, Austria. ACM, New York, NY, USA, 13 pages. which the iteration count can neither be determined at compile https://doi.org/10.1145/3237009.3237013 time nor at run time. State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often involves 1 INTRODUCTION duplicating also a loop’s exit condition, which thus only improves run-time performance if subsequent compiler optimizations can Generating fast machine code for loops depends on a selective appli- optimize the unrolled code. cation of different loop optimizations on the main optimizable parts This paper presents an unrolling approach for non-counted loops of a loop: the loop’s exit condition(s), the back edges and the loop that uses simulation at run time to determine whether unrolling body. All these parts must be optimized to generate optimal code such loops enables subsequent compiler optimizations. Simulat- for a loop.
    [Show full text]
  • Compiler Optimization for Configurable Accelerators Betul Buyukkurt Zhi Guo Walid A
    Compiler Optimization for Configurable Accelerators Betul Buyukkurt Zhi Guo Walid A. Najjar University of California Riverside University of California Riverside University of California Riverside Computer Science Department Electrical Engineering Department Computer Science Department Riverside, CA 92521 Riverside, CA 92521 Riverside, CA 92521 [email protected] [email protected] [email protected] ABSTRACT such as constant propagation, constant folding, dead code ROCCC (Riverside Optimizing Configurable Computing eliminations that enables other optimizations. Then, loop Compiler) is an optimizing C to HDL compiler targeting level optimizations such as loop unrolling, loop invariant FPGA and CSOC (Configurable System On a Chip) code motion, loop fusion and other such transforms are architectures. ROCCC system is built on the SUIF- applied. MACHSUIF compiler infrastructure. Our system first This paper is organized as follows. Next section discusses identifies frequently executed kernel loops inside programs on the related work. Section 3 gives an in depth description and then compiles them to VHDL after optimizing the of the ROCCC system. SUIF2 level optimizations are kernels to make best use of FPGA resources. This paper described in Section 4 followed by the compilation done at presents an overview of the ROCCC project as well as the MACHSUIF level in Section 5. Section 6 mentions our optimizations performed inside the ROCCC compiler. early results. Finally Section 7 concludes the paper. 1. INTRODUCTION 2. RELATED WORK FPGAs (Field Programmable Gate Array) can boost There are several projects and publications on translating C software performance due to the large amount of or other high level languages to different HDLs. Streams- parallelism they offer.
    [Show full text]
  • Generalizing Loop-Invariant Code Motion in a Real-World Compiler
    Imperial College London Department of Computing Generalizing loop-invariant code motion in a real-world compiler Author: Supervisor: Paul Colea Fabio Luporini Co-supervisor: Prof. Paul H. J. Kelly MEng Computing Individual Project June 2015 Abstract Motivated by the perpetual goal of automatically generating efficient code from high-level programming abstractions, compiler optimization has developed into an area of intense research. Apart from general-purpose transformations which are applicable to all or most programs, many highly domain-specific optimizations have also been developed. In this project, we extend such a domain-specific compiler optimization, initially described and implemented in the context of finite element analysis, to one that is suitable for arbitrary applications. Our optimization is a generalization of loop-invariant code motion, a technique which moves invariant statements out of program loops. The novelty of the transformation is due to its ability to avoid more redundant recomputation than normal code motion, at the cost of additional storage space. This project provides a theoretical description of the above technique which is fit for general programs, together with an implementation in LLVM, one of the most successful open-source compiler frameworks. We introduce a simple heuristic-driven profitability model which manages to successfully safeguard against potential performance regressions, at the cost of missing some speedup opportunities. We evaluate the functional correctness of our implementation using the comprehensive LLVM test suite, passing all of its 497 whole program tests. The results of our performance evaluation using the same set of tests reveal that generalized code motion is applicable to many programs, but that consistent performance gains depend on an accurate cost model.
    [Show full text]
  • Functional Array Programming in Sac
    Functional Array Programming in SaC Sven-Bodo Scholz Dept of Computer Science, University of Hertfordshire, United Kingdom [email protected] Abstract. These notes present an introduction into array-based pro- gramming from a functional, i.e., side-effect-free perspective. The first part focuses on promoting arrays as predominant, stateless data structure. This leads to a programming style that favors compo- sitions of generic array operations that manipulate entire arrays over specifications that are made in an element-wise fashion. An algebraicly consistent set of such operations is defined and several examples are given demonstrating the expressiveness of the proposed set of operations. The second part shows how such a set of array operations can be defined within the first-order functional array language SaC.Itdoes not only discuss the language design issues involved but it also tackles implementation issues that are crucial for achieving acceptable runtimes from such genericly specified array operations. 1 Introduction Traditionally, binary lists and algebraic data types are the predominant data structures in functional programming languages. They fit nicely into the frame- work of recursive program specifications, lazy evaluation and demand driven garbage collection. Support for arrays in most languages is confined to a very resricted set of basic functionality similar to that found in imperative languages. Even if some languages do support more advanced specificational means such as array comprehensions, these usualy do not provide the same genericity as can be found in array programming languages such as Apl,J,orNial. Besides these specificational issues, typically, there is also a performance issue.
    [Show full text]
  • Loop Transformations and Parallelization
    Loop transformations and parallelization Claude Tadonki LAL/CNRS/IN2P3 University of Paris-Sud [email protected] December 2010 Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Introduction Most of the time, the most time consuming part of a program is on loops. Thus, loops optimization is critical in high performance computing. Depending on the target architecture, the goal of loops transformations are: improve data reuse and data locality efficient use of memory hierarchy reducing overheads associated with executing loops instructions pipeline maximize parallelism Loop transformations can be performed at different levels by the programmer, the compiler, or specialized tools. At high level, some well known transformations are commonly considered: loop interchange loop (node) splitting loop unswitching loop reversal loop fusion loop inversion loop skewing loop fission loop vectorization loop blocking loop unrolling loop parallelization Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Dependence analysis Extract and analyze the dependencies of a computation from its polyhedral model is a fundamental step toward loop optimization or scheduling. Definition For a given variable V and given indexes I1, I2, if the computation of X(I1) requires the value of X(I2), then I1 ¡ I2 is called a dependence vector for variable V . Drawing all the dependence vectors within the computation polytope yields the so-called dependencies diagram. Example The dependence vectors are (1; 0); (0; 1); (¡1; 1). Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Scheduling Definition The computation on the entire domain of a given loop can be performed following any valid schedule.A timing function tV for variable V yields a valid schedule if and only if t(x) > t(x ¡ d); 8d 2 DV ; (1) where DV is the set of all dependence vectors for variable V .
    [Show full text]
  • Compiler-Based Code-Improvement Techniques
    Compiler-Based Code-Improvement Techniques KEITH D. COOPER, KATHRYN S. MCKINLEY, and LINDA TORCZON Since the earliest days of compilation, code quality has been recognized as an important problem [18]. A rich literature has developed around the issue of improving code quality. This paper surveys one part of that literature: code transformations intended to improve the running time of programs on uniprocessor machines. This paper emphasizes transformations intended to improve code quality rather than analysis methods. We describe analytical techniques and specific data-flow problems to the extent that they are necessary to understand the transformations. Other papers provide excellent summaries of the various sub-fields of program analysis. The paper is structured around a simple taxonomy that classifies transformations based on how they change the code. The taxonomy is populated with example transformations drawn from the literature. Each transformation is described at a depth that facilitates broad understanding; detailed references are provided for deeper study of individual transformations. The taxonomy provides the reader with a framework for thinking about code-improving transformations. It also serves as an organizing principle for the paper. Copyright 1998, all rights reserved. You may copy this article for your personal use in Comp 512. Further reproduction or distribution requires written permission from the authors. 1INTRODUCTION This paper presents an overview of compiler-based methods for improving the run-time behavior of programs — often mislabeled code optimization. These techniques have a long history in the literature. For example, Backus makes it quite clear that code quality was a major concern to the implementors of the first Fortran compilers [18].
    [Show full text]
  • Foundations of Scientific Research
    2012 FOUNDATIONS OF SCIENTIFIC RESEARCH N. M. Glazunov National Aviation University 25.11.2012 CONTENTS Preface………………………………………………….…………………….….…3 Introduction……………………………………………….…..........................……4 1. General notions about scientific research (SR)……………….……….....……..6 1.1. Scientific method……………………………….………..……..……9 1.2. Basic research…………………………………………...……….…10 1.3. Information supply of scientific research……………..….………..12 2. Ontologies and upper ontologies……………………………….…..…….…….16 2.1. Concepts of Foundations of Research Activities 2.2. Ontology components 2.3. Ontology for the visualization of a lecture 3. Ontologies of object domains………………………………..………………..19 3.1. Elements of the ontology of spaces and symmetries 3.1.1. Concepts of electrodynamics and classical gauge theory 4. Examples of Research Activity………………….……………………………….21 4.1. Scientific activity in arithmetics, informatics and discrete mathematics 4.2. Algebra of logic and functions of the algebra of logic 4.3. Function of the algebra of logic 5. Some Notions of the Theory of Finite and Discrete Sets…………………………25 6. Algebraic Operations and Algebraic Structures……………………….………….26 7. Elements of the Theory of Graphs and Nets…………………………… 42 8. Scientific activity on the example “Information and its investigation”……….55 9. Scientific research in Artificial Intelligence……………………………………..59 10. Compilers and compilation…………………….......................................……69 11. Objective, Concepts and History of Computer security…….………………..93 12. Methodological and categorical apparatus of scientific research……………114 13. Methodology and methods of scientific research…………………………….116 13.1. Methods of theoretical level of research 13.1.1. Induction 13.1.2. Deduction 13.2. Methods of empirical level of research 14. Scientific idea and significance of scientific research………………………..119 15. Forms of scientific knowledge organization and principles of SR………….121 1 15.1. Forms of scientific knowledge 15.2.
    [Show full text]
  • Survey of Compiler Technology at the IBM Toronto Laboratory
    An (incomplete) Survey of Compiler Technology at the IBM Toronto Laboratory Bob Blainey March 26, 2002 Target systems Sovereign (Sun JDK-based) Just-in-Time (JIT) Compiler zSeries (S/390) OS/390, Linux Resettable, shareable pSeries (PowerPC) AIX 32-bit and 64-bit Linux xSeries (x86 or IA-32) Windows, OS/2, Linux, 4690 (POS) IA-64 (Itanium, McKinley) Windows, Linux C and C++ Compilers zSeries OS/390 pSeries AIX Fortran Compiler pSeries AIX Key Optimizing Compiler Components TOBEY (Toronto Optimizing Back End with Yorktown) Highly optimizing code generator for S/390 and PowerPC targets TPO (Toronto Portable Optimizer) Mostly machine-independent optimizer for Wcode intermediate language Interprocedural analysis, loop transformations, parallelization Sun JDK-based JIT (Sovereign) Best of breed JIT compiler for client and server applications Based very loosely on Sun JDK Inside a Batch Compilation C source C++ source Fortran source Other source C++ Front Fortran C Front End Other Front End Front End Ends Wcode Wcode Wcode++ Wcode Wcode TPO Wcode Scalarizer Wcode TOBEY Wcode Back End Object Code TOBEY Optimizing Back End Project started in 1983 targetting S/370 Later retargetted to ROMP (PC-RT), Power, Power2, PowerPC, SPARC, and ESAME/390 (64 bit) Experimental retargets to i386 and PA-RISC Shipped in over 40 compiler products on 3 different platforms with 8 different source languages Primary vehicle for compiler optimization since the creation of the RS/6000 (pSeries) Implemented in a combination of PL.8 ("80% of PL/I") and C++ on an AIX reference
    [Show full text]
  • Compiler Optimizations
    Compiler Optimizations CS 498: Compiler Optimizations Fall 2006 University of Illinois at Urbana-Champaign A note about the name of optimization • It is a misnomer since there is no guarantee of optimality. • We could call the operation code improvement, but this is not quite true either since compiler transformations are not guaranteed to improve the performance of the generated code. Outline Assignment Statement Optimizations Loop Body Optimizations Procedure Optimizations Register allocation Instruction Scheduling Control Flow Optimizations Cache Optimizations Vectorization and Parallelization Advanced Compiler Design Implementation. Steven S. Muchnick, Morgan and Kaufmann Publishers, 1997. Chapters 12 - 19 Historical origins of compiler optimization "It was our belief that if FORTRAN, during its first months, were to translate any reasonable "scientific" source program into an object program only half as fast as its hand coded counterpart, then acceptance of our system would be in serious danger. This belief caused us to regard the design of the translator as the real challenge, not the simple task of designing the language."... "To this day I believe that our emphasis on object program efficiency rather than on language design was basically correct. I believe that has we failed to produce efficient programs, the widespread use of language like FORTRAN would have been seriously delayed. John Backus FORTRAN I, II, and III Annals of the History of Computing Vol. 1, No 1, July 1979 Compilers are complex systems “Like most of the early hardware and software systems, Fortran was late in delivery, and didn’t really work when it was delivered. At first people thought it would never be done.
    [Show full text]
  • Compiler Transformations for High-Performance Computing
    School of Electrical and Computer Engineering N.T.U.A. Embedded System Design High-Level Dimitrios Soudris Transformations for Embedded Computing Άδεια Χρήσης Το παρόν εκπαιδευτικό υλικό υπόκειται σε άδειες χρήσης Creative Commons. Για εκπαιδευτικό υλικό, όπως εικόνες, που υπόκειται σε άδεια χρήσης άλλου τύπου, αυτή πρέπει να αναφέρεται ρητώς. Organization of a hypothetical optimizing compiler 2 DEPENDENCE ANALYSIS Dependence analysis identifies these constraints, which are then used to determine whether a particular transformation can be applied without changing the semantics of the computation. A dependence is a relationship between two computations that places constraints on their execution order. Types dependences: (i) control dependence and (ii) data dependence. Control Dependence Two statements have a Data Dependence if they cannot be executed simultaneously due to conflicting uses of the same variable. 3 Types of Data Dependences (1) Flow dependence (also called true dependence) S1: a = c*10 S2: d = 2*a + c Anti-dependence S1: e = f*4 + g S2: g = 2*h 4 Types of Data Dependences (2) Output dependence both statements write the same variable S1: a = b*c S2: a = d+e Input Dependence when two accesses to the same location memory are both reads Dependence Graph nodes represents statements and arcs dependencies between computations 5 Loop Dependence Analysis To compute dependence information for loops, the key problem is understanding the use of arrays; scalar variables are relatively easy to manage. To track array behavior, the compiler must analyze the subscript expressions in each array reference. To discover whether there is a dependence in the loop nest, it is sufficient to determine whether any of the iterations can write a value that is read or written by any of the other iterations.
    [Show full text]