Introduction 22 : Optimization

Total Page:16

File Type:pdf, Size:1020Kb

Introduction 22 : Optimization CSc 453 Compilers and Systems Software Introduction 22 : Optimization Department of Computer Science University of Arizona [email protected] Copyright °c 2009 Christian Collberg Compiler Phases Intermediate Code Optimization Local Opt Global Opt Inter−Proc. Opt Transformation 1 Transformation 1 Transformation 1 Transformation 2 Transformation 2 Transformation 2 tokens .... .... .... source Lexer Parser Lexical Analysis AST Syntactic Analysis AST Semantic Analysis errors errors gen[S]={..} kill[S1]={..} AST ... IR Tree Interm. Semantic Optimize SOURCE Transformations Code Gen Analyser Build Call Data Flow B5 IR Graph Analysis AST errors Machine We are here! Code Gen Or how about here? Machine Code Gen. Or maybe here! Interm. Code Gen. asm Peephole Opt. Control Flow Analysis What do we Optimize? 1 Optimize everything, all the time. The problem is that optimization interferes with debugging. In fact, many (most) compilers don’t let you generate an optimized program with What do we Optimize? debugging information. The problem of debugging optimized code is an important research field. Furthermore, optimization is probably the most time consuming pass in the compiler. Always optimizing everything (even routines which will never be called!) wastes valuable time. 2 The programmer decides what to optimize. The problem is that the programmer has a local view of the code. When timing a program programmers are often very surprised to see where most of the time is spent. 3 Turn optimization on when program is complete. 5 Use profiling information to guide what to optimize. Compiler Code Unfortunately, optimizers aren’t perfect, and a program that Optimizer Front−End Generator performed OK with debugging turned on often behaves Profile differently when debugging is off and optimization is on. data "Carefully Executable 4 Optimize inner loops only. Unfortunately, procedure calls can chosen input" Program hide inner loops: 6 Runtime code generation/optimization. We delay code PROCEDURE P(n); generation and optimization until execution time. At that BEGIN time we have more information to guide the otpimizations: Execution time FOR k:=1 TO n DO ··· END; Interm. Code Interm. Compiled call P(x,3) END P; Compiler Code Table Code Table Front−End P [...] P [...] Gen spec. code Q [...] Q [...] for P(_,3), P(_,3) [...] optimize, then FOR i:=1 TO 10000 DO P(i) END; call Specialized Code Local, Global, Inter-Proc. Some compilers optimize more aggressively than others. An aggressive compiler optimizes over a large piece of code, a Local vs. Global vs. simple one only considers a small chunk of code at a time. Inter-procedural Optimization Local Optimization: Consider each basic block by itself. (All compilers.) Global Optimization: Consider each procedure by itself. (Most compilers.) Inter-Procedural Optimization: Consider the control flow between procedures. (A few compilers do this.) Local Optimization — Transformations Peephole Optimization Local common subexpression elimination. Can be done at the machine code level or at the intermediate Local copy propagation. code level. Local dead-code elimination. 1 Examine a “window” of instructions. Algebraic optimization. 2 Improve code in window. Jump-to-jump removal. 3 Slide window. Reduction in strength. 4 Repeat until “optimal”. Peephole optimization—Redundant Loads A naive code generator will generate the same address or variable several times. Local Optimization A:=A+1; ⇒ set A, %l0 set A, %l1 ld [%l1], %l1 add %l1, 1, %l1 st %l1, [%l0] ⇓ set A, %l0 ld [%l0], %l1 add %l1, 1, %l1 st %l1, [%l0] Peephole optimization—Jumps-to-jumps Algebraic Simplification Beware of numerical problems: Complicated boolean expressions (with many and, or, (x ∗ 0.00000001) ∗ 10000000000.0 nots) can easily produce lots of jumps to jumps. may produce a different result than if a < b goto L1 ⇒ if a < b goto L3 ... ... (x ∗ 1000.0) L1: goto L2 L1: goto L3 ... ... ! L2: goto L3 L2: goto L3 FORTRAN requires that parenthesis be honored: (5.0 ∗ x) ∗ (6.0 ∗ y) can’t be evaluated as (30.0 ∗ x ∗ y). Note that multiplication is often faster than division. Algebraic Simplification. Reduction in Strength SHL(x,y) = shift x left y steps. Multiplications (and divisions) by constants can be replaced x := x + 0; ⇒ by cheaper sequences of shifts and adds. x := x - 0; ⇒ x := x ∗ 32 ⇒ x := SHL(x, 5); x := x ∗ 1; ⇒ x := 1 ∗ 1; ⇒ x := 1 x := x ∗ 100 x := x / 1; ⇒ ⇓ x := x ** 2; ⇒ x := x * x; x := x ∗ (64 + 32 + 4) f := f / 2.0; ⇒ f := f ∗ 0.5; ⇓ x := x ∗ 64 + x ∗32 + x ∗4 ⇓ x := SHL(x,6) + SHL(x,5) + SHL(x,2) Local, Global,Inter-Procedural FUNCTION P (X,n): INT; Original Code IF n=3 THEN RETURN X[1] ELSE RETURN X[n]; CONST R = 1; FUNCTION P (X,n): INT; BEGIN IF n = 3 THEN RETURN X[1] K := 3; ... ELSE RETURN X[n]; IF P(X,K) = X[1] THEN X[1] := R * (X[1] ** 2) CONST R = 1; BEGIN After Local Optimization K := 3; ... IF P(X,K) = X[1] THEN FUNCTION P (X,n): INT; X[1] := R * (X[1] ** 2) IF n=3 THEN RETURN X[1] ELSE RETURN X[n] BEGIN K := 3; ... IF P(X,K) = X[1] THEN X[1] := X[1] * X[1] FUNCTION P (X,n): INT; FUNCTION P (X,n): INT; IF n=3 THEN RETURN X[1] ELSE RETURN X[n] IF THEN RETURN ELSE RETURN n=3 X[1] X[n] BEGIN BEGIN IF P(X,3) = X[1] THEN X[1] := X[1] * X[1] K := 3; ... After Inter-Procedural Opt IF P(X,K) = X[1] THEN X[1] := X[1] * X[1] BEGIN After Global Optimization IF TRUE THEN X[1] := X[1] * X[1] FUNCTION P (X,n): INT; After Another Local Opt IF n=3 THEN RETURN X[1] ELSE RETURN X[n] BEGIN BEGIN X[1] := X[1] * X[1] ... IF P(X,3) = X[1] THEN X[1] := X[1] * X[1] Delete P if it isn’t used elsewhere. This can maybe be deduced by an inter-procedural analysis. Global Optimization Makes use of control-flow and data-flow analysis. Dead code elimination. Global Optimization Common subexpression elimination (local and global). Loop unrolling. Code hoisting. Induction variables. Reduction in strenght. Copy propagation. Live variable analysis. Uninitialized Variable Analysis. Control Flow Graphs B1 Common Sub−Expression Elimination Perform optimizations over the control flow graph of a procedure. B1 B2 Basic Block B2 B3 if ... goto B2 t3 := 4 * j t3 := 4 * j B3 t5 := j * 4 A[t3] := 20 A[t5] := 20 if ... goto B3 B4 B4 No changes to j here! if ... goto B6 B5 B6 B5 goto B2 t9 := 4 * j x := A[t9] x := A[t3] No changes to x! x := 23 B1 Dead Code B3 Elimination x := t3 x := t3 Copy Propagation B2 No changes to x here! A[t4] := x A[t4] := t3 if x>20 goto B4 B4 B3 A[t4] := t3 t1 := 4 * j A[t3] := 20 No more uses of x! B5 B4 Nor here! A[x] := 20 A[t3] := 20 B5 Many optimizations produce X := Y . After an assignment X := Y , replace references to X by Y . A piece of code is dead if we can determine at compile time Remove the assignment if possible. that it will never be executed. x := 23 B1 Induction Variables Code B2 B1 B1 Hoisting if a<5 goto B2 t1 := A[i+3]; if a<5 goto B2 t1 := 4 * j B2 B3 B2 B3 B3 x := 23; X:=A[i+3]*6 X:=A[i+3]+9 X:=t1+9 X:=t1*6 j := j + 1 REPEAT j := j + 1 j := j + 1; t1 := 4 * j t1 := t1 + 4 t4 := A[4*j] UNTIL ....; t4 := A[t1] t4 := A[t1] if ... goto B3 if ... goto B3 IF a < 5 THEN X := A[i+3] + 9; B4 ELSE X := A[i+3] * 6 END If i and j are updated simultaneously in a loop, and Move code that is computed twice in different basic blocks to j = i ∗ c1 + c2 (c1, c2 are consants) we can remove one of a common ancestor block. them, and/or replace ∗ by +. Loop Unrolling – Constant Bounds FOR i := 1 TO 5 DO Loop Unrolling A[i]:=i END ⇓ A[1] := 1; A[2] := 2; A[3] := 3; A[4] := 4; A[5] := 5; Loop unrolling increases code size. How does this effect caching? Loop Unrolling – Variable Bounds FOR i := 1 TO n DO A[i] := i END ⇓ Inter-procedural Optimizations i := 1; WHILE i <= (n-4) DO A[i]:=i; A[i+1]:=i+1; A[i+2]:=i+2; A[i+3]:=i+3; A[i+4]:=i+4; i:=i+5; END; WHILE i<=n DO A[i]:=i; i:=i+1; END Inter-procedural Optimization Inline Expansion—Original Code Consider the entire program during optimization. FUNCTION Power (n, exp:INT):INT; How can this be done for languages that support separately IF exp < 0 THEN result := 0; compiled modules? ELSIF exp = 0 THEN result := 1; Transformations ELSE result := n; Inline expansion: Replace a procedure call with the code of the FOR i := 2 TO exp DO called procedure. result := result * n; Procedure Cloning: Create multiple specialized copies of a single END; END; procedure. RETURN result; END Power; Inter-procedural constant propagation: If we know that a particular procedure is always called with a constant BEGIN X := 7; PRINT Power(X,2) END; parameter with a specific value, we can optimize for this case. Inline Expansion—Expanded Code Inline Expansion—After copy propagation BEGIN BEGIN X := 7; X := 7; result := X; result := 7; FOR i := 2 TO 2 DO FOR i := 2 TO 2 DO result := result * X; result := result * 7; END; END; PRINT result; PRINT result; END END Inline Expansion—More optimization Procedure Cloning—Original Code After loop unrolling BEGIN FUNCTION Power (n, exp:INT):INT; X := 7; IF exp < 0 THEN result := 0; result := 7; ELSIF exp = 0 THEN result := 1; result := result * 7; ELSE result := n; PRINT result; FOR i := 2 TO exp DO END result := result * n; END; After constant folding RETURN result; BEGIN END Power; result := 49; PRINT result; BEGIN PRINT Power(X,2), Power(X,7) END; END Procedure Cloning—Cloned Routines FUNCTION Power0 (n):INT; RETURN 1; Machine Dependent vs.
Recommended publications
  • Loop Pipelining with Resource and Timing Constraints
    UNIVERSITAT POLITÈCNICA DE CATALUNYA LOOP PIPELINING WITH RESOURCE AND TIMING CONSTRAINTS Autor: Fermín Sánchez October, 1995 1 1 1 1 1 1 1 1 2 1• SOFTWARE PIPELINING 1 1 1 2.1 INTRODUCTION 1 Software pipelining [ChaSl] comprises a family of techniques aimed at finding a pipelined schedule of the execution of loop iterations. The pipelined schedule represents a new loop body which may contain instructions belonging to different iterations. The sequential execution of the schedule takes less time than the sequential execution of the iterations of the loop as they were initially written. 1 In general, a pipelined loop schedule has the following characteristics1: • All the iterations (of the new loop) are executed in the same fashion. • The initiation interval (ÍÍ) between the issuing of two consecutive iterations is always the same. 1 Figure 2.1 shows an example of software pipelining a loop. The DDG representing the loop body to pipeline is presented in Figure 2.1(a). The loop must be executed ten times (with an iteration index i G [0,9]). Let us assume that all instructions in the loop are executed in a single cycle 1 and a new iteration may start every cycle (otherwise the dependence between instruction A from 1 We will assume here that the schedule contains a unique iteration of the loop. 1 15 1 1 I I 16 CHAPTER 2 I I time B,, Prol AO Q, BO A, I Q, B, A2 3<i<9 I Steady State D4 B7 I Epilogue Cy I D9 (a) (b) (c) I Figure 2.1 Software pipelining a loop (a) DDG representing a loop body (b) Parallel execution of the loop (c) New parallel loop body I iterations i and i+ 1 will not be honored).- With this assumption,-the loop can be executed in a I more parallel fashion than the sequential one, as shown in Figure 2.1(b).
    [Show full text]
  • A Deep Dive Into the Interprocedural Optimization Infrastructure
    Stes Bais [email protected] Kut el [email protected] Shi Oku [email protected] A Deep Dive into the Luf Cen Interprocedural [email protected] Hid Ue Optimization Infrastructure [email protected] Johs Dor [email protected] Outline ● What is IPO? Why is it? ● Introduction of IPO passes in LLVM ● Inlining ● Attributor What is IPO? What is IPO? ● Pass Kind in LLVM ○ Immutable pass Intraprocedural ○ Loop pass ○ Function pass ○ Call graph SCC pass ○ Module pass Interprocedural IPO considers more than one function at a time Call Graph ● Node : functions ● Edge : from caller to callee A void A() { B(); C(); } void B() { C(); } void C() { ... B C } Call Graph SCC ● SCC stands for “Strongly Connected Component” A D G H I B C E F Call Graph SCC ● SCC stands for “Strongly Connected Component” A D G H I B C E F Passes In LLVM IPO passes in LLVM ● Where ○ Almost all IPO passes are under llvm/lib/Transforms/IPO Categorization of IPO passes ● Inliner ○ AlwaysInliner, Inliner, InlineAdvisor, ... ● Propagation between caller and callee ○ Attributor, IP-SCCP, InferFunctionAttrs, ArgumentPromotion, DeadArgumentElimination, ... ● Linkage and Globals ○ GlobalDCE, GlobalOpt, GlobalSplit, ConstantMerge, ... ● Others ○ MergeFunction, OpenMPOpt, HotColdSplitting, Devirtualization... 13 Why is IPO? ● Inliner ○ Specialize the function with call site arguments ○ Expose local optimization opportunities ○ Save jumps, register stores/loads (calling convention) ○ Improve instruction locality ● Propagation between caller and callee ○ Other passes would benefit from the propagated information ● Linkage
    [Show full text]
  • Research of Register Pressure Aware Loop Unrolling Optimizations for Compiler
    MATEC Web of Conferences 228, 03008 (2018) https://doi.org/10.1051/matecconf/201822803008 CAS 2018 Research of Register Pressure Aware Loop Unrolling Optimizations for Compiler Xuehua Liu1,2, Liping Ding1,3 , Yanfeng Li1,2 , Guangxuan Chen1,4 , Jin Du5 1Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Digital Forensics Lab, Institute of Software Application Technology, Guangzhou & Chinese Academy of Sciences, Guangzhou, China 4Zhejiang Police College, Hangzhou, China 5Yunnan Police College, Kunming, China Abstract. Register pressure problem has been a known problem for compiler because of the mismatch between the infinite number of pseudo registers and the finite number of hard registers. Too heavy register pressure may results in register spilling and then leads to performance degradation. There are a lot of optimizations, especially loop optimizations suffer from register spilling in compiler. In order to fight register pressure and therefore improve the effectiveness of compiler, this research takes the register pressure into account to improve loop unrolling optimization during the transformation process. In addition, a register pressure aware transformation is able to reduce the performance overhead of some fine-grained randomization transformations which can be used to defend against ROP attacks. Experiments showed a peak improvement of about 3.6% and an average improvement of about 1% for SPEC CPU 2006 benchmarks and a peak improvement of about 3% and an average improvement of about 1% for the LINPACK benchmark. 1 Introduction fundamental transformations, because it is not only performed individually at least twice during the Register pressure is the number of hard registers needed compilation process, it is also an important part of to store values in the pseudo registers at given program optimizations like vectorization, module scheduling and point [1] during the compilation process.
    [Show full text]
  • CS153: Compilers Lecture 19: Optimization
    CS153: Compilers Lecture 19: Optimization Stephen Chong https://www.seas.harvard.edu/courses/cs153 Contains content from lecture notes by Steve Zdancewic and Greg Morrisett Announcements •HW5: Oat v.2 out •Due in 2 weeks •HW6 will be released next week •Implementing optimizations! (and more) Stephen Chong, Harvard University 2 Today •Optimizations •Safety •Constant folding •Algebraic simplification • Strength reduction •Constant propagation •Copy propagation •Dead code elimination •Inlining and specialization • Recursive function inlining •Tail call elimination •Common subexpression elimination Stephen Chong, Harvard University 3 Optimizations •The code generated by our OAT compiler so far is pretty inefficient. •Lots of redundant moves. •Lots of unnecessary arithmetic instructions. •Consider this OAT program: int foo(int w) { var x = 3 + 5; var y = x * w; var z = y - 0; return z * 4; } Stephen Chong, Harvard University 4 Unoptimized vs. Optimized Output .globl _foo _foo: •Hand optimized code: pushl %ebp movl %esp, %ebp _foo: subl $64, %esp shlq $5, %rdi __fresh2: movq %rdi, %rax leal -64(%ebp), %eax ret movl %eax, -48(%ebp) movl 8(%ebp), %eax •Function foo may be movl %eax, %ecx movl -48(%ebp), %eax inlined by the compiler, movl %ecx, (%eax) movl $3, %eax so it can be implemented movl %eax, -44(%ebp) movl $5, %eax by just one instruction! movl %eax, %ecx addl %ecx, -44(%ebp) leal -60(%ebp), %eax movl %eax, -40(%ebp) movl -44(%ebp), %eax Stephen Chong,movl Harvard %eax,University %ecx 5 Why do we need optimizations? •To help programmers… •They write modular, clean, high-level programs •Compiler generates efficient, high-performance assembly •Programmers don’t write optimal code •High-level languages make avoiding redundant computation inconvenient or impossible •e.g.
    [Show full text]
  • Handout – Dataflow Optimizations Assignment
    Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout – Dataflow Optimizations Assignment Tuesday, Mar 19 DUE: Thursday, Apr 4, 9:00 pm For this part of the project, you will add dataflow optimizations to your compiler. At the very least, you must implement global common subexpression elimination. The other optimizations listed below are optional. You may also wait until the next project to implement them if you are going to; there is no requirement to implement other dataflow optimizations in this project. We list them here as suggestions since past winners of the compiler derby typically implement each of these optimizations in some form. You are free to implement any other optimizations you wish. Note that you will be implementing register allocation for the next project, so you don’t need to concern yourself with it now. Global CSE (Common Subexpression Elimination): Identification and elimination of redun- dant expressions using the algorithm described in lecture (based on available-expression anal- ysis). See §8.3 and §13.1 of the Whale book, §10.6 and §10.7 in the Dragon book, and §17.2 in the Tiger book. Global Constant Propagation and Folding: Compile-time interpretation of expressions whose operands are compile time constants. See the algorithm described in §12.1 of the Whale book. Global Copy Propagation: Given a “copy” assignment like x = y , replace uses of x by y when legal (the use must be reached by only this def, and there must be no modification of y on any path from the def to the use).
    [Show full text]
  • Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗
    Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗ David Leopoldseder Roland Schatz Lukas Stadler Johannes Kepler University Linz Oracle Labs Oracle Labs Austria Linz, Austria Linz, Austria [email protected] [email protected] [email protected] Manuel Rigger Thomas Würthinger Hanspeter Mössenböck Johannes Kepler University Linz Oracle Labs Johannes Kepler University Linz Austria Zurich, Switzerland Austria [email protected] [email protected] [email protected] ABSTRACT Non-Counted Loops to Enable Subsequent Compiler Optimizations. In 15th Java programs can contain non-counted loops, that is, loops for International Conference on Managed Languages & Runtimes (ManLang’18), September 12–14, 2018, Linz, Austria. ACM, New York, NY, USA, 13 pages. which the iteration count can neither be determined at compile https://doi.org/10.1145/3237009.3237013 time nor at run time. State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often involves 1 INTRODUCTION duplicating also a loop’s exit condition, which thus only improves run-time performance if subsequent compiler optimizations can Generating fast machine code for loops depends on a selective appli- optimize the unrolled code. cation of different loop optimizations on the main optimizable parts This paper presents an unrolling approach for non-counted loops of a loop: the loop’s exit condition(s), the back edges and the loop that uses simulation at run time to determine whether unrolling body. All these parts must be optimized to generate optimal code such loops enables subsequent compiler optimizations. Simulat- for a loop.
    [Show full text]
  • Comparative Studies of Programming Languages; Course Lecture Notes
    Comparative Studies of Programming Languages, COMP6411 Lecture Notes, Revision 1.9 Joey Paquet Serguei A. Mokhov (Eds.) August 5, 2010 arXiv:1007.2123v6 [cs.PL] 4 Aug 2010 2 Preface Lecture notes for the Comparative Studies of Programming Languages course, COMP6411, taught at the Department of Computer Science and Software Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, QC, Canada. These notes include a compiled book of primarily related articles from the Wikipedia, the Free Encyclopedia [24], as well as Comparative Programming Languages book [7] and other resources, including our own. The original notes were compiled by Dr. Paquet [14] 3 4 Contents 1 Brief History and Genealogy of Programming Languages 7 1.1 Introduction . 7 1.1.1 Subreferences . 7 1.2 History . 7 1.2.1 Pre-computer era . 7 1.2.2 Subreferences . 8 1.2.3 Early computer era . 8 1.2.4 Subreferences . 8 1.2.5 Modern/Structured programming languages . 9 1.3 References . 19 2 Programming Paradigms 21 2.1 Introduction . 21 2.2 History . 21 2.2.1 Low-level: binary, assembly . 21 2.2.2 Procedural programming . 22 2.2.3 Object-oriented programming . 23 2.2.4 Declarative programming . 27 3 Program Evaluation 33 3.1 Program analysis and translation phases . 33 3.1.1 Front end . 33 3.1.2 Back end . 34 3.2 Compilation vs. interpretation . 34 3.2.1 Compilation . 34 3.2.2 Interpretation . 36 3.2.3 Subreferences . 37 3.3 Type System . 38 3.3.1 Type checking . 38 3.4 Memory management .
    [Show full text]
  • Introduction Inline Expansion
    CSc 553 Principles of Compilation Introduction 29 : Optimization IV Department of Computer Science University of Arizona [email protected] Copyright c 2011 Christian Collberg Inline Expansion I The most important and popular inter-procedural optimization is inline expansion, that is replacing the call of a procedure Inline Expansion with the procedure’s body. Why would you want to perform inlining? There are several reasons: 1 There are a number of things that happen when a procedure call is made: 1 evaluate the arguments of the call, 2 push the arguments onto the stack or move them to argument transfer registers, 3 save registers that contain live values and that might be trashed by the called routine, 4 make the jump to the called routine, Inline Expansion II Inline Expansion III 1 continued.. 3 5 make the jump to the called routine, ... This is the result of programming with abstract data types. 6 set up an activation record, Hence, there is often very little opportunity for optimization. 7 execute the body of the called routine, However, when inlining is performed on a sequence of 8 return back to the callee, possibly returning a result, procedure calls, the code from the bodies of several procedures 9 deallocate the activation record. is combined, opening up a larger scope for optimization. 2 Many of these actions don’t have to be performed if we inline There are problems, of course. Obviously, in most cases the the callee in the caller, and hence much of the overhead size of the procedure call code will be less than the size of the associated with procedure calls is optimized away.
    [Show full text]
  • PGI Compilers
    USER'S GUIDE FOR X86-64 CPUS Version 2019 TABLE OF CONTENTS Preface............................................................................................................ xii Audience Description......................................................................................... xii Compatibility and Conformance to Standards............................................................xii Organization................................................................................................... xiii Hardware and Software Constraints.......................................................................xiv Conventions.................................................................................................... xiv Terms............................................................................................................ xv Related Publications.........................................................................................xvii Chapter 1. Getting Started.....................................................................................1 1.1. Overview................................................................................................... 1 1.2. Creating an Example..................................................................................... 2 1.3. Invoking the Command-level PGI Compilers......................................................... 2 1.3.1. Command-line Syntax...............................................................................2 1.3.2. Command-line Options............................................................................
    [Show full text]
  • Strength Reduction of Induction Variables and Pointer Analysis – Induction Variable Elimination
    Loop optimizations • Optimize loops – Loop invariant code motion [last time] Loop Optimizations – Strength reduction of induction variables and Pointer Analysis – Induction variable elimination CS 412/413 Spring 2008 Introduction to Compilers 1 CS 412/413 Spring 2008 Introduction to Compilers 2 Strength Reduction Induction Variables • Basic idea: replace expensive operations (multiplications) with • An induction variable is a variable in a loop, cheaper ones (additions) in definitions of induction variables whose value is a function of the loop iteration s = 3*i+1; number v = f(i) while (i<10) { while (i<10) { j = 3*i+1; //<i,3,1> j = s; • In compilers, this a linear function: a[j] = a[j] –2; a[j] = a[j] –2; i = i+2; i = i+2; f(i) = c*i + d } s= s+6; } •Observation:linear combinations of linear • Benefit: cheaper to compute s = s+6 than j = 3*i functions are linear functions – s = s+6 requires an addition – Consequence: linear combinations of induction – j = 3*i requires a multiplication variables are induction variables CS 412/413 Spring 2008 Introduction to Compilers 3 CS 412/413 Spring 2008 Introduction to Compilers 4 1 Families of Induction Variables Representation • Basic induction variable: a variable whose only definition in the • Representation of induction variables in family i by triples: loop body is of the form – Denote basic induction variable i by <i, 1, 0> i = i + c – Denote induction variable k=i*a+b by triple <i, a, b> where c is a loop-invariant value • Derived induction variables: Each basic induction variable i defines
    [Show full text]
  • Lecture Notes on Peephole Optimizations and Common Subexpression Elimination
    Lecture Notes on Peephole Optimizations and Common Subexpression Elimination 15-411: Compiler Design Frank Pfenning and Jan Hoffmann Lecture 18 October 31, 2017 1 Introduction In this lecture, we discuss common subexpression elimination and a class of optimiza- tions that is called peephole optimizations. The idea of common subexpression elimination is to avoid to perform the same operation twice by replacing (binary) operations with variables. To ensure that these substitutions are sound we intro- duce dominance, which ensures that substituted variables are always defined. Peephole optimizations are optimizations that are performed locally on a small number of instructions. The name is inspired from the picture that we look at the code through a peephole and make optimization that only involve the small amount code we can see and that are indented of the rest of the program. There is a large number of possible peephole optimizations. The LLVM com- piler implements for example more than 1000 peephole optimizations [LMNR15]. In this lecture, we discuss three important and representative peephole optimiza- tions: constant folding, strength reduction, and null sequences. 2 Constant Folding Optimizations have two components: (1) a condition under which they can be ap- plied and the (2) code transformation itself. The optimization of constant folding is a straightforward example of this. The code transformation itself replaces a binary operation with a single constant, and applies whenever c1 c2 is defined. l : x c1 c2 −! l : x c (where c =
    [Show full text]
  • A Hierarchical Region-Based Static Single Assignment Form
    A Hierarchical Region-Based Static Single Assignment Form Jisheng Zhao and Vivek Sarkar Department of Computer Science, Rice University fjisheng.zhao,[email protected] Abstract. Modern compilation systems face the challenge of incrementally re- analyzing a program’s intermediate representation each time a code transforma- tion is performed. Current approaches typically either re-analyze the entire pro- gram after an individual transformation or limit the analysis information that is available after a transformation. To address both efficiency and precision goals in an optimizing compiler, we introduce a hierarchical static single-assignment form called Region Static Single-Assignment (Region-SSA) form. Static single assign- ment (SSA) form is an efficient intermediate representation that is well suited for solving many data flow analysis and optimization problems. By partitioning the program into hierarchical regions, Region-SSA form maintains a local SSA form for each region. Region-SSA supports a demand-driven re-computation of SSA form after a transformation is performed, since only the updated region’s SSA form needs to be reconstructed along with a potential propagation of exposed defs and uses. In this paper, we introduce the Region-SSA data structure, and present algo- rithms for construction and incremental reconstruction of Region-SSA form. The Region-SSA data structure includes a tree based region hierarchy, a region based control flow graph, and region-based SSA forms. We have implemented in Region- SSA form in the Habanero-Java (HJ) research compiler. Our experimental re- sults show significant improvements in compile-time compared to traditional ap- proaches that recompute the entire procedure’s SSA form exhaustively after trans- formation.
    [Show full text]