Introduction What Do We Optimize?

CSc 553 Principles of Compilation Introduction X16 : Optimization I Department of Computer Science University of Arizona [email protected] Copyright c 2011 Christian Collberg Intermediate Code Optimization Local Opt Global Opt Inter−Proc. Opt Transformation 1 Transformation 1 Transformation 1 Transformation 2 Transformation 2 Transformation 2 .... .... .... What do we Optimize? Lexical Analysis AST Syntactic Analysis Semantic Analysis gen[S]={..} kill[S1]={..} ... Tree SOURCE Transformations Build Call Data Flow B5 Graph Analysis AST Machine Code Gen. Interm. Code Gen. Peephole Opt. Control Flow Analysis What do we Optimize I? What do we Optimize II? 1 Optimize everything, all the time. The problem is that 3 Turn optimization on when program is complete. optimization interferes with debugging. In fact, many (most) Unfortunately, optimizers aren’t perfect, and a program that compilers don’t let you generate an optimized program with performed OK with debugging turned on often behaves debugging information. The problem of debugging optimized differently when debugging is off and optimization is on. code is an important research field. 4 Optimize inner loops only. Unfortunately, procedure calls can Furthermore, optimization is probably the most time hide inner loops: consuming pass in the compiler. Always optimizing everything (even routines which will never be called!) wastes valuable PROCEDURE P(n); time. BEGIN FOR k:=1 TO n DO · · · END; 2 The programmer decides what to optimize. The problem is END P; that the programmer has a local view of the code. When timing a program programmers are often very surprised to see FOR i:=1 TO 10000 DO P(i) END; where most of the time is spent. What do we Optimize III? 5 Use profiling information to guide what to optimize. Compiler Code Optimizer Front−End Generator Profile Local vs. Global vs. data "Carefully Executable chosen input" Inter-procedural Optimization Program 6 Runtime code generation/optimization. We delay code generation and optimization until execution time. At that time we have more information to guide the otpimizations: Execution time Interm. Code Interm. Compiled call P(x,3) Compiler Code Table Code Table Gen spec. code Front−End P [...] P [...] for P(_,3), Q [...] Q [...] optimize, then P(_,3) [...] call Specialized Code Local, Global, Inter-Proc. I Local Optimization I Transformations: Some compilers optimize more aggressively than others. An Local common subexpression elimination. aggressive compiler optimizes over a large piece of code, a Local copy propagation. simple one only considers a small chunk of code at a time. Local dead-code elimination. Local Optimization Algebraic optimization. Consider each basic block by itself. Jump-to-jump removal. All compilers do this. Reduction in strength. Global Optimization Peephole Optimization: Consider each procedure by itself. On machine and/or interm. code. Most compilers do this. 1 Examine a “window” of instructions. Inter-Procedural Opt. 2 Improve code in window. Consider the control flow between procedures. 3 Slide window. A few compilers do this. 4 Repeat until “optimal”. Local Opt. II Local Opt. IV Redundant Loads: Algebraic Simplification: x := x + 0; ⇒ A := A + 1; ⇒ x := x - 0; ⇒ set A, %l0 x := x ∗ 1; ⇒ set A, %l1 x := 1 ∗ 1; ⇒ Reduction in ld [%l1], %l1 x := x / 1; ⇒ add %l1, 1, %l1 x := x ** 2; ⇒ x := x * x; st %l1, [%l0] ⇒ f := f / 2.0; ⇒ f := f ∗ 0.5; set A, %l0 Strength: ld [%l0], %l1 add %l1, 1, %l1 x := x ∗ 32; ⇒ x := SHL(x, 5); st %l1, [%l0] Jumps-to-jumps: x := x ∗ 100; ⇒ x := x ∗ (64 + 32 + 4) ⇒ if a < b goto L1 ..... x := x ∗ 64 + x ∗32 + x ∗4 ⇒ L1: goto L2 ..... x := SHL(x,6) + SHL(x,5) + SHL(x,2) L2: goto L3 ..... ⇒ if a < b goto L3 ..... Local, Global,Inter-Proc. II Local, Global, Inter-Proc. III Original Code After Local Opt FUNCTION P (X,n): INT; FUNCTION P (X,n): INT; IF n=3 THEN RETURN X[1] IF n=3 THEN RETURN X[1] ELSE RETURN X[n]; ELSE RETURN X[n] CONST R = 1; BEGIN BEGIN K := 3; K := 3; ... ... IF P(X,K) = X[1] THEN IF P(X,K) = X[1] THEN X[1] := R * (X[1] ** 2) X[1] := X[1] * X[1] After Local Opt After Global Opt FUNCTION P (X,n): INT; FUNCTION P (X,n): INT; IF n=3 THEN RETURN X[1] IF n=3 THEN RETURN X[1] ELSE RETURN X[n] ELSE RETURN X[n] BEGIN BEGIN K := 3; ... ... P(X,K) = X[1] P(X,3) = X[1] Local, Global, Inter-Proc. IV FUNCTION P (X,n): INT; IF n=3 THEN RETURN X[1] ELSE RETURN X[n] BEGIN IF P(X,3) = X[1] THEN Local Optimization X[1] := X[1] * X[1] After Inter-Procedural Opt BEGIN IF TRUE THEN X[1] := X[1] * X[1] After Another Local Opt BEGIN X[1] := X[1] * X[1] Delete P if it isn’t used elsewhere. This can maybe be Local Optimization I Redundant Loads Transformations A naive code generator will generate the same address or Local common subexpression elimination. variable several times. Peephole optimization over the Local copy propagation. generated code will easily remove these. Local dead-code elimination. A := A + 1; Algebraic optimization. ⇓ Jump-to-jump removal. set A, %l0 Reduction in strength. set A, %l1 Peephole Optimization ld [%l1], %l1 On machine and/or interm. code. add %l1, 1, %l1 st %l1, [%l0] 1 Examine a “window” of instructions. ⇓ 2 Improve code in window. 3 Slide window. set A, %l0 4 Repeat until “optimal”. ld [%l0], %l1 add %l1, 1, %l1 Jumps-to-jumps Algebraic Simplification Complicated boolean expressions (with many and, or, Beware of numerical problems: nots) can easily produce lots of jumps to jumps. A peephole 1 (x ∗ 0.00000001) ∗ 10000000000.0 may produce a different optimization pass over the generated code can remove these. result than (x ∗ 1000.0)! if a < b goto L1 FORTRAN requires that parenthesis be honored: ... (5.0 ∗ x) ∗ (6.0 ∗ y) can’t be evaluated as (30.0 ∗ x ∗ y). L1: goto L2 Note that multiplication is often faster than division. ... L2: goto L3 x := x + 0; ⇒ x := x - 0; ⇒ ⇓ x := x ∗ 1; ⇒ x := 1 ∗ 1; ⇒ x := 1 if a < b goto L3 x := x / 1; ⇒ ... x := x ** 2; ⇒ x := x * x; L1: goto L3 f := f / 2.0; ⇒ f := f ∗ 0.5; ... L2: goto L3 Reduction in Strength SHL(x,y) = shift x left y steps. Multiplcation (and division) by a constant is a common operation. They can be replaced by cheaper sequences of shifts and adds. x := x ∗ 32 ; Global Optimization ⇓ x := SHL(x, 5); x := x ∗ 100 ; ⇓ x := x ∗ (64 + 32 + 4) ⇓ x := x ∗ 64 + x ∗32 + x ∗4 ⇓ x := SHL(x,6) + SHL(x,5) + SHL(x,2) Global Optimization I Control Flow Graphs We perform our optimizations over the control flow graph of a Makes use of control-flow and data-flow analysis. procedure. B1 Transformations Dead code elimination. Basic Block Common subexpression elimination (local and global). B2 Loop unrolling. if ... goto B2 Code hoisting. B3 Induction variables. if ... goto B3 Reduction in strenght. Copy propagation. B4 Live variable analysis. if ... goto B6 Uninitialized Variable Analysis. B5 B6 Common Sub-Expr. Elimination Copy Propagation B1 Many optimizations produce X := Y . After an assignment X := Y , replace references to X by Y . Remove the assignment if possible. B2 No changes to x! B3 x := t3 x := t3 A[t4] := x A[t4] := t3 B3 B4 t3 := 4 * j t3 := 4 * j t5 := j * 4 A[t4] := t3 A[t3] := 20 A[t5] := 20 B4 No more uses of x! B5 No changes to j here! A[x] := 20 A[t3] := 20 Dead Code Elimination Induction Variables A piece of code is dead if we can determine at compile time If i and j are updated simultaneously in a loop, and that it will never be executed. j = i ∗ c1 + c2 (c1, c2 are consants) we can remove one of B1 them, and/or replace ∗ by +. x := 23 B1 x := 23 x := 23; REPEAT j := j + 1; B2 No changes to x here! t4 := A[4*j] UNTIL ....; B2 if x>20 goto B4 B3 t1 := 4 * j t1 := 4 * j B3 j := j + 1 A[t3] := 20 j := j + 1 t1 := 4 * j t1 := t1 + 4 B4 t4 := A[t1] Nor here! t4 := A[t1] if ... goto B3 if ... goto B3 B4 Code Hoisting Move code that is computed twice in different basic blocks to a common ancestor block. IF a<5 THEN X := A[i+3] + 9; ELSE X := A[i+3] * 6 END Loop Unrolling B1 if a<5 goto B2 B2 B3 X:=A[i+3]+9 X:=A[i+3]*6 B1 t1 := A[i+3]; if a<5 goto B2 B2 B3 X:=t1+9 X:=t1*6 Loop Unrolling Constant Bounds FOR i := 1 TO 5 DO A[i]:=i END ⇓ A[1] := 1; A[2] := 2; A[3] := 3; A[4] := 4; A[5] := 5; Inter-procedural Optimizations Variable Bounds FOR i := 1 TO n DO A[i] := i END ⇓ i := 1; WHILE i <= (n-4) DO A[i]:=i; A[i+1]:=i+1; A[i+2]:=i+2; A[i+3]:=i+3; A[i+4]:=i+4; i:=i+5; END; WHILE i<=n DO A[i]:=i; i:=i+1; END Loop unrolling increases code size. How does this effect Inter-procedural Opt. Inline Expansion I Original Code: Consider the entire program during optimization. FUNCTION Power (n, exp:INT):INT; IF exp < 0 THEN result := 0; How can this be done for languages that support separately ELSIF exp = 0 THEN result := 1; compiled modules? ELSE result := n; Transformations FOR i := 2 TO exp DO Inline expansion result := result * n; Replace a procedure call with the code of the called procedure.

Introduction What Do We Optimize?

A Deep Dive Into the Interprocedural Optimization Infrastructure

Handout – Dataflow Optimizations Assignment

Comparative Studies of Programming Languages; Course Lecture Notes

Introduction Inline Expansion

PGI Compilers

Dataflow Optimizations

Eliminating Scope and Selection Restrictions in Compiler Optimizations

Compiler-Based Code-Improvement Techniques

Foundations of Scientific Research

RX Compiler Application Notes

TMS320C6000 Optimizing Compiler V8.2.X User's Guide (Rev. B)

Secrets of the Glasgow Haskell Compiler Inliner