10/28/2013

Optimization

• Next topic: how to generate better code CS 4120 through optimization. Introduction to • This course covers the most valuable Ross Tate and straightforward optimizations – Cornell University much more to learn! • Other sources: Lecture 25: Introduction to Optimization • Muchnick has 10 chapters of optimization techniques • Cooper and Torczon also cover optimization

CS 4120 Introduction to Compilers 2

How fast can you go? Goal of optimization 10000 direct source code interpreters • Help programmers • clean, modular, high-level source code 1000 tokenized program interpreters (BASIC, Tcl) • but compile to assembly-code performance AST interpreters (CS 3110 RCL, Perl 4) 100 • Optimizations are code transformations bytecode interpreters (Java, Perl 5, OCaml) • can’t change meaning of program to behavior call-threaded interpreters 10 pointer-threaded interpreters (FORTH) not allowed by source simple code generation (PA4, JIT) • Different kinds of optimization: • space optimization: reduce memory use 1 naive assembly code local optimization expert assembly code global optimization • time optimization: reduce execution time 0.1 • power optimization: reduce power usage

CS 4120 Introduction to Compilers 3 CS 4120 Introduction to Compilers 4

Why do we need optimization? Where to optimize?

• Programmers may write suboptimal code to • Usual goal: improve time performance make it clearer. • But: many optimizations trade off space versus time. • High-level language may make avoiding • E.g.: replaces loop body with N copies. redundant computation inconvenient or • Increasing code space slows program down a little, impossible speeds up one loop • Frequently executed code with long loops: space/time – a[i][j] = a[i][j] + 1 tradeoff is generally a win • Infrequently executed code: optimize code space at • Exploit patterns inexpressible at source level expense of time, saving instruction cache space • Clean up mess from translation stages • Complex optimizations may never pay off! • Architectural independence. • Ideal focus of optimization: program hot spots – Modern architectures make it hard to hand optimize

CS 4120 Introduction to Compilers 5 CS 4120 Introduction to Compilers 6

1 10/28/2013

Safety Writing fast programs in practice

• Possible opportunity for loop-invariant code motion: 1.Pick the right algorithms and data while (b) { z = y/x; // x, y not assigned in loop structures: design for locality and few … operations } • Transformation: invariant code out of loop: 2.Turn on optimization and profile to figure z = y/x; out program hot spots while (b) { Preserves meaning? … 3.Evaluate whether design works; if so… } Faster? • Three aspects of an optimization: 4.Tweak source code until optimizer does • the code transformation “the right thing” to machine code • safety of transformation • understanding optimizers helps! • performance improvement

CS 4120 Introduction to Compilers 7 CS 4120 Introduction to Compilers 8

Structure of an optimization When to apply optimization

• Optimization is a code transformation AST Inlining HIR Specialization • Applied at some stage of IR – HIR, MIR, LIR Constant propagation • In general requires some analysis: Canonical • safety analysis to determine where MIR IR Loop-invariant code motion Common sub-expression elimination transformation does not change meaning (e.g. ) Abstract Constant folding & propagation Assembly Branch prediction/optimization • cost analysis to determine where it ought to Register allocation speed up code (e.g., which variable to spill) Loop unrolling LIR Assembly Cache optimization Peephole optimizations

CS 4120 Introduction to Compilers 9 CS 4120 Introduction to Compilers 10

Register allocation Constant folding

• Goal: convert abstract assembly (infinite # of registers) • Idea: if operands are known at compile time, into real assembly (6 registers) evaluate at compile time when possible. mov t1, t2 mov rax, rbx int x = (2 + 3)*4*y; ⇒ int x = 5*4*y; add t1, -4(rbp) add rax, -4(rbp) ⇒ int x = 20*y; mov t3, -8(rbp) mov rbx, -8(rbp) • Can perform at every stage of compilation mov t4, t3 cmp t1, t4 cmp rax, rbx • Constant expressions are created by translation and by optimization a[2] ⇒ MEM(MEM(a) + 2*4) • Need to reuse registers aggressively (e.g., rbx) ⇒ MEM(MEM(a) + 8) • Coalesce registers (t3, t4) to eliminate mov’s • May be impossible without spilling some temporaries to stack

CS 4120 Introduction to Compilers 11 CS 4120 Introduction to Compilers 12

2 10/28/2013

Constant folding conditionals Algebraic simplification

• More general form of constant folding: take advantage of if (true) S ⇒ S simplification rules a * 1 ⇒ a b | false ⇒ b if (false) S ⇒ ; a + 0 ⇒ a b & true ⇒ b identities if (true) S else S’ ⇒ S a * 0 ⇒ 0 b & false ⇒ false zeroes (a + 1) + 2 ⇒ a + (1 + 2) ⇒ a+3 reassociation if (false) S else S’ ⇒ S’ a * 4 ⇒ a shl 2 a * 7 ⇒ (a shl 3) − a strength reduction a / 32767 ⇒ a shr 15 + a shr 30 while (false) S ⇒ ; • Must be careful with floating point and with overflow - algebraic while (true) ; ⇒ ; ???? identities may give wrong or less precise answers if (2 > 3) S ⇒ if (false) S ⇒ ; • E.g., (a+b)+c ≠ a+(b+c) in floating point if a,b small

CS 4120 Introduction to Compilers 13 CS 4120 Introduction to Compilers 14

Unreachable-code elimination Inlining • Replace a function call with the body of the function: • Basic blocks not contained by any f(a: int):int = { b:int=1; n:int = 0; trace leading from starting basic while (n

CS 4120 Introduction to Compilers 15 CS 4120 Introduction to Compilers 16

Constant propagation Dead-code elimination • If side effect of a statement can never be • If value of variable is known to be a observed, can eliminate the statement constant, replace use of variable with x = y*y; // dead! constant … // x unused • Value of variable must be propagated x = z*z; x = z*z; forward from point of assignment • Dead variable: if never read after definition int x = 5; int i; int y = x*2; while (m

CS 4120 Introduction to Compilers 17 CS 4120 Introduction to Compilers 18

3 10/28/2013

Copy propagation Redundancy Elimination

• Given assignment x = y, replace • Common-Subexpression Elimination subsequent uses of x with y (CSE) combines redundant computations • May make x a dead variable, result in a(i) = a(i) + 1 dead code ⇒ [[a]+i*4] = [[a]+i*4] + 1 • Need to determine where copies of y ⇒ t1 = [a] + i*4; [t1] = [t1]+1 propagate to • Need to determine that expression always has same value in both places x = y x = y b[j]=a[i]+1; c[k]=a[i] ⇏ t1=a[i]; b[j]=t1+1; c[k]=t1 x = x * f(x - 1) x = y * f(y - 1)

CS 4120 Introduction to Compilers 19 CS 4120 Introduction to Compilers 20

Loops Loop-invariant code motion

• Program hot spots are usually loops • Another form of redundancy elimination (exceptions: OS kernels, compilers) • If result of a statement or expression does • Most execution time in most programs is not change during loop, and it has no spent in loops: 90/10 is typical. externally-visible side effect (!), can hoist its • Loop optimizations are important, effective, computation before loop and numerous • Often useful for array element addressing computations – invariant code not visible at source level • Requires analysis to identify loop-invariant expressions

CS 4120 Introduction to Compilers 21 CS 4120 Introduction to Compilers 22

Loop-invariant code motion Strength reduction • Replace expensive operations (*,/) by cheap ones for (i = 0; i < a.length; i++) { (+,−) via dependent induction variable S // a not assigned in S } hoisted loop-invariant expression for (int i = 0; i < n; i++) { a[i*3] = 1; t1 = a.length; } int j = 0; for (i = 0; i < t1; i++) { for (int i = 0; i < n; i++){ S a[ j ] = 1; j = j+3; } }

CS 4120 Introduction to Compilers 23 CS 4120 Introduction to Compilers 24

4 10/28/2013

Loop unrolling Summary • Many useful optimizations that can transform • Branches are expensive code to make it faster/smaller/... – unroll loop to avoid them: • Whole is greater than sum of parts: optimizations for (i = 0; i

CS 4120 Introduction to Compilers 25 CS 4120 Introduction to Compilers 26

5