Introduction to Optimization • Muchnick Has 10 Chapters of Optimization Techniques • Cooper and Torczon Also Cover Optimization

10/28/2013 Optimization • Next topic: how to generate better code CS 4120 through optimization. Introduction to Compilers • This course covers the most valuable Ross Tate and straightforward optimizations – Cornell University much more to learn! • Other sources: Lecture 25: Introduction to Optimization • Muchnick has 10 chapters of optimization techniques • Cooper and Torczon also cover optimization CS 4120 Introduction to Compilers 2 How fast can you go? Goal of optimization 10000 direct source code interpreters • Help programmers • clean, modular, high-level source code 1000 tokenized program interpreters (BASIC, Tcl) • but compile to assembly-code performance AST interpreters (CS 3110 RCL, Perl 4) 100 • Optimizations are code transformations bytecode interpreters (Java, Perl 5, OCaml) • can’t change meaning of program to behavior call-threaded interpreters 10 pointer-threaded interpreters (FORTH) not allowed by source simple code generation (PA4, JIT) • Different kinds of optimization: • space optimization: reduce memory use 1 register allocation naive assembly code local optimization expert assembly code global optimization • time optimization: reduce execution time 0.1 • power optimization: reduce power usage CS 4120 Introduction to Compilers 3 CS 4120 Introduction to Compilers 4 Why do we need optimization? Where to optimize? • Programmers may write suboptimal code to • Usual goal: improve time performance make it clearer. • But: many optimizations trade off space versus time. • High-level language may make avoiding • E.g.: loop unrolling replaces loop body with N copies. redundant computation inconvenient or • Increasing code space slows program down a little, impossible speeds up one loop • Frequently executed code with long loops: space/time – a[i][j] = a[i][j] + 1 tradeoff is generally a win • Infrequently executed code: optimize code space at • Exploit patterns inexpressible at source level expense of time, saving instruction cache space • Clean up mess from translation stages • Complex optimizations may never pay off! • Architectural independence. • Ideal focus of optimization: program hot spots – Modern architectures make it hard to hand optimize CS 4120 Introduction to Compilers 5 CS 4120 Introduction to Compilers 6 1 10/28/2013 Safety Writing fast programs in practice • Possible opportunity for loop-invariant code motion: 1.Pick the right algorithms and data while (b) { z = y/x; // x, y not assigned in loop structures: design for locality and few … operations } • Transformation: invariant code out of loop: 2.Turn on optimization and profile to figure z = y/x; out program hot spots while (b) { Preserves meaning? … 3.Evaluate whether design works; if so… } Faster? • Three aspects of an optimization: 4.Tweak source code until optimizer does • the code transformation “the right thing” to machine code • safety of transformation • understanding optimizers helps! • performance improvement CS 4120 Introduction to Compilers 7 CS 4120 Introduction to Compilers 8 Structure of an optimization When to apply optimization • Optimization is a code transformation AST Inlining HIR Specialization • Applied at some stage of compiler Constant folding IR – HIR, MIR, LIR Constant propagation Value numbering • In general requires some analysis: Canonical Dead code elimination • safety analysis to determine where MIR IR Loop-invariant code motion Common sub-expression elimination transformation does not change meaning Strength reduction (e.g. live variable analysis) Abstract Constant folding & propagation Assembly Branch prediction/optimization • cost analysis to determine where it ought to Register allocation speed up code (e.g., which variable to spill) Loop unrolling LIR Assembly Cache optimization Peephole optimizations CS 4120 Introduction to Compilers 9 CS 4120 Introduction to Compilers 10 Register allocation Constant folding • Goal: convert abstract assembly (infinite # of registers) • Idea: if operands are known at compile time, into real assembly (6 registers) evaluate at compile time when possible. mov t1, t2 mov rax, rbx int x = (2 + 3)*4*y; ⇒ int x = 5*4*y; add t1, -4(rbp) add rax, -4(rbp) ⇒ int x = 20*y; mov t3, -8(rbp) mov rbx, -8(rbp) • Can perform at every stage of compilation mov t4, t3 cmp t1, t4 cmp rax, rbx • Constant expressions are created by translation and by optimization a[2] ⇒ MEM(MEM(a) + 2*4) • Need to reuse registers aggressively (e.g., rbx) ⇒ MEM(MEM(a) + 8) • Coalesce registers (t3, t4) to eliminate mov’s • May be impossible without spilling some temporaries to stack CS 4120 Introduction to Compilers 11 CS 4120 Introduction to Compilers 12 2 10/28/2013 Constant folding conditionals Algebraic simplification • More general form of constant folding: take advantage of if (true) S ⇒ S simplification rules a * 1 ⇒ a b | false ⇒ b if (false) S ⇒ ; a + 0 ⇒ a b & true ⇒ b identities if (true) S else S’ ⇒ S a * 0 ⇒ 0 b & false ⇒ false zeroes (a + 1) + 2 ⇒ a + (1 + 2) ⇒ a+3 reassociation if (false) S else S’ ⇒ S’ a * 4 ⇒ a shl 2 a * 7 ⇒ (a shl 3) − a strength reduction a / 32767 ⇒ a shr 15 + a shr 30 while (false) S ⇒ ; • Must be careful with floating point and with overflow - algebraic while (true) ; ⇒ ; ???? identities may give wrong or less precise answers if (2 > 3) S ⇒ if (false) S ⇒ ; • E.g., (a+b)+c ≠ a+(b+c) in floating point if a,b small CS 4120 Introduction to Compilers 13 CS 4120 Introduction to Compilers 14 Unreachable-code elimination Inlining • Replace a function call with the body of the function: • Basic blocks not contained by any f(a: int):int = { b:int=1; n:int = 0; trace leading from starting basic while (n<a) (b = 2*b); return b; } g(x: int):int = { return 1+ f(x); } block are unreachable and can be ⇒ g(x:int):int = { fx:int; a:int = x; eliminated { b:int=1; n:int = 0; while (n<a) ( b = 2*b); fx=b; goto finish; } • Performed at canonical-IR or finish: return 1 + fx; } assembly-code levels • Best done on HIR • Can inline methods, but more difficult – there can be only one f. • Reductions in code size improve • May need to rename variables to avoid name capture—consider if cache, TLB performance. f refers to a global variable x CS 4120 Introduction to Compilers 15 CS 4120 Introduction to Compilers 16 Constant propagation Dead-code elimination • If side effect of a statement can never be • If value of variable is known to be a observed, can eliminate the statement constant, replace use of variable with x = y*y; // dead! constant … // x unused • Value of variable must be propagated x = z*z; x = z*z; forward from point of assignment • Dead variable: if never read after definition int x = 5; int i; int y = x*2; while (m<n) ( m++; i = i+1) while (m<n) (m++) int z = a[y]; // = MEM(MEM(a) + y*4) • Other optimizations create dead • Interleave with constant folding! statements/variables CS 4120 Introduction to Compilers 17 CS 4120 Introduction to Compilers 18 3 10/28/2013 Copy propagation Redundancy Elimination • Given assignment x = y, replace • Common-Subexpression Elimination subsequent uses of x with y (CSE) combines redundant computations • May make x a dead variable, result in a(i) = a(i) + 1 dead code ⇒ [[a]+i*4] = [[a]+i*4] + 1 • Need to determine where copies of y ⇒ t1 = [a] + i*4; [t1] = [t1]+1 propagate to • Need to determine that expression always has same value in both places x = y x = y b[j]=a[i]+1; c[k]=a[i] ⇏ t1=a[i]; b[j]=t1+1; c[k]=t1 x = x * f(x - 1) x = y * f(y - 1) CS 4120 Introduction to Compilers 19 CS 4120 Introduction to Compilers 20 Loops Loop-invariant code motion • Program hot spots are usually loops • Another form of redundancy elimination (exceptions: OS kernels, compilers) • If result of a statement or expression does • Most execution time in most programs is not change during loop, and it has no spent in loops: 90/10 is typical. externally-visible side effect (!), can hoist its • Loop optimizations are important, effective, computation before loop and numerous • Often useful for array element addressing computations – invariant code not visible at source level • Requires analysis to identify loop-invariant expressions CS 4120 Introduction to Compilers 21 CS 4120 Introduction to Compilers 22 Loop-invariant code motion Strength reduction • Replace expensive operations (*,/) by cheap ones for (i = 0; i < a.length; i++) { (+,−) via dependent induction variable S // a not assigned in S } hoisted loop-invariant expression for (int i = 0; i < n; i++) { a[i*3] = 1; t1 = a.length; } int j = 0; for (i = 0; i < t1; i++) { for (int i = 0; i < n; i++){ S a[ j ] = 1; j = j+3; } } CS 4120 Introduction to Compilers 23 CS 4120 Introduction to Compilers 24 4 10/28/2013 Loop unrolling Summary • Many useful optimizations that can transform • Branches are expensive code to make it faster/smaller/... – unroll loop to avoid them: • Whole is greater than sum of parts: optimizations for (i = 0; i<n; i++) { S } should be applied together, sometimes more than once, at different levels for (i = 0; i < n−3; i+=4) {S; S; S; S;} • Transformation is relatively easy • The hard problem: when are optimizations are for ( ; i < n; i++) S; safe and when are they effective? – Data-flow analysis • Gets rid of ¾ of conditional branches! – Control-flow analysis • Space-time tradeoff: not a good idea for – Pointer analysis large S or small n. CS 4120 Introduction to Compilers 25 CS 4120 Introduction to Compilers 26 5 .

Load more