Administration • Problem Set 3 due Oct. 15 (next CS 4120 !ursday) Introduction to • Prelim 1 Oct. 20 (Wednesday after) Andrew Myers • Programming Assignment 4 due Oct. 26 Cornell University

Lecture 19: Introduction to Optimization 9 Oct 09

CS 4120 Introduction to Compilers 2

Trivial Rewriting abstract code • Can convert abstract assembly to real • Given instruction, replace every temporary assembly easily (but generate bad code) in instruction with one of three registers • Allocate every temporary in stack frame e[acd]x rather than to a real register • Add mov instructions before instruction – t1 = [ebp-4], t2=[ebp-8], t3 = [ebp-12], ... to load registers properly • Every temporary stored in di"erent place -- conflict is impossible • Add mov instructions after to put data • Up to three registers needed to shuttle data in back into stack frame (as necessary) and out of stack frame (max. # registers used push t1 ! mov eax, [ebp-4]; push eax by one instruction) : e.g, eax, ecx, edx add t2, t3 ! ? CS 4120 Introduction to Compilers 3 CS 4120 Introduction to Compilers 4 Result Optimization • Simple way to get working code – will use • Next topic: how to generate better code for Programming Assignment 4 through optimization. • Code is bigger and slower than necessary • !is course covers the most valuable and • Refinement: allocate temporaries to registers straightforward optimizations – much until registers run out (3 temporaries on more to learn! IA-32, 20+ on MIPS, PowerPC) – Other sources: • Code generation technique actually used by • Muchnick has 10 chapters of optimization some compilers when all optimization techniques turned o" (-O0) • Cooper and Torczon also cover optimization

CS 4120 Introduction to Compilers 5 CS 4120 Introduction to Compilers 6

How fast can you go? Goal of optimization 10000 direct source code interpreters • Help programmers – clean, modular, high-level source code 1000 tokenized program interpreters (BASIC, Tcl) AST interpreters (CS 3110 RCL, Perl 4) – but compile to assembly-code performance 100 • Optimizations are code transformations bytecode interpreters (Java, Perl 5, OCaml) call-threaded interpreters – can’t change meaning of program to behavior not 10 pointer-threaded interpreters (FORTH) allowed by source. simple code generation (PA4, JIT) • Di"erent kinds of optimization: 1 register allocation naive assembly code local optimization – space optimization: reduce memory use expert assembly codeglobal optimization 0.1 – time optimization: reduce execution time – power optimization: reduce power usage CS 4120 Introduction to Compilers 7 CS 4120 Introduction to Compilers 8 Why do we need optimization? Where to optimize? • Programmers may write suboptimal code to make it • Usual goal: improve time performance clearer. • Problem: many optimizations trade o" space versus • High-level language may make avoiding redundant time. computation inconvenient or impossible • Example: replaces a loop body with N a(i)(j) = a(i)(j) + 1 copies. – Increasing code space slows program down a little, speeds up • Architectural independence. one loop • Modern architectures assume optimization–hard to – Frequently executed code with long loops: space/time tradeo" optimize by hand. is generally a win – Infrequently executed code: optimize code space at expense of time, saving instruction cache space – Complex optimizations may never pay o"! • Focus of optimization: program hot spots

CS 4120 Introduction to Compilers 9 CS 4120 Introduction to Compilers 10

Safety Writing fast programs in practice

• Possible opportunity for loop-invariant code motion: 1. Pick the right algorithms and data while (b) { z = y/x; // x, y not assigned in loop structures: design for locality and few … } operations • Transformation: invariant code out of loop: 2. Turn on optimization and profile to figure z = y/x; while (b) { Preserves meaning? out program hot spots. … Faster? } 3. Evaluate whether design works; if so… !ree aspects of an optimization: 4. Tweak source code until optimizer does • the code transformation “the right thing” to machine code • safety of transformation – understanding optimizers helps! • performance improvement CS 4120 Introduction to Compilers 11 CS 4120 Introduction to Compilers 12 Structure of an optimization When to apply optimization

• Optimization is a code transformation AST Inlining HIR Specialization • Applied at some stage of (HIR, Constant folding IR Constant propagation MIR, LIR) • In general requires some analysis: Canonical Loop-invariant code motion MIR IR Common sub-expression elimination – safety analysis to determine where transformation does not change meaning Constant folding & propagation Abstract Branch prediction/optimization (e.g. ) Assembly Register allocation Loop unrolling – cost analysis to determine where it ought to Cache optimization speed up code (e.g., which variable to spill) LIR Assembly Peephole optimizations

CS 4120 Introduction to Compilers 13 CS 4120 Introduction to Compilers 14

Register allocation Constant folding

• Goal: convert abstract assembly (infinite no. of registers) into • Idea: if operands are known at compile time, real assembly (6 registers) evaluate at compile time when possible. mov eax, ebx mov t1, t2 int x = (2 + 3)*4*y; ⇒ int x = 5*4*y; add t1, [bp–4] add eax, [ebp-4] mov t3, [bp-8] mov ebx, [ebp–8] ⇒ int x = 20*y; mov t4, t3 cmp t1, t4 cmp eax, ebx • Can perform at every stage of compilation – Constant expressions are created by translation and by optimization Need to reuse registers aggressively (e.g., ebx) a[2] MEM(MEM(a) + 2*4) • Coalesce registers (t3, t4) to eliminate mov’s ⇒ • May be impossible without spilling some temporaries to stack ⇒ MEM(MEM(a) + 8) CS 4120 Introduction to Compilers 15 CS 4120 Introduction to Compilers 16 Constant folding conditionals Algebraic simplification

• More general form of constant folding: take advantage of simplification if (true) S ⇒ S rules a * 1 a a * 0 0 if (false) S ⇒ ; ⇒ ⇒ a + 0 ⇒ a identities b | false b b & true b if (true) S else S’ ⇒ S ⇒ ⇒ (a + 1) + 2 ⇒ a + (1 + 2) ⇒ a+3 reassociation if (false) S else S’ ⇒ S’ a * 4 ⇒ a shl 2 a * 7 ⇒ (a shl 3) − a while (false) S ⇒ ; a / 32767 ⇒ a shr 15 + a shr 30 strength reduction

• Must be careful with floating point and with overflow - algebraic identities may give wrong or less precise answers. if (2 > 3) S ⇒ if (false) S ⇒ ; – E.g., (a+b)+c ≠ a+(b+c) in floating point if a,b small. CS 4120 Introduction to Compilers 17 CS 4120 Introduction to Compilers 18

Unreachable code elimination Inlining • Replace a function call with the body of the function: • Basic blocks not contained by any trace f(a: int):int = { b:int=1; n:int = 0; leading from starting basic block are while (n

CS 4120 Introduction to Compilers 19 CS 4120 Introduction to Compilers 20 Specialization Constant propagation • Idea: create specialized versions of functions (or • If value of variable is known to be a methods) that are called from di"erent places w/ di"erent args constant, replace use of variable with constant class A implements I { m( ) {…} } class B implements I { m( ) {…} } • Value of variable must be propagated f(x: I) { x.m( ); } // don’t know which m forward from point of assignment a = new A(); f(a) // know A.m int x = 5; b = new B(); f(b) // know B.m int y = x*2; • Can inline methods when implementation is known • Impl. known if only one implementing class int z = a[y]; // = MEM(MEM(a) + y*4) • Can specialize inherited methods (e.g., HotSpot JIT) • Interleave with constant folding! CS 4120 Introduction to Compilers 21 CS 4120 Introduction to Compilers 22

Dead code elimination Copy propagation • If side e"ect of a statement can never be observed, • Given assignment x = y, replace subsequent can eliminate the statement uses of x with y x = y*y; // dead! … // x unused … • May make x a dead variable, result in dead code x = z*z; x = z*z; • Need to determine where copies of y propagate • Dead variable: if never read after defn. to int i; while (m 1) if (y > 1) { x = x * f(x - 1) x = y * f(y - 1)

CS 4120 Introduction to Compilers 23 CS 4120 Introduction to Compilers 24 Redundancy Elimination Loops • Common Subexpression Elimination • Program hot spots are usually loops (exceptions: OS kernels, compilers) (CSE) combines redundant computations • Most execution time in most programs is spent in a(i) = a(i) + 1 loops: 90/10 is typical. ! [[a]+i*4] = [[a]+i*4] + 1 • Loop optimizations are important, e"ective, and ! t1 = [a] + i*4; [t1] = [t1]+1 numerous • Need to determine that expression always has same value in both places b[j]=a[i]+1; c[k]=a[i] ! t1=a[i]; b[j]=t1+1; c[k]=t1 ?

CS 4120 Introduction to Compilers 25 CS 4120 Introduction to Compilers 26

Loop-invariant code motion Loop-invariant code motion • Another form of redundancy elimination for (i = 0; i < a.length; i++) { • If result of a statement or expression does not // a not assigned in loop change during loop, and it has no externally-visible } side e"ect (!), can hoist its computation before loop hoisted loop-invariant expression • Often useful for array element addressing computations – invariant code not visible at source t1 = a.length ; level for (i = 0; i < t1; i++) { • Requires analysis to identify loop-invariant … expressions }

CS 4120 Introduction to Compilers 27 CS 4120 Introduction to Compilers 28 Strength reduction Loop unrolling • Replace expensive operations (*,/) by cheap ones (+, −) via dependent • Branches are expensive; unroll loop to avoid them: for (int i = 0; i < n; i++) { for (i = 0; i

Summary • Many useful optimizations that can transform code to make it faster/smaller/... • Whole is greater than sum of parts: optimizations should be applied together, sometimes more than once, at di"erent levels. • Problem: when are optimizations are safe and when are they e"ective? !Dataflow analysis !Control flow analysis !

CS 4120 Introduction to Compilers 31