Course Logistics

• Want to get off the waitlist? – See me in my office (GHC 7221) after class to discuss – This course is not intended to be your first compiler course

• Let Dominic know if can’t get on Piazza or Blackboard for this course 15-745: Optimizing Compilers for Modern Architectures

• Need to get the book Lecture 1: Introduction What would you get out of this course? Structure of a Compiler Optimization Example

• Let’s run through the course webpage at http://www.cs.cmu.edu/~15745/

Carnegie Mellon Carnegie Mellon Phillip B. Gibbons 15-745: Introduction 1 15-745: Introduction 2 Phillip B. Gibbons

What Do Compilers Do? How Can the Compiler Improve Performance?

Execution time = Operation count * Machine cycles per operation 1. Translate one language into another – e.g., convert C++ into x86 object code • Minimize the number of operations – difficult for “natural” languages, but feasible for computer languages – arithmetic operations, memory accesses • Replace expensive operations with simpler ones – e.g., replace 4-cycle multiplication with 1-cycle shift Processor 2. Improve (i.e. “optimize”) the code • Minimize cache misses – e.g., make the code run 3 times faster – both data and instruction accesses cache • or more energy efficient, more robust, etc. • Perform work in parallel memory – driving force behind modern processor design – within a thread – parallel execution across multiple threads

More accurately, machine cycles per operation must account for instruction overlap

Carnegie Mellon Carnegie Mellon 15-745: Introduction 3 15-745: Introduction 4

1 What Would You Get Out of This Course? II. Structure of a Compiler

• Basic knowledge of existing compiler optimizations Source Code Intermediate Form Object Code

• Hands-on experience in constructing optimizations within a fully functional C x86Alpha research compiler C++ Front Optimizer Back SPARCARM • Basic principles and theory for the development of new optimizations End End Java SPARCx86

Verilog MIPSIA-64

• Optimizations are performed on an “intermediate form” – similar to a generic RISC instruction set • Allows easy portability to multiple source languages, target machines

Carnegie Mellon Carnegie Mellon 15-745: Introduction 5 15-745: Introduction 6

Ingredients in a Compiler Optimization Ingredients in a Compiler Optimization

• Formulate optimization problem • Formulate optimization problem – Identify opportunities of optimization – Identify opportunities of optimization • applicable across many programs • applicable across many programs • affect key parts of the program (loops/recursions) • affect key parts of the program (loops/recursions) • amenable to “efficient enough” algorithm • amenable to “efficient enough” algorithm • Representation • Representation – Must abstract essential details relevant to optimization – Must abstract essential details relevant to optimization Mathematical • Analysis Programs Model – Detect when it is desirable and safe to apply transformation graphs • Code Transformation static statements abstraction matrices dynamic execution integer programs relations • Experimental Evaluation (and repeat process)

generated code solutions

Carnegie Mellon Carnegie Mellon 15-745: Introduction 7 15-745: Introduction 8

2 Representation: Instructions III. Optimization Example

• Three-address code • Bubblesort program that sorts an array A that is allocated in static storage: A := B op C – an element of A requires four bytes of a byte-addressed machine • LHS: name of variable e.g. x, A[t] (address of A + contents of t) – elements of A are numbered 1 through n (n is a variable) • RHS: value – A[j] is in location &A+4*(j-1)

• Typical instructions FOR i := n-1 DOWNTO 1 DO A := B op C FOR j := 1 TO i DO A := unaryop B A := B IF A[j]> A[j+1] THEN BEGIN temp := A[j]; GOTO s A[j] := A[j+1]; IF A relop B GOTO s A[j+1] := temp CALL f END RETURN

Carnegie Mellon Carnegie Mellon 15-745: Introduction 9 15-745: Introduction 10

Translated Code Representation: a Basic Block

i := n-1 t8 :=j-1 • Basic block = a sequence of 3-address statements S5: if i<1 goto s1 t9 := 4*t8 j := 1 temp := A[t9] ;A[j] – only the first statement can be reached from outside the block s4: if j>i goto s2 t10 := j+1 (no branches into middle of block) t1 := j-1 t11:= t10-1 – all the statements are executed consecutively if the first one is t2 := 4*t1 t12 := 4*t11 (no branches out or halts except perhaps at end of block) t3 := A[t2] ;A[j] t13 := A[t12] ;A[j+1] t4 := j+1 t14 := j-1 t5 := t4-1 t15 := 4*t14 • We require basic blocks to be maximal t6 := 4*t5 A[t15] := t13 ;A[j]:=A[j+1] – they cannot be made larger without violating the conditions t7 := A[t6] ;A[j+1] t16 := j+1 if t3<=t7 goto s3 t17 := t16-1 t18 := 4*t17 • Optimizations within a basic block are local optimizations FOR i := n-1 DOWNTO 1 DO A[t18]:=temp ;A[j+1]:=temp s3: j := j+1 FOR j := 1 TO i DO goto S4 IF A[j]> A[j+1] THEN BEGIN S2: i := i-1 temp := A[j]; goto s5 A[j] := A[j+1]; s1: A[j+1] := temp END Carnegie Mellon Carnegie Mellon 15-745: Introduction 11 15-745: Introduction 12

3 Flow Graphs Find the Basic Blocks

• Nodes: basic blocks i := n-1 t8 :=j-1 S5: if i<1 goto s1 t9 := 4*t8 j := 1 temp := A[t9] ;A[j] • Edges: Bi -> Bj, iff Bj can follow Bi immediately in some execution s4: if j>i goto s2 t10 := j+1 – Either first instruction of B is target of a goto at end of B t1 := j-1 t11:= t10-1 j i t2 := 4*t1 t12 := 4*t11 – Or, Bj physically follows Bi, which does not end in an unconditional goto. t3 := A[t2] ;A[j] t13 := A[t12] ;A[j+1] t4 := j+1 t14 := j-1 • The block led by first statement of the program is the start, or entry node. t5 := t4-1 t15 := 4*t14 t6 := 4*t5 A[t15] := t13 ;A[j]:=A[j+1] t7 := A[t6] ;A[j+1] t16 := j+1 if t3<=t7 goto s3 t17 := t16-1 t18 := 4*t17 A[t18]:=temp ;A[j+1]:=temp s3: j := j+1 goto S4 S2: i := i-1 goto s5 s1:

Carnegie Mellon Carnegie Mellon 15-745: Introduction 13 15-745: Introduction 14

Basic Blocks from Example Sources of Optimizations in • Algorithm optimization B1 i := n-1

• Algebraic optimization B2 if i<1 goto out out A := B+0 => A := B

i := i-1 B3 j := 1 B5 goto B2 • Local optimizations – within a basic block -- across instructions if j>i goto B5 • Global optimizations B4 t8 :=j-1 B7 ... – within a flow graph -- across basic blocks A[t18]=temp • Interprocedural analysis t1 := j-1 – within a program -- across procedures (flow graphs) B6 ... if t3<=t7 goto B8 j := j+1 B8 goto B4

Carnegie Mellon Carnegie Mellon 15-745: Introduction 15 15-745: Introduction 16

4 Local Optimizations Example

• Analysis & transformation performed within a basic block i := n-1 t8 :=j-1 S5: if i<1 goto s1 t9 := 4*t8 • No control flow information is considered j := 1 temp := A[t9] ;A[j] • Examples of local optimizations: s4: if j>i goto s2 t10 := j+1 • local common subexpression elimination t1 := j-1 t11:= t10-1 analysis: same expression evaluated more than once in b. t2 := 4*t1 t12 := 4*t11 transformation: replace with single calculation t3 := A[t2] ;A[j] t13 := A[t12] ;A[j+1] t4 := j+1 t14 := j-1 t5 := t4-1 t15 := 4*t14 t6 := 4*t5 A[t15] := t13 ;A[j]:=A[j+1] t7 := A[t6] ;A[j+1] t16 := j+1 • local or elimination if t3<=t7 goto s3 t17 := t16-1 analysis: expression can be evaluated at compile time t18 := 4*t17 transformation: replace by constant, compile-time value A[t18]:=temp ;A[j+1]:=temp s3: j := j+1 goto S4 S2: i := i-1 • goto s5 s1:

Carnegie Mellon Carnegie Mellon 15-745: Introduction 17 15-745: Introduction 18

Example (Intraprocedural) Global Optimizations

B1: i := n-1 B7: t8 :=j-1 • Global versions of local optimizations B2: if i<1 goto out t9 := 4*t8 – global common subexpression elimination B3: j := 1 temp := A[t9] ;temp:=A[j] – global constant propagation B4: if j>i goto B5 t12 := 4*j B6: t1 := j-1 t13 := A[t12] ;A[j+1] – dead code elimination t2 := 4*t1 A[t9]:= t13 ;A[j]:=A[j+1] • Loop optimizations t3 := A[t2] ;A[j] A[t12]:=temp ;A[j+1]:=temp – reduce code to be executed in each iteration t6 := 4*j B8: j := j+1 – code motion t7 := A[t6] ;A[j+1] goto B4 – if t3<=t7 goto B8 B5: i := i-1 induction variable elimination goto B2 • Other control structures out: – Code hoisting: eliminates copies of identical code on parallel paths in a flow graph to reduce code size.

Carnegie Mellon Carnegie Mellon 15-745: Introduction 19 15-745: Introduction 20

5 Example Example (After Global CSE)

B1: i := n-1 B7: t8 :=j-1 B1: i := n-1 B7: A[t2] := t7 B2: if i<1 goto out t9 := 4*t8 B2: if i<1 goto out A[t6] := t3 B3: j := 1 temp := A[t9] ;temp:=A[j] B3: j := 1 B8: j := j+1 B4: if j>i goto B5 t12 := 4*j B4: if j>i goto B5 goto B4 B6: t1 := j-1 t13 := A[t12] ;A[j+1] B6: t1 := j-1 B5: i := i-1 t2 := 4*t1 A[t9]:= t13 ;A[j]:=A[j+1] t2 := 4*t1 goto B2 t3 := A[t2] ;A[j] A[t12]:=temp ;A[j+1]:=temp t3 := A[t2] ;A[j] out: t6 := 4*j B8: j := j+1 t6 := 4*j t7 := A[t6] ;A[j+1] goto B4 t7 := A[t6] ;A[j+1] if t3<=t7 goto B8 B5: i := i-1 if t3<=t7 goto B8 goto B2 out:

Carnegie Mellon Carnegie Mellon 15-745: Introduction 21 15-745: Introduction 22

Induction Variable Elimination Example

• Intuitively B1: i := n-1 B7: A[t2] := t7 B2: if i<1 goto out A[t6] := t3 – Loop indices are induction variables (counting iterations) B3: j := 1 B8: j := j+1 B4: if j>i goto B5 goto B4 – Linear functions of the loop indices are also induction variables B6: t1 := j-1 B5: i := i-1 (for accessing arrays) t2 := 4*t1 goto B2 t3 := A[t2] ;A[j] out: • Analysis: detection of induction variable t6 := 4*j t7 := A[t6] ;A[j+1] • Optimizations if t3<=t7 goto B8 – strength reduction: • replace multiplication by additions – elimination of loop index: • replace termination by tests on other induction variables

Carnegie Mellon Carnegie Mellon 15-745: Introduction 23 15-745: Introduction 24

6 Example (After Induction Variable Elimination) Loop Invariant Code Motion

B1: i := n-1 B7: A[t2] := t7 • Analysis B2: if i<1 goto out A[t6] := t3 – a computation is done within a loop and B3: t2 := 0 B8: t2 := t2+4 – result of the computation is the same as long as we keep going around the t6 := 4 t6 := t6+4 loop B4: t19 := 4*i goto B4 if t6>t19 goto B5 B5: i := i-1 B6: t3 := A[t2] goto B2 • Transformation t7 := A[t6] ;A[j+1] out: – move the computation outside the loop if t3<=t7 goto B8

B4: t19 := 4*i t19 := 4*i if t6>t19 goto B5 B4: if t6>t19 goto B5

Carnegie Mellon Carnegie Mellon 15-745: Introduction 25 15-745: Introduction 26

Machine Dependent Optimizations Wednesday’s Class

• Dominic will present “LLVM Compiler: Getting Started” • Instruction scheduling – part 1 of 2 on LLVM • Memory hierarchy optimizations • etc. • Assignment 1 will be handed out

Carnegie Mellon Carnegie Mellon 15-745: Introduction 27 15-745: Introduction 28 Phillip B. Gibbons

7