Analysis Synthesis of input program of output program (front -end) (back -end) character Passes stream Intermediate Lexical Analysis Code Generation

token intermediate Optimization stream form

Syntactic Analysis Optimization

abstract intermediate Before and after generating machine syntax tree form code, devote one or more passes over the program to “improve” code quality Semantic Analysis Code Generation annotated target AST language

2

The Role of the Optimizer Optimizations • The compiler can implement a procedure in many ways • The optimizer tries to find an implementation that is “better” – Speed, code size, data space, … Identify inefficiencies in intermediate or target code Replace with equivalent but better sequences To accomplish this, it • Analyzes the code to derive knowledge about run-time • equivalent = "has the same externally visible behavior – Data-flow analysis, pointer disambiguation, … behavior" – General term is “static analysis” Target-independent optimizations best done on IL • Uses that knowledge in an attempt to improve the code – Literally hundreds of transformations have been proposed code – Large amount of overlap between them Target-dependent optimizations best done on Nothing “optimal” about optimization target code • Proofs of optimality assume restrictive & unrealistic conditions • Better goal is to “usually improve” 3 4

Traditional Three-pass Compiler The Optimizer (or Middle End)

IR Source Front Middle IR Back Machine IROpt IROpt IROpt IR... Opt IR Code End End End code 1 2 3 n

Errors Errors

Modern optimizers are structured as a series of passes Middle End does Code Improvement (Optimization) • Analyzes IR and rewrites (or transforms ) IR Typical Transformations • Primary goal is to reduce running time of the • Discover & propagate some constant value compiled code • Move a computation to a less frequently executed place – May also improve space, power consumption, … • Specialize some computation based on context • Must preserve “meaning” of the code • Discover a redundant computation & remove it – Measured by values of named variables • Remove useless or unreachable code • Encode an idiom in some particularly efficient form – A course (or two) unto itself 5 6

1 Kinds of optimizations Optimizations are characterized by which Transformation over what Scope. Typical scopes are: After target code generation, look at adjacent peephole : instructions (a “peephole” on the code stream) – look at adjacent instructions – try to replace adjacent instructions with something • local : faster – look at straight-line sequence of statements Example: • global (intraprocedural): movl %eax, 12(%ebp) – look at entire procedure movl 12(%ebp), %ebx • whole program (interprocedural): => – look across procedures movl %eax, 12(%ebp) Larger scope => better optimization but more cost and movl %eax , %ebx complexity 7 8

Algebraic Simplification Local Optimizations

”, “” Analysis and optimizations within a basic block z = 3 + 4;

z = x + 0; z = x * 1; • Basic block: straight-line sequence of statements – no control flow into or out of middle of sequence z = x * 2; • Better than peephole z = x * 8; • Not too hard to implement z = x / 8;

double x, y, z; Machine-independent, if done on intermediate code z = (x + y) - y; Can be done by peephole optimizer, or by code generator 9 10

Local Constant Propagation Local Dead Assignment (Store) Elimination If variable assigned a constant value, replace downstream If l.h.s. of assignment never referenced again before uses of the variable with the constant. being overwritten, then can delete assignment. Example: Can enable more constant folding final int count = 10; Example: ... Primary use: clean-up after final int count = 10; x = count * 5; previous optimizations! ... y = x ^ 3; x = count * 5; x = 7; y = x ^ 3; Intermediate code after constant propagation: Unoptimized intermediate code: t1 = 10; t1 = 10; t2 = 5; t2 = 5; t3 = 50; t3 = t1 * t2; x = 50; x = t3; t4 = 50; t4 = x; t5 = 3; t5 = 3; t6 = 125000; t6 = exp(t4, t5); y = 125000; y = t6; x = 7; 11 12

2 Local Common Subexpression Elimination Redundancy Elimination Implementation (AKA Redundancy Elimination) An expression x+y is redundant if and only if, along every Avoid repeating the same calculation path from the procedure’s entry, it has been evaluated, and its • CSE of repeated loads: redundant load elimination constituent subexpressions (x & y) have not been re-defined. Keep track of available expressions Source: If the compiler can prove that an expression is redundant ... a[i] + b[i] ... • It can preserve the results of earlier evaluations Unoptimized intermediate code: • It can replace the current evaluation with a reference t1 = *(fp + ioffset); t2 = t1 * 4; t3 = fp + t2; Two pieces to the problem t4 = *(t3 + aoffset); • Proving that x+y is redundant t5 = *(fp + ioffset); • Rewriting the code to eliminate the redundant evaluation t6 = t5 * 4; t7 = fp + t6; One technique for accomplishing both is called t8 = *(t7 + boffset); t9 = t4 + t8; 13 14

Value Numbering (An old idea) Local Value Numbering The Algorithm The key notion ( Balke 1968 or Ershov 1954 ) For each operation o = in the block • Assign an identifying number, V(n), to each expression 1 Get value numbers for operands from hash lookup – V(x+y) = V(j) iff x+y and j have the same value ∀ path 2 Hash to get a value – Use hashing over the value numbers to make it efficient number for o • Use these numbers to improve the code 3 If o already had a value number, replace o with a reference Improving the code 4 If o 1 & o 2 are constant, evaluate it & replace with a • Replace redundant expressions load • Simplify algebraic identities • Discover constant-valued expressions, fold & If hashing behaves, the algorithm runs in linear time propagate them Handling algebraic identities This technique was invented for low-level, linear IR s • Case statement on operator type

Equivalent methods exist for trees ( build a DAG ) 15 • Handle special cases within each operator 16

Local Value Numbering Local Value Numbering

Example Example ( continued )

Original Code With VNs Rewritten Original Code With VNs Rewritten ← 3 ← 1 2 3 ← 1 2 ← 3 ← 1 2 3 ← 1 2 a x + y a x + y a x + y a0 x0 + y 0 a0 x0 + y 0 a0 x0 + y 0 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 b x + y b x + y b a b0 x0 + y 0 b0 x0 + y 0 b0 a0 ← 4 ← 4 ← ← 4 ← 4 ← a 17 a 17 a 17 a1 17 a1 17 a1 17 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 c x + y c x + y c a (oops!) c0 x0 + y 0 c0 x0 + y 0 c0 a0

Two redundancies: Options: Renaming: Notation: Result: 3 3 3 • Eliminate stmts • Use c ← b • Give each value a • While complex, • a0 is available ∗ with a • Save a 3 in t 3 unique name the meaning is • Rewriting just clear • Coalesce results ? • Rename around it • Makes it clear works

17 18

3 Simple Extensions to Value Numbering Intraprocedural (Global) optimizations Constant folding • Add a bit that records when a value is constant • Enlarge scope of analysis to entire procedure • Evaluate constant values at compile-time – more opportunities for optimization • Replace with load immediate or immediate operand – have to deal with branches, merges, and loops • No stronger local algorithm Identities: • Can do constant propagation, common x←←←y, x+0, x-0, x ∗∗∗1, x÷1, x-x, subexpression elimination, etc. at global level x∗∗∗0, x ÷÷÷x, x ∨∨∨0, x ∧∧∧ 0x FF …FF , max(x, MAXINT ), min(x, MININT ), max(x,x), min(y,y), and so on ... • Can do new things, e.g. loop optimizations Algebraic identities • Must check (many) special cases Optimizing usually work at this level • Replace result with input VN With values, not names • Build a decision tree on operation 19 20

Intraprocedural (Global) Optimizations Control-flow Graph Two data structures are commonly used to help analyze the of procedure body. Models the transfer of control in the procedure Control flow graph (CFG) captures flow of control • Nodes in the graph are basic blocks – nodes are IL statements, or whole basic blocks – Can be represented with quads or any other linear – edges represent control flow representation – node with multiple successors = branch/switch – node with multiple predecessors = merge • Edges in the graph represent control flow – loop in graph = loop if (x = y) Data flow graph (DFG) capture flow of data Example A common one is def/use chains: ← ← – nodes are def(inition)s and uses a 2 a 3 b ← 5 b ← 4 – edge from def to use – a def can reach multiple uses ← – a use can have multiple reaching defs c a * b 21 22

Data-flow Graph – Use/Def Chains Analysis and Transformation

Models the transfer of data in the procedure Each optimization is made up of – some number of analyses • Nodes in the graph are definitions and uses – followed by a transformation • Edges in the graph represent data flow Analyze CFG and/or DFG by propagating info forward or backward along CFG and/or DFG edges if (x = y) – edges called program points Example – merges in graph require combining info a ← 2 – loops in graph require iterative approximation b ← 5 b Perform improving transformations based on info a ← 3 computed a ← b a b 4 – have to wait until any iterative approximation has converged Analysis must be conservative/safe/sound so that c ← a * b transformations preserve program behavior 23 24

4 Data-flow Analysis Data-flow Analysis Data-flow analysis is a collection of techniques for Limitations compile-time reasoning about the run-time flow of values 1. Precision – “up to symbolic execution” • Almost always involves building a graph – Assume all paths are taken – Problems are trivial on a basic block 2. Solution – cannot afford to compute M OP solution – Global problems ⇒ control-flow graph (or derivative) – Large class of problems where M OP = M FP = L FP – Whole program problems ⇒ call graph (or derivative) – Not all problems of interest are in this class • Usually formulated as a set of simultaneous equations 3. Arrays – treated naively in classical analysis – Sets attached to nodes and edges – Represent whole array with a single fact – Lattice (or semilattice) to describe values 4. Pointers – difficult ( and expensive ) to analyze • Desired result is usually meet over all paths solution – Imprecision rapidly adds up Good news: – “What is true on every path from the entry?” – Need to ask the right questions Simple problems can carry us pretty far – “Can this happen on any path from the entry?” Summary – Related to the safety of optimization For scalar values, we can quickly solve simple problems 25 26

Data-flow (Partial) Example Example: Constant Propagation, Folding Block Def LiveUse Can use either the CFG or the DFG b1 b1 {a} {b} CFG analysis info: b2 {b} 0 a = b3 {c} 0 1 table mapping each variable in scope to one of b4 {d} {a,b} a = • a particular constant b2b b3 • NonConstant • Undefined b = c = 1 1 • Transformation: at each instruction: – if reference a variable that the table maps to a constant, b4 Block LiveIn LiveOut then replace with that constant (constant propagation) b1 {b} {b} – if r.h.s. expression involves only constants, and has no side- d = a + effects, then perform operation at compile-time and replace LiveOut(b) = U LiveIn(i) b b2 {a} {a,b} i E Succ(b) b3 {a,b} {a,b} r.h.s. with constant result (constant folding)

LiveIn(b) = LiveUse(b) U (LiveOut(b) – Def(b)) b4 {a,b} 0 For best analysis, do constant folding as part of analysis, to learn all constants in one pass 27 28

Analysis of Loops x = 3; Example Program y = x * x; • How to analyze a loop? v = y - 2; i = 0; if (y > 10) { x = 10; x = 5; y = 20; y = y + 1; while (...) { } else { // what’s true here? x = 6; ... y = x + 4; i = i + 1; } y = 30; w = y / v; } if (v > 20) { // what’s true here? z = w * w; ... x ... i ... y ... x = x - z; y = y - 1; A safe but imprecise approach: } • forget everything when we enter or exit a loop System.out.println(x); A precise but unsafe approach: • keep everything when we enter or exit a loop 29 30 Can we do better?

5 Loop Terminology Optimistic Iterative Analysis preheader entry edge 1. Assuming info at loop head is same as info at loop head entry 2. Then analyze loop body, computing info at back edge 3. Merge infos at loop back edge and loop entry loop back 4. Test if merged info is same as original assumption edge a) If so, then we’re done b) If not, then replace previous assumption with merged info, tail and goto step 2

exit edge

31 32

Why does optimistic iterative analysis Example work?

i = 0; Why are the results always conservative? x = 10; Because if the algorithm stops, then y = 20; – the loop head info is at least as conservative as both the while (...) { loop entry info and the loop back edge info // what’s true here? – the analysis within the loop body is conservative, given the ... assumption that the loop head info is conservative i = i + 1; Why does the algorithm terminate? y = 30; } It might not! // what’s true here? But it does if: ... x ... i ... y ... – there are only a finite number of times we could merge values together without reaching the worst case info (e.g. NotConstant)

33 34

Loop Optimization - Elimination - Code Motion For-loop index is induction variable Goal: move loop-invariant calculations out of loops • incremented each time around loop Can do at source level or at intermediate code level • offsets & pointers calculated from it Source: If used only to index arrays, can rewrite with pointers • compute initial offsets/pointers before loop for (i = 0; i < 10; i = i+1) { • increment offsets/pointers each time around loop a[i] = a[i] + b[j]; • no expensive scaling in loop z = z + 10000; Source: } for (i = 0; i < 10; i = i+1) { Transformed source: a[i] = a[i] + x; t1 = b[j]; } t2 = 10000; Transformed source: for (i = 0; i < 10; i = i+1) { for (p = &a[0]; p < &a[10]; p = p+4) { *p = *p + x; a[i] = a[i] + t1; } z = z + t2; then do loop-invariant code motion } 35 36

6 Inlining Interprocedural (“Whole Program”) Optimizations Replace procedure call with body of called procedure Source: double pi = 3.1415927; • Expand scope of analysis to procedures calling ... double circle_area(double radius) { each other return pi * (radius * radius); } • Can do local & intraprocedural optimizations at ... larger scope double r = 5.0; ... • Can do new optimizations, e.g. inlining double a = circle_area(r); After inlining: ... double r = 5.0; ... double a = pi * r * r;

(Then what?) 37 38

Summary Additional Material

Enlarging scope of analysis yields better results – today, most optimizing compilers work at the intraprocedural (global) level

Optimizations organized as collections of passes, each rewriting IL in place into better version

Presence of optimizations makes other parts of compiler (e.g. intermediate and target code generation) easier to write

39 40

Another example: Example

Want the set of live variables at each pt. in program x := read() y := x * 2; – live: might be used later in the program z := sin(y) Supports dead assignment elimination, What info computed for each program point? z := z+1 y := x + 10; What is the requirement for this info to be conservative? How to merge two infos conservatively? return y How to analyze an assignment, e.g. X := Y + Z? – given liveVars before (or after?), what is computed after (or before?) What is live at procedure entry (or exit?)?

41 42

7 Peephole Optimization of Jumps Global Register Allocation Eliminate jumps to jumps Try to allocate local variables to registers Eliminate jumps after conditional branches If life times of two locals don’t overlap, can give to same “Adjacent” instructions = “adjacent in control flow” register Source code: IL: Try to allocate most-frequently-used variables to if (a < b) { registers first if (c < d) { Example: // do nothing int foo(int n, int x) { } else { int sum; int i; int t; stmt1; sum = x; } for (i = n; i > 0; i=i-1) { } else { sum = sum + i; stmt2; } } t = sum * sum; return t; 43 44 }

Handling Larger Scopes Handling Larger Scopes Extended Basic Blocks Otherwise, it is complex • Initialize table for b i with table from b i-1 To go further, we must deal with merge points b1 • With single-assignment naming, can use scoped hash table • Our simple naming scheme falls apart in b 4 • We need more powerful analysis tools b 2 b3 The Plan: b1 • Naming scheme becomes S SA → Process b 1, b 2, b 4 Pop two levels b4 Process b relative to b → 3 1 This requires global data-flow analysis b b Start clean with b 2 3 → 5 Start clean with b 6 “Compile-time reasoning about the run-time flow of values ” → 1 Build a model of control-flow b4 b5 Using a scoped table makes doing 2 Pose questions as sets of simultaneous equations the full tree of EBBs that share a common header efficient. 3 Solve the equations b 6 4 Use solution to transform the code

45 Examples: LIVE, REACHES, AVAIL 46

8