Compilers

Optimization

Yannis Smaragdakis, U. Athens (or ig ina l s lides by Sam Guyer @Tu fts ) What we already saw z Lowering FlFrom language-lllevel cons ttttructs to machi ne-lllevel cons tttructs z At this point we could generate machine code z Output of lowering is a correct translation z What’s left to do? z Map from lower-level IR to actual ISA z Maybe some register management (could be required) z Pass off to assembler z Whyyp have a separate assembler? z Handles “packing the bits” Assembly addi , , MhiMachine 0010 00ss ssst tttt iiii iiii iiii iiii

2 But first … z The compiler “understands” the program z IR captures program semantics z Lowering: semantics-preserving transformation z Why not d o oth ers? z Compppiler optimizations z Oh great, now my program will be optimal! z Sorry, it’s a misnomer z What is an “optimization”?

3 Optimizations z What are they? z Code transformations z Improve some metric z Metrics z Performance: time, instructions, cycles AthAre these me titrics equ ilt?ivalent? z Memory z Memory hierarchy (reduce cache misses) z Reduce memory usage z Code Size z Energy

4 Big picture Errors

Semantic chars Scanner tokens Parser AST checker

Optimizations

Checked Instruction IR lowering AST tuples selection

assembly Register assembly Machine Assembler ∞ registers allocation k registers code

5 Big picture Errors Optimizations Semantic chars Scanner tokens Parser AST checker

Checked Instruction IR lowering AST tuples selection

assembly Register assembly Machine Assembler ∞ registers allocation k registers code

6 Why optimize? z High-level constructs may make some A[i][j] = A[i][j-1]+11] + 1 optimizations difficult t = A + i*row + j or impossible: ssoj = A + i*row + j – 1 (*t) = (*s) + 1 z High-level code may be more desirable z Program at high level z Focus on design; clean, modular implementation z Let compiler worry about gory details z Premature optimization is the root of all evil!

7 Limitations z What are optimizers good at? z Consistent and thorough z Find all opportunities for an optimization z Uniformly apply the transformation z What are they not good at? z Asyypmptotic com plexit y z Compilers can’t fix bad algorithms z Compilers can’t fix bad data structures z There’s no magic

8 Requirements z Safety z Preserve the semantics of the program z What does that mean? z Profitability z Will it help our metric? z Do we need a guarantee of improvement? z Risk z How will interact with other optimizations? z How will it affect other stages of compilation?

9 Example:

for ((;i=0; i<100 ; i++ ) for ((;i=0; i<25 ; i++ ){) { *t++ = *s++; *t++ = *s++; *t++ = *s++; *t++ = *s++; *t++ = *s++; } z Safety: z Always safe; getting loop conditions right can be tricky. z Profitability z Depends on hardware – usually a win – why? z Risk? z Increases size of code in loop z May not fit in the instruction cache

10 Optimizations z Manyyy, many optimizations invented z , constant propagation, tail-call elimination, redundancy elimination, , loop- invariant code motion, loop splitting, loop fusion, , array scalarization, inlining, cloning, data prefetching, parallelization. . .etc . . z How do they interact? z Optimist: we get the sum of all improvements! z RlitRealist: many are iditin direct oppos ition

11 Rough categories z Traditional optimizations z Transform the program to reduce work z Don’t change the level of abstraction z Resource allocation z Map program to specific hardware properties z z , parallelism z Data streaming, prefetching z EblitEnabling trans forma tions z Don’t necessarily improve code on their own z Inlining, loop unrolling

12 Constant propagation z Idea z If the value of a variable is known to be a constant at compile-time, replace the use of variable with constant

n = 10; n = 10; c = 2; c = 2; for (i=0;i

13 Constant folding z Idea z If operand s are k nown at compil e-time, eva lua te expression at compile-time

r=3141*10;r = 3.141 * 10; r=3141;r = 31.41;

z What do we need to be careful of? z IthIs the result ltth the same as if execut tdted at runti ti?me? z Overflow/underflow, rounding and numeric stability z Often repeated throughout compiler

x = A[2]; t1 = 2*4; t2 = A + t1; x = *t2;

14 Partial evaluation z Constant propaggggation and folding together z Idea: z Evaluate as much of the ppgrogram at compile-time as possible z More sophisticated schemes: z Simulate data structures, arrays z SbliSymbolic execu tifthdtion of the code z Caveat: floating point z Preservi ng the error c harac ter is tics o f floa ting po in t va lues

15 Algebraic simplification z Idea: z Apply the usual algebraic rules to simplify expressions

a * 1 a a/1 a a * 0 0 a + 0 a b || false b z Reppyppyppeatedly apply to complex expressions z Many, many possible rules z Associativity and commutativity come into play

16 Common sub-expression elimination z Idea: z If program computes the same expression multiple times, reuse the value.

a = b + c; tb+t = b + c c = b + c; a = t; d = b + c; c = t; d=b+c;d = b + c; z Safety: z Subexpression can only be reused until operands are redefined z Often occurs in address computations z Array inde xing and str uct/field accesses

17 Dead code elimination z Idea: z If the result of a computation is never used, then we can remove the computation x=y+1;x = y + 1; y = 1; y = 1; x = 2 * z; x = 2 * z; z Safety z Variable is dead if it is never used after defined z Remove code that assigns to dead variables z This may, in turn, create more dead code z Dead-code elimination usually works transitively

18 Copy propagation z Idea: z After an assignment x = y, replace any uses of x with y

x = y; x = y; if (x>1) if (y>1) s = x+f(x); s = y+f(y); z SftSafety: z Only apply up to another assignment to x, or z …another assignment to y! z What if there was an assignment y = z earlier? z Apply transitively to all assignments

19 Unreachable code elimination z Idea: z Eliminate code that can never be executed

#define DEBUG 0 . . . if (DEBUG) print(“Current value = “, v); z Different implementations z High-level: look for if (false) or while (false) z Low-level: more difficult z Code is just labels and gotos z TthhkihblblkTraverse the graph, marking reachable blocks

20 How do these things happen? z Who would write code with: z Dead code z Common subexpressions z CttConstant expressi ons z Copies of variables z TthTwo ways they occur z High-level constructs – we’ve already seen examples z Other optimizations z Copy propagation often leaves dead code z Enabling transformations: inlining, loop unrolling, etc.

21 Loop optimizations z Program hot-spots are usually in loops z Most programs: 90% of execution time is in loops z What are possible exceptions? OS k ernel s, compil ers an d in terpre ters z Loops are a good place to expend extra effort z Numerous loop optimizations z Often expensive – complex analysis z For languages like Fortran, very effective z What about C?

22 Loop-invariant code motion z Idea: z If a computation won’t change from one loop iteration to the next, move it outside the loop

t1 = x *x; for (i=0;i

23 Strength reduction z Idea: z Replace expensive operations (mutiplication, division) with cheaper ones (addition, subtraction, bit shift) z Traditionally applied to induction variables z Variables whose value depends linearly on loop count z Special analysis to find such variables

v = 0; for (i=0;i

24 Strength reduction z Can also be applied to simple arithmetic operations:

x * 2 x + x x * 22c^c x<>c z Typical example of premature optimization z Programmers use bit-shift instead of multiplication z “2”ihdtdtd“x<<2” is harder to understand z Most compilers will get it right automatically

25 Inlining z Idea: z Replace a function call with the body of the callee z Safety z What about recursion? z Risk z Code size z Most compilers use heuristics to decide when z Has been cast as a knapppsack problem

26 Inlining z What are the benefits of inlining? z Eliminate call/return overhead z Customize callee code in the context of the caller z Use actual arguments z Push into copy of callee code using constant prop z Apply other optimizations to reduce code z Hardware z Eliminate the two jumps z Keep the pipeline filled z Critical for OO languages z Methods are often small z Encapsu lation, modu larity force code apart

27 Inlining z In C: z At a call-site, decide whether or not to inline z (Often a heuristic about callee/caller size) z Look up the callee z Replace call with body of callee z What about Java? class A { void M() {…} } z What complicates this? class B extends A { void M() {…} } z Virtual methods z Even worse? void foo(A x) z Dynamic class loading { x. M(); // which M? }

28 Inlining in Java z With guards: void foo(()A x) { if (x is type A) x.M(); // inline A’s M if (x is type B) x.M(); // inline B’s M } z Specialization y = new A(); foo(y); z At a given call, we may be able to z = new B(); determine the type foo(z); z Requires fancy analysis

29 Big picture z When do we apply these optimizations? z High-level: z Inlining, cloning z Some algebraic simplifications z Low-level z Everything else z It’s a black art z Ordering is often arbitrary z Many compilers just repeat the optimization passes over and over

30 Writing fast programs

In practice: z Pick the right algorithms and data structures z Asymptotic complexity and constants z Memory usage , memory layout , data representation z Turn on optimization and profile z Run-time z Program counters (e.g., cache misses) z Evaluate problems z Tweak source code z Help the optimizer do “the right thing”

31 Anatomy of an optimization

Two big parts: z Program analysis Pass over code to find: z Opportunities z Check safety constraints z PtftiProgram transformation z Change the code to exploit opportunity z Often: rinse and repeat

32 Dead code elimination z Idea: z Remove a computation if result is never used

y = w – 7; y = w – 7; x = y +1+ 1; y = 1; y = 1; y = 1; x = 2 * z; x = 2 * z; x = 2 * z; z Safety z Variable is dead if it is never used after defined z RdthtitddiblRemove code that assigns to dead variables z This may, in turn, create more dead code z DdDead-code e lim ina tion usuall y wor ks trans itive ly

33 Dead code z Another example:

x = y + 1; y = 2 * z; Which statem en ts x = y + z; can be safely z = 1; removed? z=x;z = x; z Conditions: z Computations whose value is never used z Obvious for straight-line code z What about control flow?

34 Dead code z With if-then-else:

Which x = y + 1; statements are y = 2 * z; can bd?be removed? if()if (c) x = y + z; z = 1; z = x; z Which statements are dead code? z What if “c” is false? z Dead only on some paths through the code

35 Dead code z And a loop:

Which while (p) { statements are x = y + 1; can bd?be removed? y = 2*2 * z; if (c) x = y + z; z = 1; } z = x;

z Now which statements are dead code?

36 Dead code z And a loop:

Which while (p) { statements are x = y + 1; can bd?be removed? y = 2*2 * z; if (c) x = y + z; z = 1; } z = x; z Statement “x = y+1” not dead z What about “z = 11?”?

37 Low-level IR z Most optimizations performed in low-level IR z Labels and jumps label1: z No explicit loops jumpifnot p label2 x = y +1+ 1 y = 2 * z jumpifnot c label3 z Even harder to see x = y + z possible paths label3: z = 1 jump label1 label2: z = x

38 Optimizations and control flow z Dead code is flow sensitive z Not obvious from program Dead code example: are there any possible paths that make use of the value? z Must characterize all possible dynamic behavior z Must verify conditions at compile-time z Control flow makes it hard to extract information z High-level: different kinds of control structures z Low-level: control-flow hard to infer z Need a unifying data structure

39 Control flow graph z Control flow graph (CFG): a graph representation of the program z Includes both computation and control flow z Easy to check control flow properties z Provides a framework for global optimizations and other compiler passes z Nodes are basic blocks z Consecutive sequences of non-branching statements z Edges represent control flow z From jump to a label z Each block may have multiple incoming/outgoing edges

40 CFG Example Program Control flow graph

x = a + b; BB1 x = a + b; y = 5; y = 5; if(){if (c) { if()if (c) x = x + 1; y = y + 1; TF BB2 BB3 } else { x = x + 1; x = x – 1; x = x – 1; y = y + 1; y = y – 1; y = y – 1; } z = x + y;

BB4 z = x + y;

41 Multiple program executions Control flow graph z CFG models all BB1 x = a + b; program executions y = 5; if()if (c) z An actual execution is a path through the TF BB BB graph 2 3 x = x + 1; x = x – 1; y = y + 1; y = y – 1; z Multippple paths: multi ple possible executions z How many?

BB4 z = x + y;

42 Execution 1 Control flow graph z CFG models all BB1 x = a + b; program executions y = 5; if()if (c) z Execution 1: z c is true T F BB2 BB3 z Program executes

BB1, BB2, and BB4 x = x + 1; x = x – 1; y = y + 1; y = y – 1;

BB4 z = x + y;

43 Execution 2 Control flow graph z CFG models all BB1 x = a + b; program executions y = 5; if()if (c) z Execution 2: z c is false T F BB2 BB3 z Program executes

BB1, BB3, and BB4 x = x + 1; x = x – 1; y = y + 1; y = y – 1;

BB4 z = x + y;

44 Basic blocks z Idea: z Once execution enters the sequence, all statements (or instructions) are executed z Single-entry, sing le-exit region z Details z Starts with a label z Ends with one or more branches z Edges may be labeled with predicates May include special categories of edges z Exception jumps z Fall-through edges z Computed jumps (jump tables)

45 Building the CFG

z Two passes z First, group instructions into basic blocks z Second, analyze jumps and labels z Htidtifbiblk?How to identify basic blocks? z Non-branching instructions Control cannot flow out of a basic block without a jump z Non-label instruction Control cannot enter the middle of a block without a label

46 Basic blocks

lbl1label1: z Basic block starts: jumpifnot p label2 z At a label x = y + 1 z After a jump y = 2 * z jumpifnot c label3 z Basic block ends: x=y+zx = y + z z At a jump label3: z = 1 z Before a label jump lbl1label1 label2: z = x

47 Basic blocks

label1: z Basic block starts: jumpifnot p label2 z At a label x = y + 1 z After a jump y = 2 * z jumpifnot c label3 z Basic block ends: x = y + z z At a jump z Before a label label3: z = 1 jump label1 z Note: order still matters label2:

z = x 48 Add edges z Unconditional jump label1: z Add edge from source of jumpifnot p label2 jump to the block containing the label x = y + 1 y = 2 * z jumpifnot c label3 z Conditional jump x = y + z z 2 successors z One may be the fall- label3: through block z = 1 jump label1 z Fall-through label2: z = x 49 Two CFGs z From the high-level z Break down the complex constructs z Stop at sequences of non-control-flow statements z RiRequires spec ilhdlifbkial handling of break, con tittinue, goto z From the low-level z Start with lowered IR – tuples, or 3-address ops z Build up the graph z More general algorithm z Most compilers use this approach z Should lead to roughly the same graph

50 Using the CFG z Uniform representation for program behavior z Shows all possible program behavior z Each execution represented as a path z CbtttilbhiCan reason about potential behavior Which paths can happen, which can’t z Possible pppypaths imply possible values of variables z Example: liveness information z Idea: z Define program points in CFG z Describe how information flows between points

51 Program points • x = y + 1 • z In between instructions y = 2*z • z Before each instruction if (c) z After each instruction •

• May have multiple x = y + z successors or • predecessors

• z = 1 • 52 Live variables analysis z Idea z Determine live range of a variable Region of the code between when the variable is assigned and when its value is used z Specifically: Def: A variable v is live at point p if z There is a path through the CFG from p to a use of v z There are no assignments to v along the path Compute a set of live variables at each point p z Uses of live variables: z Dead-code elimination – find unused computations z Also: reg is ter a lloca tion, gar bage co llec tion

53 Computing live variables z How do we compute live variables? (Specifi ca lly, a se t o f live var ia bles a t eac h program po in t) z What is a straight-forward algorithm? z Start at uses of v, search backward through the CFG z Add v to live variable set for each point visited z Stop when we hit assignment to v z Can we do better? z Can we compute liveness for all variables at the same time? z Idea: z Maintain a set of live variables z Push set through the CFG, updating it at each instruction

54 Flow of information

• z Question 1: how does information x = y + 1 • flow across instructions? y = 2*z • if (c) z Question 2: how does information • flow between predecessor and

successor blocks? • x = y + z •

• z = 1 • 55 Live variables analysis z At each program point: Which variables contain values computed earlier and needed later z For instruction I: z in[I] : live variables at program point before I z out[I] : live variables at program point after I z For a basic block B: z in[B] : live variables at beginning of B z out[B] : live variables at end of B z Note: in[I] = in[B] for first instruction of B out[I] = out[B] for last instruction of B

56 Computing liveness

z Answer question 1: for each in[I] I instruction I, what is relation between out[I] in[I] and out[I]? z Answ er questi on 2: for each basi c

block B, with successors B1, …, Bn, B what is relationship between out[B] out[B] and in[B1] … in[Bn]

in[B1] in[Bn] B1 … Bn

57 Part 1: Analyze instructions z Live variables across instructions z Examples:

in[I] = {y,z} in[I] = {y,z,t} in[I] = {x,t} x = y + z x = y + z x = x + 1 out[I] = {x} out[I] = {x , t, y} out[I] = {x , t} z Is there a general rule?

58 Liveness across instructions z How is liveness determined? in[I] = {b} z All variables that I uses are live before I Called the uses of I a = b + 2

z All variables live after I are also live before I, unless I writes to them in[I] = {y,z} ClldthCalled the dfdefs of I x = 5 out[I] = {x,y,z} z Mathematically:

in[I] = ( out[I] – def[I] ) ∪ use[I]

59 Example z Single basic block (obviously: out[I] = in[succ(I)] ) Live1 z Live1 = in[B] = in[I1] x = y+1 z Live 2 = out[I1] = i n[I2] Live2 z Live3 = out[I2] = in[I3] y = 2*z z Live4 = out[I3] = out[B] Live3 if (d) z Relation between live sets Live4 z Live1 = (Live2 – {x}) ∪ {y} z Live2 = (Live3 – {y}) ∪ {z} z Live3 = (Live4 – {}) ∪ {d}

60 Flow of information z Equation: in[I] = ( out[I] – def[I] ) ∪ use[I]

Live1 z Notice: information flows backwards x = y+1 z Need out[] sets to compute in[] sets Live2 z Propagate information up y = 2*z Live3 z Many problems are forward if (d) Common sub-expressions, constant Live4 propagation, others

61 Part 2: Analyze control flow z Question 2: for each basic block B, with successors B1, …, Bn, what is relationship between out[B] and in[B1] … in[Bn] z Example: B out={ }

in={ } in={ } … w=x+z; q=x+y; B1 Bn z What’s the general rule?

62 Control flow z Rule: A variable is live at end of block B if it is live at the beginning of any of the successors z Characterizes all possible executions z Conservative: some paths may not actually happen z MthMathema tilltically:

out[B] = ∪ in[B’] B’ ∈ succ(B) z Agagain: inf orm ati on fl ow s backw ar ds

63 System of equations z Put parts together: in[I] = ( out[I] – def[I] ) ∪ use[I] out[[]I] = in[[()]succ(I)] Often called a system of out[B] = ∪ in[B’] Dataflow B’ ∈ succ(B) Equations z DfiDefines a sys tem o f equati ons ( or const rai it)nts) z Consider equation instances for each instruction and each basic block z What happens with loops? z Circular dependences in the constraints z IthtIs that a probl bl?em?

64 Solving the problem z Iterative solution: z Start with empty sets of live variables z Iteratively apply constraints z Stop wh en we reach a fixpoi nt For all instructions in[I] = out[I] = ∅ Repeat For each instruction I in[I] = ( out[I] – def[I] ) ∪ use[I] out[I] = in[(I)][succ(I)] For each basic block B out[B] = ∪ in[B’] B’ ∈ succ(B) Until no new changes in sets 65 Example z Steps: z Set up live sets for each if (c) program point z Instantiate equations x = y+1 y = 2*z z Solve equations if (d)

x = y+z

z = 1

z = x

66 Example z Program points L1 if (c) L2 L3 x = y+1 L4 y = 2*z if (d) L5 L6 L7 x = y+z L8 L9 z = 1 L10 L11 z = x L12

67 Example

L1 = L2 ∪ {c} L1={xyzcd}L1 = { x, y, z, c, d } L2 = L3 ∪ L11 1 if (c) L3 = (L4 – {x}) ∪ {y} L2 = { x, y, z, c, d } L4 = (L5 – {y}) ∪ {{}z} L3 = { y, z, c, d } 2 x = y+1 L4 = { x, z, c, d } L5 = L6 ∪ {d} 3 y = 2*z L5 = { x, y, z, c, d } L6 = L7 ∪ L9 4 if (d) L6 = { x, y, z, c, d } L7 = (L8 – {x}) ∪ {y, z} L7 = { y, z, c, d } L8 = L9 5 x = y+z L8 = { x, y, c, d } L9 = L10 – {z} L9 = { x, y, c, d } L10 = L1 6 z = 1 L11 = (L12 – {z}) ∪ {x} L10 = { x, y, z, c, d } L11 = { x } L12 = {} 7 z = x L12 = { }

68 Questions

z Does this terminate? z Does this compute the right answer? z How could generalize this scheme for other kinds of analysis?

69 Generalization z Dataflow analysis z A common framework for such analysis z Computes information at each program point z Conservative: characterizes all possible program behaviors z Methodology z Describe the information (e .g ., live variable sets) using a structure called a lattice z Build a system of equations based on: z HhtttfftiftiHow each statement affects information z How information flows between basic blocks z Solve the system of constraints

70 Parts of live variables analysis z Live variable sets z Called flow values z Associated with program points z Start “empty” , eventually contain solution z Effects of instructions z Called transfer functions z Take a flow value, compute a new flow value that captures the effects z One for each instruction – often a schema z Handling control flow z Called confluence operator z Combines flow values from different paths

71 Mathematical model z Flow values z Elements of a lattice L = (P, ⊆) z Flow value v ∈ P z Transfer functions z Set of functions (one for each instruction)

z Fi :P: P → P z Confluence operator z Merges lattice values z C : P × P → P z How does this help us?

72 Lattices z Lattice L = (P, ⊆) z A partial order relation ⊆ Reflexive, anti-symmetric, transitive z Upper and lower bounds Consider a subset S of P z Upper bound of S: u∈S:S : ∀x∈SxS x ⊆ u z Lower bound of S: l∈S : ∀x∈S l ⊆ x z Lattices are complete Unique greatest and least elements z “Top” T∈P : ∀x∈P x ⊆ T z “Bottom” ⊥∈P:P : ∀x∈P ⊥ ⊆ x

73 Confluence operator z Combine flow values z “Merge” values on different control-flow paths z Result should be a safe over-approximation z We use th e l a ttice ⊆ tdto denot t“e “more saf f”e” z Example: live variables z v1 = {x, y, z} and v2 = {y, w} z How do we combine these values? z v = v1 ∪ v2{2 = {w, x, y, z } z What is the “⊆” operator? z Superset

74 Meet and join z Goal: Combine two values to produce the “best” approximation z Intuition: z Given v1 = {x, y, z} and v2 = {y, w} z A safe over-approximation is “all variables live” z We want the smallest set z Greatest lower bound z Given x,y ∈P z GLB(x,y) = z such that z z ⊆ x and z ⊆ y and z ∀w w ⊆ x and w ⊆ y ⇒ w ⊆ z z Meet operator: x ∧ y = GLB(x, y) z Natural “opposite”: Least upper bound, join operator

75 Termination z Monotonicity Transfer functions F are monotonic if z Given x,y ∈P z If x ⊆ y then F(x) ⊆ F(y) z Alternatively: F(x) ⊆ x z KidKey idea: Iterative dataflow analysis terminates if z Transfer functions are monotonic z Lattice has finite height z Intuition: values only go down, can only go to bottom

76 Example z Prove monotonicityyy of live variables analysis z Equation: in[i] = ( out[i] – def[i] ) ∪ use[i] (For each instruction i)

z As a function: F(x) = (x – def[i]) ∪ use[i] z Obligation: If x ⊆ yy() then F(x) ⊆ F(y) z Prove: x ⊆ y => (x – def[i]) ∪ use[i] ⊆ (y – def[i]) ∪ use[i] z Somewhat trivially: z x ⊆ y ⇒ x – s ⊆ y – s z x ⊆ y ⇒ x ∪ s ⊆ y ∪ s

77 Dataflow solution z Question: z What is the solution we compute? z Start at lattice top, move down z ClldCalled grea tes t fixpoi nt z Where does approximation come from? z Conflue nce of contr ol-floopasw paths z Knaster Tarski theorem z Every monotonic function F over a complete lattice L has a unique least (and greatest) fixpoint z (Actually, the theorem is more general)

79 Summary z Dataflow analysis z Lattice of flow values z Transfer functions (encode program behavior) z Iterative fixpoint computation z Key insight: If our dataflow equations have these properties: z Transfer functions are monotonic z Lattice has finite height z Transfer functions distribute over meet operator Then: z Our fixpoint computation will terminate z Will compute meet-over-all-paths solution

81