<<

CS502: Design Code Optimization

Manas Thakur

Fall 2020 Fast. Faster. Fastest? Character stream

Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer d

Token stream

Intermediate representation n

d e

k n SyntaxSyntax Analyzer Analyzer Code Generator

Code Generator e

a

Syntax tree Target n

B o

r Semantic Analyzer Machine-Dependent Semantic Analyzer Code Optimizer F

Syntax tree Target machine code

Intermediate Intermediate Symbol Code Generator Code Generator Table Intermediate representation

Manas Thakur CS502: Compiler Design 2 Role of Code Optimizer

● Make the program better – time, memory, energy, ...

● No guarantees in this land! – Will a particular optimization for sure improve something? – Will performing an optimization affect something else? – In what order should I perform the optimizations? – What “” to perform certain optimization at? – Is the optimizer fast enough?

● Can an optimized program be optimized further?

Manas Thakur CS502: Compiler Design 3 Full employment theorem for compiler writers

● Statement: There is no fully .

● Assume it exists: – such that it transforms a program P to the smallest program Opt(P) that has the same behaviour as P. – Halting problem comes to the rescue:

● Smallest program that never halts: L1: goto L1 – Thus, a fully optimizing compiler could solve the halting problem by checking if a given program is L1: goto L1! – But HP is an undecidable problem. – Hence, a fully optimizing compiler can’t exist! ● Therefore we talk just about an optimizing compiler. – and keep working without worrying about future prospects!

Manas Thakur CS502: Compiler Design 4 How to perform optimizations?

● Analysis – Go over the program – Identify some properties

● Potentially useful properties

● Transformation – Use the information computed by the analysis to transform the program

● without affecting the semantics

● An example that we have (not literally) seen: – Compute liveness information – Delete assignments to variables that are dead

Manas Thakur CS502: Compiler Design 5 Classifying optimizations

● Based on scope: – Local to basic blocks – Intraprocedural – Interprocedural

● Based on positioning: – High-level (transform or high-level IR) – Low-level (transform mid/low-level IR)

● Based on (in)dependence w.r.t. target machine: – Machine independent (general enough) – Machine dependent (specific to the architecture)

Manas Thakur CS502: Compiler Design 6 May versus Must information

● Consider the program: if (c) { ● Which variables may be assigned? a = ... b = ... – a, b, c } else { a = ... ● Which variables must be assigned? c = ... } – a ● May analysis: – the computed information may hold in at least one of the program. ● Must analysis: – the computed information must hold every time the program is executed.

Manas Thakur CS502: Compiler Design 7 Many many optimizations

, constant propagation, tail-call elimination, redundancy elimination, , loop-invariant code motion, , loop fusion, , array scalarization, inlining, synchronization elision, cloning, data prefetching, parallelization . . . etc . .

● How do they interact? – Optimist: we get the sum of all improvements. – Realist: many are in direct opposition.

● Let us study some of them!

Manas Thakur CS502: Compiler Design 8 Constant propagation

● Idea: – If the value of a variable is known to be a constant at compile-time, replace the use of the variable with the constant.

n = 10; n = 10; c = 2; c = 2; for (i=0; i

– Usually a very helpful optimization – e.g., Can we now unroll the loop?

● Why is it good? ● Why could it be bad? – When can we eliminate n and c themselves?

● Now you know how well different optimizations might interact!

Manas Thakur CS502: Compiler Design 9 Constant folding

● Idea: – If operands are known at compile-time, evaluate expression at compile-time.

r = 3.141 * 10; r = 31.41;

– What if the code was?

PI = 3.141; Constant propagation r = PI * 10; – And what now? Constant folding PI = 3.141; r = PI * 10; d = 2 * r; Called partial evaluation

Manas Thakur CS502: Compiler Design 10 Common sub-expression elimination

● Idea: – If program computes the same value multiple times, reuse the value.

t = b + c; a = b + c; a = t; c = b + c; c = t; d = b + c; d = b + c;

– Subexpressions can be reused until operands are redefined.

Manas Thakur CS502: Compiler Design 11 Copy propagation

● Idea: – After an assignment x = y, replace the uses of x with y.

x = y; x = y; if (x > 1) if (y > 1) s = x + f(x); s = y + f(y);

– Can only apply up to another assignment to x, or ... another assignment to y! – What if there was an assignment y = z earlier?

● Apply transitively to all assignments.

Manas Thakur CS502: Compiler Design 12 Dead-code elimination

● Idea: – If the result of a computation is never used, remove the computation.

x = y + 1; y = 1; y = 1; x = 2 * z; x = 2 * z;

– Remove code that assigns to dead variables.

● Liveness analysis done before would help! – This may, in turn, create more dead code.

● Dead-code elimination usually works transitively.

Manas Thakur CS502: Compiler Design 13 Unreachable-code elimination

● Idea: – Eliminate code that can never be executed

#define DEBUG 0 if (DEBUG) print(“Current value = ", v);

– High-level: look for if (false) or while (false)

● perhaps after constant folding! – Low-level: more difficult

● Code is just labels and gotos ● Traverse the CFG, marking reachable blocks

Manas Thakur CS502: Compiler Design 14 Next class

● Next class: – How to perform the optimizations that we have seen using a dataflow analysis?

● Starting with: – The back-end fullform of CFG!

● Approximately only 10 more classes left. – Hope this course is being successful in making (y)our hectic days a bit more exciting :-)

Manas Thakur CS502: Compiler Design 15 CS502: Compiler Design Code Optimization (Cont.)

Manas Thakur

Fall 2020 Recall A2

int a; if (*) { a = 10; else { //something that doesn’t touch ‘a’ } x = a;

● Is ‘a’ initialized in this program? – Reality during run-time: Depends – What to tell at compile-time?

● Is this a ‘must’ question or a ‘may’ question? ● Correct answer: No – How do we obtain such answers?

● Need to model the control-flow

Manas Thakur CS502: Compiler Design 17 Control-Flow Graph (CFG)

● Nodes represent instructions; edges represent flow of control

a = 0

a = 0 b = a + 1 L1: b = a + 1 c = c + b a = b * 2 c = c + b if a < N goto L1 return c a = b * 2

a < N

return c

Manas Thakur CS502: Compiler Design 18 1 Some CFG terminology a = 0

2 b = a + 1 ● pred[n] gives predecessors of n – pred[1]? pred[4]? pred[2]? 3 c = c + b

4 ● succ[n] gives successors of n a = b * 2 – succ[2]? succ[5]? 5 a < N

● def(n) gives variables defined by n 6 return c – def(3) = {c}

● use(n) gives variables used by n – use(3) = {b, c}

Manas Thakur CS502: Compiler Design 19 Live ranges revisited

● A variable is live, if its current value may be used in future.

– Insight: 1 a = 0 ● work from future to past ● backward over the CFG 2 b = a + 1

3 ● Live ranges: c = c + b – a: {1->2, 4->5->2} 4 a = b * 2 – b: {2->3, 3->4} – c: All edges except 1->2 5 a < N

6 return c

Manas Thakur CS502: Compiler Design 20 Liveness

● A variable v is live on an edge if there is a 1 a = 0 directed path from that edge to a use of v that does not go through any def of v. 2 b = a + 1

● A variable is live-in at a node if it is live on any 3 c = c + b of the in-edges of that node.

4 a = b * 2 ● A variable is live-out at a node if it is live on any of the out-edges of that node. 5 a < N

6 ● Verify: return c – a: {1->2, 4->5->2} – b: {2->4}

Manas Thakur CS502: Compiler Design 21 Computation of liveness

● Say live-in of n is in[n], and live-out of n is out[n].

● We can compute in[n] and out[n] for any n as follows:

in[n] = use[n] ∪ (out[n] – def[n])

out[n] = ∀s∈succ[n] ∪ in[s]

Called flow functions. Called dataflow equations.

Manas Thakur CS502: Compiler Design 22 Liveness as an iterative dataflow analysis IDFA for each n

in[n] = {}; out[n] = {} Initialize repeat

for each n Save previous values in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] ∪ (out[n] – def[n]) Compute out[n] = ∀ s ∈ succ[n] ∪ in[s] new values

● until in’[n] == in[n] and out’[n] == out[n] ∀n

Repeat till fixed-point

Manas Thakur CS502: Compiler Design 23 Fixed point Liveness analysis example

1 a = 0 4 a = b * 2

in[n] = use[n] ∪ (out[n] – def[n]) 2 b = a + 1 5 a < N out[n] = ∀s∈succ[n] ∪ in[s]

3 c = c + b 6 return c

Manas Thakur CS502: Compiler Design 24 In backward order 1 a = 0

2 b = a + 1

3 c = c + b

4 a = b * 2

5 a < N

6 return c ● Fixed point only in 3 iterations!

● Thus, the order of processing statements is important for efficiency.

in[n] = use[n] ∪ (out[n] – def[n]) out[n] = ∀s∈succ[n] ∪ in[s]

Manas Thakur CS502: Compiler Design 25 Complexity of our liveness computation algorithm

● For input program of size N repeat for each n – ≤N nodes in CFG in’[n] = in[n]; out’[n] = out[n]

N variables in[n] = use[n] ∪ (out[n] – def[n]) ⇒ N elements per in/out out[n] = ∀ s ∈ succ[n] ∪ in[s] ⇒ until in’[n] == in[n] and out’[n] == out[n] ∀n O(N) time per set union ⇒ – for loop performs constant number of set operations per node O(N2) time for for loop ⇒ – Each iteration of for loop can only add to each set (monotonicity) – Sizes of all in and out sets sum to 2N2 thus bounding the number of iterations of the repeat loop worst-case complexity of O(N4) ⇒ – Much less in practice (usually O(N) or O(N2)) if ordered properly.

Manas Thakur CS502: Compiler Design 26 Least fixed points

● There is often more than one solution for a given dataflow problem. – Any solution to dataflow equations is a conservative approximation. ● Conservatively assuming a variable is live does not break the program: – Just means more registers may be needed. ● Assuming a variable is dead when really live will break things.

● Many possible solutions; but we want the smallest: the least fixed point.

● The iterative algorithm computes this least fixed point.

Manas Thakur CS502: Compiler Design 27 Confused!?

● Is a theoretical topic or a practical one?

● Recall: – “A sangam of theory and practice.”

● Next class: – We are not leaving a topic as important as IDFA so soon!

Manas Thakur CS502: Compiler Design 28 CS502: Compiler Design Code Optimization (Cont.)

Manas Thakur

Fall 2020 Recall our IDFA algorithm

for each n

in[n] = ...; out[n] = ... Initialize repeat

for each n Save previous values in’[n] = in[n]; out’[n] = out[n] in[n] = ... Compute out[n] = ... new values until in’[n] = in[n] and out’[n] = out[n] for all n

Repeat till fixed-point

Do we need to process all the nodes in each iteration?

Manas Thakur CS502: Compiler Design 30 Worklist-based Implementation of IDFA

● Initialize a worklist of statements

● Forward analysis: – Start with the entry node – If OUT(n) changes, then add succ(n) to the worklist

● Backward analysis: – Start with the exit node – If IN(n) changes, then add pred(n) to the worklist

● In both the cases, iterate till fixed point.

Manas Thakur CS502: Compiler Design 31 Writing an IDFA (Cont.)

● Initialization of IN and OUT sets depends on the analysis: – empty if the information grows – all the nodes if the information shrinks

● Requirement for termination: – unidirectional growth/shrinkage – Called monotonicity

● Confluence/Meet operation (at control-flow merges): – Union – Intersection Depends on the analysis

Manas Thakur CS502: Compiler Design 32 Live-variable analysis revisited

● Direction: – Backward

● Initialization: – Empty sets

● Flow functions:

– out[n] = ∀s∈succ[n] ∪ in[s] – in[n] = use[n] ∪ (out[n] – def[n])

● Confluence operation: – Union

Manas Thakur CS502: Compiler Design 33 Common sub-expressions revisited

● Idea: – If a program computes the same value multiple times, reuse the value.

t = b + c; a = b + c; a = t; c = b + c; c = t; d = b + c; d = b + c;

– Subexpressions can be reused until operands are redefined.

– Say given a node n, the expressions computed at n are denoted as gen(n) and the ones killed (operands redefined) at n are denoted as kill(n).

Manas Thakur CS502: Compiler Design 34 Common subexpressions as an IDFA

● Direction: – Forward

● Initialization: – Empty sets

● Flow functions:

– in[n] = ∀p∈pred[n] ∩ out[p] – out[n] = gen[n] ∪ (in[n] – kill[n])

● Confluence operation: – Intersection

Manas Thakur CS502: Compiler Design 35 Are we efficient enough?

● When can IDFAs take a lot of time?

● Which operations could be expensive? – Confluence – Equality ● Compilers may have to perform several IDFAs.

● How can we make an IDFA more efficient (perhaps with some loss of precision)?

repeat

for each n in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] ∪ (out[n] – def[n]) out[n] = ∀ s ∈ succ[n] ∪ in[s] until in’[n] == in[n] and out’[n] == out[n] ∀ n

Manas Thakur CS502: Compiler Design 36 Basic Blocks a = 0 L1: b = a + 1 c = c + b a = 0 a = b * 2 if a < N goto L1 return c b = a + 1

a = 0 c = c + b

b = a + 1 a = b * 2 c = c + b a = b * 2 a < N a < N

return c return c

Each instruction as a node Using basic blocks

Manas Thakur CS502: Compiler Design 37 Basic Blocks (Cont.)

● Idea: – Once execution enters a basic , all statements are executed in sequence. – Single-entry, single-exit region

● Details: – Starts with a label – Ends with one or more branches – Edges may be labeled with predicates

● True/false ● Exceptions

● Key: Improve efficiency, with reasonable precision.

Manas Thakur CS502: Compiler Design 38 Have you got a compiler’s eyes yet?

● What properties can you identify about this program?

S1: y = 1;

S2: y = 2;

S3: x = y;

● What’s the advantage if it was rewritten as follows?

S1: y1 = 1;

S2: y2 = 2;

S3: x = y2;

● Def-use becomes explicit.

Manas Thakur CS502: Compiler Design 39 Static Single Assignment (SSA)

● A form of IR in which each use can be mapped to a single definition. – Achieved using variable renaming and phi nodes.

if (flag) if (flag) x1 = -1; x = -1; else else x = 1; x = 1; 2 x = Φ(x , x ) y = x * a; 3 1 2 y = x3 * a;

● Many compilers use SSA form in their IRs.

Manas Thakur CS502: Compiler Design 40 SSA Classwork

● Convert the following program to SSA form:

– (Hint: First convert to 3AC) x1 = 0;

i1 = 0;

L1: x = 0; i = Φ(i ,i ); for (i=0; i

} i2 = i13 + 1; x = x + i; x3 = x2 – 1;

i3 = i2 + 1;

goto L1; }

x4 = Φ(x1, x3);

x5 = x4 + i13;

Manas Thakur CS502: Compiler Design 41 Effect of SSA on !?

● What is the effect of SSA form on liveness?

● What does SSA do? – Breaks a single variable into multiple instances – Instances represent distinct, non-overlapping uses

● Effect: – Breaks up live ranges; often improves register allocation

x x1 x2

Manas Thakur CS502: Compiler Design 42 Featuring Next in Code Optimization

● Heard of the 80-20 or 90-10 rule? – X% of time is spent in executing y% of the code, where X >> y.

● Which all kinds of code portions tend to form the region ‘y’ in typical programs? – Loops – Methods

● Tomorrow: Loop optimizations

Manas Thakur CS502: Compiler Design 43 CS502: Compiler Design Loop Optimizations

Manas Thakur

Fall 2020 Why optimize loops?

for (i=0; i

● Form a significant portion of the time spent in executing programs. – If N is just 10000 (not uncommon), we have too many instructions!

● How many in the above loop? – What if S1/S2 is/are function calls? ● Involve costly instructions in each iteration: – Comparisons – Jumps

Manas Thakur CS502: Compiler Design 45 What is a loop?

● A loop in a CFG is a set of nodes S such that: – There is a designated header node h in S – There is a path from each node in S to h – There is a path from h to each node in S – h is the only node in S with an incoming edge from outside S

Manas Thakur CS502: Compiler Design 46 Are all these loops?

Manas Thakur CS502: Compiler Design 47 What about these?

Manas Thakur CS502: Compiler Design 48 Identifying loops using dominators

● A node d dominates a node n if every path from entry to n goes through d.

● Compute dominators of each node:

Manas Thakur CS502: Compiler Design 49 Flow function for dominators

● Assuming D[i] is the set of dominators of node i:

D[entry] = {entry}

D[n] = {n} ∪ ∀p∈pred[n] ∩ D[p]

Manas Thakur CS502: Compiler Design 50 Identifying loops using dominators (Cont.)

● First, identify a back edge: – An edge from a node n to another node h, where h dominates n ● Each back edge leads to a loop: – Set X of nodes such that for each x X, h dominates x and there is a path from x to n not containing h∈ – h is the header ● Verify:

Manas Thakur CS502: Compiler Design 51 Loop-Invariant Code Motion (LICM)

● Loop-invariant code: – d: t = a OP b, such that:

● a and b are constants; or ● all the definitions of a and b that reach d are outside the loop; or ● only one definition each of a and b reaches d, and that definition is loop-invariant.

● Example:

L0: t = 0 L1: i = i + 1 t = a * b M[i] = t if i

Manas Thakur CS502: Compiler Design 52 LICM: Get ready for code hoisting

● Can we always hoist loop-invariant code?

L0: t = 0 L0: t = 0 L0: t = 0 L1: i = i + 1 L1: if i>=N goto L2 L1: M[j] = t t = a * b i = i + 1 i = i + 1 M[i] = t t = a * b t = a * b if i

● Criteria for hoisting d: t = a OP b: – d dominates all loop exits at which t is live-out, and – there is only one definition of t in the loop, and – t is not live-out of the loop preheader ● How can we hoist code in the pink and the orange blocks?

Manas Thakur CS502: Compiler Design 53 Induction-variable optimization

● Induction variables: – Variables whose value depends on iteration variable ● Optimization: – Compute them efficiently, if possible

s = 0 s = 0 i = 0 k’ = a L1: if i>=N goto L2 b = N * 4 j = i * 4 c = a + b k = j + a L1: if k’>=c goto L2 x = M[k] x = M[k’] s = s + x s = s + x i = i + 1 k’ = k’ + 4 goto L1 goto L1 L2: L2:

Manas Thakur CS502: Compiler Design 54

● Minimize the number of increments and condition-checks

● Be careful about the increase in code size (I- misses!)

Only even no. of iterations: Any no. of iterations:

if i

Manas Thakur CS502: Compiler Design 55

● A C/Java starting with MATLAB: for i=1:1000, for j=1:1000, a(i) = a(i) + b(i,j)*c(i) end end

● But MATLAB stores matrices in column-major order!

● Implication? – Cache misses (perhaps in each iteration)! ● Solution (interchange the loops!):

for j=1:1000, for i=1:1000, a(i) = a(i) + b(i,j)*c(i) end end

Manas Thakur CS502: Compiler Design 56 Many more loop optimizations

● Loop fusion

● Loop fission

Some other time! ● Loop tiling

● . . .

● Vectorization Next class! ● Parallelization

Manas Thakur CS502: Compiler Design 57