CS502: Compiler Design Code Optimization
Manas Thakur
Fall 2020 Fast. Faster. Fastest? Character stream
Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer d
Token stream
Intermediate representation n
d e
k n SyntaxSyntax Analyzer Analyzer Code Generator
Code Generator c e
t a
Syntax tree Target machine code n
B o
r Semantic Analyzer Machine-Dependent Semantic Analyzer Code Optimizer F
Syntax tree Target machine code
Intermediate Intermediate Symbol Code Generator Code Generator Table Intermediate representation
Manas Thakur CS502: Compiler Design 2 Role of Code Optimizer
● Make the program better – time, memory, energy, ...
● No guarantees in this land! – Will a particular optimization for sure improve something? – Will performing an optimization affect something else? – In what order should I perform the optimizations? – What “scope” to perform certain optimization at? – Is the optimizer fast enough?
● Can an optimized program be optimized further?
Manas Thakur CS502: Compiler Design 3 Full employment theorem for compiler writers
● Statement: There is no fully optimizing compiler.
● Assume it exists: – such that it transforms a program P to the smallest program Opt(P) that has the same behaviour as P. – Halting problem comes to the rescue:
● Smallest program that never halts: L1: goto L1 – Thus, a fully optimizing compiler could solve the halting problem by checking if a given program is L1: goto L1! – But HP is an undecidable problem. – Hence, a fully optimizing compiler can’t exist! ● Therefore we talk just about an optimizing compiler. – and keep working without worrying about future prospects!
Manas Thakur CS502: Compiler Design 4 How to perform optimizations?
● Analysis – Go over the program – Identify some properties
● Potentially useful properties
● Transformation – Use the information computed by the analysis to transform the program
● without affecting the semantics
● An example that we have (not literally) seen: – Compute liveness information – Delete assignments to variables that are dead
Manas Thakur CS502: Compiler Design 5 Classifying optimizations
● Based on scope: – Local to basic blocks – Intraprocedural – Interprocedural
● Based on positioning: – High-level (transform source code or high-level IR) – Low-level (transform mid/low-level IR)
● Based on (in)dependence w.r.t. target machine: – Machine independent (general enough) – Machine dependent (specific to the architecture)
Manas Thakur CS502: Compiler Design 6 May versus Must information
● Consider the program: if (c) { ● Which variables may be assigned? a = ... b = ... – a, b, c } else { a = ... ● Which variables must be assigned? c = ... } – a ● May analysis: – the computed information may hold in at least one execution of the program. ● Must analysis: – the computed information must hold every time the program is executed.
Manas Thakur CS502: Compiler Design 7 Many many optimizations
● Constant folding, constant propagation, tail-call elimination, redundancy elimination, dead code elimination, loop-invariant code motion, loop splitting, loop fusion, strength reduction, array scalarization, inlining, synchronization elision, cloning, data prefetching, parallelization . . . etc . .
● How do they interact? – Optimist: we get the sum of all improvements. – Realist: many are in direct opposition.
● Let us study some of them!
Manas Thakur CS502: Compiler Design 8 Constant propagation
● Idea: – If the value of a variable is known to be a constant at compile-time, replace the use of the variable with the constant.
n = 10; n = 10; c = 2; c = 2; for (i=0; i – Usually a very helpful optimization – e.g., Can we now unroll the loop? ● Why is it good? ● Why could it be bad? – When can we eliminate n and c themselves? ● Now you know how well different optimizations might interact! Manas Thakur CS502: Compiler Design 9 Constant folding ● Idea: – If operands are known at compile-time, evaluate expression at compile-time. r = 3.141 * 10; r = 31.41; – What if the code was? PI = 3.141; Constant propagation r = PI * 10; – And what now? Constant folding PI = 3.141; r = PI * 10; d = 2 * r; Called partial evaluation Manas Thakur CS502: Compiler Design 10 Common sub-expression elimination ● Idea: – If program computes the same value multiple times, reuse the value. t = b + c; a = b + c; a = t; c = b + c; c = t; d = b + c; d = b + c; – Subexpressions can be reused until operands are redefined. Manas Thakur CS502: Compiler Design 11 Copy propagation ● Idea: – After an assignment x = y, replace the uses of x with y. x = y; x = y; if (x > 1) if (y > 1) s = x + f(x); s = y + f(y); – Can only apply up to another assignment to x, or ... another assignment to y! – What if there was an assignment y = z earlier? ● Apply transitively to all assignments. Manas Thakur CS502: Compiler Design 12 Dead-code elimination ● Idea: – If the result of a computation is never used, remove the computation. x = y + 1; y = 1; y = 1; x = 2 * z; x = 2 * z; – Remove code that assigns to dead variables. ● Liveness analysis done before would help! – This may, in turn, create more dead code. ● Dead-code elimination usually works transitively. Manas Thakur CS502: Compiler Design 13 Unreachable-code elimination ● Idea: – Eliminate code that can never be executed #define DEBUG 0 if (DEBUG) print(“Current value = ", v); – High-level: look for if (false) or while (false) ● perhaps after constant folding! – Low-level: more difficult ● Code is just labels and gotos ● Traverse the CFG, marking reachable blocks Manas Thakur CS502: Compiler Design 14 Next class ● Next class: – How to perform the optimizations that we have seen using a dataflow analysis? ● Starting with: – The back-end fullform of CFG! ● Approximately only 10 more classes left. – Hope this course is being successful in making (y)our hectic days a bit more exciting :-) Manas Thakur CS502: Compiler Design 15 CS502: Compiler Design Code Optimization (Cont.) Manas Thakur Fall 2020 Recall A2 int a; if (*) { a = 10; else { //something that doesn’t touch ‘a’ } x = a; ● Is ‘a’ initialized in this program? – Reality during run-time: Depends – What to tell at compile-time? ● Is this a ‘must’ question or a ‘may’ question? ● Correct answer: No – How do we obtain such answers? ● Need to model the control-flow Manas Thakur CS502: Compiler Design 17 Control-Flow Graph (CFG) ● Nodes represent instructions; edges represent flow of control a = 0 a = 0 b = a + 1 L1: b = a + 1 c = c + b a = b * 2 c = c + b if a < N goto L1 return c a = b * 2 a < N return c Manas Thakur CS502: Compiler Design 18 1 Some CFG terminology a = 0 2 b = a + 1 ● pred[n] gives predecessors of n – pred[1]? pred[4]? pred[2]? 3 c = c + b 4 ● succ[n] gives successors of n a = b * 2 – succ[2]? succ[5]? 5 a < N ● def(n) gives variables defined by n 6 return c – def(3) = {c} ● use(n) gives variables used by n – use(3) = {b, c} Manas Thakur CS502: Compiler Design 19 Live ranges revisited ● A variable is live, if its current value may be used in future. – Insight: 1 a = 0 ● work from future to past ● backward over the CFG 2 b = a + 1 3 ● Live ranges: c = c + b – a: {1->2, 4->5->2} 4 a = b * 2 – b: {2->3, 3->4} – c: All edges except 1->2 5 a < N 6 return c Manas Thakur CS502: Compiler Design 20 Liveness ● A variable v is live on an edge if there is a 1 a = 0 directed path from that edge to a use of v that does not go through any def of v. 2 b = a + 1 ● A variable is live-in at a node if it is live on any 3 c = c + b of the in-edges of that node. 4 a = b * 2 ● A variable is live-out at a node if it is live on any of the out-edges of that node. 5 a < N 6 ● Verify: return c – a: {1->2, 4->5->2} – b: {2->4} Manas Thakur CS502: Compiler Design 21 Computation of liveness ● Say live-in of n is in[n], and live-out of n is out[n]. ● We can compute in[n] and out[n] for any n as follows: in[n] = use[n] ∪ (out[n] – def[n]) out[n] = ∀s∈succ[n] ∪ in[s] Called flow functions. Called dataflow equations. Manas Thakur CS502: Compiler Design 22 Liveness as an iterative dataflow analysis IDFA for each n in[n] = {}; out[n] = {} Initialize repeat for each n Save previous values in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] ∪ (out[n] – def[n]) Compute out[n] = ∀ s ∈ succ[n] ∪ in[s] new values ● until in’[n] == in[n] and out’[n] == out[n] ∀n Repeat till fixed-point Manas Thakur CS502: Compiler Design 23 Fixed point Liveness analysis example 1 a = 0 4 a = b * 2 in[n] = use[n] ∪ (out[n] – def[n]) 2 b = a + 1 5 a < N out[n] = ∀s∈succ[n] ∪ in[s] 3 c = c + b 6 return c Manas Thakur CS502: Compiler Design 24 In backward order 1 a = 0 2 b = a + 1 3 c = c + b 4 a = b * 2 5 a < N 6 return c ● Fixed point only in 3 iterations! ● Thus, the order of processing statements is important for efficiency. in[n] = use[n] ∪ (out[n] – def[n]) out[n] = ∀s∈succ[n] ∪ in[s] Manas Thakur CS502: Compiler Design 25 Complexity of our liveness computation algorithm ● For input program of size N repeat for each n – ≤N nodes in CFG in’[n] = in[n]; out’[n] = out[n] N variables in[n] = use[n] ∪ (out[n] – def[n]) ⇒ N elements per in/out out[n] = ∀ s ∈ succ[n] ∪ in[s] ⇒ until in’[n] == in[n] and out’[n] == out[n] ∀n O(N) time per set union ⇒ – for loop performs constant number of set operations per node O(N2) time for for loop ⇒ – Each iteration of for loop can only add to each set (monotonicity) – Sizes of all in and out sets sum to 2N2 thus bounding the number of iterations of the repeat loop worst-case complexity of O(N4) ⇒ – Much less in practice (usually O(N) or O(N2)) if ordered properly. Manas Thakur CS502: Compiler Design 26 Least fixed points ● There is often more than one solution for a given dataflow problem. – Any solution to dataflow equations is a conservative approximation. ● Conservatively assuming a variable is live does not break the program: – Just means more registers may be needed. ● Assuming a variable is dead when really live will break things. ● Many possible solutions; but we want the smallest: the least fixed point. ● The iterative algorithm computes this least fixed point. Manas Thakur CS502: Compiler Design 27 Confused!? ● Is compilers a theoretical topic or a practical one? ● Recall: – “A sangam of theory and practice.” ● Next class: – We are not leaving a topic as important as IDFA so soon! Manas Thakur CS502: Compiler Design 28 CS502: Compiler Design Code Optimization (Cont.) Manas Thakur Fall 2020 Recall our IDFA algorithm for each n in[n] = ...; out[n] = ... Initialize repeat for each n Save previous values in’[n] = in[n]; out’[n] = out[n] in[n] = ... Compute out[n] = ... new values until in’[n] = in[n] and out’[n] = out[n] for all n Repeat till fixed-point Do we need to process all the nodes in each iteration? Manas Thakur CS502: Compiler Design 30 Worklist-based Implementation of IDFA ● Initialize a worklist of statements ● Forward analysis: – Start with the entry node – If OUT(n) changes, then add succ(n) to the worklist ● Backward analysis: – Start with the exit node – If IN(n) changes, then add pred(n) to the worklist ● In both the cases, iterate till fixed point. Manas Thakur CS502: Compiler Design 31 Writing an IDFA (Cont.) ● Initialization of IN and OUT sets depends on the analysis: – empty if the information grows – all the nodes if the information shrinks ● Requirement for termination: – unidirectional growth/shrinkage – Called monotonicity ● Confluence/Meet operation (at control-flow merges): – Union – Intersection Depends on the analysis Manas Thakur CS502: Compiler Design 32 Live-variable analysis revisited ● Direction: – Backward ● Initialization: – Empty sets ● Flow functions: – out[n] = ∀s∈succ[n] ∪ in[s] – in[n] = use[n] ∪ (out[n] – def[n]) ● Confluence operation: – Union Manas Thakur CS502: Compiler Design 33 Common sub-expressions revisited ● Idea: – If a program computes the same value multiple times, reuse the value. t = b + c; a = b + c; a = t; c = b + c; c = t; d = b + c; d = b + c; – Subexpressions can be reused until operands are redefined. – Say given a node n, the expressions computed at n are denoted as gen(n) and the ones killed (operands redefined) at n are denoted as kill(n). Manas Thakur CS502: Compiler Design 34 Common subexpressions as an IDFA ● Direction: – Forward ● Initialization: – Empty sets ● Flow functions: – in[n] = ∀p∈pred[n] ∩ out[p] – out[n] = gen[n] ∪ (in[n] – kill[n]) ● Confluence operation: – Intersection Manas Thakur CS502: Compiler Design 35 Are we efficient enough? ● When can IDFAs take a lot of time? ● Which operations could be expensive? – Confluence – Equality ● Compilers may have to perform several IDFAs. ● How can we make an IDFA more efficient (perhaps with some loss of precision)? repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] ∪ (out[n] – def[n]) out[n] = ∀ s ∈ succ[n] ∪ in[s] until in’[n] == in[n] and out’[n] == out[n] ∀ n Manas Thakur CS502: Compiler Design 36 Basic Blocks a = 0 L1: b = a + 1 c = c + b a = 0 a = b * 2 if a < N goto L1 return c b = a + 1 a = 0 c = c + b b = a + 1 a = b * 2 c = c + b a = b * 2 a < N a < N return c return c Each instruction as a node Using basic blocks Manas Thakur CS502: Compiler Design 37 Basic Blocks (Cont.) ● Idea: – Once execution enters a basic block, all statements are executed in sequence. – Single-entry, single-exit region ● Details: – Starts with a label – Ends with one or more branches – Edges may be labeled with predicates ● True/false ● Exceptions ● Key: Improve efficiency, with reasonable precision. Manas Thakur CS502: Compiler Design 38 Have you got a compiler’s eyes yet? ● What properties can you identify about this program? S1: y = 1; S2: y = 2; S3: x = y; ● What’s the advantage if it was rewritten as follows? S1: y1 = 1; S2: y2 = 2; S3: x = y2; ● Def-use becomes explicit. Manas Thakur CS502: Compiler Design 39 Static Single Assignment (SSA) ● A form of IR in which each use can be mapped to a single definition. – Achieved using variable renaming and phi nodes. if (flag) if (flag) x1 = -1; x = -1; else else x = 1; x = 1; 2 x = Φ(x , x ) y = x * a; 3 1 2 y = x3 * a; ● Many compilers use SSA form in their IRs. Manas Thakur CS502: Compiler Design 40 SSA Classwork ● Convert the following program to SSA form: – (Hint: First convert to 3AC) x1 = 0; i1 = 0; L1: x = 0; i = Φ(i ,i ); for (i=0; i } i2 = i13 + 1; x = x + i; x3 = x2 – 1; i3 = i2 + 1; goto L1; } x4 = Φ(x1, x3); x5 = x4 + i13; Manas Thakur CS502: Compiler Design 41 Effect of SSA on Register Allocation!? ● What is the effect of SSA form on liveness? ● What does SSA do? – Breaks a single variable into multiple instances – Instances represent distinct, non-overlapping uses ● Effect: – Breaks up live ranges; often improves register allocation x x1 x2 Manas Thakur CS502: Compiler Design 42 Featuring Next in Code Optimization ● Heard of the 80-20 or 90-10 rule? – X% of time is spent in executing y% of the code, where X >> y. ● Which all kinds of code portions tend to form the region ‘y’ in typical programs? – Loops – Methods ● Tomorrow: Loop optimizations Manas Thakur CS502: Compiler Design 43 CS502: Compiler Design Loop Optimizations Manas Thakur Fall 2020 Why optimize loops? for (i=0; i ● Form a significant portion of the time spent in executing programs. – If N is just 10000 (not uncommon), we have too many instructions! ● How many in the above loop? – What if S1/S2 is/are function calls? ● Involve costly instructions in each iteration: – Comparisons – Jumps Manas Thakur CS502: Compiler Design 45 What is a loop? ● A loop in a CFG is a set of nodes S such that: – There is a designated header node h in S – There is a path from each node in S to h – There is a path from h to each node in S – h is the only node in S with an incoming edge from outside S Manas Thakur CS502: Compiler Design 46 Are all these loops? Manas Thakur CS502: Compiler Design 47 What about these? Manas Thakur CS502: Compiler Design 48 Identifying loops using dominators ● A node d dominates a node n if every path from entry to n goes through d. ● Compute dominators of each node: Manas Thakur CS502: Compiler Design 49 Flow function for computing dominators ● Assuming D[i] is the set of dominators of node i: D[entry] = {entry} D[n] = {n} ∪ ∀p∈pred[n] ∩ D[p] Manas Thakur CS502: Compiler Design 50 Identifying loops using dominators (Cont.) ● First, identify a back edge: – An edge from a node n to another node h, where h dominates n ● Each back edge leads to a loop: – Set X of nodes such that for each x X, h dominates x and there is a path from x to n not containing h∈ – h is the header ● Verify: Manas Thakur CS502: Compiler Design 51 Loop-Invariant Code Motion (LICM) ● Loop-invariant code: – d: t = a OP b, such that: ● a and b are constants; or ● all the definitions of a and b that reach d are outside the loop; or ● only one definition each of a and b reaches d, and that definition is loop-invariant. ● Example: L0: t = 0 L1: i = i + 1 t = a * b M[i] = t if i Manas Thakur CS502: Compiler Design 52 LICM: Get ready for code hoisting ● Can we always hoist loop-invariant code? L0: t = 0 L0: t = 0 L0: t = 0 L1: i = i + 1 L1: if i>=N goto L2 L1: M[j] = t t = a * b i = i + 1 i = i + 1 M[i] = t t = a * b t = a * b if i ● Criteria for hoisting d: t = a OP b: – d dominates all loop exits at which t is live-out, and – there is only one definition of t in the loop, and – t is not live-out of the loop preheader ● How can we hoist code in the pink and the orange blocks? Manas Thakur CS502: Compiler Design 53 Induction-variable optimization ● Induction variables: – Variables whose value depends on iteration variable ● Optimization: – Compute them efficiently, if possible s = 0 s = 0 i = 0 k’ = a L1: if i>=N goto L2 b = N * 4 j = i * 4 c = a + b k = j + a L1: if k’>=c goto L2 x = M[k] x = M[k’] s = s + x s = s + x i = i + 1 k’ = k’ + 4 goto L1 goto L1 L2: L2: Manas Thakur CS502: Compiler Design 54 Loop unrolling ● Minimize the number of increments and condition-checks ● Be careful about the increase in code size (I-cache misses!) Only even no. of iterations: Any no. of iterations: if i Manas Thakur CS502: Compiler Design 55 Loop interchange ● A C/Java programmer starting with MATLAB: for i=1:1000, for j=1:1000, a(i) = a(i) + b(i,j)*c(i) end end ● But MATLAB stores matrices in column-major order! ● Implication? – Cache misses (perhaps in each iteration)! ● Solution (interchange the loops!): for j=1:1000, for i=1:1000, a(i) = a(i) + b(i,j)*c(i) end end Manas Thakur CS502: Compiler Design 56 Many more loop optimizations ● Loop fusion ● Loop fission ● Loop inversion Some other time! ● Loop tiling ● . . . ● Vectorization Next class! ● Parallelization Manas Thakur CS502: Compiler Design 57