Optimization Compiler Passes Optimizations the Role of The
Total Page:16
File Type:pdf, Size:1020Kb
Analysis Synthesis of input program of output program Compiler (front -end) (back -end) character Passes stream Intermediate Lexical Analysis Code Generation token intermediate Optimization stream form Syntactic Analysis Optimization abstract intermediate Before and after generating machine syntax tree form code, devote one or more passes over the program to “improve” code quality Semantic Analysis Code Generation annotated target AST language 2 The Role of the Optimizer Optimizations • The compiler can implement a procedure in many ways • The optimizer tries to find an implementation that is “better” – Speed, code size, data space, … Identify inefficiencies in intermediate or target code Replace with equivalent but better sequences To accomplish this, it • Analyzes the code to derive knowledge about run-time • equivalent = "has the same externally visible behavior – Data-flow analysis, pointer disambiguation, … behavior" – General term is “static analysis” Target-independent optimizations best done on IL • Uses that knowledge in an attempt to improve the code – Literally hundreds of transformations have been proposed code – Large amount of overlap between them Target-dependent optimizations best done on Nothing “optimal” about optimization target code • Proofs of optimality assume restrictive & unrealistic conditions • Better goal is to “usually improve” 3 4 Traditional Three-pass Compiler The Optimizer (or Middle End) IR Source Front Middle IR Back Machine IROpt IROpt IROpt ... IROpt IR Code End End End code 1 2 3 n Errors Errors Modern optimizers are structured as a series of passes Middle End does Code Improvement (Optimization) • Analyzes IR and rewrites (or transforms ) IR Typical Transformations • Primary goal is to reduce running time of the • Discover & propagate some constant value compiled code • Move a computation to a less frequently executed place – May also improve space, power consumption, … • Specialize some computation based on context • Must preserve “meaning” of the code • Discover a redundant computation & remove it – Measured by values of named variables • Remove useless or unreachable code • Encode an idiom in some particularly efficient form – A course (or two) unto itself 5 6 1 Kinds of optimizations Peephole Optimization Optimizations are characterized by which Transformation over what Scope. Typical scopes are: After target code generation, look at adjacent peephole : instructions (a “peephole” on the code stream) – look at adjacent instructions – try to replace adjacent instructions with something • local : faster – look at straight-line sequence of statements Example: • global (intraprocedural): movl %eax, 12(%ebp) – look at entire procedure movl 12(%ebp), %ebx • whole program (interprocedural): => – look across procedures movl %eax, 12(%ebp) Larger scope => better optimization but more cost and movl %eax , %ebx complexity 7 8 Algebraic Simplification Local Optimizations “constant folding”, “strength reduction” Analysis and optimizations within a basic block z = 3 + 4; z = x + 0; z = x * 1; • Basic block: straight-line sequence of statements – no control flow into or out of middle of sequence z = x * 2; • Better than peephole z = x * 8; • Not too hard to implement z = x / 8; double x, y, z; Machine-independent, if done on intermediate code z = (x + y) - y; Can be done by peephole optimizer, or by code generator 9 10 Local Constant Propagation Local Dead Assignment (Store) Elimination If variable assigned a constant value, replace downstream If l.h.s. of assignment never referenced again before uses of the variable with the constant. being overwritten, then can delete assignment. Example: Can enable more constant folding final int count = 10; Example: ... Primary use: clean-up after final int count = 10; x = count * 5; previous optimizations! ... y = x ^ 3; x = count * 5; x = 7; y = x ^ 3; Intermediate code after constant propagation: Unoptimized intermediate code: t1 = 10; t1 = 10; t2 = 5; t2 = 5; t3 = 50; t3 = t1 * t2; x = 50; x = t3; t4 = 50; t4 = x; t5 = 3; t5 = 3; t6 = 125000; t6 = exp(t4, t5); y = 125000; y = t6; x = 7; 11 12 2 Local Common Subexpression Elimination Redundancy Elimination Implementation (AKA Redundancy Elimination) An expression x+y is redundant if and only if, along every Avoid repeating the same calculation path from the procedure’s entry, it has been evaluated, and its • CSE of repeated loads: redundant load elimination constituent subexpressions (x & y) have not been re-defined. Keep track of available expressions Source: If the compiler can prove that an expression is redundant ... a[i] + b[i] ... • It can preserve the results of earlier evaluations Unoptimized intermediate code: • It can replace the current evaluation with a reference t1 = *(fp + ioffset); t2 = t1 * 4; t3 = fp + t2; Two pieces to the problem t4 = *(t3 + aoffset); • Proving that x+y is redundant t5 = *(fp + ioffset); • Rewriting the code to eliminate the redundant evaluation t6 = t5 * 4; t7 = fp + t6; One technique for accomplishing both is called value numbering t8 = *(t7 + boffset); t9 = t4 + t8; 13 14 Value Numbering (An old idea) Local Value Numbering The Algorithm The key notion ( Balke 1968 or Ershov 1954 ) For each operation o = <operator, o 1, o 2> in the block • Assign an identifying number, V(n), to each expression 1 Get value numbers for operands from hash lookup – V(x+y) = V(j) iff x+y and j have the same value ∀ path 2 Hash <operator,VN(o 1),VN(o 2)> to get a value – Use hashing over the value numbers to make it efficient number for o • Use these numbers to improve the code 3 If o already had a value number, replace o with a reference Improving the code 4 If o 1 & o 2 are constant, evaluate it & replace with a • Replace redundant expressions load • Simplify algebraic identities • Discover constant-valued expressions, fold & If hashing behaves, the algorithm runs in linear time propagate them Handling algebraic identities This technique was invented for low-level, linear IR s • Case statement on operator type Equivalent methods exist for trees ( build a DAG ) 15 • Handle special cases within each operator 16 Local Value Numbering Local Value Numbering Example Example ( continued ) Original Code With VNs Rewritten Original Code With VNs Rewritten ← 3 ← 1 2 3 ← 1 2 ← 3 ← 1 2 3 ← 1 2 a x + y a x + y a x + y a0 x0 + y 0 a0 x0 + y 0 a0 x0 + y 0 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 b x + y b x + y b a b0 x0 + y 0 b0 x0 + y 0 b0 a0 ← 4 ← 4 ← ← 4 ← 4 ← a 17 a 17 a 17 a1 17 a1 17 a1 17 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 c x + y c x + y c a (oops!) c0 x0 + y 0 c0 x0 + y 0 c0 a0 Two redundancies: Options: Renaming: Notation: Result: 3 3 3 • Eliminate stmts • Use c ← b • Give each value a • While complex, • a0 is available ∗ with a • Save a 3 in t 3 unique name the meaning is • Rewriting just clear • Coalesce results ? • Rename around it • Makes it clear works 17 18 3 Simple Extensions to Value Numbering Intraprocedural (Global) optimizations Constant folding • Add a bit that records when a value is constant • Enlarge scope of analysis to entire procedure • Evaluate constant values at compile-time – more opportunities for optimization • Replace with load immediate or immediate operand – have to deal with branches, merges, and loops • No stronger local algorithm Identities: • Can do constant propagation, common x←←←y, x+0, x-0, x ∗∗∗1, x÷1, x-x, subexpression elimination, etc. at global level x∗∗∗0, x ÷÷÷x, x ∨∨∨0, x ∧∧∧ 0x FF …FF , max(x, MAXINT ), min(x, MININT ), max(x,x), min(y,y), and so on ... • Can do new things, e.g. loop optimizations Algebraic identities • Must check (many) special cases Optimizing compilers usually work at this level • Replace result with input VN With values, not names • Build a decision tree on operation 19 20 Intraprocedural (Global) Optimizations Control-flow Graph Two data structures are commonly used to help analyze the of procedure body. Models the transfer of control in the procedure Control flow graph (CFG) captures flow of control • Nodes in the graph are basic blocks – nodes are IL statements, or whole basic blocks – Can be represented with quads or any other linear – edges represent control flow representation – node with multiple successors = branch/switch – node with multiple predecessors = merge • Edges in the graph represent control flow – loop in graph = loop if (x = y) Data flow graph (DFG) capture flow of data Example A common one is def/use chains: ← ← – nodes are def(inition)s and uses a 2 a 3 b ← 5 b ← 4 – edge from def to use – a def can reach multiple uses ← – a use can have multiple reaching defs c a * b 21 22 Data-flow Graph – Use/Def Chains Analysis and Transformation Models the transfer of data in the procedure Each optimization is made up of – some number of analyses • Nodes in the graph are definitions and uses – followed by a transformation • Edges in the graph represent data flow Analyze CFG and/or DFG by propagating info forward or backward along CFG and/or DFG edges if (x = y) – edges called program points Example – merges in graph require combining info a ← 2 – loops in graph require iterative approximation b ← 5 b Perform improving transformations based on info a ← 3 computed a ← b a b 4 – have to wait until any iterative approximation has converged Analysis must be conservative/safe/sound so that c ← a * b transformations preserve program behavior 23 24 4 Data-flow Analysis Data-flow Analysis Data-flow analysis is a collection of techniques for Limitations compile-time reasoning about the run-time flow of values 1. Precision – “up to symbolic execution” • Almost always involves building a graph – Assume all paths are taken – Problems are trivial on a basic block 2.