Optimization Compiler Passes Optimizations the Role of The

Total Page:16

File Type:pdf, Size:1020Kb

Optimization Compiler Passes Optimizations the Role of The Analysis Synthesis of input program of output program Compiler (front -end) (back -end) character Passes stream Intermediate Lexical Analysis Code Generation token intermediate Optimization stream form Syntactic Analysis Optimization abstract intermediate Before and after generating machine syntax tree form code, devote one or more passes over the program to “improve” code quality Semantic Analysis Code Generation annotated target AST language 2 The Role of the Optimizer Optimizations • The compiler can implement a procedure in many ways • The optimizer tries to find an implementation that is “better” – Speed, code size, data space, … Identify inefficiencies in intermediate or target code Replace with equivalent but better sequences To accomplish this, it • Analyzes the code to derive knowledge about run-time • equivalent = "has the same externally visible behavior – Data-flow analysis, pointer disambiguation, … behavior" – General term is “static analysis” Target-independent optimizations best done on IL • Uses that knowledge in an attempt to improve the code – Literally hundreds of transformations have been proposed code – Large amount of overlap between them Target-dependent optimizations best done on Nothing “optimal” about optimization target code • Proofs of optimality assume restrictive & unrealistic conditions • Better goal is to “usually improve” 3 4 Traditional Three-pass Compiler The Optimizer (or Middle End) IR Source Front Middle IR Back Machine IROpt IROpt IROpt ... IROpt IR Code End End End code 1 2 3 n Errors Errors Modern optimizers are structured as a series of passes Middle End does Code Improvement (Optimization) • Analyzes IR and rewrites (or transforms ) IR Typical Transformations • Primary goal is to reduce running time of the • Discover & propagate some constant value compiled code • Move a computation to a less frequently executed place – May also improve space, power consumption, … • Specialize some computation based on context • Must preserve “meaning” of the code • Discover a redundant computation & remove it – Measured by values of named variables • Remove useless or unreachable code • Encode an idiom in some particularly efficient form – A course (or two) unto itself 5 6 1 Kinds of optimizations Peephole Optimization Optimizations are characterized by which Transformation over what Scope. Typical scopes are: After target code generation, look at adjacent peephole : instructions (a “peephole” on the code stream) – look at adjacent instructions – try to replace adjacent instructions with something • local : faster – look at straight-line sequence of statements Example: • global (intraprocedural): movl %eax, 12(%ebp) – look at entire procedure movl 12(%ebp), %ebx • whole program (interprocedural): => – look across procedures movl %eax, 12(%ebp) Larger scope => better optimization but more cost and movl %eax , %ebx complexity 7 8 Algebraic Simplification Local Optimizations “constant folding”, “strength reduction” Analysis and optimizations within a basic block z = 3 + 4; z = x + 0; z = x * 1; • Basic block: straight-line sequence of statements – no control flow into or out of middle of sequence z = x * 2; • Better than peephole z = x * 8; • Not too hard to implement z = x / 8; double x, y, z; Machine-independent, if done on intermediate code z = (x + y) - y; Can be done by peephole optimizer, or by code generator 9 10 Local Constant Propagation Local Dead Assignment (Store) Elimination If variable assigned a constant value, replace downstream If l.h.s. of assignment never referenced again before uses of the variable with the constant. being overwritten, then can delete assignment. Example: Can enable more constant folding final int count = 10; Example: ... Primary use: clean-up after final int count = 10; x = count * 5; previous optimizations! ... y = x ^ 3; x = count * 5; x = 7; y = x ^ 3; Intermediate code after constant propagation: Unoptimized intermediate code: t1 = 10; t1 = 10; t2 = 5; t2 = 5; t3 = 50; t3 = t1 * t2; x = 50; x = t3; t4 = 50; t4 = x; t5 = 3; t5 = 3; t6 = 125000; t6 = exp(t4, t5); y = 125000; y = t6; x = 7; 11 12 2 Local Common Subexpression Elimination Redundancy Elimination Implementation (AKA Redundancy Elimination) An expression x+y is redundant if and only if, along every Avoid repeating the same calculation path from the procedure’s entry, it has been evaluated, and its • CSE of repeated loads: redundant load elimination constituent subexpressions (x & y) have not been re-defined. Keep track of available expressions Source: If the compiler can prove that an expression is redundant ... a[i] + b[i] ... • It can preserve the results of earlier evaluations Unoptimized intermediate code: • It can replace the current evaluation with a reference t1 = *(fp + ioffset); t2 = t1 * 4; t3 = fp + t2; Two pieces to the problem t4 = *(t3 + aoffset); • Proving that x+y is redundant t5 = *(fp + ioffset); • Rewriting the code to eliminate the redundant evaluation t6 = t5 * 4; t7 = fp + t6; One technique for accomplishing both is called value numbering t8 = *(t7 + boffset); t9 = t4 + t8; 13 14 Value Numbering (An old idea) Local Value Numbering The Algorithm The key notion ( Balke 1968 or Ershov 1954 ) For each operation o = <operator, o 1, o 2> in the block • Assign an identifying number, V(n), to each expression 1 Get value numbers for operands from hash lookup – V(x+y) = V(j) iff x+y and j have the same value ∀ path 2 Hash <operator,VN(o 1),VN(o 2)> to get a value – Use hashing over the value numbers to make it efficient number for o • Use these numbers to improve the code 3 If o already had a value number, replace o with a reference Improving the code 4 If o 1 & o 2 are constant, evaluate it & replace with a • Replace redundant expressions load • Simplify algebraic identities • Discover constant-valued expressions, fold & If hashing behaves, the algorithm runs in linear time propagate them Handling algebraic identities This technique was invented for low-level, linear IR s • Case statement on operator type Equivalent methods exist for trees ( build a DAG ) 15 • Handle special cases within each operator 16 Local Value Numbering Local Value Numbering Example Example ( continued ) Original Code With VNs Rewritten Original Code With VNs Rewritten ← 3 ← 1 2 3 ← 1 2 ← 3 ← 1 2 3 ← 1 2 a x + y a x + y a x + y a0 x0 + y 0 a0 x0 + y 0 a0 x0 + y 0 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 b x + y b x + y b a b0 x0 + y 0 b0 x0 + y 0 b0 a0 ← 4 ← 4 ← ← 4 ← 4 ← a 17 a 17 a 17 a1 17 a1 17 a1 17 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 ∗ ← ∗ 3 ← 1 2 ∗ 3 ← 3 c x + y c x + y c a (oops!) c0 x0 + y 0 c0 x0 + y 0 c0 a0 Two redundancies: Options: Renaming: Notation: Result: 3 3 3 • Eliminate stmts • Use c ← b • Give each value a • While complex, • a0 is available ∗ with a • Save a 3 in t 3 unique name the meaning is • Rewriting just clear • Coalesce results ? • Rename around it • Makes it clear works 17 18 3 Simple Extensions to Value Numbering Intraprocedural (Global) optimizations Constant folding • Add a bit that records when a value is constant • Enlarge scope of analysis to entire procedure • Evaluate constant values at compile-time – more opportunities for optimization • Replace with load immediate or immediate operand – have to deal with branches, merges, and loops • No stronger local algorithm Identities: • Can do constant propagation, common x←←←y, x+0, x-0, x ∗∗∗1, x÷1, x-x, subexpression elimination, etc. at global level x∗∗∗0, x ÷÷÷x, x ∨∨∨0, x ∧∧∧ 0x FF …FF , max(x, MAXINT ), min(x, MININT ), max(x,x), min(y,y), and so on ... • Can do new things, e.g. loop optimizations Algebraic identities • Must check (many) special cases Optimizing compilers usually work at this level • Replace result with input VN With values, not names • Build a decision tree on operation 19 20 Intraprocedural (Global) Optimizations Control-flow Graph Two data structures are commonly used to help analyze the of procedure body. Models the transfer of control in the procedure Control flow graph (CFG) captures flow of control • Nodes in the graph are basic blocks – nodes are IL statements, or whole basic blocks – Can be represented with quads or any other linear – edges represent control flow representation – node with multiple successors = branch/switch – node with multiple predecessors = merge • Edges in the graph represent control flow – loop in graph = loop if (x = y) Data flow graph (DFG) capture flow of data Example A common one is def/use chains: ← ← – nodes are def(inition)s and uses a 2 a 3 b ← 5 b ← 4 – edge from def to use – a def can reach multiple uses ← – a use can have multiple reaching defs c a * b 21 22 Data-flow Graph – Use/Def Chains Analysis and Transformation Models the transfer of data in the procedure Each optimization is made up of – some number of analyses • Nodes in the graph are definitions and uses – followed by a transformation • Edges in the graph represent data flow Analyze CFG and/or DFG by propagating info forward or backward along CFG and/or DFG edges if (x = y) – edges called program points Example – merges in graph require combining info a ← 2 – loops in graph require iterative approximation b ← 5 b Perform improving transformations based on info a ← 3 computed a ← b a b 4 – have to wait until any iterative approximation has converged Analysis must be conservative/safe/sound so that c ← a * b transformations preserve program behavior 23 24 4 Data-flow Analysis Data-flow Analysis Data-flow analysis is a collection of techniques for Limitations compile-time reasoning about the run-time flow of values 1. Precision – “up to symbolic execution” • Almost always involves building a graph – Assume all paths are taken – Problems are trivial on a basic block 2.
Recommended publications
  • Efficient Run Time Optimization with Static Single Assignment
    i Jason W. Kim and Terrance E. Boult EECS Dept. Lehigh University Room 304 Packard Lab. 19 Memorial Dr. W. Bethlehem, PA. 18015 USA ¢ jwk2 ¡ tboult @eecs.lehigh.edu Abstract We introduce a novel optimization engine for META4, a new object oriented language currently under development. It uses Static Single Assignment (henceforth SSA) form coupled with certain reasonable, albeit very uncommon language features not usually found in existing systems. This reduces the code footprint and increased the optimizer’s “reuse” factor. This engine performs the following optimizations; Dead Code Elimination (DCE), Common Subexpression Elimination (CSE) and Constant Propagation (CP) at both runtime and compile time with linear complexity time requirement. CP is essentially free, whether the values are really source-code constants or specific values generated at runtime. CP runs along side with the other optimization passes, thus allowing the efficient runtime specialization of the code during any point of the program’s lifetime. 1. Introduction A recurring theme in this work is that powerful expensive analysis and optimization facilities are not necessary for generating good code. Rather, by using information ignored by previous work, we have built a facility that produces good code with simple linear time algorithms. This report will focus on the optimization parts of the system. More detailed reports on META4?; ? as well as the compiler are under development. Section 0.2 will introduce the scope of the optimization algorithms presented in this work. Section 1. will dicuss some of the important definitions and concepts related to the META4 programming language and the optimizer used by the algorithms presented herein.
    [Show full text]
  • CS153: Compilers Lecture 19: Optimization
    CS153: Compilers Lecture 19: Optimization Stephen Chong https://www.seas.harvard.edu/courses/cs153 Contains content from lecture notes by Steve Zdancewic and Greg Morrisett Announcements •HW5: Oat v.2 out •Due in 2 weeks •HW6 will be released next week •Implementing optimizations! (and more) Stephen Chong, Harvard University 2 Today •Optimizations •Safety •Constant folding •Algebraic simplification • Strength reduction •Constant propagation •Copy propagation •Dead code elimination •Inlining and specialization • Recursive function inlining •Tail call elimination •Common subexpression elimination Stephen Chong, Harvard University 3 Optimizations •The code generated by our OAT compiler so far is pretty inefficient. •Lots of redundant moves. •Lots of unnecessary arithmetic instructions. •Consider this OAT program: int foo(int w) { var x = 3 + 5; var y = x * w; var z = y - 0; return z * 4; } Stephen Chong, Harvard University 4 Unoptimized vs. Optimized Output .globl _foo _foo: •Hand optimized code: pushl %ebp movl %esp, %ebp _foo: subl $64, %esp shlq $5, %rdi __fresh2: movq %rdi, %rax leal -64(%ebp), %eax ret movl %eax, -48(%ebp) movl 8(%ebp), %eax •Function foo may be movl %eax, %ecx movl -48(%ebp), %eax inlined by the compiler, movl %ecx, (%eax) movl $3, %eax so it can be implemented movl %eax, -44(%ebp) movl $5, %eax by just one instruction! movl %eax, %ecx addl %ecx, -44(%ebp) leal -60(%ebp), %eax movl %eax, -40(%ebp) movl -44(%ebp), %eax Stephen Chong,movl Harvard %eax,University %ecx 5 Why do we need optimizations? •To help programmers… •They write modular, clean, high-level programs •Compiler generates efficient, high-performance assembly •Programmers don’t write optimal code •High-level languages make avoiding redundant computation inconvenient or impossible •e.g.
    [Show full text]
  • 9. Optimization
    9. Optimization Marcus Denker Optimization Roadmap > Introduction > Optimizations in the Back-end > The Optimizer > SSA Optimizations > Advanced Optimizations © Marcus Denker 2 Optimization Roadmap > Introduction > Optimizations in the Back-end > The Optimizer > SSA Optimizations > Advanced Optimizations © Marcus Denker 3 Optimization Optimization: The Idea > Transform the program to improve efficiency > Performance: faster execution > Size: smaller executable, smaller memory footprint Tradeoffs: 1) Performance vs. Size 2) Compilation speed and memory © Marcus Denker 4 Optimization No Magic Bullet! > There is no perfect optimizer > Example: optimize for simplicity Opt(P): Smallest Program Q: Program with no output, does not stop Opt(Q)? © Marcus Denker 5 Optimization No Magic Bullet! > There is no perfect optimizer > Example: optimize for simplicity Opt(P): Smallest Program Q: Program with no output, does not stop Opt(Q)? L1 goto L1 © Marcus Denker 6 Optimization No Magic Bullet! > There is no perfect optimizer > Example: optimize for simplicity Opt(P): Smallest ProgramQ: Program with no output, does not stop Opt(Q)? L1 goto L1 Halting problem! © Marcus Denker 7 Optimization Another way to look at it... > Rice (1953): For every compiler there is a modified compiler that generates shorter code. > Proof: Assume there is a compiler U that generates the shortest optimized program Opt(P) for all P. — Assume P to be a program that does not stop and has no output — Opt(P) will be L1 goto L1 — Halting problem. Thus: U does not exist. > There will
    [Show full text]
  • Precise Null Pointer Analysis Through Global Value Numbering
    Precise Null Pointer Analysis Through Global Value Numbering Ankush Das1 and Akash Lal2 1 Carnegie Mellon University, Pittsburgh, PA, USA 2 Microsoft Research, Bangalore, India Abstract. Precise analysis of pointer information plays an important role in many static analysis tools. The precision, however, must be bal- anced against the scalability of the analysis. This paper focusses on improving the precision of standard context and flow insensitive alias analysis algorithms at a low scalability cost. In particular, we present a semantics-preserving program transformation that drastically improves the precision of existing analyses when deciding if a pointer can alias Null. Our program transformation is based on Global Value Number- ing, a scheme inspired from compiler optimization literature. It allows even a flow-insensitive analysis to make use of branch conditions such as checking if a pointer is Null and gain precision. We perform experiments on real-world code and show that the transformation improves precision (in terms of the number of dereferences proved safe) from 86.56% to 98.05%, while incurring a small overhead in the running time. Keywords: Alias Analysis, Global Value Numbering, Static Single As- signment, Null Pointer Analysis 1 Introduction Detecting and eliminating null-pointer exceptions is an important step towards developing reliable systems. Static analysis tools that look for null-pointer ex- ceptions typically employ techniques based on alias analysis to detect possible aliasing between pointers. Two pointer-valued variables are said to alias if they hold the same memory location during runtime. Statically, aliasing can be de- cided in two ways: (a) may-alias [1], where two pointers are said to may-alias if they can point to the same memory location under some possible execution, and (b) must-alias [27], where two pointers are said to must-alias if they always point to the same memory location under all possible executions.
    [Show full text]
  • Control Flow Analysis in Scheme
    Control Flow Analysis in Scheme Olin Shivers Carnegie Mellon University [email protected] Abstract Fortran). Representative texts describing these techniques are [Dragon], and in more detail, [Hecht]. Flow analysis is perhaps Traditional ¯ow analysis techniques, such as the ones typically the chief tool in the optimising compiler writer's bag of tricks; employed by optimising Fortran compilers, do not work for an incomplete list of the problems that can be addressed with Scheme-like languages. This paper presents a ¯ow analysis ¯ow analysis includes global constant subexpression elimina- technique Ð control ¯ow analysis Ð which is applicable to tion, loop invariant detection, redundant assignment detection, Scheme-like languages. As a demonstration application, the dead code elimination, constant propagation, range analysis, information gathered by control ¯ow analysis is used to per- code hoisting, induction variable elimination, copy propaga- form a traditional ¯ow analysis problem, induction variable tion, live variable analysis, loop unrolling, and loop jamming. elimination. Extensions and limitations are discussed. However, these traditional ¯ow analysis techniques have The techniques presented in this paper are backed up by never successfully been applied to the Lisp family of computer working code. They are applicable not only to Scheme, but languages. This is a curious omission. The Lisp community also to related languages, such as Common Lisp and ML. has had suf®cient time to consider the problem. Flow analysis dates back at least to 1960, ([Dragon], pp. 516), and Lisp is 1 The Task one of the oldest computer programming languages currently in use, rivalled only by Fortran and COBOL. Flow analysis is a traditional optimising compiler technique Indeed, the Lisp community has long been concerned with for determining useful information about a program at compile the execution speed of their programs.
    [Show full text]
  • Value Numbering
    SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 0(0), 1–18 (MONTH 1900) Value Numbering PRESTON BRIGGS Tera Computer Company, 2815 Eastlake Avenue East, Seattle, WA 98102 AND KEITH D. COOPER L. TAYLOR SIMPSON Rice University, 6100 Main Street, Mail Stop 41, Houston, TX 77005 SUMMARY Value numbering is a compiler-based program analysis method that allows redundant computations to be removed. This paper compares hash-based approaches derived from the classic local algorithm1 with partitioning approaches based on the work of Alpern, Wegman, and Zadeck2. Historically, the hash-based algorithm has been applied to single basic blocks or extended basic blocks. We have improved the technique to operate over the routine’s dominator tree. The partitioning approach partitions the values in the routine into congruence classes and removes computations when one congruent value dominates another. We have extended this technique to remove computations that define a value in the set of available expressions (AVA IL )3. Also, we are able to apply a version of Morel and Renvoise’s partial redundancy elimination4 to remove even more redundancies. The paper presents a series of hash-based algorithms and a series of refinements to the partitioning technique. Within each series, it can be proved that each method discovers at least as many redundancies as its predecessors. Unfortunately, no such relationship exists between the hash-based and global techniques. On some programs, the hash-based techniques eliminate more redundancies than the partitioning techniques, while on others, partitioning wins. We experimentally compare the improvements made by these techniques when applied to real programs. These results will be useful for commercial compiler writers who wish to assess the potential impact of each technique before implementation.
    [Show full text]
  • Language and Compiler Support for Dynamic Code Generation by Massimiliano A
    Language and Compiler Support for Dynamic Code Generation by Massimiliano A. Poletto S.B., Massachusetts Institute of Technology (1995) M.Eng., Massachusetts Institute of Technology (1995) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 1999 © Massachusetts Institute of Technology 1999. All rights reserved. A u th or ............................................................................ Department of Electrical Engineering and Computer Science June 23, 1999 Certified by...............,. ...*V .,., . .* N . .. .*. *.* . -. *.... M. Frans Kaashoek Associate Pro essor of Electrical Engineering and Computer Science Thesis Supervisor A ccepted by ................ ..... ............ ............................. Arthur C. Smith Chairman, Departmental CommitteA on Graduate Students me 2 Language and Compiler Support for Dynamic Code Generation by Massimiliano A. Poletto Submitted to the Department of Electrical Engineering and Computer Science on June 23, 1999, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Dynamic code generation, also called run-time code generation or dynamic compilation, is the cre- ation of executable code for an application while that application is running. Dynamic compilation can significantly improve the performance of software by giving the compiler access to run-time infor- mation that is not available to a traditional static compiler. A well-designed programming interface to dynamic compilation can also simplify the creation of important classes of computer programs. Until recently, however, no system combined efficient dynamic generation of high-performance code with a powerful and portable language interface. This thesis describes a system that meets these requirements, and discusses several applications of dynamic compilation.
    [Show full text]
  • Strength Reduction of Induction Variables and Pointer Analysis – Induction Variable Elimination
    Loop optimizations • Optimize loops – Loop invariant code motion [last time] Loop Optimizations – Strength reduction of induction variables and Pointer Analysis – Induction variable elimination CS 412/413 Spring 2008 Introduction to Compilers 1 CS 412/413 Spring 2008 Introduction to Compilers 2 Strength Reduction Induction Variables • Basic idea: replace expensive operations (multiplications) with • An induction variable is a variable in a loop, cheaper ones (additions) in definitions of induction variables whose value is a function of the loop iteration s = 3*i+1; number v = f(i) while (i<10) { while (i<10) { j = 3*i+1; //<i,3,1> j = s; • In compilers, this a linear function: a[j] = a[j] –2; a[j] = a[j] –2; i = i+2; i = i+2; f(i) = c*i + d } s= s+6; } •Observation:linear combinations of linear • Benefit: cheaper to compute s = s+6 than j = 3*i functions are linear functions – s = s+6 requires an addition – Consequence: linear combinations of induction – j = 3*i requires a multiplication variables are induction variables CS 412/413 Spring 2008 Introduction to Compilers 3 CS 412/413 Spring 2008 Introduction to Compilers 4 1 Families of Induction Variables Representation • Basic induction variable: a variable whose only definition in the • Representation of induction variables in family i by triples: loop body is of the form – Denote basic induction variable i by <i, 1, 0> i = i + c – Denote induction variable k=i*a+b by triple <i, a, b> where c is a loop-invariant value • Derived induction variables: Each basic induction variable i defines
    [Show full text]
  • Global Value Numbering Using Random Interpretation
    Global Value Numbering using Random Interpretation Sumit Gulwani George C. Necula [email protected] [email protected] Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94720-1776 Abstract General Terms We present a polynomial time randomized algorithm for global Algorithms, Theory, Verification value numbering. Our algorithm is complete when conditionals are treated as non-deterministic and all operators are treated as uninter- Keywords preted functions. We are not aware of any complete polynomial- time deterministic algorithm for the same problem. The algorithm Global Value Numbering, Herbrand Equivalences, Random Inter- does not require symbolic manipulations and hence is simpler to pretation, Randomized Algorithm, Uninterpreted Functions implement than the deterministic symbolic algorithms. The price for these benefits is that there is a probability that the algorithm can report a false equality. We prove that this probability can be made 1 Introduction arbitrarily small by controlling various parameters of the algorithm. Detecting equivalence of expressions in a program is a prerequi- Our algorithm is based on the idea of random interpretation, which site for many important optimizations like constant and copy prop- relies on executing a program on a number of random inputs and agation [18], common sub-expression elimination, invariant code discovering relationships from the computed values. The computa- motion [3, 13], induction variable elimination, branch elimination, tions are done by giving random linear interpretations to the opera- branch fusion, and loop jamming [10]. It is also important for dis- tors in the program. Both branches of a conditional are executed. At covering equivalent computations in different programs, for exam- join points, the program states are combined using a random affine ple, plagiarism detection and translation validation [12, 11], where combination.
    [Show full text]
  • Transparent Dynamic Optimization: the Design and Implementation of Dynamo
    Transparent Dynamic Optimization: The Design and Implementation of Dynamo Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia HP Laboratories Cambridge HPL-1999-78 June, 1999 E-mail: [email protected] dynamic Dynamic optimization refers to the runtime optimization optimization, of a native program binary. This report describes the compiler, design and implementation of Dynamo, a prototype trace selection, dynamic optimizer that is capable of optimizing a native binary translation program binary at runtime. Dynamo is a realistic implementation, not a simulation, that is written entirely in user-level software, and runs on a PA-RISC machine under the HPUX operating system. Dynamo does not depend on any special programming language, compiler, operating system or hardware support. Contrary to intuition, we demonstrate that it is possible to use a piece of software to improve the performance of a native, statically optimized program binary, while it is executing. Dynamo not only speeds up real application programs, its performance improvement is often quite significant. For example, the performance of many +O2 optimized SPECint95 binaries running under Dynamo is comparable to the performance of their +O4 optimized version running without Dynamo. Internal Accession Date Only Ó Copyright Hewlett-Packard Company 1999 Contents 1 INTRODUCTION ........................................................................................... 7 2 RELATED WORK ......................................................................................... 9 3 OVERVIEW
    [Show full text]
  • Optimizing for Reduced Code Space Using Genetic Algorithms
    Optimizing for Reduced Code Space using Genetic Algorithms Keith D. Cooper, Philip J. Schielke, and Devika Subramanian Department of Computer Science Rice University Houston, Texas, USA {keith | phisch | devika}@cs.rice.edu Abstract is virtually impossible to select the best set of optimizations to run on a particular piece of code. Historically, compiler Code space is a critical issue facing designers of software for embedded systems. Many traditional compiler optimiza- writers have made one of two assumptions. Either a fixed tions are designed to reduce the execution time of compiled optimization order is “good enough” for all programs, or code, but not necessarily the size of the compiled code. Fur- giving the user a large set of flags that control optimization ther, different results can be achieved by running some opti- is sufficient, because it shifts the burden onto the user. mizations more than once and changing the order in which optimizations are applied. Register allocation only com- Interplay between optimizations occurs frequently. A plicates matters, as the interactions between different op- transformation can create opportunities for other transfor- timizations can cause more spill code to be generated. The mations. Similarly, a transformation can eliminate oppor- compiler for embedded systems, then, must take care to use tunities for other transformations. These interactions also the best sequence of optimizations to minimize code space. Since much of the code for embedded systems is compiled depend on the program being compiled, and they are of- once and then burned into ROM, the software designer will ten difficult to predict. Multiple applications of the same often tolerate much longer compile times in the hope of re- transformation at different points in the optimization se- ducing the size of the compiled code.
    [Show full text]
  • Code Optimization
    Code Optimization Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon’s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Traditional Three-Phase Compiler Source Front IR IR Back Machine Optimizer Code End End code Errors Optimization (or Code Improvement) • Analyzes IR and rewrites (or transforms) IR • Primary goal is to reduce running time of the compiled code — May also improve space, power consumption, … • Must preserve “meaning” of the code — Measured by values of named variables — A course (or two) unto itself 1 1 The Optimizer IR Opt IR Opt IR Opt IR... Opt IR 1 2 3 n Errors Modern optimizers are structured as a series of passes Typical Transformations • Discover & propagate some constant value • Move a computation to a less frequently executed place • Specialize some computation based on context • Discover a redundant computation & remove it • Remove useless or unreachable code • Encode an idiom in some particularly efficient form 2 The Role of the Optimizer • The compiler can implement a procedure in many ways • The optimizer tries to find an implementation that is “better” — Speed, code size, data space, … To accomplish this, it • Analyzes the code to derive knowledge
    [Show full text]