Optimizations Correct Program in High IR IR Lowering Program in Low IR

Total Page:16

File Type:pdf, Size:1020Kb

Optimizations Correct Program in High IR IR Lowering Program in Low IR Where We Are Source code (character stream) if (b == 0) a = b; Lexical Analysis Syntax Analysis Errors Semantic Analysis IR Generation Optimizations Correct program In High IR IR Lowering Program In Low IR 1 2 What Next? Optimizations • At this point we could generate assembly code Source code (character stream) from the low-level IR if (b == 0) a = b; Lexical Analysis Syntax Analysis Errors • Better: Semantic Analysis – Optimize the program first IR Generation – Then generate code Correct program Optimize In High IR • If optimization performed at the IR level, then IR Lowering they apply to all target machines Program In Low IR Optimize 3 4 1 What are Optimizations? Why Optimize? • Optimizations = code transformations that • Programmers don’t always write optimal code – improve the program can recognize ways to improve code (e.g., avoid recomputing same expression) • Different kinds • High-level language may make some – space optimizations: improve (reduce) memory use oppptimizations inconvenient or impossible to express – time optimizations: improve (reduce) execution time a[ i ][ j ] = a[ i ][ j ] + 1; • Code transformations must be safe! • High-level unoptimized code may be more – They must preserve the meaning of the program readable: cleaner, modular int square(x) { return x*x; } 5 6 Where to Optimize? Many Possible Optimizations • Usual goal: improve time performance • Many ways to optimize a program • Problem: many optimizations trade off space • Some of the most common optimizations: versus time Function Inlining Function Cloning • Example: loop unrolling Constant folding – Increases code spp,pace, speeds up one loop Constant propagation – Frequently executed code with long loops: Dead code elimination Loop-invariant code motion space/time tradeoff is generally a win Common sub-expression elimination – Infrequently executed code: may want to optimize Strength reduction code space at expense of time Branch prediction/optimization Loop unrolling • Want to optimize program hot spots 7 8 2 Constant Propagation Constant Folding • If value of variable is known to be a constant, replace • Evaluate an expression if operands are known use of variable with constant at compile time (i.e., they are constants) •Example: •Example: n = 10 x = 1.1 * 2; ⇒ x = 2.2; c = 2 for (i=0; i<n; i++) { s = s + i*c; } • Performed at every stage of compilation • Replace n, c: – Constants created by translations or optimizations for (i=0; i<10; i++) { s = s + i*2; } int x = a[2] ⇒ t1 = 2*4 • Each variable must be replaced only when it has known constant value: t2 = a + t1 – Forward from a constant assignment x = *t2 – Until next assignment of the variable 9 10 Algebraic Simplification Copy Propagation • More general form of constant folding: take • After assignment x = y, replace uses of x with y advantage of usual simplification rules • Replace until x is assigned again a * 1 ⇒ a a* 0 ⇒ 0 a / 1 ⇒ a a + 0 ⇒ a x = y; x = y; b || false ⇒ b b && true ⇒ b if (x > 1) ⇒ if (y > 1) s = x * f(x - 1); s = y * f(y - 1); • Repeatedly apply the above rules (y*1+0)/1 ⇒ y*1+0 ⇒ y*1 ⇒ y • What if there was an assignment y = z before? – Transitively apply replacements • Must be careful with floating point! 11 12 3 Common Subexpression Elimination Unreachable Code Elimination • If program computes same expression multiple • Eliminate code that is never executed time, can reuse the computed value •Example: #define debug false •Example: s = 1; a = b+c; a = b+c; ⇒ s = 1; if (debug) cb+c;c = b+c; ⇒ ca;c = a; d = b+c; d = b+c; print(“state = ”, s); • Common subexpressions also occur in low-level code in address calculations for array accesses: • Unreachable code may not be obvious in low IR (or in high-level languages with unstructured a[i] = b[i] + 1; “goto” statements) 13 14 Unreachable Code Elimination Dead Code Elimination • Unreachable code in while/if statements when: • If effect of a statement is never observed, – Loop condition is always false (loop never executed) eliminate the statement – Condition of an if statement is always true or always false (only one branch executed) x = y+1; y = 1; ⇒ y = 1; if (false) S ⇒ ; x=2x = 2z;*z; x = 2*z; if (true) S else S’ ⇒ S • Variable is dead if value is never used after if (false) S else S’ ⇒ S’ definition while (false) S ⇒ ; • Eliminate assignments to dead variables while (2>3) S ⇒ ; • Other optimizations may create dead code 15 16 4 Loop Optimizations Loop-Invariant Code Motion • If result of a statement or expression does not • Program hot spots are usually loops change during loop, and it has no externally- (exceptions: OS kernels, compilers) visible side-effect (!), can hoist its computation • Most execution time in most programs is out of the loop spent in loops: 90/10 is typical • Often useful for array element addressing • Loop optimizations are important , effective , computations – invariant code not visible at and numerous source level • Requires analysis to identify loop-invariant expressions 17 18 Code Motion Example Another Example • Identify invariant expression: • Can also hoist statements out of loops • Assume x not updated in the loop body: for(i=0; i<n; i++) a[i] = a[i] + (x*x)/(y*y); … … while (…) { y = x*x; • HitthHoist the express ion ou t o fthlf the loop: y = x*x; ⇒ while ( …) { … … c = (x*x)/(y*y); } } for(i=0; i<n; i++) … … a[i] = a[i] + c; •… Is it safe? 19 20 5 Strength Reduction Strength Reduction • Replaces expensive operations (multiplies, divides) by cheap ones (adds, subtracts) • Can apply strength reduction to • Strength reduction more effective in loops computation other than induction variables: • Induction variable = loop variable whose value is depends linearly on the iteration number x * 2 ⇒ x + x • AlApply strength hd reducti on to i idinduction vari iblables i*2i * 2c ⇒ i << c s = 0; s = 0; v = -4; i / 2c ⇒ i >> c for (i = 0; i < n; i++) { for (i = 0; i < n; i++) { v = 4*i; ⇒ v = v+4; s = s + v; s = s + v; } } 21 22 Induction Variable Elimination Loop Unrolling • If there are multiple induction variables in a loop, can • Execute loop body multiple times at each eliminate the ones that are used only in the test iteration condition • Need to rewrite test using the other induction variables •Example: • Usually applied after strength reduction for (i = 0; i< n; i++) { S } s = 0; v=-4; s = 0; v = -4; • Unroll loop four times: for (i = 0; i < n; i++) { for (; v < (4*n-4);) { for (i = 0; i < n-3; i+=4) { S; S; S; S; } v = v+4; ⇒ v = v+4; for ( ; i < n; i++) S; s = s + v; s = s + v; } } • Gets rid of ¾ of conditional branches! • Space-time tradeoff: program size increases 23 24 6 Function Inlining Function Cloning • Replace a function call with the body of the function: • Create specialized versions of functions that are called from different call sites with different arguments int g(int x) { return f(x)-1; } int f(int n) { int b=1; while (n--) { b = 2*b }; return b; } void f(int x[], int n, int m) { for(int i=0; i<n; i++) { x[i] = x[i] + i*m; } int g(int x) { int r; } int n = x; { int b =1; while (n--) { b = 2*b }; r = b } • For a call f(a, 10, 1), create a specialized version of f: return r – 1; } void f1(int x[]) { for(int i=0; i<10; i++) { x[i] = x[i] + i; } • Can inline methods, but more difficult } • … how about recursive procedures? • For another call f(b, p, 0), create another version f2(…) 25 26 When to Apply Optimizations Summary • Many useful optimizations that can transform Function inlining code to make it faster High IR Function cloning Constant folding Constant propagation Value numbering • Whole is greater than sum of parts: Dead code elimination oppppgtimizations should be applied together, Loop-iitdtiinvariant code motion LIRLow IR Common sub-expression elimination sometimes more than once, at different levels Strength reduction Constant folding & propagation Branch prediction/optimization • Problem: when are optimizations are safe? Loop unrolling Assembly Register allocation Cache optimization 27 28 7.
Recommended publications
  • Loop Pipelining with Resource and Timing Constraints
    UNIVERSITAT POLITÈCNICA DE CATALUNYA LOOP PIPELINING WITH RESOURCE AND TIMING CONSTRAINTS Autor: Fermín Sánchez October, 1995 1 1 1 1 1 1 1 1 2 1• SOFTWARE PIPELINING 1 1 1 2.1 INTRODUCTION 1 Software pipelining [ChaSl] comprises a family of techniques aimed at finding a pipelined schedule of the execution of loop iterations. The pipelined schedule represents a new loop body which may contain instructions belonging to different iterations. The sequential execution of the schedule takes less time than the sequential execution of the iterations of the loop as they were initially written. 1 In general, a pipelined loop schedule has the following characteristics1: • All the iterations (of the new loop) are executed in the same fashion. • The initiation interval (ÍÍ) between the issuing of two consecutive iterations is always the same. 1 Figure 2.1 shows an example of software pipelining a loop. The DDG representing the loop body to pipeline is presented in Figure 2.1(a). The loop must be executed ten times (with an iteration index i G [0,9]). Let us assume that all instructions in the loop are executed in a single cycle 1 and a new iteration may start every cycle (otherwise the dependence between instruction A from 1 We will assume here that the schedule contains a unique iteration of the loop. 1 15 1 1 I I 16 CHAPTER 2 I I time B,, Prol AO Q, BO A, I Q, B, A2 3<i<9 I Steady State D4 B7 I Epilogue Cy I D9 (a) (b) (c) I Figure 2.1 Software pipelining a loop (a) DDG representing a loop body (b) Parallel execution of the loop (c) New parallel loop body I iterations i and i+ 1 will not be honored).- With this assumption,-the loop can be executed in a I more parallel fashion than the sequential one, as shown in Figure 2.1(b).
    [Show full text]
  • A Methodology for Assessing Javascript Software Protections
    A methodology for Assessing JavaScript Software Protections Pedro Fortuna A methodology for Assessing JavaScript Software Protections Pedro Fortuna About me Pedro Fortuna Co-Founder & CTO @ JSCRAMBLER OWASP Member SECURITY, JAVASCRIPT @pedrofortuna 2 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Agenda 1 4 7 What is Code Protection? Testing Resilience 2 5 Code Obfuscation Metrics Conclusions 3 6 JS Software Protections Q & A Checklist 3 What is Code Protection Part 1 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Intellectual Property Protection Legal or Technical Protection? Alice Bob Software Developer Reverse Engineer Sells her software over the Internet Wants algorithms and data structures Does not need to revert back to original source code 5 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Intellectual Property IP Protection Protection Legal Technical Encryption ? Trusted Computing Server-Side Execution Obfuscation 6 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Code Obfuscation Obfuscation “transforms a program into a form that is more difficult for an adversary to understand or change than the original code” [1] More Difficult “requires more human time, more money, or more computing power to analyze than the original program.” [1] in Collberg, C., and Nagra, J., “Surreptitious software: obfuscation, watermarking, and tamperproofing for software protection.”, Addison- Wesley Professional, 2010. 7 A methodology for Assessing
    [Show full text]
  • Polyhedral Compilation As a Design Pattern for Compiler Construction
    Polyhedral Compilation as a Design Pattern for Compiler Construction PLISS, May 19-24, 2019 [email protected] Polyhedra? Example: Tiles Polyhedra? Example: Tiles How many of you read “Design Pattern”? → Tiles Everywhere 1. Hardware Example: Google Cloud TPU Architectural Scalability With Tiling Tiles Everywhere 1. Hardware Google Edge TPU Edge computing zoo Tiles Everywhere 1. Hardware 2. Data Layout Example: XLA compiler, Tiled data layout Repeated/Hierarchical Tiling e.g., BF16 (bfloat16) on Cloud TPU (should be 8x128 then 2x1) Tiles Everywhere Tiling in Halide 1. Hardware 2. Data Layout Tiled schedule: strip-mine (a.k.a. split) 3. Control Flow permute (a.k.a. reorder) 4. Data Flow 5. Data Parallelism Vectorized schedule: strip-mine Example: Halide for image processing pipelines vectorize inner loop https://halide-lang.org Meta-programming API and domain-specific language (DSL) for loop transformations, numerical computing kernels Non-divisible bounds/extent: strip-mine shift left/up redundant computation (also forward substitute/inline operand) Tiles Everywhere TVM example: scan cell (RNN) m = tvm.var("m") n = tvm.var("n") 1. Hardware X = tvm.placeholder((m,n), name="X") s_state = tvm.placeholder((m,n)) 2. Data Layout s_init = tvm.compute((1,n), lambda _,i: X[0,i]) s_do = tvm.compute((m,n), lambda t,i: s_state[t-1,i] + X[t,i]) 3. Control Flow s_scan = tvm.scan(s_init, s_do, s_state, inputs=[X]) s = tvm.create_schedule(s_scan.op) 4. Data Flow // Schedule to run the scan cell on a CUDA device block_x = tvm.thread_axis("blockIdx.x") 5. Data Parallelism thread_x = tvm.thread_axis("threadIdx.x") xo,xi = s[s_init].split(s_init.op.axis[1], factor=num_thread) s[s_init].bind(xo, block_x) Example: Halide for image processing pipelines s[s_init].bind(xi, thread_x) xo,xi = s[s_do].split(s_do.op.axis[1], factor=num_thread) https://halide-lang.org s[s_do].bind(xo, block_x) s[s_do].bind(xi, thread_x) print(tvm.lower(s, [X, s_scan], simple_mode=True)) And also TVM for neural networks https://tvm.ai Tiling and Beyond 1.
    [Show full text]
  • Attacking Client-Side JIT Compilers.Key
    Attacking Client-Side JIT Compilers Samuel Groß (@5aelo) !1 A JavaScript Engine Parser JIT Compiler Interpreter Runtime Garbage Collector !2 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !3 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !4 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !5 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !6 Agenda 1. Background: Runtime Parser • Object representation and Builtins JIT Compiler 2. JIT Compiler Internals • Problem: missing type information • Solution: "speculative" JIT Interpreter 3.
    [Show full text]
  • Research of Register Pressure Aware Loop Unrolling Optimizations for Compiler
    MATEC Web of Conferences 228, 03008 (2018) https://doi.org/10.1051/matecconf/201822803008 CAS 2018 Research of Register Pressure Aware Loop Unrolling Optimizations for Compiler Xuehua Liu1,2, Liping Ding1,3 , Yanfeng Li1,2 , Guangxuan Chen1,4 , Jin Du5 1Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Digital Forensics Lab, Institute of Software Application Technology, Guangzhou & Chinese Academy of Sciences, Guangzhou, China 4Zhejiang Police College, Hangzhou, China 5Yunnan Police College, Kunming, China Abstract. Register pressure problem has been a known problem for compiler because of the mismatch between the infinite number of pseudo registers and the finite number of hard registers. Too heavy register pressure may results in register spilling and then leads to performance degradation. There are a lot of optimizations, especially loop optimizations suffer from register spilling in compiler. In order to fight register pressure and therefore improve the effectiveness of compiler, this research takes the register pressure into account to improve loop unrolling optimization during the transformation process. In addition, a register pressure aware transformation is able to reduce the performance overhead of some fine-grained randomization transformations which can be used to defend against ROP attacks. Experiments showed a peak improvement of about 3.6% and an average improvement of about 1% for SPEC CPU 2006 benchmarks and a peak improvement of about 3% and an average improvement of about 1% for the LINPACK benchmark. 1 Introduction fundamental transformations, because it is not only performed individually at least twice during the Register pressure is the number of hard registers needed compilation process, it is also an important part of to store values in the pseudo registers at given program optimizations like vectorization, module scheduling and point [1] during the compilation process.
    [Show full text]
  • Equality Saturation: a New Approach to Optimization
    Logical Methods in Computer Science Vol. 7 (1:10) 2011, pp. 1–37 Submitted Oct. 12, 2009 www.lmcs-online.org Published Mar. 28, 2011 EQUALITY SATURATION: A NEW APPROACH TO OPTIMIZATION ROSS TATE, MICHAEL STEPP, ZACHARY TATLOCK, AND SORIN LERNER Department of Computer Science and Engineering, University of California, San Diego e-mail address: {rtate,mstepp,ztatlock,lerner}@cs.ucsd.edu Abstract. Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization. We present a new approach for structuring the optimization phase of a compiler. In our approach, optimizations take the form of equality analyses that add equality information to a common intermediate representation. The op- timizer works by repeatedly applying these analyses to infer equivalences between program fragments, thus saturating the intermediate representation with equalities. Once saturated, the intermediate representation encodes multiple optimized versions of the input program. At this point, a profitability heuristic picks the final optimized program from the various programs represented in the saturated representation. Our proposed way of structuring optimizers has a variety of benefits over previous approaches: our approach obviates the need to worry about optimization ordering, enables the use of a global optimization heuris- tic that selects among fully optimized programs, and can be used to perform translation validation, even on compilers other than our own. We present our approach, formalize it, and describe our choice of intermediate representation. We also present experimental results showing that our approach is practical in terms of time and space overhead, is effective at discovering intricate optimization opportunities, and is effective at performing translation validation for a realistic optimizer.
    [Show full text]
  • CS153: Compilers Lecture 19: Optimization
    CS153: Compilers Lecture 19: Optimization Stephen Chong https://www.seas.harvard.edu/courses/cs153 Contains content from lecture notes by Steve Zdancewic and Greg Morrisett Announcements •HW5: Oat v.2 out •Due in 2 weeks •HW6 will be released next week •Implementing optimizations! (and more) Stephen Chong, Harvard University 2 Today •Optimizations •Safety •Constant folding •Algebraic simplification • Strength reduction •Constant propagation •Copy propagation •Dead code elimination •Inlining and specialization • Recursive function inlining •Tail call elimination •Common subexpression elimination Stephen Chong, Harvard University 3 Optimizations •The code generated by our OAT compiler so far is pretty inefficient. •Lots of redundant moves. •Lots of unnecessary arithmetic instructions. •Consider this OAT program: int foo(int w) { var x = 3 + 5; var y = x * w; var z = y - 0; return z * 4; } Stephen Chong, Harvard University 4 Unoptimized vs. Optimized Output .globl _foo _foo: •Hand optimized code: pushl %ebp movl %esp, %ebp _foo: subl $64, %esp shlq $5, %rdi __fresh2: movq %rdi, %rax leal -64(%ebp), %eax ret movl %eax, -48(%ebp) movl 8(%ebp), %eax •Function foo may be movl %eax, %ecx movl -48(%ebp), %eax inlined by the compiler, movl %ecx, (%eax) movl $3, %eax so it can be implemented movl %eax, -44(%ebp) movl $5, %eax by just one instruction! movl %eax, %ecx addl %ecx, -44(%ebp) leal -60(%ebp), %eax movl %eax, -40(%ebp) movl -44(%ebp), %eax Stephen Chong,movl Harvard %eax,University %ecx 5 Why do we need optimizations? •To help programmers… •They write modular, clean, high-level programs •Compiler generates efficient, high-performance assembly •Programmers don’t write optimal code •High-level languages make avoiding redundant computation inconvenient or impossible •e.g.
    [Show full text]
  • Quaxe, Infinity and Beyond
    Quaxe, infinity and beyond Daniel Glazman — WWX 2015 /usr/bin/whoami Primary architect and developer of the leading Web and Ebook editors Nvu and BlueGriffon Former member of the Netscape CSS and Editor engineering teams Involved in Internet and Web Standards since 1990 Currently co-chair of CSS Working Group at W3C New-comer in the Haxe ecosystem Desktop Frameworks Visual Studio (Windows only) Xcode (OS X only) Qt wxWidgets XUL Adobe Air Mobile Frameworks Adobe PhoneGap/Air Xcode (iOS only) Qt Mobile AppCelerator Visual Studio Two solutions but many issues Fragmentation desktop/mobile Heavy runtimes Can’t easily reuse existing c++ libraries Complex to have native-like UI Qt/QtMobile still require c++ Qt’s QML is a weak and convoluted UI language Haxe 9 years success of Multiplatform OSS language Strong affinity to gaming Wide and vibrant community Some press recognition Dead code elimination Compiles to native on all But no native GUI… platforms through c++ and java Best of all worlds Haxe + Qt/QtMobile Multiplatform Native apps, native performance through c++/Java C++/Java lib reusability Introducing Quaxe Native apps w/o c++ complexity Highly dynamic applications on desktop and mobile Native-like UI through Qt HTML5-based UI, CSS-based styling Benefits from Haxe and Qt communities Going from HTML5 to native GUI completeness DOM dynamism in native UI var b: Element = document.getElementById("thirdButton"); var t: Element = document.createElement("input"); t.setAttribute("type", "text"); t.setAttribute("value", "a text field"); b.parentNode.insertBefore(t,
    [Show full text]
  • Scalable Conditional Induction Variables (CIV) Analysis
    Scalable Conditional Induction Variables (CIV) Analysis ifact Cosmin E. Oancea Lawrence Rauchwerger rt * * Comple A t te n * A te s W i E * s e n l C l o D C O Department of Computer Science Department of Computer Science and Engineering o * * c u e G m s E u e C e n R t v e o d t * y * s E University of Copenhagen Texas A & M University a a l d u e a [email protected] [email protected] t Abstract k = k0 Ind. k = k0 DO i = 1, N DO i =1,N Var. DO i = 1, N IF(cond(b(i)))THEN Subscripts using induction variables that cannot be ex- k = k+2 ) a(k0+2*i)=.. civ = civ+1 )? pressed as a formula in terms of the enclosing-loop indices a(k)=.. Sub. ENDDO a(civ) = ... appear in the low-level implementation of common pro- ENDDO k=k0+MAX(2N,0) ENDIF ENDDO gramming abstractions such as filter, or stack operations and (a) (b) (c) pose significant challenges to automatic parallelization. Be- Figure 1. Loops with affine and CIV array accesses. cause the complexity of such induction variables is often due to their conditional evaluation across the iteration space of its closed-form equivalent k0+2*i, which enables its in- loops we name them Conditional Induction Variables (CIV). dependent evaluation by all iterations. More importantly, This paper presents a flow-sensitive technique that sum- the resulted code, shown in Figure 1(b), allows the com- marizes both such CIV-based and affine subscripts to pro- piler to verify that the set of points written by any dis- gram level, using the same representation.
    [Show full text]
  • Induction Variable Analysis with Delayed Abstractions1
    Induction Variable Analysis with Delayed Abstractions1 SEBASTIAN POP, and GEORGES-ANDRE´ SILBER CRI, Mines Paris, France and ALBERT COHEN ALCHEMY group, INRIA Futurs, Orsay, France We present the design of an induction variable analyzer suitable for the analysis of typed, low-level, three address representations in SSA form. At the heart of our analyzer stands a new algorithm that recognizes scalar evolutions. We define a representation called trees of recurrences that is able to capture different levels of abstractions: from the finer level that is a subset of the SSA representation restricted to arithmetic operations on scalar variables, to the coarser levels such as the evolution envelopes that abstract sets of possible evolutions in loops. Unlike previous work, our algorithm tracks induction variables without prior classification of a few evolution patterns: different levels of abstraction can be obtained on demand. The low complexity of the algorithm fits the constraints of a production compiler as illustrated by the evaluation of our implementation on standard benchmark programs. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors—compilers, interpreters, optimization, retargetable compilers; F.3.2 [Logics and Meanings of Programs]: Semantics of Programming Languages—partial evaluation, program analysis General Terms: Compilers Additional Key Words and Phrases: Scalar evolutions, static analysis, static single assignment representation, assessing compilers heuristics regressions. 1Extension of Conference Paper:
    [Show full text]
  • Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗
    Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗ David Leopoldseder Roland Schatz Lukas Stadler Johannes Kepler University Linz Oracle Labs Oracle Labs Austria Linz, Austria Linz, Austria [email protected] [email protected] [email protected] Manuel Rigger Thomas Würthinger Hanspeter Mössenböck Johannes Kepler University Linz Oracle Labs Johannes Kepler University Linz Austria Zurich, Switzerland Austria [email protected] [email protected] [email protected] ABSTRACT Non-Counted Loops to Enable Subsequent Compiler Optimizations. In 15th Java programs can contain non-counted loops, that is, loops for International Conference on Managed Languages & Runtimes (ManLang’18), September 12–14, 2018, Linz, Austria. ACM, New York, NY, USA, 13 pages. which the iteration count can neither be determined at compile https://doi.org/10.1145/3237009.3237013 time nor at run time. State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often involves 1 INTRODUCTION duplicating also a loop’s exit condition, which thus only improves run-time performance if subsequent compiler optimizations can Generating fast machine code for loops depends on a selective appli- optimize the unrolled code. cation of different loop optimizations on the main optimizable parts This paper presents an unrolling approach for non-counted loops of a loop: the loop’s exit condition(s), the back edges and the loop that uses simulation at run time to determine whether unrolling body. All these parts must be optimized to generate optimal code such loops enables subsequent compiler optimizations. Simulat- for a loop.
    [Show full text]
  • Cross-Platform Language Design
    Cross-Platform Language Design THIS IS A TEMPORARY TITLE PAGE It will be replaced for the final print by a version provided by the service academique. Thèse n. 1234 2011 présentée le 11 Juin 2018 à la Faculté Informatique et Communications Laboratoire de Méthodes de Programmation 1 programme doctoral en Informatique et Communications École Polytechnique Fédérale de Lausanne pour l’obtention du grade de Docteur ès Sciences par Sébastien Doeraene acceptée sur proposition du jury: Prof James Larus, président du jury Prof Martin Odersky, directeur de thèse Prof Edouard Bugnion, rapporteur Dr Andreas Rossberg, rapporteur Prof Peter Van Roy, rapporteur Lausanne, EPFL, 2018 It is better to repent a sin than regret the loss of a pleasure. — Oscar Wilde Acknowledgments Although there is only one name written in a large font on the front page, there are many people without which this thesis would never have happened, or would not have been quite the same. Five years is a long time, during which I had the privilege to work, discuss, sing, learn and have fun with many people. I am afraid to make a list, for I am sure I will forget some. Nevertheless, I will try my best. First, I would like to thank my advisor, Martin Odersky, for giving me the opportunity to fulfill a dream, that of being part of the design and development team of my favorite programming language. Many thanks for letting me explore the design of Scala.js in my own way, while at the same time always being there when I needed him.
    [Show full text]