Compilers Optimization

Total Page:16

File Type:pdf, Size:1020Kb

Compilers Optimization Compilers Optimization Yannis Smaragdakis, U. Athens (or ig ina l s lides by Sam Guyer @Tu fts ) What we already saw z Lowering FlFrom language-lllevel cons ttttructs to mac hine-lllevel cons tttructs z At this point we could generate machine code z Output of lowering is a correct translation z What’s left to do? z Map from lower-level IR to actual ISA z Maybe some register management (could be required) z Pass off to assembler z Whyyp have a separate assembler? z Handles “packing the bits” Assembly addi <target>, <source>, <value> MhiMachine 0010 00ss ssst tttt iiii iiii iiii iiii 2 But first… z The compiler “understands” the program z IR captures program semantics z Lowering: semantics-preserving transformation z Why not d o oth ers ? z Compppiler optimizations z Oh great, now my program will be optimal! z Sorry, it’s a misnomer z What is an “optimization”? 3 Optimizations z What are they? z Code transformations z Improve some metric z Metrics z Performance: time, instructions, cycles AthAre these me titrics equ ilt?ivalent? z Memory z Memory hierarchy (reduce cache misses) z Reduce memory usage z Code Size z Energy 4 Big picture Errors Semantic chars Scanner tokens Parser AST checker Optimizations Checked Instruction IR lowering AST tuples selection assembly Register assembly Machine Assembler ∞ registers allocation k registers code 5 Big picture Errors Optimizations Semantic chars Scanner tokens Parser AST checker Checked Instruction IR lowering AST tuples selection assembly Register assembly Machine Assembler ∞ registers allocation k registers code 6 Why optimize? z High-level constructs may make some A[i][j] = A[i][j-1]+11] + 1 optimizations difficult t = A + i*row + j or impossible: ssoj = A + i*row + j – 1 (*t) = (*s) + 1 z High-level code may be more desirable z Program at high level z Focus on design; clean, modular implementation z Let compiler worry about gory details z Premature optimization is the root of all evil! 7 Limitations z What are optimizers good at? z Consistent and thorough z Find all opportunities for an optimization z Uniformly apply the transformation z What are they not good at? z Asyypmptotic comp lexit y z Compilers can’t fix bad algorithms z Compilers can’t fix bad data structures z There’s no magic 8 Requirements z Safety z Preserve the semantics of the program z What does that mean? z Profitability z Will it help our metric? z Do we need a guarantee of improvement? z Risk z How will interact with other optimizations? z How will it affect other stages of compilation? 9 Example: loop unrolling for ((;i=0; i<100; i++ ) for ((;i=0; i<25; i++ ){) { *t++ = *s++; *t++ = *s++; *t++ = *s++; *t++ = *s++; *t++ = *s++; } z Safety: z Always safe; getting loop conditions right can be tricky. z Profitability z Depends on hardware – usually a win – why? z Risk? z Increases size of code in loop z May not fit in the instruction cache 10 Optimizations z Manyyy, many optimizations invented z Constant folding, constant propagation, tail-call elimination, redundancy elimination, dead code elimination, loop- invariant code motion, loop splitting, loop fusion, strength reduction, array scalarization, inlining, cloning, data prefetching, parallelization. .etc . z How do they interact? z Optimist: we get the sum of all improvements! z RlitRealist: many are iditin direct oppos ition 11 Rough categories z Traditional optimizations z Transform the program to reduce work z Don’t change the level of abstraction z Resource allocation z Map program to specific hardware properties z Register allocation z Instruction scheduling, parallelism z Data streaming, prefetching z EblitEnabling trans forma tions z Don’t necessarily improve code on their own z Inlining, loop unrolling 12 Constant propagation z Idea z If the value of a variable is known to be a constant at compile-time, replace the use of variable with constant n = 10; n = 10; c = 2; c = 2; for (i=0;i<n;i++) for (i=0;i<10;i++) s = s +i*+ i*c; s = s +i*+ i*2; z Safety z Prove the value is constant z Notice: z May interact favorably with other optimizations, like loop unrolling – now we know the trip count 13 Constant folding z Idea z If operand s are known a t compil e-time, eva lua te expression at compile-time r=3141*10;r = 3.141 * 10; r=3141;r = 31.41; z What do we need to be careful of? z IthIs the resu ltthlt the same as if execu tdtted at runti ti?me? z Overflow/underflow, rounding and numeric stability z Often repeated throughout compiler x = A[2]; t1 = 2*4; t2 = A + t1; x = *t2; 14 Partial evaluation z Constant propaggggation and folding together z Idea: z Evaluate as much of the ppgrogram at compile-time as possible z More sophisticated schemes: z Simulate data structures, arrays z SbliSymbolic execu tifthdtion of the code z Caveat: floating point z Preserv ing the error c harac ter is tics o f floa ting po in t va lues 15 Algebraic simplification z Idea: z Apply the usual algebraic rules to simplify expressions a * 1 a a/1 a a * 0 0 a + 0 a b || false b z Reppyppyppeatedly apply to complex expressions z Many, many possible rules z Associativity and commutativity come into play 16 Common sub-expression elimination z Idea: z If program computes the same expression multiple times, reuse the value. a = b + c; tb+t = b + c c = b + c; a = t; d = b + c; c = t; d=b+c;d = b + c; z Safety: z Subexpression can only be reused until operands are redefined z Often occurs in address computations z Array inde xing and str u ct/field accesses 17 Dead code elimination z Idea: z If the result of a computation is never used, then we can remove the computation x=y+1;x = y + 1; y = 1; y = 1; x = 2 * z; x = 2 * z; z Safety z Variable is dead if it is never used after defined z Remove code that assigns to dead variables z This may, in turn, create more dead code z Dead-code elimination usually works transitively 18 Copy propagation z Idea: z After an assignment x = y, replace any uses of x with y x = y; x = y; if (x>1) if (y>1) s = x+f(x); s = y+f(y); z SftSafety: z Only apply up to another assignment to x, or z …another assignment to y! z What if there was an assignment y = z earlier? z Apply transitively to all assignments 19 Unreachable code elimination z Idea: z Eliminate code that can never be executed #define DEBUG 0 . if (DEBUG) print(“Current value = “, v); z Different implementations z High-level: look for if (false) or while (false) z Low-level: more difficult z Code is just labels and gotos z TthhkihblblkTraverse the graph, marking reachable blocks 20 How do these things happen? z Who would write code with: z Dead code z Common subexpressions z CttConstant express ions z Copies of variables z TthTwo ways they occur z High-level constructs – we’ve already seen examples z Other optimizations z Copy propagation often leaves dead code z Enabling transformations: inlining, loop unrolling, etc. 21 Loop optimizations z Program hot-spots are usually in loops z Most programs: 90% of execution time is in loops z What are possible exceptions? OS k ernel s, compil ers an d in terpre ters z Loops are a good place to expend extra effort z Numerous loop optimizations z Often expensive – complex analysis z For languages like Fortran, very effective z What about C? 22 Loop-invariant code motion z Idea: z If a computation won’t change from one loop iteration to the next, move it outside the loop t1 = x *x; for (i=0;i<N;i++) for (i=0;i<N;i++) A[i] = A[i] + x*x; A[i] = A[i] + t1; z Safety: z Determine when expressions are invariant z Just check for variables with no assignments? z Useful for array address computations z Not visible at source level 23 Strength reduction z Idea: z Replace expensive operations (mutiplication, division) with cheaper ones (addition, subtraction, bit shift) z Traditionally applied to induction variables z Variables whose value depends linearly on loop count z Special analysis to find such variables v = 0; for (i=0;i<N;i++) for (i=0;i<N;i++) v = 4*i; A[v] = . A[v] = . v = v + 4; 24 Strength reduction z Can also be applied to simple arithmetic operations: x * 2 x + x x * 22c^c x<<c x/2^c x>>c z Typical example of premature optimization z Programmers use bit-shift instead of multiplication z “2”ihdtdtd“x<<2” is harder to understand z Most compilers will get it right automatically 25 Inlining z Idea: z Replace a function call with the body of the callee z Safety z What about recursion? z Risk z Code size z Most compilers use heuristics to decide when z Has been cast as a knapppsack problem 26 Inlining z What are the benefits of inlining? z Eliminate call/return overhead z Customize callee code in the context of the caller z Use actual arguments z Push into copy of callee code using constant prop z Apply other optimizations to reduce code z Hardware z Eliminate the two jumps z Keep the pipeline filled z Critical for OO languages z Methods are often small z Encapsu lation, modu larity force code apart 27 Inlining z In C: z At a call-site, decide whether or not to inline z (Often a heuristic about callee/caller size) z Look up the callee z Replace call with body of callee z What about Java? class A { void M() {…} } z What complicates this? class B extends A { void M() {…} } z Virtual methods z Even worse? void foo(A x) z Dynamic class loading { x.
Recommended publications
  • Loop Transformations and Parallelization
    Loop transformations and parallelization Claude Tadonki LAL/CNRS/IN2P3 University of Paris-Sud [email protected] December 2010 Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Introduction Most of the time, the most time consuming part of a program is on loops. Thus, loops optimization is critical in high performance computing. Depending on the target architecture, the goal of loops transformations are: improve data reuse and data locality efficient use of memory hierarchy reducing overheads associated with executing loops instructions pipeline maximize parallelism Loop transformations can be performed at different levels by the programmer, the compiler, or specialized tools. At high level, some well known transformations are commonly considered: loop interchange loop (node) splitting loop unswitching loop reversal loop fusion loop inversion loop skewing loop fission loop vectorization loop blocking loop unrolling loop parallelization Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Dependence analysis Extract and analyze the dependencies of a computation from its polyhedral model is a fundamental step toward loop optimization or scheduling. Definition For a given variable V and given indexes I1, I2, if the computation of X(I1) requires the value of X(I2), then I1 ¡ I2 is called a dependence vector for variable V . Drawing all the dependence vectors within the computation polytope yields the so-called dependencies diagram. Example The dependence vectors are (1; 0); (0; 1); (¡1; 1). Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Scheduling Definition The computation on the entire domain of a given loop can be performed following any valid schedule.A timing function tV for variable V yields a valid schedule if and only if t(x) > t(x ¡ d); 8d 2 DV ; (1) where DV is the set of all dependence vectors for variable V .
    [Show full text]
  • Foundations of Scientific Research
    2012 FOUNDATIONS OF SCIENTIFIC RESEARCH N. M. Glazunov National Aviation University 25.11.2012 CONTENTS Preface………………………………………………….…………………….….…3 Introduction……………………………………………….…..........................……4 1. General notions about scientific research (SR)……………….……….....……..6 1.1. Scientific method……………………………….………..……..……9 1.2. Basic research…………………………………………...……….…10 1.3. Information supply of scientific research……………..….………..12 2. Ontologies and upper ontologies……………………………….…..…….…….16 2.1. Concepts of Foundations of Research Activities 2.2. Ontology components 2.3. Ontology for the visualization of a lecture 3. Ontologies of object domains………………………………..………………..19 3.1. Elements of the ontology of spaces and symmetries 3.1.1. Concepts of electrodynamics and classical gauge theory 4. Examples of Research Activity………………….……………………………….21 4.1. Scientific activity in arithmetics, informatics and discrete mathematics 4.2. Algebra of logic and functions of the algebra of logic 4.3. Function of the algebra of logic 5. Some Notions of the Theory of Finite and Discrete Sets…………………………25 6. Algebraic Operations and Algebraic Structures……………………….………….26 7. Elements of the Theory of Graphs and Nets…………………………… 42 8. Scientific activity on the example “Information and its investigation”……….55 9. Scientific research in Artificial Intelligence……………………………………..59 10. Compilers and compilation…………………….......................................……69 11. Objective, Concepts and History of Computer security…….………………..93 12. Methodological and categorical apparatus of scientific research……………114 13. Methodology and methods of scientific research…………………………….116 13.1. Methods of theoretical level of research 13.1.1. Induction 13.1.2. Deduction 13.2. Methods of empirical level of research 14. Scientific idea and significance of scientific research………………………..119 15. Forms of scientific knowledge organization and principles of SR………….121 1 15.1. Forms of scientific knowledge 15.2.
    [Show full text]
  • Compiler Transformations for High-Performance Computing
    Compiler Transformations for High-Performance Computing DAVID F. BACON, SUSAN L. GRAHAM, AND OLIVER J. SHARP Computer Sc~ence Diviszon, Unwersbty of California, Berkeley, California 94720 In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimization for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimization for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis. This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each traneformation, explain how to determine if it is legal, and give an example of its application. Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimization that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as a bibliographic reference for the details of each optimization.
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • Codegeneratoren
    Codegeneratoren Sebastian Geiger Twitter: @lanoxx, Email: [email protected] January 26, 2015 Abstract This is a very short summary of the topics that are tought in “Codegeneratoren“. Code generators are the backend part of a compiler that deal with the conversion of the machine independent intermediate representation into machine dependent instructions and the various optimizations on the resulting code. The first two Sections deal with prerequisits that are required for the later parts. Section 1 discusses computer architectures in order to give a better understanding of how processors work and why di↵erent optimizations are possible. Section 2 discusses some of the higher level optimizations that are done before the code generator does its work. Section 3 discusses instruction selection, which conserns the actual conversion of the intermediate representation into machine dependent instructions. The later chapters all deal with di↵erent kinds of optimizations that are possible on the resulting code. I have written this document to help my self studing for this subject. If you find this document useful or if you find errors or mistakes then please send me an email. Contents 1 Computer architectures 2 1.1 Microarchitectures ............................................... 2 1.2 InstructionSetArchitectures ......................................... 3 2 Optimizations 3 2.1 StaticSingleAssignment............................................ 3 2.2 Graphstructuresandmemorydependencies . 4 2.3 Machineindependentoptimizations. 4 3 Instruction selection 5 4 Register allocation 6 5 Coallescing 7 6 Optimal register allocation using integer linear programming 7 7 Interproceedural register allocation 8 7.1 Global Register allocation at link time (Annotation based) . 8 7.2 Register allocation across procedure and module bounaries (Web) . 8 8 Instruction Scheduling 9 8.1 Phaseorderingproblem:...........................................
    [Show full text]
  • Handling Irreducible Loops: Optimized Node Splitting Vs. DJ-Graphs
    Handling Irreducible Loops: Optimized Node Splitting vs. DJ-Graphs Sebastian Unger1 and Frank Mueller2 1 DResearch Digital Media Systems Otto-Schmirgal-Str. 3, 10319 Berlin (Germany) 2 North Carolina State University, CS Dept. Box 8206, Raleigh, NC 27695-7534 fax: +1.919.515.7925 [email protected] Abstract. This paper addresses the question of how to handle irre- ducible regions during optimization, which has become even more rele- vant for contemporary processors since recent VLIW-like architectures highly rely on instruction scheduling. The contributions of this paper are twofold. First, a method of optimized node splitting to transform irreducible regions of control flow into reducible regions is derived. This method is superior to approaches previously published since it reduces the number of replicated nodes by comparison. Second, three methods that handle regions of irreducible control flow are evaluated with respect to their impact on compiler optimizations: traditional and optimized node splitting as well as loop analysis through DJ graphs. Measurements show improvements of 1-40% for these methods of handling irreducible loop over the unoptimized case. 1 Introduction Compilers heavily rely on recognizing loops for optimizations. Most loop opti- mizations have only been formulated for natural loops with a single entry point (header), the sink of the backedge(s) of such a loop. Multiple entry loops cause irreducible regions of control flow, typically not recognized as loops by traditional algorithms. These regions may result from goto statements or from optimizations that modify the control flow. As a result, loop transformations and optimizations to exploit instruction-level parallelism cannot be applied to such regions so that opportunities for code improvements may be missed.
    [Show full text]
  • Data Locality on Manycore Architectures Duco Van Amstel
    Data Locality on Manycore Architectures Duco van Amstel To cite this version: Duco van Amstel. Data Locality on Manycore Architectures. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Grenoble-Alpes, 2016. English. tel-01358312v1 HAL Id: tel-01358312 https://hal.inria.fr/tel-01358312v1 Submitted on 31 Aug 2016 (v1), last revised 14 Apr 2017 (v2) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE ALPES Spécialité : Informatique Arrêté ministèriel : 7 août 2006 Présentée par Duco van Amstel Thèse dirigée par Fabrice Rastello et co-encadrée par Benoît Dupont de Dinechin préparée au sein de l’ INRIA et de l’École Doctorale MSTII Optimisation de la Localité des Données sur Architectures Manycœurs Thése soutenue publiquement le 18 / 07 / 2016, devant le jury composé de : Mr. François Bodin Professeur, IRISA, Président Mr. Albert Cohen Directeur de Recherche, Inria, Rapporteur Mr. Benoît Dupont de Dinechin Directeur technique, Kalray S.A, Co-Encadrant de thèse Mr. François Irigoin Directeur de Recherche, MINES ParisTech, Rapporteur Mr. Fabrice Rastello Chargé de Recherche, Inria, Directeur de thèse Mr. Ponnuswamy Sadayappan Professor, Ohio State University, Examinateur Data Locality on Manycore Architectures a dissertation by Duco Yalmar van Amstel Acknowledgments The work which is related in this manuscript is not that of a single person.
    [Show full text]
  • Software Thread Integration for Instruction Level Parallelism
    ABSTRACT SO, WON. Software Thread Integration for Instruction Level Parallelism. (Under the direction of Associate Professor Alexander G. Dean). Multimedia applications require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor (DSP) makers to adopt high-performance architectures like VLIW (Very-Long In- struction Word) or EPIC (Explicitly Parallel Instruction Computing). Despite many efforts to exploit instruction-level parallelism (ILP) in the application, the speed is a fraction of what it could be, limited by the difficulty of finding enough independent instructions to keep all of the processor’s functional units busy. This dissertation proposes Software Thread Integration (STI) for Instruction Level Parallelism. STI is a software technique for interleaving multiple threads of control into a single implicitly multithreaded one. We use STI to improve the performance on ILP processors by merging parallel procedures into one, increasing the compiler’s scope and hence allowing it to create a more efficient instruction schedule. STI is es- sentially procedure jamming with intraprocedural code motion transformations which allow arbitrary alignment of instructions or code regions. This alignment enables code to be moved to use available execution resources better and improve the execution schedule. Parallel procedures are identified by the programmer with either annota- tions in conventional procedural languages or graph analysis for stream coarse-grain dataflow programming languages. We use the method of procedure cloning and integration for improving program run-time performance by integrating parallel procedures via STI. This defines a new way of converting parallelism at the thread level to the instruction level.
    [Show full text]
  • CS 610: Loop Transformations Swarnendu Biswas
    CS 610: Loop Transformations Swarnendu Biswas Semester 2020-2021-I CSE, IIT Kanpur Content influenced by many excellent references, see References slide for acknowledgements. Copyright Information • “The instructor of this course owns the copyright of all the course materials. This lecture material was distributed only to the students attending the course CS 610: Programming for Performance of IIT Kanpur, and should not be distributed in print or through electronic media without the consent of the instructor. Students can make their own copies of the course materials for their use.” https://www.iitk.ac.in/doaa/data/FAQ-2020-21-I.pdf CS 610 Swarnendu Biswas Enhancing Program Performance Fundamental issues • Adequate fine-grained parallelism • Exploit vector instruction sets (SSE, AVX, AVX-512) • Multiple pipelined functional units in each core • Adequate parallelism for SMP-type systems • Keep multiple asynchronous processors busy with work • Minimize cost of memory accesses CS 610 Swarnendu Biswas Role of a Good Compiler Try and extract performance automatically Optimize memory access latency • Code restructuring optimizations • Prefetching optimizations • Data layout optimizations • Code layout optimizations CS 610 Swarnendu Biswas Loop Optimizations • Loops are one of most commonly used constructs in HPC program • Compiler performs many of loop optimization techniques automatically • In some cases source code modifications enhance optimizer’s ability to transform code CS 610 Swarnendu Biswas Reordering Transformations • A reordering
    [Show full text]
  • An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs
    HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs Simone Campanoni Kevin Brownell Svilen Kanev Timothy M. Jones+ Gu-Yeon Wei David Brooks Harvard University University of Cambridge+ Cambridge, MA, USA Cambridge, UK Abstract overheads to support misspeculation and must therefore target Data dependences in sequential programs limit paralleliza- relatively large loops to amortize penalties. tion because extracted threads cannot run independently. Al- In contrast to existing parallelization solutions, we propose though thread-level speculation can avoid the need for precise an alternate strategy that instead targets small loops, which are dependence analysis, communication overheads required to syn- much easier to analyze via state-of-the-art control and data flow chronize actual dependences counteract the benefits of paral- analysis, significantly improving accuracy. Furthermore, this lelization. To address these challenges, we propose a lightweight ease of analysis enables transformations that simply re-compute architectural enhancement co-designed with a parallelizing com- shared variables in order to remove a large fraction of actual piler, which together can decouple communication from thread dependences. This strategy increases TLP and reduces core-to- execution. Simulations of these approaches, applied to a pro- core communication. Such optimizations do not readily translate cessor with 16 Intel Atom-like cores, show an average of 6.85× to TLS because the complexity of TLS-targeted code typically performance speedup for six SPEC CINT2000 benchmarks. spans multiple procedures in larger loops. Finally, our data shows parallelizing small hot loops yields high program coverage and 1. Introduction produces meaningful speedups for the non-numerical programs in the SPEC CPU2000 suite.
    [Show full text]
  • Code Optimization Based on Source to Source Transformations Using Profile Guided Metrics Youenn Lebras
    Code optimization based on source to source transformations using profile guided metrics Youenn Lebras To cite this version: Youenn Lebras. Code optimization based on source to source transformations using profile guided met- rics. Automatic Control Engineering. Université Paris-Saclay, 2019. English. NNT : 2019SACLV037. tel-02443231 HAL Id: tel-02443231 https://tel.archives-ouvertes.fr/tel-02443231 Submitted on 17 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. s Code optimization based on source to source transformations using profile guided metrics Thèse de doctorat de l’Université Paris-Saclay préparée à l’Université de Versailles-Saint-Quentin-en-Yvelines Ecole doctorale n◦580 Sciences et technologies de l’information et de la NNT : 2019SACLV037 communication (STIC) Spécialité de doctorat: "Programmation : modèles, algorithmes, langages, architecture" Thèse présentée et soutenue à Versailles, le 3 Juillet 2019, par YOUENN LEBRAS Composition du Jury : Anthony Scemama IR CNRS, HdR, Université de Toulouse Rapporteur Angelo Steffenel ) McF HdR, Université de Reims Champagne-Ardenne Rapporteur Michel Masella CEA DRF, HdR, Saclay Examinateur Denis Barthou PR. Université de Bordeaux Président Sophie Robert McF, Université d’Orleans Examinateur William Jalby PR, Univesité de Versailles Saint-Quentin Directeur de thèse Andres S.
    [Show full text]
  • Efficient Support for High-Level Array Operations in a Functional Setting
    JFP 13 (6): 1005–1059, November 2003. c 2003 Cambridge University Press 1005 DOI: 10.1017/S0956796802004458 Printed in the United Kingdom Single Assignment C: efficient support for high-level array operations in a functional setting SVEN-BODO SCHOLZ Institut fur¨ Informatik und Praktische Mathematik, Christian-Albrechts-Universitat¨ Kiel, Herman-Rodewald-Strasse 3, 24118 Kiel, Germany (e-mail: [email protected]) Abstract This paper presents a novel approach for integrating arrays with access time O(1) into functional languages. It introduces n-dimensional arrays combined with a type system that supports hierarchies of array types with varying shape information as well as a shape- invariant form of array comprehension called with-loop. Together, these constructs allow for a programming style similar to that of array programming languages such as Apl.We use Single Assignment C (SaC), a functional C-variant aimed at numerical applications that is based on the proposed design, to demonstrate that programs written in that style can be compiled to code whose runtime performance is competitive with that of hand-optimized Fortran programs. However, essential prerequisites for such performance figures are a shape inference system integrated in the type system as well as several high-level optimizations. Most notably of these is With Loop Folding,anoptimization technique for eliminating intermediate arrays. Capsule Review To me as a compiler writer, this work is unusually interesting because it transforms scientific code in ways that have not been considered in high-performance compilers. For example, because of the single-assignment nature of the language, loop fusion and array contraction can be done without loop alignment or data liveness analysis.
    [Show full text]