K-Loops: Loop Transformations for Reconfigurable Architectures

Total Page:16

File Type:pdf, Size:1020Kb

K-Loops: Loop Transformations for Reconfigurable Architectures Ozana Silvia Dragomir K-loops: Loop Transformations for Reconfigurable Architectures K-loops: Loop Transformations for Reconfigurable Architectures PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof.ir. K.C.A.M Luyben, voorzitter van het College voor Promoties, in het openbaar te verdedigen op donderdag 16 juni 2011 om 15:00 uur door Ozana Silvia DRAGOMIR Master of Science, Politehnica University of Bucharest geboren te Ploies¸ti, Roemenie¨ Dit proefschrift is goedgekeurd door de promotor: Prof.dr.ir. H.J. Sips Copromotor: Dr. K.L.M. Bertels Samenstelling promotiecommissie: Rector Magnificus, voorzitter Prof.dr.ir. H.J.Sips, Technische Universiteit Delft, promotor Dr. K.L.M.Bertels, Technische Universiteit Delft, copromotor Prof.dr. A.P. de Vries, Technische Universiteit Delft Prof.dr. B.H.H. Juurlink, Technische Universitat¨ Berlin, Duitsland Prof.dr. F. Catthoor, Katholieke Universiteit Leuven, Belgie¨ Prof.dr.ir. K. De Bosschere, Universiteit Gent, Belgie¨ Dr. J.M.P. Cardoso, Universidade do Porto, Portugal Prof.dr. C.I.M. Beenakker, Technische Universiteit Delft, reservelid Ozana Silvia Dragomir K-loops: Loop Transformations for Reconfigurable Architectures Delft: TU Delft, Faculty of Elektrotechniek, Wiskunde en Informatica Met samenvatting in het Nederlands. ISBN 978-90-72298-00-3 Subject headings: compiler optimizations, loop transformations, reconfigurable computing. Cover image by Sven Geier. Copyright c 2011 Ozana Silvia Dragomir All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without permission of the author. Printed by Wohrmann¨ Print Service (www.wps.nl), The Netherlands. To my beloved husband, to our wonderful son. K-loops: Loop Transformations for Reconfigurable Architectures Abstract ECONFIGURABLE computing is becoming increasingly popular, as it is a middle point between dedicated hardware, which gives the best R performance, but has the highest production cost and longest time to the market, and conventional microprocessors, which are very flexible, but give less performance. Reconfigurable architectures have the advantages of high performance (given by the hardware), great flexibility (given by the software), and fast time to the market. Signal processing represents an important category of applications nowadays. In such applications, it is common to find large computation intensive parts of code (the application kernels) inside loop nests. Examples are the JPEG and the MJPEG image compression algorithms, the H264 video compression algorithm, the Sobel edge detection, etc. One way of increasing the performance of these applications is to map the kernels onto reconfigurable fabric. The focus of this dissertation is on kernel loops (K-loops), which are loop nests that contain hardware mapped kernels in the loop body. In this thesis, we propose methods for improving the performance of such K-loops, by using standard loop transformations for exposing and exploiting the coarse grain loop level parallelism. We target a reconfigurable architecture that is a heterogeneous system con- sisting of a general purpose processor and a field programmable gate array (FPGA). Research projects targeting reconfigurable architectures are trying to give answers to several problems: how to partition the application – decide which parts to be accelerated on the FPGA, how to optimize these parts (the kernels), what is the performance gain. However, only few try to exploit the coarse grain loop level parallelism. This work goes towards automatically deciding the number of kernel instances to place into the reconfigurable hardware, in a flexible way that can balance between area and performance. In this dissertation, we propose a general frame- work that helps determine the optimal degree of parallelism for each hardware i mapped kernel within a K-loop, taking into account area, memory size and bandwidth, and performance considerations. In the future it can also take into account power. Furthermore, we present algorithms and mathematical models for several loop transformations in the context of K-loops. The algorithms are used to determine the best degree of parallelism for a given K-loop, while the mathematical models are used to determine the corresponding performance improvement. The algorithms are validated with experimental results. The loop transformations that we analyze in this thesis are loop unrolling, loop shifting, K-pipelining, loop distribution, and loop skewing. An algorithm that decides which transformations to use for a given K-loop is also provided. Finally, we also present an analysis of possible situations and justifications of when and why the loop transformations have or have not a significant impact on the K-loop performance. ii Acknowledgements The fact that I came to Delft to do my PhD is the result of a series of both fortunate and unfortunate events. I would like to thank to the main people that made it possible: Prof. Irina Athanasiu, Prof. Stamatis Vassiliadis, Vlad Sima, and Elena Panainte. Irina Athanasiu was my first mentor from the Politehnica University of Bucharest, and she taught many of us not only about compilers, but also valuable lessons of life. Stamatis Vassiliadis was the one that gave me the chance to do this PhD, and accepted me in his group. He was a great spirit and a great leader, and the group has never been the same without him. Vlad Sima has been near me in many crucial moments of my life. I thank him for all his patience and support throughout the years and, although our lives went on different paths, I will always think of him fondly. I am grateful to Elena Panainte and to her husband Viorel for welcoming me into their home, and for sharing their knowledge about life here, which made it easier for me to make the decision of coming to live here. Next, I would like to thank my advisor, Dr. Koen Bertels, who believed in me more than I did. If it wasn’t for him, I might not have found the strength to complete the PhD. I am also grateful to Stephan Wong and to Todor Stefanov, with whom I worked on my first papers. They critically reviewed my work and gave me helpful suggestions to improve it. Many thanks also to Carlo Galuzzi who reviewed the text of my dissertation. Many people from the CE group made my life here in Delft easier and nicer, and I would like to thank all of them for the fun nights and social events, for the help when relocating (too many times already), for the time to chat, for the nice pictures, and for everything else. Many thanks to all the Romanians, to my past and present office mates, to Kamana, Yana, Filipa, Behnaz, Lu Yi, Demid, Roel, Dimitris, Sebastian, Stefan (Bugs), Cor, Ricardo, and to the CE staff (Lidwina, Monique, Erik, and Eef). Special thanks go to Eva and Vasek,ˇ for opening up my mind regarding the terrifying subject of child rearing; to Marjan Stolk, for introducing us to the Dutch language and culture in a pleasant and fun way and for being more than iii an instructor; to Martijn, for being the ‘best man’ and a true friend; and to Sean Halle, for insightful discussions, for believing in me, and for becoming an (almost) regular presence in our family. Last but not least, I would like to thank my family. My parents have always been there for me and encouraged me to follow my dreams. I thank them for understanding that I cannot be with them now, and I miss them. To my husband, Arnaldo, I would like to thank for his love and patience, and for our wonderful son, Alex. With them I am a complete person, and, I hope, a better person. Ozana Dragomir Delft, The Netherlands, June 2011 iv Table of contents Abstract .................................. i Acknowledgements ............................ iii List of Tables ................................ ix List of Figures ............................... xi List of Acronyms and Symbols ...................... xv 1 Introducing K-Loops .......................... 1 1.1 Research Context . 1 1.2 Problem Overview . 3 1.3 Thesis Contributions . 7 1.4 Thesis Organization . 8 2 Related Work .............................. 9 2.1 Levels of Parallelism . 10 2.2 Loop Transformations Overview . 11 2.2.1 Loop Unrolling . 11 2.2.2 Software Pipelining . 14 2.2.3 Loop Fusion . 17 2.2.4 Loop Fission . 18 2.2.5 Loop Scheduling Transformations . 19 2.2.6 Enabler Transformations . 22 2.2.7 Other Loop Transformations . 24 2.3 Loop Optimizations in Reconfigurable Computing . 25 2.3.1 Fine-Grained Reconfigurable Architectures . 25 2.3.2 Coarse-Grained Reconfigurable Architectures . 29 2.4 LLP Exploitation . 32 v 2.5 Open Issues . 34 3 Methodology .............................. 37 3.1 Introduction . 37 3.1.1 The Definition of K-loop . 38 3.1.2 Assumptions Regarding the K-loops Framework . 38 3.2 Framework . 41 3.3 Theoretical Background . 42 3.3.1 General Notations . 42 3.3.2 Area considerations . 45 3.3.3 Memory Considerations . 46 3.3.4 Performance Considerations . 48 3.3.5 Maximum Speedup by Amdahl’s Law . 49 3.4 The Random Test Generator . 51 3.5 Overview of the Proposed Loop Transformations . 53 3.6 Conclusions . 58 4 Loop Unrolling for K-loops ....................... 59 4.1 Introduction . 59 4.1.1 Example . 59 4.2 Mathematical Model . 60 4.3 Experimental Results . 62 4.3.1 Selected Kernels . 62 4.3.2 Analysis of the Results . 64 4.4 Conclusions . 68 5 Loop Shifting for K-loops ....................... 69 5.1 Introduction . 69 5.1.1 Example . 70 5.2 Mathematical Model . 71 5.3 Experimental Results . 74 5.4 Conclusions . 77 6 K-Pipelining .............................. 81 6.1 Introduction . 81 6.1.1 Example . 82 6.2 Mathematical Model . 85 6.3 Experimental Results .
Recommended publications
  • Generalizing Loop-Invariant Code Motion in a Real-World Compiler
    Imperial College London Department of Computing Generalizing loop-invariant code motion in a real-world compiler Author: Supervisor: Paul Colea Fabio Luporini Co-supervisor: Prof. Paul H. J. Kelly MEng Computing Individual Project June 2015 Abstract Motivated by the perpetual goal of automatically generating efficient code from high-level programming abstractions, compiler optimization has developed into an area of intense research. Apart from general-purpose transformations which are applicable to all or most programs, many highly domain-specific optimizations have also been developed. In this project, we extend such a domain-specific compiler optimization, initially described and implemented in the context of finite element analysis, to one that is suitable for arbitrary applications. Our optimization is a generalization of loop-invariant code motion, a technique which moves invariant statements out of program loops. The novelty of the transformation is due to its ability to avoid more redundant recomputation than normal code motion, at the cost of additional storage space. This project provides a theoretical description of the above technique which is fit for general programs, together with an implementation in LLVM, one of the most successful open-source compiler frameworks. We introduce a simple heuristic-driven profitability model which manages to successfully safeguard against potential performance regressions, at the cost of missing some speedup opportunities. We evaluate the functional correctness of our implementation using the comprehensive LLVM test suite, passing all of its 497 whole program tests. The results of our performance evaluation using the same set of tests reveal that generalized code motion is applicable to many programs, but that consistent performance gains depend on an accurate cost model.
    [Show full text]
  • Loop Transformations and Parallelization
    Loop transformations and parallelization Claude Tadonki LAL/CNRS/IN2P3 University of Paris-Sud [email protected] December 2010 Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Introduction Most of the time, the most time consuming part of a program is on loops. Thus, loops optimization is critical in high performance computing. Depending on the target architecture, the goal of loops transformations are: improve data reuse and data locality efficient use of memory hierarchy reducing overheads associated with executing loops instructions pipeline maximize parallelism Loop transformations can be performed at different levels by the programmer, the compiler, or specialized tools. At high level, some well known transformations are commonly considered: loop interchange loop (node) splitting loop unswitching loop reversal loop fusion loop inversion loop skewing loop fission loop vectorization loop blocking loop unrolling loop parallelization Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Dependence analysis Extract and analyze the dependencies of a computation from its polyhedral model is a fundamental step toward loop optimization or scheduling. Definition For a given variable V and given indexes I1, I2, if the computation of X(I1) requires the value of X(I2), then I1 ¡ I2 is called a dependence vector for variable V . Drawing all the dependence vectors within the computation polytope yields the so-called dependencies diagram. Example The dependence vectors are (1; 0); (0; 1); (¡1; 1). Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Scheduling Definition The computation on the entire domain of a given loop can be performed following any valid schedule.A timing function tV for variable V yields a valid schedule if and only if t(x) > t(x ¡ d); 8d 2 DV ; (1) where DV is the set of all dependence vectors for variable V .
    [Show full text]
  • Foundations of Scientific Research
    2012 FOUNDATIONS OF SCIENTIFIC RESEARCH N. M. Glazunov National Aviation University 25.11.2012 CONTENTS Preface………………………………………………….…………………….….…3 Introduction……………………………………………….…..........................……4 1. General notions about scientific research (SR)……………….……….....……..6 1.1. Scientific method……………………………….………..……..……9 1.2. Basic research…………………………………………...……….…10 1.3. Information supply of scientific research……………..….………..12 2. Ontologies and upper ontologies……………………………….…..…….…….16 2.1. Concepts of Foundations of Research Activities 2.2. Ontology components 2.3. Ontology for the visualization of a lecture 3. Ontologies of object domains………………………………..………………..19 3.1. Elements of the ontology of spaces and symmetries 3.1.1. Concepts of electrodynamics and classical gauge theory 4. Examples of Research Activity………………….……………………………….21 4.1. Scientific activity in arithmetics, informatics and discrete mathematics 4.2. Algebra of logic and functions of the algebra of logic 4.3. Function of the algebra of logic 5. Some Notions of the Theory of Finite and Discrete Sets…………………………25 6. Algebraic Operations and Algebraic Structures……………………….………….26 7. Elements of the Theory of Graphs and Nets…………………………… 42 8. Scientific activity on the example “Information and its investigation”……….55 9. Scientific research in Artificial Intelligence……………………………………..59 10. Compilers and compilation…………………….......................................……69 11. Objective, Concepts and History of Computer security…….………………..93 12. Methodological and categorical apparatus of scientific research……………114 13. Methodology and methods of scientific research…………………………….116 13.1. Methods of theoretical level of research 13.1.1. Induction 13.1.2. Deduction 13.2. Methods of empirical level of research 14. Scientific idea and significance of scientific research………………………..119 15. Forms of scientific knowledge organization and principles of SR………….121 1 15.1. Forms of scientific knowledge 15.2.
    [Show full text]
  • Compiler Optimizations
    Compiler Optimizations CS 498: Compiler Optimizations Fall 2006 University of Illinois at Urbana-Champaign A note about the name of optimization • It is a misnomer since there is no guarantee of optimality. • We could call the operation code improvement, but this is not quite true either since compiler transformations are not guaranteed to improve the performance of the generated code. Outline Assignment Statement Optimizations Loop Body Optimizations Procedure Optimizations Register allocation Instruction Scheduling Control Flow Optimizations Cache Optimizations Vectorization and Parallelization Advanced Compiler Design Implementation. Steven S. Muchnick, Morgan and Kaufmann Publishers, 1997. Chapters 12 - 19 Historical origins of compiler optimization "It was our belief that if FORTRAN, during its first months, were to translate any reasonable "scientific" source program into an object program only half as fast as its hand coded counterpart, then acceptance of our system would be in serious danger. This belief caused us to regard the design of the translator as the real challenge, not the simple task of designing the language."... "To this day I believe that our emphasis on object program efficiency rather than on language design was basically correct. I believe that has we failed to produce efficient programs, the widespread use of language like FORTRAN would have been seriously delayed. John Backus FORTRAN I, II, and III Annals of the History of Computing Vol. 1, No 1, July 1979 Compilers are complex systems “Like most of the early hardware and software systems, Fortran was late in delivery, and didn’t really work when it was delivered. At first people thought it would never be done.
    [Show full text]
  • Compiler Transformations for High-Performance Computing
    Compiler Transformations for High-Performance Computing DAVID F. BACON, SUSAN L. GRAHAM, AND OLIVER J. SHARP Computer Sc~ence Diviszon, Unwersbty of California, Berkeley, California 94720 In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimization for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimization for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis. This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each traneformation, explain how to determine if it is legal, and give an example of its application. Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimization that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as a bibliographic reference for the details of each optimization.
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • Codegeneratoren
    Codegeneratoren Sebastian Geiger Twitter: @lanoxx, Email: [email protected] January 26, 2015 Abstract This is a very short summary of the topics that are tought in “Codegeneratoren“. Code generators are the backend part of a compiler that deal with the conversion of the machine independent intermediate representation into machine dependent instructions and the various optimizations on the resulting code. The first two Sections deal with prerequisits that are required for the later parts. Section 1 discusses computer architectures in order to give a better understanding of how processors work and why di↵erent optimizations are possible. Section 2 discusses some of the higher level optimizations that are done before the code generator does its work. Section 3 discusses instruction selection, which conserns the actual conversion of the intermediate representation into machine dependent instructions. The later chapters all deal with di↵erent kinds of optimizations that are possible on the resulting code. I have written this document to help my self studing for this subject. If you find this document useful or if you find errors or mistakes then please send me an email. Contents 1 Computer architectures 2 1.1 Microarchitectures ............................................... 2 1.2 InstructionSetArchitectures ......................................... 3 2 Optimizations 3 2.1 StaticSingleAssignment............................................ 3 2.2 Graphstructuresandmemorydependencies . 4 2.3 Machineindependentoptimizations. 4 3 Instruction selection 5 4 Register allocation 6 5 Coallescing 7 6 Optimal register allocation using integer linear programming 7 7 Interproceedural register allocation 8 7.1 Global Register allocation at link time (Annotation based) . 8 7.2 Register allocation across procedure and module bounaries (Web) . 8 8 Instruction Scheduling 9 8.1 Phaseorderingproblem:...........................................
    [Show full text]
  • Study Topics Test 3
    Printed by David Whalley Apr 22, 20 9:44 studytopics_test3.txt Page 1/2 Apr 22, 20 9:44 studytopics_test3.txt Page 2/2 Topics to Study for Exam 3 register allocation array padding scalar replacement Chapter 6 loop interchange prefetching intermediate code representations control flow optimizations static versus dynamic type checking branch chaining equivalence of type expressions reversing branches coercions, overloading, and polymorphism code positioning translation of boolean expressions loop inversion backpatching and translation of flow−of−control constructs useless jump elimination translation of record declarations unreachable code elimination translation of switch statements data flow optimizations translation of array references common subexpression elimination partial redundancy elimination dead assignment elimination Chapter 5 evaluation order determination recurrence elimination syntax directed definitions machine specific optimizations synthesized and inherited attributes instruction scheduling dependency graphs filling delay slots L−attributed definitions exploiting instruction−level parallelism translation schemes peephole optimization syntax−directed construction of syntax trees and DAGs constructing a control flow graph syntax−directed translation with YACC optimizations before and after code generation instruction selection optimization Chapter 7 Chapter 10 activation records call graphs instruction pipelining typical actions during calls and returns data dependences storage allocation strategies true dependences static,
    [Show full text]
  • Faculty of Engineering, Technology and Management Sciences
    FACULTY OF ENGINEERING, TECHNOLOGY AND MANAGEMENT SCIENCES (V.M.K.V.ENGINEERING COLLEGE, SALEM AND AARUPADAI VEEDU INSTITUTE OF TECHNOLOGY, PAIYANOOR, CHENNAI) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING B.E. – COMPUTER SCIENCE AND ENGINEERING (PART TIME) (SYLLABUS) 2015 ONWARDS SEMESTER I (4+1) CODE No. COURSE TITLE THEORY L P C Engineering Mathematics 3 0 3 Programming in C 3 0 3 Environmental Science in Engineering 3 0 3 Data structures 3 0 3 PRACTICAL Data structures lab 0 3 2 TOTAL: 14 SEMESTER II (5+1) THEORY Advanced Engineering Mathematics 3 0 3 Database Management Systems 3 0 3 Computer Organization 3 0 3 Electron Devices 3 0 3 Object Oriented programming 3 0 3 PRACTICAL Object Oriented programming Lab 3 0 2 TOTAL: 17 SEMESTER III (4+1) THEORY Artificial Intelligence 3 0 3 System Software 3 0 3 Microprocessors and Microcontrollers 3 0 3 Operating Systems 3 0 3 PRACTICAL System Programming Laboratory 0 3 2 TOTAL: 14 SEMESTER IV (4+1) CODE No. COURSE TITLE L P C THEORY Probability and Queuing Theory 3 0 3 Data Communication and Networks 3 0 3 Software Engineering 3 0 3 Object Oriented Analysis and Design 3 0 3 PRACTICAL Computer Networks Laboratory 0 3 2 TOTAL: 14 SEMESTER V (4+1) THEORY Mobile Computing 3 0 3 Web Technology 3 0 3 Graphics and Multimedia 3 0 3 Elective I 3 0 3 PRACTICAL Web Technology Laboratory 0 3 2 TOTAL: 14 SEMESTER VI (4+2) THEORY Total Quality Management 3 0 3 Compiler Design 3 0 3 Security in Computing 3 0 3 Elective II 3 0 3 PRACTICAL Compiler Design Lab 0 3 2 Mini Project 0 3 3 TOTAL: 17 SEMESTER VII (4+1) CODE No.
    [Show full text]
  • Translation Validation of Optimizing Compilers
    Translation Validation of Optimizing Compilers by Yi Fang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science Courant Institute of Mathematical Sciences New York University December, 2005 Advisor: Amir Pnueli Abstract There is a growing awareness, both in industry and academia, of the crucial role of formally verifying the translation from high-level source-code into low- level object code that is typically performed by an optimizing compiler. Formally verifying an optimizing compiler, as one would verify any other large program, is not feasible due to its size, ongoing evolution and modification, and possibly, proprietary considerations. Translation validation is a novel approach that offers an alternative to the verification of translator in general and compilers in partic- ular: Rather than verifying the compiler itself, one constructs a validation tool which, after every run of the compiler, formally confirms that the target code pro- duced in the run is a correct translation of the source program. This thesis work takes an important step towards ensuring an extremely high level of confidence in compilers targeted at EPIC architectures. The dissertation focuses on the translation validation of structure-preserving optimizations, i.e., transformations that do not modify programs’ structure in a major way, which include most of the global optimizations performed by compil- ers. The first part of the dissertation develops the theory of a correct translation, which provides a precise definition of the notion of a target program being a cor- rect translation of a source program, and the method that formally establishes the correctness of structure preserving transformations based on computational induc- tion.
    [Show full text]
  • Denotational Translation Validation
    Denotational Translation Validation The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Govereau, Paul. 2012. Denotational Translation Validation. Doctoral dissertation, Harvard University. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:10121982 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA c 2012 - Paul Govereau All rights reserved. Thesis advisor Author Greg Morrisett Paul Govereau Denotational Translation Validation Abstract In this dissertation we present a simple and scalable system for validating the correctness of low-level program transformations. Proving that program transfor- mations are correct is crucial to the development of security critical software tools. We achieve a simple and scalable design by compiling sequential low-level programs to synchronous data-flow programs. Theses data-flow programs are a denotation of the original programs, representing all of the relevant aspects of the program se- mantics. We then check that the two denotations are equivalent, which implies that the program transformation is semantics preserving. Our denotations are computed by means of symbolic analysis. In order to achieve our design, we have extended symbolic analysis to arbitrary control-flow graphs. To this end, we have designed an intermediate language called Synchronous Value Graphs (SVG), which is capable of representing our denotations for arbitrary control-flow graphs, we have built an algorithm for computing SVG from normal assembly language, and we have given a formal model of SVG which allows us to simplify and compare denotations.
    [Show full text]
  • Handling Irreducible Loops: Optimized Node Splitting Vs. DJ-Graphs
    Handling Irreducible Loops: Optimized Node Splitting vs. DJ-Graphs Sebastian Unger1 and Frank Mueller2 1 DResearch Digital Media Systems Otto-Schmirgal-Str. 3, 10319 Berlin (Germany) 2 North Carolina State University, CS Dept. Box 8206, Raleigh, NC 27695-7534 fax: +1.919.515.7925 [email protected] Abstract. This paper addresses the question of how to handle irre- ducible regions during optimization, which has become even more rele- vant for contemporary processors since recent VLIW-like architectures highly rely on instruction scheduling. The contributions of this paper are twofold. First, a method of optimized node splitting to transform irreducible regions of control flow into reducible regions is derived. This method is superior to approaches previously published since it reduces the number of replicated nodes by comparison. Second, three methods that handle regions of irreducible control flow are evaluated with respect to their impact on compiler optimizations: traditional and optimized node splitting as well as loop analysis through DJ graphs. Measurements show improvements of 1-40% for these methods of handling irreducible loop over the unoptimized case. 1 Introduction Compilers heavily rely on recognizing loops for optimizations. Most loop opti- mizations have only been formulated for natural loops with a single entry point (header), the sink of the backedge(s) of such a loop. Multiple entry loops cause irreducible regions of control flow, typically not recognized as loops by traditional algorithms. These regions may result from goto statements or from optimizations that modify the control flow. As a result, loop transformations and optimizations to exploit instruction-level parallelism cannot be applied to such regions so that opportunities for code improvements may be missed.
    [Show full text]