Parallelization in Co-Compilation for Configurable Accelerators
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Switching Theory for Logic Synthesis Ad Hoc Wireless Networking Data
COMMUNICATION ENGINEERING & SIGNAL PROCESSING Switching Theory for Logic Synthesis Tsutomu Sasao Contents: 1. Mathematical Foundation. 2. Lattice and Boolean Algebra. 3. Logic Functions and their Representations 4. Optimization of and-or Two-level Logic Networks. 5. Logic Functions with Various Properties. 6. Sequential Networks. 7. Optimization of Sequential Networks. 8. Delay and Asynchronous Behavior. 9. Multi-valued Input Two-valued Output Function. 10. Heuristic Optimization of Two- level Networks. 11. Multi-level Logic Synthesis. 12. Logic Design Using Modules. 13. Logic Design Using EXORs. 14. Complexity of Logic Networks Rpt. 2011 379 pp 978-81-84898-02-6 BSPSPR Rs. 595.00 Ad Hoc Wireless Networking For New Arrivals visit Cheng Contents: 1. Introduction 2. Related Work 3. Formulation of Power-aware Routing 4. Online Power-aware Routing with max-min zPmin 5. Hierarchical Routing with max-min zPmin 6. Distributed Routing with max-min zPmin Rpt. 2011 630 pp 978-81-84898-48-4 BSPSPR Rs. 650.00 Communications Satellite Handbook : Walter L. Morgan, Gary D. Gordon www.bspbooks.net / www.bspublications.net Contents: 1. Obtaining Access to the Satellite 2. TELETRAFFIC 3. Interfaces Between Terrestrial and Satellite Systems 4. Telecommunications Systems 5. COMMUNICATIONS SATELLITE SYSTEMS 6. System Modeling 7. Overall System Calculations 8. MULTIPLE-ACCESS TECHNIQUES 9. Frequency Domain Multiple Access 10. Time Domain Multiple Access 11. Space Domain Multiple Access 12. Code Domain Multiple Access 13. Random Multiple Access 14. SPACECRAFT TECHNOLOGY 15. Space Configuration and Subsystems 16. Telemetry, Tracking, and Command 17. Solar Arrays 18. Attitude Control 19. Thermal Control 20. SATELLITE ORBITS 21. Direction of Orbit Normals and of Sun 22. -
Reconfigurable Computing
Reconfigurable Computing: A Survey of Systems and Software KATHERINE COMPTON Northwestern University AND SCOTT HAUCK University of Washington Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey, we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which reuse the configurable hardware during program execution. Categories and Subject Descriptors: A.1 [Introductory and Survey]; B.6.1 [Logic Design]: Design Style—logic arrays; B.6.3 [Logic Design]: Design Aids; B.7.1 [Integrated Circuits]: Types and Design Styles—gate arrays General Terms: Design, Performance Additional Key Words and Phrases: Automatic design, field-programmable, FPGA, manual design, reconfigurable architectures, reconfigurable computing, reconfigurable systems 1. INTRODUCTION of algorithms. The first is to use hard- wired technology, either an Application There are two primary methods in con- Specific Integrated Circuit (ASIC) or a ventional computing for the execution group of individual components forming a This research was supported in part by Motorola, Inc., DARPA, and NSF. K. Compton was supported by an NSF fellowship. S. Hauck was supported in part by an NSF CAREER award and a Sloan Research Fellowship. -
Reconfigurable Computing
Reconfigurable computing: architectures and design methods T.J. Todman, G.A. Constantinides, S.J.E. Wilton, O. Mencer, W. Luk and P.Y.K. Cheung Abstract: Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Virtex 4 FPGA devices. The authors identify major trends in general-purpose and special- purpose design methods. It is shown that reconfigurable computing designs are capable of achieving up to 500 times speedup and 70% energy savings over microprocessor implementations for specific applications. 1 Introduction Recent research suggests that it is a trend rather than a one-off for a wide variety of applications: from image Reconfigurable computing is rapidly establishing itself as a processing [3] to floating-point operations [4]. major discipline that covers various subjects of learning, Sheer speed, while important, is not the only strength of including both computing science and electronic engineer- reconfigurable computing. Another compelling advantage is ing. Reconfigurable computing involves the use of reduced energy and power consumption. In a reconfigurable reconfigurable devices, such as field programmable gate system, the circuitry is optimised for the application, such arrays (FPGAs), for computing purposes. Reconfigurable that the power consumption will tend to be much lower than computing is also known as configurable computing or that for a general-purpose processor. A recent study [5] custom computing, since many of the design techniques can reports that moving critical software loops to reconfigurable be seen as customising a computational fabric for specific hardware results in average energy savings of 35% to 70% applications [1]. -
The Paramountcy of Reconfigurable Computing
Energy Efficient Distributed Computing Systems, Edited by Albert Y. Zomaya, Young Choon Lee. ISBN 978-0-471--90875-4 Copyright © 2012 Wiley, Inc. Chapter 18 The Paramountcy of Reconfigurable Computing Reiner Hartenstein Abstract. Computers are very important for all of us. But brute force disruptive architectural develop- ments in industry and threatening unaffordable operation cost by excessive power consumption are a mas- sive future survival problem for our existing cyber infrastructures, which we must not surrender. The pro- gress of performance in high performance computing (HPC) has stalled because of the „programming wall“ caused by lacking scalability of parallelism. This chapter shows that Reconfigurable Computing is the sil- ver bullet to obtain massively better energy efficiency as well as much better performance, also by the up- coming methodology of HPRC (high performance reconfigurable computing). We need a massive cam- paign for migration of software over to configware. Also because of the multicore parallelism dilemma, we anyway need to redefine programmer education. The impact is a fascinating challenge to reach new hori- zons of research in computer science. We need a new generation of talented innovative scientists and engi- neers to start the beginning second history of computing. This paper introduces a new world model. 18.1 Introduction In Reconfigurable Computing, e. g. by FPGA (Table 15), practically everything can be implemented which is running on traditional computing platforms. For instance, recently the historical Cray 1 supercomputer has been reproduced cycle-accurate binary-compatible using a single Xilinx Spartan-3E 1600 development board running at 33 MHz (the original Cray ran at 80 MHz) 0. -
Generalizing Loop-Invariant Code Motion in a Real-World Compiler
Imperial College London Department of Computing Generalizing loop-invariant code motion in a real-world compiler Author: Supervisor: Paul Colea Fabio Luporini Co-supervisor: Prof. Paul H. J. Kelly MEng Computing Individual Project June 2015 Abstract Motivated by the perpetual goal of automatically generating efficient code from high-level programming abstractions, compiler optimization has developed into an area of intense research. Apart from general-purpose transformations which are applicable to all or most programs, many highly domain-specific optimizations have also been developed. In this project, we extend such a domain-specific compiler optimization, initially described and implemented in the context of finite element analysis, to one that is suitable for arbitrary applications. Our optimization is a generalization of loop-invariant code motion, a technique which moves invariant statements out of program loops. The novelty of the transformation is due to its ability to avoid more redundant recomputation than normal code motion, at the cost of additional storage space. This project provides a theoretical description of the above technique which is fit for general programs, together with an implementation in LLVM, one of the most successful open-source compiler frameworks. We introduce a simple heuristic-driven profitability model which manages to successfully safeguard against potential performance regressions, at the cost of missing some speedup opportunities. We evaluate the functional correctness of our implementation using the comprehensive LLVM test suite, passing all of its 497 whole program tests. The results of our performance evaluation using the same set of tests reveal that generalized code motion is applicable to many programs, but that consistent performance gains depend on an accurate cost model. -
A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco Van Amstel
A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel To cite this version: Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel. A Tiling Perspective for Register Op- timization. [Research Report] RR-8541, Inria. 2014, pp.24. hal-00998915 HAL Id: hal-00998915 https://hal.inria.fr/hal-00998915 Submitted on 3 Jun 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Tiling Perspective for Register Optimization Łukasz Domagała, Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel RESEARCH REPORT N° 8541 May 2014 Project-Teams GCG ISSN 0249-6399 ISRN INRIA/RR--8541--FR+ENG A Tiling Perspective for Register Optimization Lukasz Domaga la∗, Fabrice Rastello†, Sadayappan Ponnuswany‡, Duco van Amstel§ Project-Teams GCG Research Report n° 8541 — May 2014 — 21 pages Abstract: Register allocation is a much studied problem. A particularly important context for optimizing register allocation is within loops, since a significant fraction of the execution time of programs is often inside loop code. A variety of algorithms have been proposed in the past for register allocation, but the complexity of the problem has resulted in a decoupling of several important aspects, including loop unrolling, register promotion, and instruction reordering. -
Efficient Symbolic Analysis for Optimizing Compilers*
Efficient Symbolic Analysis for Optimizing Compilers? Robert A. van Engelen Dept. of Computer Science, Florida State University, Tallahassee, FL 32306-4530 [email protected] Abstract. Because most of the execution time of a program is typically spend in loops, loop optimization is the main target of optimizing and re- structuring compilers. An accurate determination of induction variables and dependencies in loops is of paramount importance to many loop opti- mization and parallelization techniques, such as generalized loop strength reduction, loop parallelization by induction variable substitution, and loop-invariant expression elimination. In this paper we present a new method for induction variable recognition. Existing methods are either ad-hoc and not powerful enough to recognize some types of induction variables, or existing methods are powerful but not safe. The most pow- erful method known is the symbolic differencing method as demonstrated by the Parafrase-2 compiler on parallelizing the Perfect Benchmarks(R). However, symbolic differencing is inherently unsafe and a compiler that uses this method may produce incorrectly transformed programs without issuing a warning. In contrast, our method is safe, simpler to implement in a compiler, better adaptable for controlling loop transformations, and recognizes a larger class of induction variables. 1 Introduction It is well known that the optimization and parallelization of scientific applica- tions by restructuring compilers requires extensive analysis of induction vari- ables and dependencies -
Using Static Single Assignment SSA Form
Using Static Single Assignment SSA form Last Time B1 if (. ) B1 if (. ) • Basic definition, and why it is useful j j B2 x ← 5 B3 x ← 3 B2 x0 ← 5 B3 x1 ← 3 • How to build it j j B4 y ← x B4 x2 ← !(x0,x1) Today y ← x2 • Loop Optimizations – Induction variables (standard vs. SSA) – Loop Invariant Code Motion (SSA based) CS502 Using SSA 1 CS502 Using SSA 2 Loop Optimization Classical Loop Optimizations Loops are important, they execute often • typically, some regular access pattern • Loop Invariant Code Motion regularity ⇒ opportunity for improvement • Induction Variable Recognition repetition ⇒ savings are multiplied • Strength Reduction • assumption: loop bodies execute 10depth times • Linear Test Replacement • Loop Unrolling CS502 Using SSA 3 CS502 Using SSA 4 Other Loop Optimizations Loop Invariant Code Motion Other Loop Optimizations • Build the SSA graph semi-pruned • Scalar replacement • Need insertion of !-nodes: If two non-null paths x →+ z and y →+ z converge at node z, and nodes x • Loop Interchange and y contain assignments to t (in the original program), then a !-node for t must be inserted at z (in the new program) • Loop Fusion and t must be live across some basic block Simple test: • Loop Distribution If, for a statement s ≡ [x ← y ⊗ z], none of the operands y,z refer to a !-node or definition inside the loop, then • Loop Skewing Transform: assign the invariant computation a new temporary name, t ← y ⊗ z, move • Loop Reversal it to the loop pre-header, and assign x ← t. CS502 Using SSA 5 CS502 Using SSA 6 Loop Invariant Code -
Loop Transformations and Parallelization
Loop transformations and parallelization Claude Tadonki LAL/CNRS/IN2P3 University of Paris-Sud [email protected] December 2010 Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Introduction Most of the time, the most time consuming part of a program is on loops. Thus, loops optimization is critical in high performance computing. Depending on the target architecture, the goal of loops transformations are: improve data reuse and data locality efficient use of memory hierarchy reducing overheads associated with executing loops instructions pipeline maximize parallelism Loop transformations can be performed at different levels by the programmer, the compiler, or specialized tools. At high level, some well known transformations are commonly considered: loop interchange loop (node) splitting loop unswitching loop reversal loop fusion loop inversion loop skewing loop fission loop vectorization loop blocking loop unrolling loop parallelization Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Dependence analysis Extract and analyze the dependencies of a computation from its polyhedral model is a fundamental step toward loop optimization or scheduling. Definition For a given variable V and given indexes I1, I2, if the computation of X(I1) requires the value of X(I2), then I1 ¡ I2 is called a dependence vector for variable V . Drawing all the dependence vectors within the computation polytope yields the so-called dependencies diagram. Example The dependence vectors are (1; 0); (0; 1); (¡1; 1). Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Scheduling Definition The computation on the entire domain of a given loop can be performed following any valid schedule.A timing function tV for variable V yields a valid schedule if and only if t(x) > t(x ¡ d); 8d 2 DV ; (1) where DV is the set of all dependence vectors for variable V . -
Compiler-Based Code-Improvement Techniques
Compiler-Based Code-Improvement Techniques KEITH D. COOPER, KATHRYN S. MCKINLEY, and LINDA TORCZON Since the earliest days of compilation, code quality has been recognized as an important problem [18]. A rich literature has developed around the issue of improving code quality. This paper surveys one part of that literature: code transformations intended to improve the running time of programs on uniprocessor machines. This paper emphasizes transformations intended to improve code quality rather than analysis methods. We describe analytical techniques and specific data-flow problems to the extent that they are necessary to understand the transformations. Other papers provide excellent summaries of the various sub-fields of program analysis. The paper is structured around a simple taxonomy that classifies transformations based on how they change the code. The taxonomy is populated with example transformations drawn from the literature. Each transformation is described at a depth that facilitates broad understanding; detailed references are provided for deeper study of individual transformations. The taxonomy provides the reader with a framework for thinking about code-improving transformations. It also serves as an organizing principle for the paper. Copyright 1998, all rights reserved. You may copy this article for your personal use in Comp 512. Further reproduction or distribution requires written permission from the authors. 1INTRODUCTION This paper presents an overview of compiler-based methods for improving the run-time behavior of programs — often mislabeled code optimization. These techniques have a long history in the literature. For example, Backus makes it quite clear that code quality was a major concern to the implementors of the first Fortran compilers [18]. -
Efpgas : Architectural Explorations, System Integration & a Visionary Industrial Survey of Programmable Technologies Syed Zahid Ahmed
eFPGAs : Architectural Explorations, System Integration & a Visionary Industrial Survey of Programmable Technologies Syed Zahid Ahmed To cite this version: Syed Zahid Ahmed. eFPGAs : Architectural Explorations, System Integration & a Visionary Indus- trial Survey of Programmable Technologies. Micro and nanotechnologies/Microelectronics. Université Montpellier II - Sciences et Techniques du Languedoc, 2011. English. tel-00624418 HAL Id: tel-00624418 https://tel.archives-ouvertes.fr/tel-00624418 Submitted on 16 Sep 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Université Montpellier 2 (UM2) École Doctorale I2S LIRMM (Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier) Domain: Microelectronics PhD thesis report for partial fulfillment of requirements of Doctorate degree of UM2 Thesis conducted in French Industrial PhD (CIFRE) framework between: Menta & LIRMM lab (Dec.2007 – Feb. 2011) in Montpellier, FRANCE “eFPGAs: Architectural Explorations, System Integration & a Visionary Industrial Survey of Programmable Technologies” eFPGAs: Explorations architecturales, integration système, et une enquête visionnaire industriel des technologies programmable by Syed Zahid AHMED Presented and defended publically on: 22 June 2011 Jury: Mr. Guy GOGNIAT Prof. at STICC/UBS (Lorient, FRANCE) President Mr. Habib MEHREZ Prof. at LIP6/UPMC (Paris, FRANCE) Reviewer Mr. -
[email protected]
Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://hartenstein.de viewgraph downloading: link found in 60 Semester Informatik I http://kressarray.de Kritik an der Praktischen Informatik Xputer Lab University of Kaiserslautern (in der Lehre) Festkolloquium Universität Dortmund, 18. – 19. Juli 2002 • mißbraucht ihre Zweidrittel-Mehrheit • hält die Prägungsphase strikt „procedural-only“ • Absolventen sind daher völlig unvorbereitet für Reiner Hartenstein die nahe Zukunft Data-Stream-based Computing: Universität – Wo >90% der Anwendungen für eingebettete Kaiserslautern Antimaterie der Kern-Informatik Systeme implementiert werden – Wie für 2010 vorhergesagt • nur wenige % des Kurrikulum wären zu ändern • meine Mission: Sie hierfür zu gewinnen © 2002, [email protected] 2 http://KressArray.de Kritik an der Technischen Informatik, TI die Kern-Informatik: jung ? dynamisch ? University of Kaiserslautern (klassischer Art) University of Kaiserslautern .. ist nach >10 Technologie-Generationen ... • diese ist noch immer weit verbreitet das von Neumann Paradigma .... • keine Vorbereitung auf die heutige Arbeitswelt • 1th 4004 ... der vN Mikroprozessor • 2nd 8008 ... noch immer ist ein Methusalem ... • Indizien: Begriffe wie „Rechnerorganisation“, • 3rd 8086 die vorherrschende „Rechnerstrukturen, “„Rechnerarchitektur“ • 4th 80286 Doktrin • 5th 80386 ... die Dampfmaschine • vN-only, alles andere wird konsequent verschwiegen • 6th 80486 des Silizium-Zeitalters die Mikroelektronik • 7th P5 (Pentium) • Paradebeispiel: