ADVERTIMENT. Lʼaccés Als Continguts Dʼaquesta Tesi Queda Condicionat a Lʼacceptació De Les Condicions Dʼús Establertes Pe

Total Page:16

File Type:pdf, Size:1020Kb

ADVERTIMENT. Lʼaccés Als Continguts Dʼaquesta Tesi Queda Condicionat a Lʼacceptació De Les Condicions Dʼús Establertes Pe ADVERTIMENT. Lʼaccés als continguts dʼaquesta tesi queda condicionat a lʼacceptació de les condicions dʼús establertes per la següent llicència Creative Commons: http://cat.creativecommons.org/?page_id=184 ADVERTENCIA. El acceso a los contenidos de esta tesis queda condicionado a la aceptación de las condiciones de uso establecidas por la siguiente licencia Creative Commons: http://es.creativecommons.org/blog/licencias/ WARNING. The access to the contents of this doctoral thesis it is limited to the acceptance of the use conditions set by the following Creative Commons license: https://creativecommons.org/licenses/?lang=en David Castells Rufas Scalable Parallel Architectures on Reconfigurable Platforms Ph.D. Thesis, David Castells Rufas Department of Microelectronics and Electronic Systems Universitat Autònoma de Barcelona December 2015 Scalable Parallel Architectures on Reconfigurable Platforms Ph.D. Thesis Dissertation Author: Advisor: David Castells i Rufas Jordi Carrabina i Bordoll Universitat Autònoma de Barcelona December 2015 Bellaterra, Catalonia I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Dr. Jordi Carrabina i Bordoll This work was carried out at the Microelectronics and Electronic Systems Department of the Universitat Autònoma de Barcelona In fulfillment of the requirements for the degree of Doctorat en Informàtica © 2015 David Castells i Rufas Some Rights Reserved. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. ─ 3 ─ ─ 4 ─ Abstract The latest advances on reconfigurable systems resulted in FPGA devices with enough resources to implement multiprocessing systems with more than 100 soft-cores on a single FPGA. In this thesis I present how to implement solutions based on such systems and two scenarios where those kind of systems would be valuable: hard real-time systems and energy efficient HPC systems. FPGAs are increasingly being used in HPC computing as they can bring higher energy efficiency than other alternative architectures such as CPUs or GPGPUs. However, the programmability barrier imposed by HDL languages is a serious drawback that limits their adoption. The recent introduction of OpenCL toolchains for FPGAs is a step towards lowering the barrier, but still produces application-specific designs that cannot be reused for other general-purpose computing unless the platform is reconfigured. Building many-soft-cores solutions using simple hardware accelerators can bring performance and energy efficiency and still provide a general purpose system to be used for different applications. To make it possible the current FPGA toolchains has been complemented to add support for fast many-soft-core platform description and verification (especially at higher levels of abstraction), to provide convenient parallel programming models and efficient methods to optimize the designs. On the other hand, I show how the same methods can be applied to implement systems using a task isolation approach to meet the requirements of a hard real-time safety critical system. Finally, the application of these technologies and optimization methods are used to efficiently implement both industrial and computational research problems. ─ 5 ─ Resum Els últims avenços en els sistemes reconfigurables han fet possible que les darreres famílies de FPGAs tinguin prou recursos per permetre la implementació de sistemes multiprocessadors utilitzant més de 100 nuclis (soft-core) en una única FPGA. En aquesta tesis presento com implementar solucions basades en aquests sistemes i dos escenaris on aquests sistemes tenen interès: els sistemes de temps real dur i els sistemes d’altes prestacions eficients energèticament. Les FPGAs s’utilitzen cada vegada més en la computació d’altes prestacions ja que poden arribar a oferir més eficiència energètica que altres arquitectures com ara CPUs o GPGPUs. Tanmateix, la baixa programabilitat d’aquests sistemes, fruit de l’ús de llenguatges HDL és un inconvenient que limita la seva adopció. La introducció recent de fluxos de disseny per FPGA basats en OpenCL és un avenç per reduir la dificultat de programació, però malgrat tot, produeix arquitectures de propòsit específic que no poden reutilitzar-se per altres aplicacions a no ser que es reconfiguri la plataforma. Construint many-soft-cores que utilitzin acceleradors simples, es poden aconseguir prestacions i eficiència energètica i tot i així, encara disposar d’un sistema útil per executar altres aplicacions. Per fer-ho possible les eines de disseny actual per FPGA s’han complementat per afegir suport per construir arquitectures many-soft- core, proporcionar models de programació paral·lels i mètodes eficients per analitzar i optimitzar els dissenys. Per altra banda, es mostra com els mateixos mètodes es poden utilitzar per desenvolupar un sistema de temps real i alts nivells de seguretat mitjançant l’aïllament de tasques en els diferents nuclis del sistema. ─ 6 ─ Enthusiasm is followed by disappointment and even depression, and then by renewed enthusiasm. Murray Gell-Mann ─ 7 ─ ─ 8 ─ Acknowledgments When I started my particular journey to Ithaca I prayed that the road would be long. Unfortunately, it looks as if gods were listening me… I want express my deepest gratitude to all the people that encouraged me, that helped me, or contributed (no matter how) to make this long journey enjoyable and fun. To my daughters Anna and Carla. For being lovely…most of the time. Anyway, I love you… all the time. I am happy to finish before you get to College! I will play more with you from now on! To my ex-wife Mar. I probably devoted too much time to this, I’m sorry…and thanks for the many times you’ve expected a “thanks” coming out of my mouth that has never arrived. To my parents, my brother and his family. Thank you for being there whenever I need help. Quite often lately. To Txell. Thank you for the good times we have shared. I swear I am going to reach my toes one day. To Disposats: Àngels, Carles, Marga, Miquel, Paquito, Montse, and Rita. I miss those dinners and adventures. To my advisor Jordi Carrabina. Thanks for encouraging me to start, and not falling into despair during this last sprint. I hope you never have another Ph.D. student like me! …and this is not a self-compliment! To Eloi. Thank you for the coffees, discussions, and everything that helped me think that I was not travelling a lonely road. I hope our roads can cross again in the future. To Carme. I wish you much luck in your [shorter] journey! To Albert. For not quitting smoking for some time, and for being always ready to roll up your sleeves if needed. Lab is empty without you, man! I wish we can climb together again soon! To Jaume Joven. I had much fun working together. ─ 9 ─ To Eduard. For liking disguising! All the best with you growing family! To Joan Plana. …for our discussions about the meaning of life and for, too often, paying my lunch. Man, I will be the one inviting next time! To Josep Mesado and Martí Bonamusa. For continually asking me about some Drivers… I do not know what you are talking about! I sincerely wish that you have the success you deserve. All the best! To colleagues from UAB, especially Lluis Terés, Toni Espinosa, Juan Carlos Moure, and Emilio Luque. Thanks for sharing wisdom, thoughts and experiences. To Toni Portero and David Novo. It’s long time since we worked together now. But I’m still convinced that you are great people to have around. To David Marin. I admire your resolution and British style. To Josep Besora. Be prepared! I am going to beat you in the Tennis court someday. To my “Flama’s fathers” friends, especially Pep Claret and Xavier Montanyà. Sharing drinks and laughs with you has been one of the most effective medicines to combat disappointment and depression. To Chit Yeoh Cheng. Thanks for your help and for sharing your Chinese wisdom. To Jose Duato, Greg Yeric, Rob Aitken, and Francky Catthoor, for your comments and valuable insight. To Takehiko Iida and David Womble, for answering my mails asking for estimations of the power consumption of former #1 top500 machines. To the Catalan, Spanish, and European taxpayers. Thanks for supporting my research. To the anonymous reviewers of my papers. You were always right. To the people that smile to strangers in the street. The world is better with you in it. ─ 10 ─ Contents Abstract ......................................................................................................................................... 5 Resum ............................................................................................................................................ 6 Acknowledgments ......................................................................................................................... 9 Contents....................................................................................................................................... 11 List of Acronyms ......................................................................................................................... 15 List of Figures ............................................................................................................................. 17 List of Tables ............................................................................................................................... 23 1. Introduction ........................................................................................................................
Recommended publications
  • VOLUME V INFORMATIQUE NON AMERICAINE Première Partie Par L' Ingénieur Général De L'armement BOUCHER Henri TABLE
    VOLUME V INFORMATIQUE NON AMERICAINE Première partie par l' Ingénieur Général de l'Armement BOUCHER Henri TABLE DES MATIERES INFORMATIQUE NON AMERICAINE Première partie 731 Informatique européenne (statistiques, exemples) 122 700 Histoire de l'Informatique allemande 1 701 Petits constructeurs 5 702 Les facturières de Kienzle Data System 16 703 Les minis de gestion de Nixdorf 18 704 Siemens & Halske AG 23 705 Systèmes informatiques d'origine allemande 38 706 Histoire de l'informatique britannique 40 707 Industriels anglais de l'informatique 42 708 Travaux des Laboratoires d' Etat 60 709 Travaux universitaires 63 710 Les coeurs synthétisables d' ARM 68 711 Computer Technology 70 712 Elliott Brothers et Elliott Automation 71 713 Les machines d' English Electric Company 74 714 Les calculateurs de Ferranti 76 715 Les études de General Electric Company 83 716 La patiente construction de ICL 85 Catalogue informatique – Volume E - Ingénieur Général de l'Armement Henri Boucher Page : 1/333 717 La série 29 d' ICL 89 718 Autres produits d' ICL, et fin 94 719 Marconi Company 101 720 Plessey 103 721 Systèmes en Grande-Bretagne 105 722 Histoire de l'informatique australienne 107 723 Informatique en Autriche 109 724 Informatique belge 110 725 Informatique canadienne 111 726 Informatique chinoise 116 727 Informatique en Corée du Sud 118 728 Informatique à Cuba 119 729 Informatique danoise 119 730 Informatique espagnole 121 732 Informatique finlandaise 128 733 Histoire de l'informatique française 130 734 La période héroïque : la SEA 140 735 La Compagnie
    [Show full text]
  • Increasing Memory Miss Tolerance for SIMD Cores
    Increasing Memory Miss Tolerance for SIMD Cores ∗ David Tarjan, Jiayuan Meng and Kevin Skadron Department of Computer Science University of Virginia, Charlottesville, VA 22904 {dtarjan, jm6dg,skadron}@cs.virginia.edu ABSTRACT that use a single instruction multiple data (SIMD) organi- Manycore processors with wide SIMD cores are becoming a zation can amortize the area and power overhead of a single popular choice for the next generation of throughput ori- frontend over a large number of execution backends. For ented architectures. We introduce a hardware technique example, we estimate that a 32-wide SIMD core requires called “diverge on miss” that allows SIMD cores to better about one fifth the area of 32 individual scalar cores. Note tolerate memory latency for workloads with non-contiguous that this estimate does not include the area of any intercon- memory access patterns. Individual threads within a SIMD nection network among the MIMD cores, which often grows “warp” are allowed to slip behind other threads in the same supra-linearly with the number of cores [18]. warp, letting the warp continue execution even if a subset of To better tolerate memory and pipeline latencies, many- core processors typically use fine-grained multi-threading, threads are waiting on memory. Diverge on miss can either 1 increase the performance of a given design by up to a factor switching among multiple warps, so that active warps can of 3.14 for a single warp per core, or reduce the number of mask stalls in other warps waiting on long-latency events. warps per core needed to sustain a given level of performance The drawback of this approach is that the size of the regis- from 16 to 2 warps, reducing the area per core by 35%.
    [Show full text]
  • SIMD Extensions
    SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel.
    [Show full text]
  • Reconfigurable Computing
    Reconfigurable Computing: A Survey of Systems and Software KATHERINE COMPTON Northwestern University AND SCOTT HAUCK University of Washington Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey, we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which reuse the configurable hardware during program execution. Categories and Subject Descriptors: A.1 [Introductory and Survey]; B.6.1 [Logic Design]: Design Style—logic arrays; B.6.3 [Logic Design]: Design Aids; B.7.1 [Integrated Circuits]: Types and Design Styles—gate arrays General Terms: Design, Performance Additional Key Words and Phrases: Automatic design, field-programmable, FPGA, manual design, reconfigurable architectures, reconfigurable computing, reconfigurable systems 1. INTRODUCTION of algorithms. The first is to use hard- wired technology, either an Application There are two primary methods in con- Specific Integrated Circuit (ASIC) or a ventional computing for the execution group of individual components forming a This research was supported in part by Motorola, Inc., DARPA, and NSF. K. Compton was supported by an NSF fellowship. S. Hauck was supported in part by an NSF CAREER award and a Sloan Research Fellowship.
    [Show full text]
  • Computer Hardware
    Computer Hardware MJ Rutter mjr19@cam Michaelmas 2014 Typeset by FoilTEX c 2014 MJ Rutter Contents History 4 The CPU 10 instructions ....................................... ............................................. 17 pipelines .......................................... ........................................... 18 vectorcomputers.................................... .............................................. 36 performancemeasures . ............................................... 38 Memory 42 DRAM .................................................. .................................... 43 caches............................................. .......................................... 54 Memory Access Patterns in Practice 82 matrixmultiplication. ................................................. 82 matrixtransposition . ................................................107 Memory Management 118 virtualaddressing .................................. ...............................................119 pagingtodisk ....................................... ............................................128 memorysegments ..................................... ............................................137 Compilers & Optimisation 158 optimisation....................................... .............................................159 thepitfallsofF90 ................................... ..............................................183 I/O, Libraries, Disks & Fileservers 196 librariesandkernels . ................................................197
    [Show full text]
  • Syllabus: EEL4930/5934 Reconfigurable Computing
    EEL4720/5721 Reconfigurable Computing (dual-listed course) Department of Electrical and Computer Engineering University of Florida Spring Semester 2019 Catalog Description: Prereq: EEL4712C or EEL5764 or consent of instructor. Fundamental concepts at advanced undergraduate level (EEL4720) and introductory graduate level (EEL5721) in reconfigurable computing (RC) based upon advanced technologies in field-programmable logic devices. Topics include general RC concepts, device architectures, design tools, metrics and kernels, system architectures, and application case studies. Credit Hours: 3 Prerequisites by Topic: Fundamentals of digital design including device technologies, design methodology and techniques, and design environments and tools; fundamentals of computer organization and architecture, including datapath and control structures, data formats, instruction-set principles, pipelining, instruction-level parallelism, memory hierarchy, and interconnects and interfacing. Instructor: Dr. Herman Lam Office: Benton Hall, Room 313 Office hours: TBA Telephone: (352) 392-2689 Email: [email protected] Teaching Assistant: Seyed Hashemi Office hours: TBA Email: [email protected] Class lectures: MWF 4th period, Larsen Hall 239 Required textbook: none References: . Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, edited by Scott Hauck and Andre DeHon, Elsevier, Inc. (Morgan Kaufmann Publishers), Amsterdam, 2008. ISBN: 978-0-12-370522-8 . C. Maxfield, The Design Warrior's Guide to FPGAs, Newnes, 2004, ISBN: 978-0750676045.
    [Show full text]
  • A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable
    A MATLAB Compiler For Distributed, Heterogeneous, Reconfigurable Computing Systems P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A. Jones, A. Kanhare A. Nayak, S. Periyacheri, M. Walkden, D. Zaretsky Electrical and Computer Engineering Northwestern University 2145 Sheridan Road Evanston, IL-60208 [email protected] Abstract capabilities and are coordinated to perform an application whose subtasks have diverse execution requirements. One Recently, high-level languages such as MATLAB have can visualize such systems to consist of embedded proces- become popular in prototyping algorithms in domains such sors, digital signal processors, specialized chips, and field- as signal and image processing. Many of these applications programmable gate arrays (FPGA) interconnected through whose subtasks have diverse execution requirements, often a high-speed interconnection network; several such systems employ distributed, heterogeneous, reconfigurable systems. have been described in [9]. These systems consist of an interconnected set of heteroge- A key question that needs to be addressed is how to map neous processing resources that provide a variety of archi- a given computation on such a heterogeneous architecture tectural capabilities. The objective of the MATCH (MATlab without expecting the application programmer to get into Compiler for Heterogeneous computing systems) compiler the low level details of the architecture or forcing him/her to project at Northwestern University is to make it easier for understand
    [Show full text]
  • Validated Products List, 1993 No. 2
    mmmmm NJST PUBLICATIONS NISTIR 5167 (Supersedes NISTIR 5103) Validated Products List 1993 No. 2 Programming Languages Database Language SQL Graphics GOSIP POSIX Judy B. Kailey Computer Security Editor U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology Computer Systems Laboratory Software Standards Validation Group Gaithersburg, MD 20899 April 1993 (Supersedes January 1993 issue) —QC 100 NIST . U56 #5167 1993 NISTIR 5167 (Supersedes NISTIR 5103) Validated Products List 1993 No. 2 Programming Languages Database Language SQL Graphics GOSIP POSIX Judy B. Kailey Computer Security Editor U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology Computer Systems Laboratory Software Standards Validation Group Gaithersburg, MD 20899 April 1993 (Supersedes January 1993 issue) U.S. DEPARTMENT OF COMMERCE Ronald H. Brown, Secretary NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY Raymond Kammer, Acting Director FOREWORD The Validated Products List is a collection of registers describing implementations of Federal Information Processing Standards (FIPS) that have been validated for conformance to FTPS. The Validated Products List also contains information about the organizations, test methods and procedures that support the validation programs for the FIPS identified in this document. The Validated Products List is updated quarterly. iii iv TABLE OF CONTENTS 1. INTRODUCTION 1 1.1 Purpose 1 1.2 Document Organization 2 1.2.1 Programming Languages 2 1.2.2 Database
    [Show full text]
  • Tousimojarad, Ashkan (2016) GPRM: a High Performance Programming Framework for Manycore Processors. Phd Thesis
    Tousimojarad, Ashkan (2016) GPRM: a high performance programming framework for manycore processors. PhD thesis. http://theses.gla.ac.uk/7312/ Copyright and moral rights for this thesis are retained by the author A copy can be downloaded for personal non-commercial research or study This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Glasgow Theses Service http://theses.gla.ac.uk/ [email protected] GPRM: A HIGH PERFORMANCE PROGRAMMING FRAMEWORK FOR MANYCORE PROCESSORS ASHKAN TOUSIMOJARAD SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy SCHOOL OF COMPUTING SCIENCE COLLEGE OF SCIENCE AND ENGINEERING UNIVERSITY OF GLASGOW NOVEMBER 2015 c ASHKAN TOUSIMOJARAD Abstract Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards in- creased parallelism. However, increased parallelism does not necessarily lead to better per- formance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general- purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM).
    [Show full text]
  • Openpiton: an Open Source Manycore Research Framework
    OpenPiton: An Open Source Manycore Research Framework Jonathan Balkind Michael McKeown Yaosheng Fu Tri Nguyen Yanqi Zhou Alexey Lavrov Mohammad Shahrad Adi Fuchs Samuel Payne ∗ Xiaohua Liang Matthew Matl David Wentzlaff Princeton University fjbalkind,mmckeown,yfu,trin,yanqiz,alavrov,mshahrad,[email protected], [email protected], fxiaohua,mmatl,[email protected] Abstract chipset Industry is building larger, more complex, manycore proces- sors on the back of strong institutional knowledge, but aca- demic projects face difficulties in replicating that scale. To Tile alleviate these difficulties and to develop and share knowl- edge, the community needs open architecture frameworks for simulation, synthesis, and software exploration which Chip support extensibility, scalability, and configurability, along- side an established base of verification tools and supported software. In this paper we present OpenPiton, an open source framework for building scalable architecture research proto- types from 1 core to 500 million cores. OpenPiton is the world’s first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton leverages the industry hardened OpenSPARC T1 core with modifica- Figure 1: OpenPiton Architecture. Multiple manycore chips tions and builds upon it with a scratch-built, scalable uncore are connected together with chipset logic and networks to creating a flexible, modern manycore design. In addition, build large scalable manycore systems. OpenPiton’s cache OpenPiton provides synthesis and backend scripts for ASIC coherence protocol extends off chip. and FPGA to enable other researchers to bring their designs to implementation. OpenPiton provides a complete verifica- tion infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and has been widespread across the industry with manycore pro- is written in industry standard Verilog.
    [Show full text]
  • Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems »
    THESE DE DOCTORAT DE L’UNIVERSITÉ BRETAGNE SUD COMUE UNIVERSITE BRETAGNE LOIRE ÉCOLE DOCTORALE N° 601 Mathématiques et Sciences et Technologies de l'Information et de la Communication Spécialité : Électronique Par « Satyajit DAS » « Architecture and Programming Model Support For Reconfigurable Accelerators in Multi-Core Embedded Systems » Thèse présentée et soutenue à Lorient, le 4 juin 2018 Unité de recherche : Lab-STICC Thèse N° : 491 Rapporteurs avant soutenance : Composition du Jury : Michael HÜBNER Professeur, Ruhr-Universität François PÊCHEUX Professeur, Sorbonne Université Bochum Président (à préciser après la soutenance) Jari NURMI Professeur, Tampere University of Angeliki KRITIKAKOU Maître de conférences, Université Technology Rennes 1 Davide ROSSI Assistant professor, Université de Bologna Kevin MARTIN Maître de conférences, Université Bretagne Sud Directeur de thèse Philippe COUSSY Professeur, Université Bretagne Sud Co-directeur de thèse Luca BENINI Professeur, Université de Bologna Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems Satyajit Das 2018 iii ABSTRACT Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low as they perform one specific function and increasing the number of the accelerators in a system onchip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting several processing elements with word level granularity is a promising choice for programmable accelerator.
    [Show full text]
  • Up Cores T Est Folder Opencores Name Status Author Style / Clone
    _uP_cores_t opencores style / data inst repor com LUTs blk F tool MIPS clks/ KIPS src # src tool fltg Ha max max byte # # pipe start last status author FPGA top file doc reference note worthy comments date LUT? est folder name clone size size ter ment ALUT mults ram max ver /clk inst /LUT code files chain pt vd data inst adrs inst reg len year revis cray1 alpha Christopher Fenton cray1 64 16 kintex-7-3 James Brakefield13463 6 19 10 127 ## 14.7 6.00 1.0 56.6 verilog 46 cray_sys_topyes yes Y N 4M 4M N 512 2010 CRAY data sheets homebrew Cray1 www.chrisfenton.com/homebrew-cray-1a/ fpgammix stable Tommy Thorn MMIX 64 32 arria-2 James Brakefield11605 A 8 10 94 ## q13.1 1.50 4.0 3.0 system verilog2 core yes yes Y Y 4G 4G Y 256 288 2006 2008 clone of Knuth's MMIX micro-coded s1_core S1 Core stable Fabrizio Fazzino etal SPARC 64 32 kintex-7-3 James Brakefield52845 6 8 59 56 ## v14.1 2.00 1.0 2.1 verilog 136 s1_top yes yes Y N 4G 4G Y 32 2007 2012 SPARC data sheets reduced version of OpenSPARC T1 Vivado run microblaze proprietaryXilinx uBlaze 32 32 kintex-7 Xilinx 546 6 1 320 1.03 1.0 603.7 not avail yes yes opt 4G 4G Y 86 32 3 2002 www.xilinx.com/tools/microblaze.htmMicroBlaze MCS, smallest configuration70 configuration options, MMU optional ARM_Cortex_A9 ASIC ARM ARM a9 32 16 arria V altera 4500 A 1050 2.50 1.0 583.3 asic yes yes Y 4G 4G Y 80 16 10 2012 altera data sheets uses pro-rated LC area dual issue, includes fltg-pt & MMU & caches nios2 proprietaryAltera Nios II 32 32 stratix-5 Altera 895 A 310 1.13 1.0 389.7 not avail yes yes opt 4G 4G
    [Show full text]