FPGA Tools Study Process

Total Page:16

File Type:pdf, Size:1020Kb

FPGA Tools Study Process COMPARATIVE STUDY OF TOOL-FLOWS FOR RAPID PROTOTYPING OF SOFTWARE-DEFINED RADIO DIGITAL SIGNAL PROCESSING A dissertation submitted to the Department of Electrical Engineering, UNIVERSITY OF CAPE TOWN, in fulfilment of the requirements for the degree of Master of Science at the University of Cape Town by KHOBATHA SETETEMELA Supervised by : Town DR SIMON WINBERG Cape of University c University of Cape Town February 8, 2019 The copyright of this thesis vests in the author. No quotation from it or information derivedTown from it is to be published without full acknowledgement of the source. The thesis is to be used for private study or non- commercial research purposes Capeonly. of Published by the University of Cape Town (UCT) in terms of the non-exclusive license granted to UCT by the author. University Declaration I know the meaning of plagiarism and declare that all the work in this dissertation, save for that which is properly acknowledged and referenced, is my own. It is being submitted for the degree of Master of Science in Electrical Engineering at the University of Cape Town. This work has not been submitted before for any other degree or examination in any other university. Signature of Author: ............................ University of Cape Town Cape Town February 8, 2019 ABSTRACT This dissertation is a comparative study of tool-flows for rapid prototyping of SDR DSP operations on pro- grammable hardware platforms. The study is divided into two parts, focusing on high-level tool-flows for imple- menting SDR DSP operations on FPGA and GPU platforms respectively. In this dissertation, the term ‘tool-flow’ refers to a tool or a chain of tools that facilitate the mapping of an application description specified in a program- ming language into one or more programmable hardware platforms. High-level tool-flows use different techniques, such as high-level synthesis to allow the designer to specify the application from a high level of abstraction and achieve improved productivity without significant degradation in the design’s performance. SDR is an emerging communications technology that is driven by - among other factors – increasing demands for high-speed, interop- erable and versatile communications systems. The key idea in SDR is the need to implement as many as possible of the radio functions that were traditionally defined in fixed hardware, in software on programmable hardware processors instead. The most commonly used processors are based on complex parallel computing architectures in order to support the high-speed processing demands of SDR applications, and they include FPGAs, GPUs and multicore general-purpose processors (GPPs) and DSPs. The architectural complexity of these processors re- sults in a corresponding increase in programming methodologies which however impedes their wider adoption in suitable applications domains, including SDR DSP. In an effort to address this, a plethora of different high-level tool-flows have been developed. Several comparative studies of these tool-flows have been done to help – among other benefits – designers in choosing high-level tools to use. However, there are few studies that focus on SDR DSP operations, and most existing comparative studies are not based on well-defined comparison criteria. The approach implemented in this dissertation is to use a system engineering design process, firstly, to define the qualitative comparison criteria in the form of a specification for an ideal high-level SDR DSP tool-flow and, secondly, to implement a FIR filter case study in each of the tool-flows to enable a quantitative comparison in terms of programming effort and performance. The study considers Migen- and MyHDL-based open-source tool-flows for FPGA targets, and CUDA and Open Computing Language (OpenCL) for GPU targets. The ideal high-level SDR DSP tool-flow specification was defined and used to conduct a comparative study of the tools across three main design categories, which included high-level modelling, verification and implementation. For tool-flows tar- geting GPU platforms, the FIR case study was implemented using each of the tools; it was compiled, executed on a GPU server consisting of 2 GTX Titan-X GPUs and an Intel Core i7 GPP, and lastly profiled. The tools were moreover compared in terms of programming effort, memory transfers cost and overall operation time. With regard to tool-flows with FPGA targets, the FIR case study was developed by using each tool, and then implemented on a Xilinx 7 FPGA and compared in terms of programming effort, logic utilization and timing performance. ii ACKNOWLEDGEMENTS I would like acknowledge the NRF and South African Radio Astronomy Observatory (SARAO) for affording me the funding opportunity to complete my postgraduate studies. To my supervisors Dr Simon Winberg and Prof Michael Inggs, who believed in me from the beginning of this dissertation and who never ceased to encourage, guide and kindly nudge me forward when I could not see the finish line, I am heartily indebted. This dissertation would not be possible without your emotional and technical support. To my fellow students in the SDRG and RRSG research groups at UCT Valerie, Shaun and Matthew, for al- ways willing to discuss ideas and lend a hand, thank you. To my friends, Sibonelo Madlopha, Dr Lerato Mohapi and Potjo Nthama for your words of encouragement in due season which kept me going, thank you. To my parents, Make and Babe Dlamini and ‘Me’ Mpho, who supported me emotionally, spiritually and fi- nancially throughout this work, I shall forever be indebted to you. To my loving wife and best friend, Lomaswati Setetemela, thank you for the kindness, the loving care and the patience that you gave me. I cannot thank you enough. Now unto Him who is able to keep me from falling, and to present me faultless before the presence of His glory with exceeding joy, To the only wise God my Saviour, be glory and majesty, dominion and power, both now and ever. Amen. iii CONTENTS Abstract i Acknowledgements iii Contents iv List of Figures ix List of Tables xi List of Abbreviations xii 1 Introduction 1 1.1 Background . .1 1.1.1 Software Defined Radios . .2 1.1.2 Field Programmable Gate Arrays . .3 1.1.2.1 FPGA Hardware . .3 1.1.2.2 FPGA Configuration . .4 1.1.2.3 HDL Design Methodology for FPGAs . .4 1.1.2.4 High-Level Design Methodology for FPGAs . .5 1.1.3 Graphics Processing Units . .6 1.1.3.1 GPU Hardware . .6 1.1.3.2 High-Level Design Methodology for GPUs . .7 1.2 Problem Statement . .7 1.3 Objectives . .8 1.4 Dissemination Strategy . .9 1.4.1 Objectives and Approach . .9 1.4.2 Target Audiences and Plan . .9 1.5 Dissertation Overview . 11 2 Literature Review 14 2.1 Software Defined Radio . 15 2.1.1 Brief History of SDR Technology . 15 2.1.2 SDR Hardware Architecture . 16 2.1.2.1 The Radio Frequency Front-End Subsystem . 17 2.1.2.2 Analog-to-Digital Converters (ADCs) . 18 2.1.2.3 Digital-to-Analog Converters (DACs) . 19 2.1.2.4 Programmable Digital Signal Processing . 19 iv CONTENTS – v – 2.1.3 SDR DSP Waveforms Functionality Characteristics . 21 2.1.3.1 Computational Complexity . 22 2.1.3.2 Parallelism . 22 2.1.3.3 Data Independency . 23 2.1.3.4 Arithmetic Requirements . 24 2.1.4 Programmable DSP Hardware for SDR . 24 2.1.4.1 GPP . 24 2.1.4.2 DSP Processor . 24 2.1.4.3 FPGA . 25 2.1.4.4 GPU . 26 2.2 Field Programmable Gate Array . 26 2.2.1 Programming Architecture . 26 2.2.1.1 SRAM-based Programming Technology . 26 2.2.1.2 Flash Programming Technology . 27 2.2.1.3 Antifuse Programming Technology . 27 2.2.2 Routing Architectures . 28 2.2.2.1 Island-Style Routing Architecture . 28 2.2.2.2 Hierarchical Routing Architecture . 28 2.2.3 FPGA-based SDR DSP Platforms . 29 2.2.3.1 RHINO . 29 2.2.3.2 ROACH . 30 2.2.3.3 Ettus USRP Platform . 31 2.2.4 High-Level Programming Tools . 32 2.2.4.1 MyHDL . 32 2.2.4.2 Migen . 33 2.3 Graphics Processing Units . 33 2.3.1 Hardware Architecture . 33 2.3.2 GPU-based SDR DSP Platforms . 35 2.3.2.1 GPU with GPP . 35 2.3.2.2 GPU with FPGA and GPP . 36 2.3.3 High-Level Programming Tools . 37 2.3.3.1 CUDA . 37 2.3.3.2 OpenCL . 37 2.4 High-Level Tool-Flows Techniques . 38 2.4.1 Modelling . 38 2.4.1.1 Model-based . 38 2.4.1.2 Algorithmic . 38 2.4.1.3 System Level . 38 2.4.2 Code generation . 38 2.4.2.1 High-Level Synthesis . 38 2.4.2.2 Electronic System-Level Synthesis . 40 2.5 High-Level Tool-Flows in SDR . 41 2.5.1 Existing Tools . 41 2.5.1.1 GNU Radio . 41 2.5.1.2 OptiSDR . 41 2.5.2 Past Tool-Flow Surveys . 42 2.5.2.1 Performance Measurement . 42 2.5.2.2 Design Productivity Measurement . ..
Recommended publications
  • Co-Simulation Between Cλash and Traditional Hdls
    MASTER THESIS CO-SIMULATION BETWEEN CλASH AND TRADITIONAL HDLS Author: John Verheij Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Computer Architecture for Embedded Systems (CAES) Exam committee: Dr. Ir. C.P.R. Baaij Dr. Ir. J. Kuper Dr. Ir. J.F. Broenink Ir. E. Molenkamp August 19, 2016 Abstract CλaSH is a functional hardware description language (HDL) developed at the CAES group of the University of Twente. CλaSH borrows both the syntax and semantics from the general-purpose functional programming language Haskell, meaning that circuit de- signers can define their circuits with regular Haskell syntax. CλaSH contains a compiler for compiling circuits to traditional hardware description languages, like VHDL, Verilog, and SystemVerilog. Currently, compiling to traditional HDLs is one-way, meaning that CλaSH has no simulation options with the traditional HDLs. Co-simulation could be used to simulate designs which are defined in multiple lan- guages. With co-simulation it should be possible to use CλaSH as a verification language (test-bench) for traditional HDLs. Furthermore, circuits defined in traditional HDLs, can be used and simulated within CλaSH. In this thesis, research is done on the co-simulation of CλaSH and traditional HDLs. Traditional hardware description languages are standardized and include an interface to communicate with foreign languages. This interface can be used to include foreign func- tions, or to make verification and co-simulation possible. Because CλaSH also has possibilities to communicate with foreign languages, through Haskell foreign function interface (FFI), it is possible to set up co-simulation. The Verilog Procedural Interface (VPI), as defined in the IEEE 1364 standard, is used to set-up the communication and to control a Verilog simulator.
    [Show full text]
  • An Open-Source Python-Based Hardware Generation, Simulation
    An Open-Source Python-Based Hardware Generation, Simulation, and Verification Framework Shunning Jiang Christopher Torng Christopher Batten School of Electrical and Computer Engineering, Cornell University, Ithaca, NY { sj634, clt67, cbatten }@cornell.edu pytest coverage.py hypothesis ABSTRACT Host Language HDL We present an overview of previously published features and work (Python) (Verilog) in progress for PyMTL, an open-source Python-based hardware generation, simulation, and verification framework that brings com- FL DUT pelling productivity benefits to hardware design and verification. CL DUT generate Verilog synth RTL DUT PyMTL provides a natural environment for multi-level modeling DUT' using method-based interfaces, features highly parametrized static Sim FPGA/ elaboration and analysis/transform passes, supports fast simulation cosim ASIC and property-based random testing in pure Python environment, Test Bench Sim and includes seamless SystemVerilog integration. Figure 1: PyMTL’s workflow – The designer iteratively refines the hardware within the host Python language, with the help from 1 INTRODUCTION pytest, coverage.py, and hypothesis. The same test bench is later There have been multiple generations of open-source hardware reused for co-simulating the generated Verilog. FL = functional generation frameworks that attempt to mitigate the increasing level; CL = cycle level; RTL = register-transfer level; DUT = design hardware design and verification complexity. These frameworks under test; DUT’ = generated DUT; Sim = simulation. use a high-level general-purpose programming language to ex- press a hardware-oriented declarative or procedural description level (RTL), along with verification and evaluation using Python- and explicitly generate a low-level HDL implementation. Our pre- based simulation and the same TB.
    [Show full text]
  • Intro to Programmable Logic and Fpgas
    CS 296-33: Intro to Programmable Logic and FPGAs ADEL EJJEH UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGN © Adel Ejjeh, UIUC, 2015 2 Digital Logic • In CS 233: • Logic Gates • Build Logic Circuits • Sum of Products ?? F = (A’.B)+(B.C)+(A.C’) A B Black F Box C © Adel Ejjeh, UIUC, 2015 3 Programmable Logic Devices (PLDs) PLD PLA PAL CPLD FPGA (Programmable (Programmable (Complex PLD) (Field Prog. Logic Array) Array Logic) Gate Array) •2-level structure of •Enhanced PLAs •For large designs •Has a much larger # of AND-OR gates with reduced costs •Collection of logic blocks with programmable multiple PLDs with •Larger interconnection connections an interconnection networK structure •Largest manufacturers: Xilinx - Altera Slide taken from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 4 Combinational Programmable Logic Devices PLAs, CPLDs © Adel Ejjeh, UIUC, 2015 5 Programmable Logic Arrays (PLAs) • 2-level AND-OR device • Programmable connections • Used to generate SOP • Ex: 4x3 PLA Slide adapted from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 6 PLAs contd • O1 = I1.I2’ + I4.I3’ • O2 = I2.I3.I4’ + I4.I3’ • O3 = I1.I2’ + I2.I1’ Slide adapted from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 7 Programmable Array Logic (PALs) • More Versatile than PLAs • User Programmable AND array followed by fixed OR gates • Flip-flops/Buffers with feedback transforming output ports into I/O ports © Adel Ejjeh, UIUC, 2015 8 Complex PLDs (CPLD) • Programmable PLD blocks (PALs) I/O Block I/O I/O
    [Show full text]
  • A Pythonic Approach for Rapid Hardware Prototyping and Instrumentation
    A Pythonic Approach for Rapid Hardware Prototyping and Instrumentation John Clow, Georgios Tzimpragos, Deeksha Dangwal, Sammy Guo, Joseph McMahan and Timothy Sherwood University of California, Santa Barbara, CA, 93106 USA Email: fjclow, gtzimpragos, deeksha, sguo, jmcmahan, [email protected] Abstract—We introduce PyRTL, a Python embedded hardware To achieve these goals, PyRTL intentionally restricts users design language that helps concisely and precisely describe to a set of reasonable digital design practices. PyRTL’s small digital hardware structures. Rather than attempt to infer a and well-defined internal core structure makes it easy to add good design via HLS, PyRTL provides a wrapper over a well- defined “core” set of primitives in a way that empowers digital new functionality that works across every design, including hardware design teaching and research. The proposed system logic transforms, static analysis, and optimizations. Features takes advantage of the programming language features of Python such as elaboration-through-execution (e.g. introspection), de- to allow interesting design patterns to be expressed succinctly, and sign and simulation without leaving Python, and the option encourage the rapid generation of tooling and transforms over to export to, or import from, common HDLs (Verilog-in via a custom intermediate representation. We describe PyRTL as a language, its core semantics, the transform generation interface, Yosys [1] and BLIF-in, Verilog-out) are also supported. More and explore its application to several different design patterns and information about PyRTL’s high level flow can be found in analysis tools. Also, we demonstrate the integration of PyRTL- Figure 1. generated hardware overlays into Xilinx PYNQ platform.
    [Show full text]
  • Nanoelectronic Mixed-Signal System Design
    Nanoelectronic Mixed-Signal System Design Saraju P. Mohanty Saraju P. Mohanty University of North Texas, Denton. e-mail: [email protected] 1 Contents Nanoelectronic Mixed-Signal System Design ............................................... 1 Saraju P. Mohanty 1 Opportunities and Challenges of Nanoscale Technology and Systems ........................ 1 1 Introduction ..................................................................... 1 2 Mixed-Signal Circuits and Systems . .............................................. 3 2.1 Different Processors: Electrical to Mechanical ................................ 3 2.2 Analog Versus Digital Processors . .......................................... 4 2.3 Analog, Digital, Mixed-Signal Circuits and Systems . ........................ 4 2.4 Two Types of Mixed-Signal Systems . ..................................... 4 3 Nanoscale CMOS Circuit Technology . .............................................. 6 3.1 Developmental Trend . ................................................... 6 3.2 Nanoscale CMOS Alternative Device Options ................................ 6 3.3 Advantage and Disadvantages of Technology Scaling . ........................ 9 3.4 Challenges in Nanoscale Design . .......................................... 9 4 Power Consumption and Leakage Dissipation Issues in AMS-SoCs . ................... 10 4.1 Power Consumption in Various Components in AMS-SoCs . ................... 10 4.2 Power and Leakage Trend in Nanoscale Technology . ........................ 10 4.3 The Impact of Power Consumption
    [Show full text]
  • Review of FPD's Languages, Compilers, Interpreters and Tools
    ISSN 2394-7314 International Journal of Novel Research in Computer Science and Software Engineering Vol. 3, Issue 1, pp: (140-158), Month: January-April 2016, Available at: www.noveltyjournals.com Review of FPD'S Languages, Compilers, Interpreters and Tools 1Amr Rashed, 2Bedir Yousif, 3Ahmed Shaban Samra 1Higher studies Deanship, Taif university, Taif, Saudi Arabia 2Communication and Electronics Department, Faculty of engineering, Kafrelsheikh University, Egypt 3Communication and Electronics Department, Faculty of engineering, Mansoura University, Egypt Abstract: FPGAs have achieved quick acceptance, spread and growth over the past years because they can be applied to a variety of applications. Some of these applications includes: random logic, bioinformatics, video and image processing, device controllers, communication encoding, modulation, and filtering, limited size systems with RAM blocks, and many more. For example, for video and image processing application it is very difficult and time consuming to use traditional HDL languages, so it’s obligatory to search for other efficient, synthesis tools to implement your design. The question is what is the best comparable language or tool to implement desired application. Also this research is very helpful for language developers to know strength points, weakness points, ease of use and efficiency of each tool or language. This research faced many challenges one of them is that there is no complete reference of all FPGA languages and tools, also available references and guides are few and almost not good. Searching for a simple example to learn some of these tools or languages would be a time consuming. This paper represents a review study or guide of almost all PLD's languages, interpreters and tools that can be used for programming, simulating and synthesizing PLD's for analog, digital & mixed signals and systems supported with simple examples.
    [Show full text]
  • Myhdl Manual Release 0.11
    MyHDL manual Release 0.11 Jan Decaluwe April 10, 2020 Contents 1 Overview 1 2 Background information3 2.1 Prerequisites.......................................3 2.2 A small tutorial on generators.............................3 2.3 About decorators.....................................4 3 Introduction to MyHDL7 3.1 A basic MyHDL simulation...............................7 3.2 Signals and concurrency.................................8 3.3 Parameters, ports and hierarchy............................9 3.4 Terminology review................................... 11 3.5 Some remarks on MyHDL and Python........................ 12 3.6 Summary and perspective................................ 12 4 Hardware-oriented types 13 4.1 The intbv class..................................... 13 4.2 Bit indexing........................................ 14 4.3 Bit slicing......................................... 15 4.4 The modbv class..................................... 17 4.5 Unsigned and signed representation.......................... 18 5 Structural modeling 19 5.1 Introduction........................................ 19 5.2 Conditional instantiation................................ 19 5.3 Converting between lists of signals and bit vectors................. 21 5.4 Inferring the list of instances.............................. 22 6 RTL modeling 23 6.1 Introduction........................................ 23 6.2 Combinatorial logic................................... 23 6.3 Sequential logic...................................... 25 6.4 Finite State Machine modeling............................
    [Show full text]
  • Small Soft Core up Inventory ©2019 James Brakefield Opencore and Other Soft Core Processors Reverse-U16 A.T
    tool pip _uP_all_soft opencores or style / data inst repor com LUTs blk F tool MIPS clks/ KIPS ven src #src fltg max max byte adr # start last secondary web status author FPGA top file chai e note worthy comments doc SOC date LUT? # inst # folder prmary link clone size size ter ents ALUT mults ram max ver /inst inst /LUT dor code files pt Hav'd dat inst adrs mod reg year revis link n len Small soft core uP Inventory ©2019 James Brakefield Opencore and other soft core processors reverse-u16 https://github.com/programmerby/ReVerSE-U16stable A.T. Z80 8 8 cylcone-4 James Brakefield11224 4 60 ## 14.7 0.33 4.0 X Y vhdl 29 zxpoly Y yes N N 64K 64K Y 2015 SOC project using T80, HDMI generatorretro Z80 based on T80 by Daniel Wallner copyblaze https://opencores.org/project,copyblazestable Abdallah ElIbrahimi picoBlaze 8 18 kintex-7-3 James Brakefieldmissing block622 ROM6 217 ## 14.7 0.33 2.0 57.5 IX vhdl 16 cp_copyblazeY asm N 256 2K Y 2011 2016 wishbone extras sap https://opencores.org/project,sapstable Ahmed Shahein accum 8 8 kintex-7-3 James Brakefieldno LUT RAM48 or block6 RAM 200 ## 14.7 0.10 4.0 104.2 X vhdl 15 mp_struct N 16 16 Y 5 2012 2017 https://shirishkoirala.blogspot.com/2017/01/sap-1simple-as-possible-1-computer.htmlSimple as Possible Computer from Malvinohttps://www.youtube.com/watch?v=prpyEFxZCMw & Brown "Digital computer electronics" blue https://opencores.org/project,bluestable Al Williams accum 16 16 spartan-3-5 James Brakefieldremoved clock1025 constraint4 63 ## 14.7 0.67 1.0 41.1 X verilog 16 topbox web N 4K 4K N 16 2 2009
    [Show full text]
  • Contributions of Hybrid Architectures to Depth Imaging: a CPU, APU and GPU Comparative Study
    Contributions of hybrid architectures to depth imaging : a CPU, APU and GPU comparative study Issam Said To cite this version: Issam Said. Contributions of hybrid architectures to depth imaging : a CPU, APU and GPU com- parative study. Hardware Architecture [cs.AR]. Université Pierre et Marie Curie - Paris VI, 2015. English. NNT : 2015PA066531. tel-01248522v2 HAL Id: tel-01248522 https://tel.archives-ouvertes.fr/tel-01248522v2 Submitted on 20 May 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` DE DOCTORAT DE l’UNIVERSITE´ PIERRE ET MARIE CURIE sp´ecialit´e Informatique Ecole´ doctorale Informatique, T´el´ecommunications et Electronique´ (Paris) pr´esent´eeet soutenue publiquement par Issam SAID pour obtenir le grade de DOCTEUR en SCIENCES de l’UNIVERSITE´ PIERRE ET MARIE CURIE Apports des architectures hybrides `a l’imagerie profondeur : ´etude comparative entre CPU, APU et GPU Th`esedirig´eepar Jean-Luc Lamotte et Pierre Fortin soutenue le Lundi 21 D´ecembre 2015 apr`es avis des rapporteurs M. Fran¸cois Bodin Professeur, Universit´ede Rennes 1 M. Christophe Calvin Chef de projet, CEA devant le jury compos´ede M. Fran¸cois Bodin Professeur, Universit´ede Rennes 1 M.
    [Show full text]
  • GPGPU, 4Th Meeting
    www.Grid.org.il GPGPU, 4th Meeting Mordechai Butrashvily, CEO [email protected] GASS Company for Advanced Supercomputing Solutions www.Grid.org.il Agenda • 3rd meeting • 4th meeting • Future meetings • Activities All rights reserved (c) 2008 - Mordechai Butrashvily www.Grid.org.il 3rd meeting • Dr. Avi Mendelson presented Intel “Larrabee” architecture • Covered hardware details and design information All rights reserved (c) 2008 - Mordechai Butrashvily www.Grid.org.il 4th meeting • GPU computing with AMD (ATI) • StreamComputing programming • CAL.NET • FireStream platform • GPGPU for IT • Questions All rights reserved (c) 2008 - Mordechai Butrashvily www.Grid.org.il Future meetings • Software stacks and frameworks by NVIDIA and ATI: o CUDA - √ o StreamComputing - √ • Upcoming OpenCL standard • Developments and general talks about programming and hardware issues • More advanced topics • Looking for ideas All rights reserved (c) 2008 - Mordechai Butrashvily www.Grid.org.il Activities • Basis for a platform to exchange knowledge, ideas and information • Cooperation and collaborations between parties in the Israeli industry • Representing parties against commercial and international companies • Training, courses and meetings with leading companies All rights reserved (c) 2008 - Mordechai Butrashvily www.Grid.org.il AMD Hardware GPU Computing for programmers www.Grid.org.il AMD Hardware HD3870 HD4870 HD4870 X2 FirePro FireStream V8700 9250 Core# 320 800 1600 800 800 Tflops 0.5 1.2 2.4 1.2 1.2 Core Freq. 775 Mhz 750 Mhz 750 Mhz 750 Mhz 750 Mhz Memory 0.5 GB 1 GB 2 GB 1 GB 2 GB Bandwidth 72 GB/s 115 GB/s 115 GB/s 108 GB/s 108 GB/s Power 110 W 184 W 200 W 180 W 180 W Price 150$ 300$ 550$ 2000$ 1000$ www.Grid.org.il Stream Processor • For example, HD3870, 320 cores: • 4 SIMD engines • 16 thread processors each • 5 stream cores per thread www.Grid.org.il Stream Core www.Grid.org.il GPU Performance • ATI formula • 320 cores • Each runs at 775 Mhz • 1 MAD per cycle • FLOPS = Cores * Freq.
    [Show full text]
  • Heterogeneous Multi-Core Architectures for High Performance Computing
    Alma Mater Studiorum - University of Bologna ARCES - Advanced Research Center on Electronic Systems for Information and Communication Technologies E.De Castro PhD Course in Information Technology XXVI CYCLE - Scientific-Disciplinary sector ING-INF /01 Heterogeneous Multi-core Architectures for High Performance Computing Candidate: Advisors: Matteo Chiesi Prof. Roberto Guerrieri Prof. Eleonora Franchi Scarselli PhD Course Coordinator: Prof. Claudio Fiegna Final examination year: 2014 ii Abstract This thesis deals with low-cost heterogeneous architectures in standard workstation frameworks. Heterogeneous computer architectures represent an appealing alterna- tive to traditional supercomputers because they are based on commodity hardware components fabricated in large quantities. Hence their price- performance ratio is unparalleled in the world of high performance com- puting (HPC). In particular in this thesis, different aspects related to the performance and power consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel ap- plication, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing sys- tems. Finally it discusses a memory-bounded problem, where the execu- tion time is dominated by the memory latency. Specifically, the following main contributions have been carried out: • A novel framework for the design and analysis of solar field for Cen- tral Receiver Systems (CRS) has been developed. The implementa- tion based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accu- rate and fast simulation environment for studying mirror imperfec- tion and non-planar geometries [1].
    [Show full text]
  • Introduction Hardware Acceleration Philosophy Popular Accelerators In
    Special Purpose Accelerators Special Purpose Accelerators Introduction Recap: General purpose processors excel at various jobs, but are no Theme: Towards Reconfigurable High-Performance Computing mathftch for acce lera tors w hen dea ling w ith spec ilidtialized tas ks Lecture 4 Objectives: Platforms II: Special Purpose Accelerators Define the role and purpose of modern accelerators Provide information about General Purpose GPU computing Andrzej Nowak Contents: CERN openlab (Geneva, Switzerland) Hardware accelerators GPUs and general purpose computing on GPUs Related hardware and software technologies Inverted CERN School of Computing, 3-5 March 2008 1 iCSC2008, Andrzej Nowak, CERN openlab 2 iCSC2008, Andrzej Nowak, CERN openlab Special Purpose Accelerators Special Purpose Accelerators Hardware acceleration philosophy Popular accelerators in general Floating point units Old CPUs were really slow Embedded CPUs often don’t have a hardware FPU 1980’s PCs – the FPU was an optional add on, separate sockets for the 8087 coprocessor Video and image processing MPEG decoders DV decoders HD decoders Digital signal processing (including audio) Sound Blaster Live and friends 3 iCSC2008, Andrzej Nowak, CERN openlab 4 iCSC2008, Andrzej Nowak, CERN openlab Towards Reconfigurable High-Performance Computing Lecture 4 iCSC 2008 3-5 March 2008, CERN Special Purpose Accelerators 1 Special Purpose Accelerators Special Purpose Accelerators Mainstream accelerators today Integrated FPUs Realtime graphics GiGaming car ds Gaming physics
    [Show full text]