Review of Parallel Computing Methods and Tools for FPGA Technology

Total Page:16

File Type:pdf, Size:1020Kb

Review of Parallel Computing Methods and Tools for FPGA Technology Review of parallel computing methods and tools for FPGA technology Radoslaw Cieszewski, Maciej Linczuk, Krzysztof Pozniak, Ryszard Romaniuk Institute of Electronic Systems Warsaw University of Technology Nowowiejska 15/19,00-665 Warsaw, Poland May 20, 2013 ABSTRACT Parallel computing is emerging as an important area of research in computer architectures and software systems. Many algorithms can be greatly accelerated using parallel computing techniques. Specialized parallel computer architectures are used for accelerating specific tasks. High-Energy Physics Experiments measuring systems often use FPGAs for fine-grained computation. FPGA combines many benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. Creating parallel programs implemented in FPGAs is not trivial. This paper presents existing methods and tools for fine-grained computation implemented in FPGA using Behavioral Description and High Level Programming Languages. Keywords: Parallel Computing, Algorithmic Synthesis, High-Level Synthesis, Behavioral Synthesis, Elec- tronic System Level Synthesis, FPGA, ASIC, DSP, High-Energy Physics Experiment, KX1, Fine-Grained Par- allelism, Coarse-Grained Parallelism 1. INTRODUCTION Parallel computing is a form of computation in which many arithmetical, logical and input/output operations are processed simultaneously. Parallel computer is based on the principle that large problems can be divided into subtasks, which are then solved concurrently. There are many classes of parallel computers: - Symmetric Multiprocessing (SMP), - Multicore Computing, - Grid Computing, - Cluster Computing, - Parallel Computing on Graphics Processing Units (GPU), - Reconfigurable Computing on FPGA's, - Parallel Computing on ASIC's. FPGA has a set of hardware resources, including logic and routing, which function is controlled by on-chip configuration SRAM. Programming the SRAM, either at the start of an application or during execution, allows the hardware functionality to be configured and reconfigured. Such approach gives a possibility to implement different algorithms and applications on the same hardware. The principal difference, comparing to traditional microprocessors, is the ability to make substantial changes to logic and routing. This technique is termed "time-multiplexed hardware" or "run-time reconfigured hardware". In High-Energy Physics Experiments FPGAs are widely used for accelerating diagnostic algorithms 26, 28, 30, 32{35, 45{48 . The main goal of this article is to research efficient FPGA development tools with High Level Programming Language for High-Energy Physics Experiment domain. Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2013, edited by Ryszard S. Romaniuk, Proc. of SPIE Vol. 8903, 890321 © 2013 SPIE · CCC code: 0277-786X/13/$18 · doi: 10.1117/12.2035385 Proc. of SPIE Vol. 8903 890321-1 Outline The remainder of this article is organized as follows. Section 2 gives a theoretical base for parallel computing . The comparison of different tools and methods is presented in Section 3. Finally, Section 4 gives the conclusions. 2. BACKGROUND Traditionally, computer software has been written for serial computation. To solve the problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer. Only one instruction may be executed at a time. After that instruction is finished, the next is executed. Parallel computing uses multiple resources simultaneously to solve a problem. This is accomplished by mapping the algorithm into independent parts so that each part of the algorithm can be executed simultaneously with the others. Such parts in a parallel program are often called fibers, threads, processes or tasks. If the algorithm tasks communicate many times per second, algorithm exhibits fine-grained parallelism. If the algorithm subtasks communicate fewer times per second, algorithm exhibits coarse-grained parallelism. FPGAs are very flexible and can be programmed to accelerate fine-grained and coarse-grained applications. FGPAs are also scalable. Sometimes the algorithms are so complex that they can't be fit into one FPGA. Then, multi-FPGA systems are used. From the high-level view shown in Figure 1, the FPGA looks much like a network of processors49 . Processing Units ifr Interconnect 111 !!"'! Input /Output Figure 1. High-Level FPGA Abstraction An FPGA, however, differs from a conventional multiprocessor in several ways: - granularity of FPGAs have single bit processing elements, each of which is controlled independently. - instruction control of FPGAs are configurable with a single instruction resident per processing element. - interconnect of FPGAs is dynamic and can be reconfigured in time. 2.1 Amdahl's law and Gustafson's law Amdahl's law defines maximum possible speed-up of an algorithm on a parallel computer. Gustafson's law defines the speed-up with P processors. 1 1 S = = lim (1) α P !1 1 − α + α P S(P ) = P − α(P − 1) = α + P (1 − α) (2) α − fraction of running time a program spends on non-parallelizable parts, P - number of processors. These two formulas show that acceleration depends on algorithm. Due to the fact that sequential parts of code can't be parallelized the speed-up is independent of the number of processors. Gustafson's law assumes that speed-up parallel portion of code varies linearly with the number of processors. Proc. of SPIE Vol. 8903 890321-2 2.2 Parallelism level FPGA's computers can process computation: - at bit-level of parallelism, - using pipeline techniques, - at instruction-level of parallelism, - at data-level of parallelism, - at task-level of parallelism. The main advantage over conventional microprocessors is the ability to perform the calculation at the lowest level of parallelism - bit-level. Due to Flynn's taxonomy, FPGAs are classified as Multiple Instruction Multiple Data (MIMD) machines. 2.3 Memory and communication architectures Parallel computing defines two main memory models: - Shared Memory, - Distributed Memory, Shared memory refers to a block of memory that can be accessed by several different processing units in parallel computer. This memory may be simultaneously accessed by multiple tasks with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between tasks. In distributed memory systems, each processing unit has its own local memory and computation are done on local data. If remote data is required then the subtask must communicate with remote processing units. This means that remote processors are engaged in this operation adding some overheads. Hybrid architectures combine these two main approaches. Processing element has its own local memory and access to the memory on remote processors. System architecture has impact on choosing parallel programming models and tools presented in Chapter 3 . 2.4 Recent trends 2.4.1 Recent Trend of FPGA technologies FPGA technology has improved dramatically since the 1960s. Recent trends in FPGA technology are presented on Figure 245 . Proc. of SPIE Vol. 8903 890321-3 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 10000000 1 years j [count]I I VIRTEX-7, r r r V?RTEX VIRTEXV - ..- 6ST DC V 1000000 ARRIAVGX V IRTEX N T4Tftf 1ir ° STRAT STRATIXN (GX) KINTEX-7CYCLONEV AT II GX CYCLONEIII ARRIA II VIRTEXII- P ACCELERATOR.CYCL SmartFusion EN C ARTIX-7 100000 ,,r LatticeSC/M LatticeXCP3 LatticeECP2IM AR OX RT ProASIG3 STRATSSTRATS(GX) CYCLONE II SPARTAN3A oAltera SPARTAN 3EAlk LatticeXP2 Xilinx CYCLONE RTAX (DS P)_4XCELERATOR _ 10000 Actel *Lattice RTSX(SU) 1000 I Number of logical cell over the years 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 10000 r + years - r oAltera [count] 1 STRATSV - - - o Xilinx -r/IRTEX-7 Actel $TfATIX"Ñ(GX)yl OARRIAVGX V IRTEX 1 - - Lattice - KINTEX-7 1440 j(JRC - VIRTEXII PRO STRATA III ARRIAII ARRIAGX ARTS-7 CYCLONE I 1 °CYCLONEV STRATSII 0LatticeECP2/M CYCLONE Ni LatticeXCP3 STRATIXGX SmartFusion 100 ° () CYCLONE IICYCLONEIII RTAX (DS P) SPARTAN3A STRATIX SPARTAN 3E LatticeXP2 10 I Number of DSP block over the years 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 1000000 i + yearsi ]kB] O Altera Xilinx 100000 Actel VIRTE47-- r ... -- VJRE *Lattice V IRTEXVST,3$T.I,X1/ir} STRATIXV I VIRTEXN STD4Ill. r KINTEX-7 ARRIA V GX VIRTEX II PRO ¡TRT.lXfF 5TR 1IXII CYCLONEIII °CYCLONE V 10000 ""f RRIAIL L r ARRAGX ° Lattice=CP2! + Latti eXCP3 STRATIXSTRATIX(GX) LatticeSC/M g CYCLONE NGX A SmartFusion CYCLONEII SPARTAN 3A 1000 SPARTAN3E0 CYCLONE 11 ° RTAX,DSP}A RTProASIC3 AXCELERATOR I 100 r r SRAM capacity over the years 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 3000 j Iyearsr j VIRTEI X-7 KINTEX-7 - -- -- 2000 STRATIXII STRATXN GX [Mb/sil er--- OARRIAVGX -7-6STRAJXV -NVIRTEXV ARRA I18 STRATIXGX^ - `CYCLONE Al AXLEERATOR CYCLONEN 1000 -~ , ARTIX-7 SmartFusion- LatticeXCP3 N 800 a® VIRTEX LatticeSC/M -VIRTEX II PRO STRATAI ARRIAGX CYCLONE V 600-STRATS 'Ern. I CYCLONE RTAX 3A 500 DSP)-CYCLONEII-SPARTAN oAltera SPARTAN3E 400- OCYCLONEIII Xilinx 3 00 i - Actel LatticeXP2 Lattice RT ProAS IC3 200 UO blocks speed (LVDS) over the years 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 100000 Iyearsl [Mb/s] I I VIRTEX-7 STRA-IX V - o Altera Xilinx 10 Actel V GX Lattice - VIRTEX-6KINTEX-7 ANNA 10000 - * ST2ArTÍGX STRATIX N GX 1 SmartFusion r {V) ARTS-7 VIP,TEXV r--Ó VIRTEXN ARRIAGX ARRIA1I CYCLONEV STRATIXGX T LatticeXCP3 LatticeECP2IM0 VIRTEXII PRO CYCLONE N 1000 I I SERDES speed over the years Figure 2. Recent Trend of FPGA Technologies Proc. of SPIE Vol. 8903 890321-4 During the last few years, the of number of CLB, DSP blocks, SRAM capacity has increased extremely.
Recommended publications
  • Eee4120f Hpes
    Lecture 22 HDL Imitation Method, Benchmarking and Amdahl's for FPGAs Lecturer: HDL HDL Imitation Simon Winberg Amdahl’s for FPGA Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) HDL Imitation Method Using Standard Benchmarks for FPGAs Amdahl’s Law and FPGA or ‘C-before-HDL approach to starting HDL designs. An approach to ‘golden measures’ & quicker development void mymod module mymod (char* out, char* in) (output out, input in) { { out[0] = in[0]^1; out = in^1; } } The same method can work with Python, but C is better suited due to its typical use of pointer. This method can be useful in designing both golden measures and HDL modules in (almost) one go … It is mainly a means to validate that you algorithm is working properly, and to help get into a ‘thinking space’ suited for HDL. This method is loosely based on approaches for C→HDL automatic conversion (discussed later in the course) HDL Imitation approach using C C program: functions; variables; based on sequence (start to end) and the use of memory/registers operations VHDL / Verilog HDL: Implements an entity/module for the procedure C code converted to VHDL Standard C characteristics Memory-based Variables (registers) used in performing computation Normal C and C programs are sequential Specialized C flavours for parallel description & FPGA programming: Mitrion-C , SystemC , pC (IBM Parallel C) System Crafter, Impulse C , OpenCL FpgaC Open-source (http://fpgac.sourceforge.net/) – does generate VHDL/Verilog but directly to bit file Best to simplify this approach, where possible, to just one module at a time When you’re confident the HDL works, you could just leave the C version behind Getting a whole complex design together as both a C-imitating-HDL program and a true HDL implementation is likely not viable (as it may be too much overhead to maintain) Example Task: Implement an countup module that counts up on target value, increasing its a counter value on each positive clock edge.
    [Show full text]
  • Review of FPD's Languages, Compilers, Interpreters and Tools
    ISSN 2394-7314 International Journal of Novel Research in Computer Science and Software Engineering Vol. 3, Issue 1, pp: (140-158), Month: January-April 2016, Available at: www.noveltyjournals.com Review of FPD'S Languages, Compilers, Interpreters and Tools 1Amr Rashed, 2Bedir Yousif, 3Ahmed Shaban Samra 1Higher studies Deanship, Taif university, Taif, Saudi Arabia 2Communication and Electronics Department, Faculty of engineering, Kafrelsheikh University, Egypt 3Communication and Electronics Department, Faculty of engineering, Mansoura University, Egypt Abstract: FPGAs have achieved quick acceptance, spread and growth over the past years because they can be applied to a variety of applications. Some of these applications includes: random logic, bioinformatics, video and image processing, device controllers, communication encoding, modulation, and filtering, limited size systems with RAM blocks, and many more. For example, for video and image processing application it is very difficult and time consuming to use traditional HDL languages, so it’s obligatory to search for other efficient, synthesis tools to implement your design. The question is what is the best comparable language or tool to implement desired application. Also this research is very helpful for language developers to know strength points, weakness points, ease of use and efficiency of each tool or language. This research faced many challenges one of them is that there is no complete reference of all FPGA languages and tools, also available references and guides are few and almost not good. Searching for a simple example to learn some of these tools or languages would be a time consuming. This paper represents a review study or guide of almost all PLD's languages, interpreters and tools that can be used for programming, simulating and synthesizing PLD's for analog, digital & mixed signals and systems supported with simple examples.
    [Show full text]
  • A Mobile Programmable Radio Substrate for Any‐Layer Measurement and Experimentation
    A Mobile Programmable Radio Substrate for Any‐layer Measurement and Experimentation Kuang-Ching Wang Department of Electrical and Computer Engineering Clemson University, Clemson, SC 29634 [email protected] Version 1.0 September 27 2010 This work is sponsored by NSF grant CNS-0940805. 1 A Mobile Programmable Radio Substrate for Any‐layer Measurement and Experimentation A Whitepaper Developed for GENI Subcontract #1740 Executive Summary Wireless networks have become increasingly pervasive in all aspects of the modern life. Higher capacity, ubiquitous coverage, and robust adaptive operation have been the key objectives for wireless communications and networking research. Latest researches have proposed a myriad of solutions for different layers of the protocol stack to enhance wireless network performance; these innovations, however, are hard to be validated together to study their combined benefits and implications due to the lack of a platform that can easily incorporate new protocols from any protocol layers to operate with the rest of the protocol stack. GENI’s mission is to create a highly programmable testbed for studying the future Internet. This whitepaper analyzes GENI’s potentials and requirements to support any-layer programmable wireless network experiments and measurements. To date, there is a clear divide between the research methodology for the lower layers, i.e., the physical (PHY) and link layers, and that for the higher layers, i.e., from link layer above. For lower layer research, software defined radio (SDR) based on PCs and/or FPGAs has been the technology of choice for programmable testbeds. For higher layer research, PCs equipped with a range of commercial-off-the-shelf (COTS) network interfaces and standard operating systems generally suffice.
    [Show full text]
  • Total Ionizing Dose Effects on Xilinx Field-Programmable Gate Arrays
    University of Alberta Total Ionizing Dose Effects on Xilinx Field-Programmable Gate Arrays Daniel Montgomery MacQueen O A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science Department of Physics Edmonton, Alberta FaIl 2000 National Library Bibliothèque nationale I*I of Canada du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 395 Wellington Street 345. nie Wellington Ottawa ON K1A ON4 Ottawa ON KIA ON4 Canada Canada The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant a la National Library of Canada to BLbiiothèque nationale du Canada de reproduce, han, distniute or seil reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette these sous paper or electronic formats. la forme de microfiche/nlm, de reproduction sur papier ou sur format PIectronique . The author retains ownership of the L'auteur conserve la propriété du copyxïght in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fi-om it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduit. sans son permission. autorisation. Abstract This thesis presents the results of radiation tests of Xilinx XC4036X se- ries Field Programmable Gate Arrays (FPGAs) . These radiation tests investigated the suitability of the XC403GX FPGAs as controllers for the ATLAS liquid argon calorimeter front-end boards. The FPGAs were irradiated with gamma rays from a cobalt-60 source at a average dose rate of 0.13 rad(Si)/s.
    [Show full text]
  • Altera Powerpoint Guidelines
    Ecole Numérique 2016 IN2P3, Aussois, 21 Juin 2016 OpenCL On FPGA Marc Gaucheron INTEL Programmable Solution Group Agenda FPGA architecture overview Conventional way of developing with FPGA OpenCL: abstracting FPGA away ALTERA BSP: abstracting FPGA development Live Demo Developing a Custom OpenCL BSP 2 FPGA architecture overview FPGA Architecture: Fine-grained Massively Parallel Millions of reconfigurable logic elements I/O Thousands of 20Kb memory blocks Let’s zoom in Thousands of Variable Precision DSP blocks Dozens of High-speed transceivers Multiple High Speed configurable Memory I/O I/O Controllers Multiple ARM© Cores I/O 4 FPGA Architecture: Basic Elements Basic Element 1-bit configurable 1-bit register operation (store result) Configured to perform any 1-bit operation: AND, OR, NOT, ADD, SUB 5 FPGA Architecture: Flexible Interconnect … Basic Elements are surrounded with a flexible interconnect 6 FPGA Architecture: Flexible Interconnect … … Wider custom operations are implemented by configuring and interconnecting Basic Elements 7 FPGA Architecture: Custom Operations Using Basic Elements 16-bit add 32-bit sqrt … … … Your custom 64-bit bit-shuffle and encode Wider custom operations are implemented by configuring and interconnecting Basic Elements 8 FPGA Architecture: Memory Blocks addr Memory Block data_out data_in 20 Kb Can be configured and grouped using the interconnect to create various cache architectures 9 FPGA Architecture: Memory Blocks addr Memory Block data_out data_in 20 Kb Few larger Can be configured and grouped using
    [Show full text]
  • System-Level Methods for Power and Energy Efficiency of Fpga-Based Embedded Systems
    ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library SYSTEM-LEVEL METHODS FOR POWER AND ENERGY EFFICIENCY OF FPGA-BASED EMBEDDED SYSTEMS PAWEŁ PIOTR CZAPSKI School of Computer Engineering A thesis submitted to the Nanyang Technological University in fulfillment of the requirement for the degree of Doctor of Philosophy 2010 i ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library To My Parents ii ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library Acknowledgements ACKNOWLEDGEMENTS I would like to express my sincere appreciation to my supervisor Associate Professor Andrzej Śluzek for his continuous interest, infinite patience, guidance, and constant encouragement that was motivating me during this research work. His vision and broad knowledge play an important role in the realization of the whole work. I acknowledge gratefully possibility to conduct this research at the Intelligent Systems Centre, the place with excellent working environment. I would also like to acknowledge the financial support that I received from the Nanyang Technological University and the Intelligent Systems Centre during my studies in Singapore. Finally, I would like to acknowledge my parents and my best friend Maciej for a constant help in these though moments. iii ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library Table of Contents TABLE OF CONTENTS Title Page i Acknowledgements iii Table of Contents iv List of Symbols vii List of Abbreviations x List of Figures xiii List of Tables xv Abstract xvii Chapter I Introduction 1 1.1.
    [Show full text]
  • SDCC Compiler User Guide
    SDCC Compiler User Guide SDCC 3.0.4 $Date:: 2011-08-17 #$ $Revision: 6749 $ Contents 1 Introduction 6 1.1 About SDCC.............................................6 1.2 Open Source..............................................7 1.3 Typographic conventions.......................................7 1.4 Compatibility with previous versions.................................7 1.5 System Requirements.........................................8 1.6 Other Resources............................................9 1.7 Wishes for the future.........................................9 2 Installing SDCC 10 2.1 Configure Options........................................... 10 2.2 Install paths.............................................. 12 2.3 Search Paths.............................................. 13 2.4 Building SDCC............................................ 14 2.4.1 Building SDCC on Linux.................................. 14 2.4.2 Building SDCC on Mac OS X................................ 15 2.4.3 Cross compiling SDCC on Linux for Windows....................... 15 2.4.4 Building SDCC using Cygwin and Mingw32........................ 15 2.4.5 Building SDCC Using Microsoft Visual C++ 6.0/NET (MSVC).............. 16 2.4.6 Windows Install Using a ZIP Package............................ 17 2.4.7 Windows Install Using the Setup Program.......................... 17 2.4.8 VPATH feature........................................ 17 2.5 Building the Documentation..................................... 18 2.6 Reading the Documentation....................................
    [Show full text]
  • A TMD-MPI/MPE Based Heterogeneous Video System
    i A TMD-MPI/MPE Based Heterogeneous Video System by Tony Ming Zhou Supervisor: Professor Paul Chow April 2010 ii Abstract: The FPGA technology advancements have enabled reconfigurable large-scale hardware system designs. In recent years, heterogeneous systems comprised of embedded processors, memory units, and a wide-variety of IP blocks have become an increasingly popular solution to building future computing systems. The TMD- MPI project has evolved the software standard message passing interface, MPI, to the scope of FPGA hardware design. It provides a new programming model that enabled transparent communication and synchronization between tasks running on heterogeneous processing devices in the system. In this thesis project, we present the design and characterization of a TMD-MPI based heterogeneous video processing system. The system is comprised of hardware peripheral cores and software video codec. By hiding low-level architectural details from the designer, TMD-MPI can improve development productivity and reduce the level of difficulty. In particular, with the type of abstraction TMD-MPI provides, the software video codec approach is an easy-to-entry point for hardware design. The primary focus is the functionalities and different configurations of the TMD-MPI based heterogeneous design. iii Acknowledgements I would like to thank supervisor Professor Paul Chow for his patience and guidance, and the University of Toronto and the department of Engineering Science for the wonderful five year journey that led me to this point. Special thanks go to the TMD-MPI research group, in particular to Sami Sadaka, Kevin Lam, Kam Pui Tang, and Manuel Saldaña. Last but not the least, I would like to thank my family and my friends for always being there for me.
    [Show full text]
  • SDCC Compiler User Guide
    SDCC Compiler User Guide SDCC 3.0.1 $Date:: 2011-03-02 #$ $Revision: 6253 $ Contents 1 Introduction 6 1.1 About SDCC.............................................6 1.2 Open Source..............................................7 1.3 Typographic conventions.......................................7 1.4 Compatibility with previous versions.................................7 1.5 System Requirements.........................................8 1.6 Other Resources............................................9 1.7 Wishes for the future.........................................9 2 Installing SDCC 10 2.1 Configure Options........................................... 10 2.2 Install paths.............................................. 12 2.3 Search Paths.............................................. 13 2.4 Building SDCC............................................ 14 2.4.1 Building SDCC on Linux.................................. 14 2.4.2 Building SDCC on Mac OS X................................ 15 2.4.3 Cross compiling SDCC on Linux for Windows....................... 15 2.4.4 Building SDCC using Cygwin and Mingw32........................ 15 2.4.5 Building SDCC Using Microsoft Visual C++ 6.0/NET (MSVC).............. 16 2.4.6 Windows Install Using a ZIP Package............................ 17 2.4.7 Windows Install Using the Setup Program.......................... 17 2.4.8 VPATH feature........................................ 17 2.5 Building the Documentation..................................... 18 2.6 Reading the Documentation....................................
    [Show full text]
  • Altera Powerpoint Guidelines
    Ecole Informatique 2016 IN2P3, Ecole Polytechnique, 27 Mai 2016 OpenCL On FPGA Marc Gaucheron INTEL Programmable Solution Group Agenda FPGA architecture overview Conventional way of developing with FPGA OpenCL: abstracting FPGA away ALTERA BSP: abstracting FPGA development Examples 2 A bit of Marketing ? 3 FPGA architecture overview FPGA Architecture: Fine-grained Massively Parallel Millions of reconfigurableLet’s zoom logic in elements I/O Thousands of 20Kb memory blocks Thousands of Variable Precision DSP blocks Dozens of High-speed transceivers Multiple High Speed configurable Memory Controllers I/O I/O Multiple ARM© Cores I/O 5 FPGA Architecture: Basic Elements Basic Element 1-bit configurable 1-bit register operation (store result) Configured to perform any 1-bit operation: AND, OR, NOT, ADD, SUB 6 FPGA Architecture: Flexible Interconnect … Basic Elements are surrounded with a flexible interconnect 7 FPGA Architecture: Flexible Interconnect … … Wider custom operations are implemented by configuring and interconnecting Basic Elements 8 FPGA Architecture: Custom Operations Using Basic Elements 16-bit add 32-bit sqrt … … … Your custom 64-bit bit-shuffle and encode Wider custom operations are implemented by configuring and interconnecting Basic Elements 9 FPGA Architecture: Memory Blocks addr Memory Block data_out data_in 20 Kb Can be configured and grouped using the interconnect to create various cache architectures 10 FPGA Architecture: Memory Blocks addr Memory Block data_out data_in 20 Kb Few larger Can be configured and grouped
    [Show full text]
  • Direct Hardware Generation from High-Level Programming Language
    Budapest University of Technology and Economics Department of Control Engineering and Information Technology DIRECT HARDWARE GENERATION FROM HIGH-LEVEL PROGRAMMING LANGUAGE Ph.D. thesis Csák, Bence Consultant: Prof. Dr. Arató, Péter Full member of Hungarian Academy of Sciences Budapest 2009 Tezisfuzet_EN.doc 1/12 1. INTRODUCTION The ever increasing speed and complexity demand of control applications is driving developers to work out newer and newer solutions for ages. In the course of this arrived mankind to the point where after purely mechanical components, electromechanical, electrical and later electronic components are used in the more and more complex control and computing machines. For the sake of flexibility of control systems even in the times of purely mechanical systems, program control has been created, which then spread over to electromechanical and later electronic systems. Development of programming has accelerated truly in the times of electronic computers. Today programs are written at an abstraction level from which no single step is enough to get to the level, where the first programs were written. This high level ensures that extremely big software systems can operate even safety critical systems without malfunction. The unquenchable desire for even higher computation power - despite of the development of software engineering - forces that certain tasks are solved by purpose built hardware. In case of a given task the desired performance is assured on a cost-effective way by composing the system of application specific hardware and CPU based software. Optimal partition of software and hardware parts is aided by the methodology of hardware-software co-design. Here, system specification is given on a standard symbolic language, from which a given partitioning procedure derives hardware and software components.
    [Show full text]
  • Xilinx/Synopsys Interface Guide— ISE 4 Printed in U.S.A
    Xilinx/ Synopsys Interface Guide Xilinx/Synopsys Interface Guide— ISE 4 Printed in U.S.A. Xilinx/Synopsys Interface Guide R The Xilinx logo shown above is a registered trademark of Xilinx, Inc. CoolRunner, RocketChips, RocketIP, Spartan, StateBENCH, StateCAD, Virtex, XACT, XILINX, XC2064, XC3090, XC4005, and XC5210 are registered trademarks of Xilinx, Inc. The shadow X shown above is a trademark of Xilinx, Inc. ACE Controller, ACE Flash, A.K.A. Speed, Alliance Series, AllianceCORE, Bencher, ChipScope, Configurable Logic Cell, CORE Generator, CoreLINX, Dual Block, EZTag, Fast CLK, Fast CONNECT, Fast FLASH, FastMap, Fast Zero Power, Foundation, Gigabit Speeds...and Beyond!, HardWire, HDL Bencher, IRL, J Drive, JBits, LCA, LogiBLOX, Logic Cell, LogiCORE, LogicProfessor, MicroBlaze, MicroVia, MultiLINX, NanoBlaze, PicoBlaze, PLUSASM, PowerGuide, PowerMaze, QPro, Real-PCI, Rocket I/O, Select I/O, SelectRAM, SelectRAM+, Silicon Xpresso, Smartguide, Smart-IP, SmartSearch, SMARTswitch, System ACE, Testbench In A Minute, TrueMap, UIM, VectorMaze, VersaBlock, VersaRing, Wave Table, WebFITTER, WebPACK, WebPOWERED, XABEL, XACTstep Advanced, XACTstep Foundry, XACT-Floorplanner, XACT-Performance, XAM, XAPP, X-BLOX +, XChecker, XDM, XEPLD, Xilinx Foundation Series, Xilinx XDTV, Xinfo, XSI, XtremeDSP, all XC designated products, and ZERO+ are trademarks of Xilinx, Inc. The Programmable Logic Company is a service mark of Xilinx, Inc. All other trademarks are the property of their respective owners. Xilinx, Inc. does not assume any liability arising out of the application or use of any product described or shown herein; nor does it convey any license under its patents, copyrights, or maskwork rights or any rights of others. Xilinx, Inc. reserves the right to make changes, at any time, in order to improve reliability, function or design and to supply the best product possible.
    [Show full text]