Reconfigurable Architectures for General-Purpose Computing

Total Page:16

File Type:pdf, Size:1020Kb

Reconfigurable Architectures for General-Purpose Computing MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY A.I. Technical Report No. 1586 October, 1996 Recon®gurable Architectures for General-Purpose Computing Andre DeHon [email protected] Abstract: General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We de®ne RP-space, a restricted domain of the general-purpose architectural space focussed on recon®gurable computing architectures. Two dominant features differentiate recon®gurable from special-purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) ¯exible interconnect which supports task dependent data¯ow between operations. We can characterize RP-space by the allocation and structure of these resources and compare the ef®ciencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their ef®ciency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a ®ne-grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and ®nite-state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time-switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources signi®cantly narrows the range of ef®ciency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the ®rst architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register-®le building blocks interconnected via a byte-wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20 the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and ef®cient general-purpose computing structures. Acknowledgements: This report describes research done at the Arti®cial Intelligence Laboratory of the Mas- sachusetts Institute of Technology. This research is supported by the Advanced Research Projects Agency of the Department of Defense under Rome Labs contract number F30602-94-C-0252. Acknowledgments This effort grew out the intellectual backdrop of the Transit and Abacus projects. Years prototyping Transit machines with Tom Simon and his specialization philosophy set the stage for my initial interest in FPGAs for computing. The initial ideas for the DPGA grew out of dialogs with Mike Bolotski in which we tried to reconcile Abacus, his SIMD architecture which he described as ªa bunch of one-bit processors,º with FPGAs, which looked to me like ªa bunch of one-bit processors.º Tom Knight has been my research advisor since I was a junior. He has always encouraged me to focus on the big idea and has been supportive as I explored sometimes radical points of view. He gave me plenty of freedom to do the right thing, and hopefully, I have lived up to the con®dence and trust implied by that autonomy. The efforts of Jeremy Brown, Derrick Chen, Ian Eslick, Ethan Mirsky, and Edward Tau during and after 6.371 made the DPGA prototype possible. Ian's perseverance to ®nalize the layout and veri®cation was particular responsible for the completion of that effort. Ed and Ian both helped see the DPGA prototype through its ®nal postmortem. TSFPGA and MATRIX were both possible only because of Derrick Chen and Ethan Mirsky, the Master of Engineering students who respectively took ownership of the microarchitecture and VLSI portions of those designs. We were largely able to complement each other's efforts in our attempts to understand and develop these architectures. Discussion with Rich Lethin, Russ Tessier, and Jonathan Babb at MIT were useful in focusing in on the key issues which needed addressing. Regular interaction with the emerging recon®gurable computing community was valuable for encouragement and for identifying key problems and issues. Notably, discussions with Brad Taylor, Mike Butts, Brad Hutchings, Bill Magione-Smith, John Villasenor, Phil Kuekes, Steve Trimberger, Mike Smith, and Carl Ebling have been helpful in identifying the questions which need answers and cleaning up ideas for presentation. Thomas McDermott provided valuable feedback on the early chapters of this work. The availability of high-quality, experimental CAD tools in source form from universities made the experimental mapping work done here feasible. University of Toronto's Chortle provided a clean basis for several early experiments in DPGA synthesis. UC Berkeley's SIS was used for standard, technology independent circuit mapping. UC Berkeley's mustang was the workhorse behind multicontext FSM mapping. This research was supported by the Advanced Research Projects Agency of the Department of Defense under Rome Labs contract number F30602-94-C-0252. i Contents I Introduction and Background 1 1 Overview and Synopsis 2 1.1 Evolution of General-Purpose Computing with VLSI Technology 2 1.2 This Thesis 3 1.3 Recon®gurable Device Characteristics 5 1.4 Con®gurable, Programmable, and Fixed-Function Devices 5 1.5 Key Relations 7 1.6 New General-Purpose Architectures 8 1.7 Prognosis for the Future 11 2 Basics and Terminology 12 2.1 General-Purpose Computing 12 2.2 General-Purpose Computing Issues 13 2.2.1 Interconnect 13 2.2.2 Instructions 13 2.3 Programmables and Con®gurables 13 2.4 FPGA Introduction 15 2.5 Regular and Irregular Computing Tasks 17 2.6 Metrics: Density, Diversity, and Capacity 17 2.6.1 Functional Density 18 2.6.2 Functional Diversity ± Instruction Density 20 2.6.3 Data Density 20 3 Recon®gurable Computing Background 22 3.1 Successes of Recon®gurable Computing 22 3.1.1 Programmable Active Memories 22 3.1.2 Splash 22 3.1.3 PRISM 23 3.1.4 Logic Emulation 23 3.2 Lineage 23 3.3 Technological Enablers 24 ii II Empirical Review 26 4 Empirical Review of General Purpose Computing Architectures in the Age of MOS VLSI 27 4.1 Processors 27 4.2 VLIW Processors 33 4.3 Digital Signal Processors (DSPs) 34 4.4 Memories 35 4.5 Field-Programmable Gate Arrays (FPGAs) 41 4.6 Vector and SIMD Processors 44 4.7 Multimedia Processors 47 4.8 Multiple Context FPGAs 48 4.9 MIMD Processors 49 4.10 Recon®gurable ALUs 50 4.11 Summary 51 5 Case Study: Multiply 54 5.1 Custom Multipliers 54 5.2 Semicustom Multipliers 54 5.3 General-Purpose Multiply Implementations 56 5.4 Hardwired Functional Units in ªGeneral-Purpose Devicesº 57 5.5 Multiplication Granularity 58 5.6 Specialized Multiplication 58 5.7 Summary 59 6 High Diversity on Recon®gurables 60 III Structure and Composition of Recon®gurable Computing Devices 62 7 Interconnect 63 7.1 Dominant Area and Delay 63 7.1.1 Fixed Area 63 7.1.2 Interconnect and Con®guration Area 64 7.1.3 Delay 64 7.2 Problems with ªSimpleº Networks 66 7.2.1 Crossbars 66 7.2.2 Multistage Networks 67 7.2.3 Mesh Interconnect 67 7.3 Issues in Recon®gurable Network Design 68 7.4 Conventional Interconnect 69 7.5 Switch Requirements for FPGAs with 100-1000 LUTs 71 7.6 Channel and Wire Growth 72 7.6.1 Rent's Rule Based Hierarchical Interconnect Model 72 iii 7.6.2 Wire Growth in Rent Hierarchy Model 74 7.6.3 Switch Growth in Rent Hierarchy Model 75 7.7 Network Utilization Ef®ciency 79 7.8 Interconnect Description 89 7.8.1 Weak Upper Bound 89 7.8.2 Structure Based-Estimates 90 7.8.3 Signi®cance and Impact 92 7.8.4 Instruction Growth versus Interconnect Growth 94 7.9 Effects of Interconnect Granularity 97 7.9.1 Wiring 97 7.9.2 Switches 98 7.10 Summary 99 8 Instructions 100 8.1 General Case Example 100 8.2 Bits per Instruction 102 8.3 Compressing Instruction Stream Requirements 103 8.3.1 Wide Word Architectures 103 8.3.2 Broadcast Single Instruction to Multiple Compute Units 103 8.3.3 Locally Con®gure Instruction 103 8.3.4 Broadcast Instruction Identi®er, Lookup in Local Store 104 8.3.5 Encode Length by Likelihood 105 8.3.6 Mode Bits for Early Bound information 105 8.3.7 Themes 106 8.4 Compressibility 107 8.5 Control Streams 108 8.6 Instruction Stream Taxonomy 109 8.7 Summary 110 9 RP-space Area Model 111 9.1 Model and Assumptions 111 9.2 Peak Performance Density 114 9.3 Granularity 117 9.4 Contexts 121 9.5 Composition 124 9.6 Summary 128 IV New Architectures 129 10 Dynamically Programmable Gate Arrays 130 10.1 DPGA Introduction 132 10.2 Related Architectures 137 10.3 Realm of Application 138 iv 10.3.1 Limited Throughput Requirements 138 10.3.2 Latency Limited Designs 140 10.3.3 Temporally Varying or Data Dependent Functional Requirements 141 10.3.4 Multicontext versus Monolithic and Partial Recon®guration 141 10.4 A Prototype DPGA 145 10.4.1 Architecture 145 10.4.2 Implementation 150 10.4.3 Component Operation 157 10.4.4 Prototype Context Area Model 158 10.4.5 Prototype Conclusions 158 10.5 Circuit Evaluation 160 10.5.1
Recommended publications
  • Validated Products List, 1995 No. 3: Programming Languages, Database
    NISTIR 5693 (Supersedes NISTIR 5629) VALIDATED PRODUCTS LIST Volume 1 1995 No. 3 Programming Languages Database Language SQL Graphics POSIX Computer Security Judy B. Kailey Product Data - IGES Editor U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology Computer Systems Laboratory Software Standards Validation Group Gaithersburg, MD 20899 July 1995 QC 100 NIST .056 NO. 5693 1995 NISTIR 5693 (Supersedes NISTIR 5629) VALIDATED PRODUCTS LIST Volume 1 1995 No. 3 Programming Languages Database Language SQL Graphics POSIX Computer Security Judy B. Kailey Product Data - IGES Editor U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology Computer Systems Laboratory Software Standards Validation Group Gaithersburg, MD 20899 July 1995 (Supersedes April 1995 issue) U.S. DEPARTMENT OF COMMERCE Ronald H. Brown, Secretary TECHNOLOGY ADMINISTRATION Mary L. Good, Under Secretary for Technology NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY Arati Prabhakar, Director FOREWORD The Validated Products List (VPL) identifies information technology products that have been tested for conformance to Federal Information Processing Standards (FIPS) in accordance with Computer Systems Laboratory (CSL) conformance testing procedures, and have a current validation certificate or registered test report. The VPL also contains information about the organizations, test methods and procedures that support the validation programs for the FIPS identified in this document. The VPL includes computer language processors for programming languages COBOL, Fortran, Ada, Pascal, C, M[UMPS], and database language SQL; computer graphic implementations for GKS, COM, PHIGS, and Raster Graphics; operating system implementations for POSIX; Open Systems Interconnection implementations; and computer security implementations for DES, MAC and Key Management.
    [Show full text]
  • Reconfigurable Computing
    Reconfigurable Computing: A Survey of Systems and Software KATHERINE COMPTON Northwestern University AND SCOTT HAUCK University of Washington Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey, we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which reuse the configurable hardware during program execution. Categories and Subject Descriptors: A.1 [Introductory and Survey]; B.6.1 [Logic Design]: Design Style—logic arrays; B.6.3 [Logic Design]: Design Aids; B.7.1 [Integrated Circuits]: Types and Design Styles—gate arrays General Terms: Design, Performance Additional Key Words and Phrases: Automatic design, field-programmable, FPGA, manual design, reconfigurable architectures, reconfigurable computing, reconfigurable systems 1. INTRODUCTION of algorithms. The first is to use hard- wired technology, either an Application There are two primary methods in con- Specific Integrated Circuit (ASIC) or a ventional computing for the execution group of individual components forming a This research was supported in part by Motorola, Inc., DARPA, and NSF. K. Compton was supported by an NSF fellowship. S. Hauck was supported in part by an NSF CAREER award and a Sloan Research Fellowship.
    [Show full text]
  • SPARC64-III User's Guide
    SPARC64-III User’s Guide HAL Computer Systems, Inc. Campbell, California May 1998 Copyright © 1998 HAL Computer Systems, Inc. All rights reserved. This product and related documentation are protected by copyright and distributed under licenses restricting their use, copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of HAL Computer Systems, Inc., and its licensors, if any. Portions of this product may be derived from the UNIX and Berkeley 4.3 BSD Systems, licensed from UNIX System Laboratories, Inc., a wholly owned subsidiary of Novell, Inc., and the University of California, respectively. RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the United States Government is subject to the restrictions set forth in DFARS 252.227-7013 (c)(1)(ii), FAR 52.227-19, and NASA FAR Supplement. The product described in this book may be protected by one or more U.S. patents, foreign patents, or pending applications. TRADEMARKS HAL, the HAL logo, HyperScalar, and OLIAS are registered trademarks and HAL Computer Systems, Inc. HALstation 300, and Ishmail are trademarks of HAL Computer Systems, Inc. SPARC64 and SPARC64/OS are trademarks of SPARC International, Inc., licensed by SPARC International, Inc., to HAL Computer Systems, Inc. Fujitsu and the Fujitsu logo are trademarks of Fujitsu Limited. All SPARC trademarks, including the SCD Compliant Logo, are trademarks or registered trademarks of SPARC International, Inc. SPARCstation, SPARCserver, SPARCengine, SPARCstorage, SPARCware, SPARCcenter, SPARCclassic, SPARCcluster, SPARCdesign, SPARC811 SPARCprinter, UltraSPARC, microSPARC, SPARCworks, and SPARCompiler are licensed exclusively to Sun Microsystems, Inc.
    [Show full text]
  • Syllabus: EEL4930/5934 Reconfigurable Computing
    EEL4720/5721 Reconfigurable Computing (dual-listed course) Department of Electrical and Computer Engineering University of Florida Spring Semester 2019 Catalog Description: Prereq: EEL4712C or EEL5764 or consent of instructor. Fundamental concepts at advanced undergraduate level (EEL4720) and introductory graduate level (EEL5721) in reconfigurable computing (RC) based upon advanced technologies in field-programmable logic devices. Topics include general RC concepts, device architectures, design tools, metrics and kernels, system architectures, and application case studies. Credit Hours: 3 Prerequisites by Topic: Fundamentals of digital design including device technologies, design methodology and techniques, and design environments and tools; fundamentals of computer organization and architecture, including datapath and control structures, data formats, instruction-set principles, pipelining, instruction-level parallelism, memory hierarchy, and interconnects and interfacing. Instructor: Dr. Herman Lam Office: Benton Hall, Room 313 Office hours: TBA Telephone: (352) 392-2689 Email: [email protected] Teaching Assistant: Seyed Hashemi Office hours: TBA Email: [email protected] Class lectures: MWF 4th period, Larsen Hall 239 Required textbook: none References: . Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, edited by Scott Hauck and Andre DeHon, Elsevier, Inc. (Morgan Kaufmann Publishers), Amsterdam, 2008. ISBN: 978-0-12-370522-8 . C. Maxfield, The Design Warrior's Guide to FPGAs, Newnes, 2004, ISBN: 978-0750676045.
    [Show full text]
  • The Conference Program Booklet
    Austin Convention Center Conference Austin, TX Program http://sc15.supercomputing.org/ Conference Dates: Exhibition Dates: The International Conference for High Performance Nov. 15 - 20, 2015 Nov. 16 - 19, 2015 Computing, Networking, Storage and Analysis Sponsors: SC15.supercomputing.org SC15 • Austin, Texas The International Conference for High Performance Computing, Networking, Storage and Analysis Sponsors: 3 Table of Contents Welcome from the Chair ................................. 4 Papers ............................................................... 68 General Information ........................................ 5 Posters Research Posters……………………………………..88 Registration and Conference Store Hours ....... 5 ACM Student Research Competition ........ 114 Exhibit Hall Hours ............................................. 5 Posters SC15 Information Booth/Hours ....................... 5 Scientific Visualization/ .................................... 120 Data Analytics Showcase SC16 Preview Booth/Hours ............................. 5 Student Programs Social Events ..................................................... 5 Experiencing HPC for Undergraduates ...... 122 Registration Pass Access .................................. 7 Mentor-Protégé Program .......................... 123 Student Cluster Competition Kickoff ......... 123 SCinet ............................................................... 8 Student-Postdoc Job & ............................. 123 Convention Center Maps ................................. 12 Opportunity Fair Daily Schedules
    [Show full text]
  • A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable
    A MATLAB Compiler For Distributed, Heterogeneous, Reconfigurable Computing Systems P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A. Jones, A. Kanhare A. Nayak, S. Periyacheri, M. Walkden, D. Zaretsky Electrical and Computer Engineering Northwestern University 2145 Sheridan Road Evanston, IL-60208 [email protected] Abstract capabilities and are coordinated to perform an application whose subtasks have diverse execution requirements. One Recently, high-level languages such as MATLAB have can visualize such systems to consist of embedded proces- become popular in prototyping algorithms in domains such sors, digital signal processors, specialized chips, and field- as signal and image processing. Many of these applications programmable gate arrays (FPGA) interconnected through whose subtasks have diverse execution requirements, often a high-speed interconnection network; several such systems employ distributed, heterogeneous, reconfigurable systems. have been described in [9]. These systems consist of an interconnected set of heteroge- A key question that needs to be addressed is how to map neous processing resources that provide a variety of archi- a given computation on such a heterogeneous architecture tectural capabilities. The objective of the MATCH (MATlab without expecting the application programmer to get into Compiler for Heterogeneous computing systems) compiler the low level details of the architecture or forcing him/her to project at Northwestern University is to make it easier for understand
    [Show full text]
  • Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems »
    THESE DE DOCTORAT DE L’UNIVERSITÉ BRETAGNE SUD COMUE UNIVERSITE BRETAGNE LOIRE ÉCOLE DOCTORALE N° 601 Mathématiques et Sciences et Technologies de l'Information et de la Communication Spécialité : Électronique Par « Satyajit DAS » « Architecture and Programming Model Support For Reconfigurable Accelerators in Multi-Core Embedded Systems » Thèse présentée et soutenue à Lorient, le 4 juin 2018 Unité de recherche : Lab-STICC Thèse N° : 491 Rapporteurs avant soutenance : Composition du Jury : Michael HÜBNER Professeur, Ruhr-Universität François PÊCHEUX Professeur, Sorbonne Université Bochum Président (à préciser après la soutenance) Jari NURMI Professeur, Tampere University of Angeliki KRITIKAKOU Maître de conférences, Université Technology Rennes 1 Davide ROSSI Assistant professor, Université de Bologna Kevin MARTIN Maître de conférences, Université Bretagne Sud Directeur de thèse Philippe COUSSY Professeur, Université Bretagne Sud Co-directeur de thèse Luca BENINI Professeur, Université de Bologna Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems Satyajit Das 2018 iii ABSTRACT Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low as they perform one specific function and increasing the number of the accelerators in a system onchip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting several processing elements with word level granularity is a promising choice for programmable accelerator.
    [Show full text]
  • Hardware Platforms for Embedded Computing Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund Importance of Energy Efficiency
    Universität Dortmund Hardware Platforms for Embedded Computing Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund Importance of Energy Efficiency Efficient software design needed, otherwise, the price for software flexibility cannot be paid. © Hugo De Man (IMEC) Philips, 2007 Universität Dortmund Embedded vs. general-purpose processors • Embedded processors may be optimized for a category of applications. – Customization may be narrow or broad. • We may judge embedded processors using different metrics: – Code size. – Memory system performance. – Preditability. • Disappearing distinction: embedded processors everywhere Universität Dortmund Microcontroller Architectures Memory 0 Address Bus Program + CPU Data Bus Data Von Neumann 2n Architecture Memory 0 Address Bus Program CPU Fetch Bus Harvard Address Bus 0 Architecture Data Bus Data Universität Dortmund RISC processors • RISC generally means highly-pipelinable, one instruction per cycle. • Pipelines of embedded RISC processors have grown over time: – ARM7 has 3-stage pipeline. – ARM9 has 5-stage pipeline. – ARM11 has eight-stage pipeline. ARM11 pipeline [ARM05]. Universität Dortmund ARM Cortex Based on ARMv7 Architecture & Thumb®-2 ISA – ARM Cortex A Series - Applications CPUs focused on the execution of complex OS and user applications • First Product: Cortex-A8 • Executes ARM, Thumb-2 instructions – ARM Cortex R Series - Deeply embedded processors focused on Real-time environments • First Product: Cortex-R4(F) • Executes ARM, Thumb-2 instructions – ARM Cortex M
    [Show full text]
  • Reconfigurable Computing
    Reconfigurable Computing David Boland1, Chung-Kuan Cheng2, Andrew B. Kahng2, Philip H.W. Leong1 1School of Electrical and Information Engineering, The University of Sydney, Australia 2006 2Dept. of Computer Science and Engineering, University of California, La Jolla, California Abstract: Reconfigurable computing is the application of adaptable fabrics to solve computational problems, often taking advantage of the flexibility available to produce problem-specific architectures that achieve high performance because of customization. Reconfigurable computing has been successfully applied to fields as diverse as digital signal processing, cryptography, bioinformatics, logic emulation, CAD tool acceleration, scientific computing, and rapid prototyping. Although Estrin-first proposed the idea of a reconfigurable system in the form of a fixed plus variable structure computer in 1960 [1] it has only been in recent years that reconfigurable fabrics, in the form of field-programmable gate arrays (FPGAs), have reached sufficient density to make them a compelling implementation platform for high performance applications and embedded systems. In this article, intended for the non-specialist, we describe some of the basic concepts, tools and architectures associated with reconfigurable computing. Keywords: reconfigurable computing; adaptable fabrics; application integrated circuits; field programmable gate arrays (FPGAs); system architecture; runtime 1 Introduction Although reconfigurable fabrics can in principle be constructed from any type of technology, in practice, most contemporary designs are made using commercial field programmable gate arrays (FPGAs). An FPGA is an integrated circuit containing an array of logic gates in which the connections can be configured by downloading a bitstream to its memory. FPGAs can also be embedded in integrated circuits as intellectual property cores.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • A Reconfigurable Convolutional Neural Network-Accelerated
    electronics Article A Reconfigurable Convolutional Neural Network-Accelerated Coprocessor Based on RISC-V Instruction Set Ning Wu *, Tao Jiang, Lei Zhang, Fang Zhou and Fen Ge College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; [email protected] (T.J.); [email protected] (L.Z.); [email protected] (F.Z.); [email protected] (F.G.) * Correspondence: [email protected]; Tel.: +86-139-5189-3307 Received: 5 May 2020; Accepted: 8 June 2020; Published: 16 June 2020 Abstract: As a typical artificial intelligence algorithm, the convolutional neural network (CNN) is widely used in the Internet of Things (IoT) system. In order to improve the computing ability of an IoT CPU, this paper designs a reconfigurable CNN-accelerated coprocessor based on the RISC-V instruction set. The interconnection structure of the acceleration chain designed by the predecessors is optimized, and the accelerator is connected to the RISC-V CPU core in the form of a coprocessor. The corresponding instruction of the coprocessor is designed and the instruction compiling environment is established. Through the inline assembly in the C language, the coprocessor instructions are called, coprocessor acceleration library functions are established, and common algorithms in the IoT system are implemented on the coprocessor. Finally, resource consumption evaluation and performance analysis of the coprocessor are completed on a Xilinx FPGA. The evaluation results show that the reconfigurable CNN-accelerated coprocessor only consumes 8534 LUTS, accounting for 47.6% of the total SoC system. The number of instruction cycles required to implement functions such as convolution and pooling based on the designed coprocessor instructions is better than using the standard instruction set, and the acceleration ratio of convolution is 6.27 times that of the standard instruction set.
    [Show full text]
  • Reconfigurable Computing Brittany Ransbottom Benjamin Wheeler What Is Reconfigurable Computing?
    Reconfigurable Computing Brittany Ransbottom Benjamin Wheeler What is Reconfigurable Computing? A computer architecture style which tries to find a happy medium of general purpose processors and ASICs through the use of the flexibility of software, and the speed of hardware. Reconfigurable Computing von Neumann vs. Reconfigurable Fixed resources Variable resources The architecture is While the designed prior to architecture is use as a processor designed prior to use, it can be adjusted or reconfigured to use different resources Execution in SW Execution in HW Spatial vs. Temporal Hardware Software Software The Reconfigurable Computing Paradox • Migration from software to configware o speed-up factors and electricity consumption reduction of about four orders of magnitude o Clock frequency and other specifications of FPGAs are behind microprocessors by about four orders of magnitude Current State No designs merge GPP and ASIC enough to be marketable Too specialized (a glorified asic), or priced too high to benefit replacing GPPs Coarse-Grained doesn't have SW support necessary What is an FPGA? Reprogrammable Hardware Generally configured in VHDL or Verilog Can execute processes spatially as opposed to temporally The FPGA Coprocessor Option +Reprogrammable Logic -Requires pre-existing HW design -Needs to be specific processing, or have an intensive controller/compiler What is Coarse-grained computing Functional units (add, subtraction, multiplication - word-level operations) interconnected in a mesh style Coarse-Grained +Short Reconfiguration
    [Show full text]