Energy-Efficient RISC-V Processors in 28Nm FDSOI

Total Page:16

File Type:pdf, Size:1020Kb

Energy-Efficient RISC-V Processors in 28Nm FDSOI Energy-Efficient RISC-V Processors in 28nm FDSOI Borivoje Nikolić Department of Electrical Engineering and Computer Sciences University of California, Berkeley [email protected] 26 September 2017 Our 28FDSOI Adventures Raven-2 Raven-1 Raven-3 Raven-4 LDPC May Apr Aug Feb Jul Sep Mar Nov Mar Apr Mar Jul Mar 2011 2012 2013 2014 2015 2016 2017 BTLE + SNOW testchip (Leti) Ten chips designed in 28nm FDSOI 9 tested and functional, 6 published 1 in fab 2 Berkeley RISC-V ISA www.riscv.org A new completely open ISA – free to use and extend Has complete software support (GCC, Linux, LLVM, simulators…) RV32, RV64, and RV128 variants for 32b, 64b, and 128b address spaces defined Base ISA only 40 integer instructions, but supports compiler, linker, OS, etc. Extensions provide full general-purpose ISA, including IEEE- 754/2008 floating-point Comparable ISA-level metrics to other RISCs Designed for extension, customization 3 RISC-V Foundation Members (60+) Platinum: Gold, Silver, Auditors: Rumble Developme 4 nt “Rocket Chip” SoC Generator Example Output 5 “Rocket Chip” SoC Specialization 1. Change Parameters 2. Develop New Accelerators 3. Develop Own RISC-V Core 4. Develop Own Device 6 28nm FDSOI RISC-V Processor SoCs RISC-V with vector accelerator and integrated DC-DC converters 34 GFLOPS/W RISC-V with vector accelerator and integrated DC-DC converters, back-bias, power management 54 GFLOPS/W 7 Raven-3 Processor Vector RF VI$ DC-DC Rocket/Hwacha Tile D$ I$ BIST Uncore Process: ST 28nm FDSOI Runs Linux B. Zimmer, VLSI’15 0.45V-1V, including cache B. Zimmer, JSSC 4/15 34GFLOPS/W running DP matrix-matrix multiplication 8 Raven-3 Processor 28nm FDSOI offers high energy efficiency 4 DC-DC converter modes cover wide operating range 9 Raven-4 RISC-V Processor SoC VOLTAGE AND CLOCK POWER MANAGEMENT SRAM INTEGRATED GENERATION (0.4 mm2) (0.1 mm2) BIST MEASUREMENT Toggle Back-Bias Counter Z-scale PMU Programmable I Generator 8KB Scratchpad current mirror load ref Clock NWELL Counter Set body bias Vout waveform PWELL Iload Set DC-DC Vout reconstruction To CORE (1.07 mm2) Vector Accelerator scope ... Vector Issue Unit 1.8V Rocket Core ... Branch Prediction (16KB Vector RF uses eight Vout custom 8T SRAM macros) 1.0V ... Scalar int int int int int RF FPU Crossbar 48 switched-capacitor Functional units int DC-DC unit cells (64-bit Int. Mul., SP/DP FMA) + Vector Memory Unit DCDC toggle DCDC FSM V ref 16KB Scalar 32KB Shared 8KB Vector Inst. Cache Data Cache Inst. Cache DC-DC controller (Custom 8T (Custom 8T (Custom 8T SRAM Macros) SRAM Macros) SRAM Macros) To core clk Arbiter scope Async. FIFO/Level shifters Adaptive clock between domains 1.0V generator Digital IO pads to wire-bonded chip-on-board UNCORE To/from off-chip FPGA FSB and DRAM 10 VOLTAGE AND CLOCK POWER MANAGEMENT SRAM INTEGRATED GENERATION (0.4 mm2) (0.1 mm2) BIST MEASUREMENT Toggle Back-Bias Counter Z-scale PMU Programmable Power I Generator 8KB Scratchpad current mirror load ref Clock BackNWELL Bias ManagementCounter Set body bias Vout waveform PWELL Iload Set DC-DC Vout reconstruction To CORE (1.07 mm2) Vector Accelerator scope ... Vector Issue Unit 1.8V Rocket Core ... Branch Prediction (16KB Vector RF uses eight 1.0V Vout custom 8T SRAM macros) Integrated ... Scalar int int int int int RF FPU Crossbar 48 switched-capacitor Voltage RISC-V RocketFunctional units int DC-DC unit cells (64-bit Int. Mul., SP/DP FMA) Regulation+ ProcessorVector Memory Unit DCDC toggle FSM V ref 16KB Scalar 32KB Shared 8KB Vector Inst. Cache Data Cache Inst. Cache DC-DC controller (Custom 8T (Custom 8T (Custom 8T SRAM Macros) SRAM Macros) SRAM Macros) To Adaptivecore clk Arbiter scope Async. FIFO/Level shifters ClockAdaptive clock between domains 1.0V generator Digital IO pads to wire-bonded chip-on-board UNCORE Generation To/from off-chip FPGA FSB and DRAM 11 VOLTAGE AND CLOCK POWER MANAGEMENT SRAM INTEGRATED GENERATION (0.4 mm2) (0.1 mm2) BIST MEASUREMENT Toggle Back-Bias Counter Z-scale PMU Programmable I Generator 8KB Scratchpad current mirror load ref Clock NWELL Counter Set body bias Vout waveform PWELL Iload Set DC-DC Vout reconstruction To CORE (1.07 mm2) Vector Accelerator scope ... Vector Issue Unit 1.8V Rocket Core ... Branch Prediction (16KB Vector RF uses eight 1.0V Vout custom 8T SRAM macros) Integrated ... Scalar int int int int int RF FPU Crossbar 48 switched-capacitor Voltage Functional units int DC-DC unit cells (64-bit Int. Mul., SP/DP FMA) Regulation+ Vector Memory Unit DCDC toggle FSM V ref 16KB Scalar 32KB Shared 8KB Vector Inst. Cache Data Cache Inst. Cache DC-DC controller (Custom 8T (Custom 8T (Custom 8T SRAM Macros) SRAM Macros) SRAM Macros) To core clk Arbiter scope Async. FIFO/Level shifters Adaptive clock between domains 1.0V generator Digital IO pads to wire-bonded chip-on-board UNCORE To/from off-chip FPGA FSB and DRAM 12 Simultaneous Switching DC-DCs Traditional Interleaving Simultaneous Switching Charge sharing losses No charge sharing! Clock frequency adapts to track the voltage ripple Zimmer et al, JSSC’16 13 Reconfigurable SC Converters Four operating modes Vout FSM toggle Vref + supply 0.5–1V core voltage 2GHz comparator Zimmer et al, JSSC’16 14 VOLTAGE AND CLOCK POWER MANAGEMENT SRAM INTEGRATED GENERATION (0.4 mm2) (0.1 mm2) BIST MEASUREMENT Toggle Back-Bias Counter Z-scale PMU Programmable I Generator 8KB Scratchpad current mirror load ref Clock NWELL Counter Set body bias Vout waveform PWELL Iload Set DC-DC Vout reconstruction To CORE (1.07 mm2) Vector Accelerator scope ... Vector Issue Unit 1.8V Rocket Core ... Branch Prediction (16KB Vector RF uses eight Vout custom 8T SRAM macros) 1.0V ... Scalar int int int int int RF FPU Crossbar 48 switched-capacitor RISC-V RocketFunctional units int DC-DC unit cells (64-bit Int. Mul., SP/DP FMA) + ProcessorVector Memory Unit DCDC toggle FSM V ref 16KB Scalar 32KB Shared 8KB Vector Inst. Cache Data Cache Inst. Cache DC-DC controller (Custom 8T (Custom 8T (Custom 8T SRAM Macros) SRAM Macros) SRAM Macros) To core clk Arbiter scope Async. FIFO/Level shifters Adaptive clock between domains 1.0V generator Digital IO pads to wire-bonded chip-on-board UNCORE To/from off-chip FPGA FSB and DRAM 15 RISC-V Rocket Processor Five-stage in-order RISC-V core Similar in performance to ARM Cortex-A5 Single/double precision floating point unit Memory-management unit allows full operating system support http://www.riscv.org 16 Vector Coprocessor Energy-efficient acceleration of common kernels Decoupled Vector Accelerator Scalar Unit Master Sequencer Scalar Execution Vector Lane 0 Vector Lane 1 Vector Lane 2 Vector Lane N Rocket VCMDQ Unit (SXU) Control FPREQQ Vector Execution Vector Execution Vector Execution Vector Execution Processor Unit (VXU) Unit (VXU) Unit (VXU) Unit (VXU) FPRESPQ a Sequencer/ Sequencer/ Sequencer/ Sequencer/ Expander Expander Expander Expander Vector s Runahead VRCMDQ v p v p v p v p Unit (VRU) … a Scalar Memory Unit Vector Memory Vector Memory Vector Memory Vector Memory 4 KB (SMU) Unit (VMU) Unit (VMU) Unit (VMU) Unit (VMU) L1 VI$ L1-to-L2 TileLink Crossbar Y. Lee et al, “The Hwacha Microarchitecture Manual, Version 3.8.1,” 2015 17 Custom 8T Cell SRAM Macro 8T cell for 1R1W ports, low voltage operation with single P-Well 4KB macro, 2:1 physical interleaving 512x72 bits (4KB+ECC) 1V: 380ps (C->Q), 7pJ/read; 0.6V: 1.37ns (C->Q), 2.3pJ/read Single-ended read increases speed 30% and decreases energy 30% SRAM in 28nm FDSOI functional from 1V to 0.45V Thomas et al, SOI12’16 Thomas et al, IEDM’14 Zimmer, Ph.D. Dis, 2015 Keller et al, JSSC’17 18 VOLTAGE AND CLOCK POWER MANAGEMENT SRAM INTEGRATED GENERATION (0.4 mm2) (0.1 mm2) BIST MEASUREMENT Toggle Back-Bias Counter Z-scale PMU Programmable Power I Generator 8KB Scratchpad current mirror load ref Clock NWELL ManagementCounter Set body bias Vout waveform PWELL Iload Set DC-DC Vout reconstruction To CORE (1.07 mm2) Vector Accelerator scope ... Vector Issue Unit 1.8V Rocket Core ... Branch Prediction (16KB Vector RF uses eight Vout custom 8T SRAM macros) 1.0V ... Scalar int int int int int RF FPU Crossbar 48 switched-capacitor Functional units int DC-DC unit cells (64-bit Int. Mul., SP/DP FMA) + Vector Memory Unit DCDC toggle FSM V ref 16KB Scalar 32KB Shared 8KB Vector Inst. Cache Data Cache Inst. Cache DC-DC controller (Custom 8T (Custom 8T (Custom 8T SRAM Macros) SRAM Macros) SRAM Macros) To core clk Arbiter scope Async. FIFO/Level shifters Adaptive clock between domains 1.0V generator Digital IO pads to wire-bonded chip-on-board UNCORE To/from off-chip FPGA FSB and DRAM 19 Measuring Power Rippling voltage supply makes it easy to measure power consumption of the core 20 Measuring Power Rippling voltage supply makes it easy to measure power consumption of the core Each DC-DC toggle is a fixed amount of energy Rippling Supply Voltage Vref Low power High power consumption consumption 21 Measuring Power Counters measure the DC-DC toggle clock frequency to determine core energy Each DC-DC toggle is a fixed amount of energy Rippling Supply Voltage Vref Low power High power consumption consumption DC-DC Toggle Clock Cochet et al, A‐SSCC’16 22 Power Management Unit (PMU) Tiny 32-bit RISC-V core Fully programmable; can access all control registers 23 VOLTAGE AND CLOCK POWER MANAGEMENT SRAM INTEGRATED GENERATION (0.4 mm2) (0.1 mm2) BIST MEASUREMENT Toggle Back-Bias Counter Z-scale PMU Programmable I Generator 8KB Scratchpad current mirror load ref Clock BackNWELL Bias Counter Set body bias Vout waveform PWELL Iload Set DC-DC Vout reconstruction To CORE (1.07 mm2) Vector Accelerator scope ..
Recommended publications
  • Configurable RISC-V Softcore Processor for FPGA Implementation
    1 Configurable RISC-V softcore processor for FPGA implementation Joao˜ Filipe Monteiro Rodrigues, Instituto Superior Tecnico,´ Universidade de Lisboa Abstract—Over the past years, the processor market has and development of several programming tools. The RISC-V been dominated by proprietary architectures that implement Foundation controls the RISC-V evolution, and its members instruction sets that require licensing and the payment of fees to are responsible for promoting the adoption of RISC-V and receive permission so they can be used. ARM is an example of one of those companies that sell its microarchitectures to participating in the development of the new ISA. In the list of the manufactures so they can implement them into their own members are big companies like Google, NVIDIA, Western products, and it does not allow the use of its instruction set Digital, Samsung, or Qualcomm. (ISA) in other implementations without licensing. The RISC-V The main goal of this work is the development of a RISC- instruction set appeared proposing the hardware and software V softcore processor to be implemented in an FPGA, using development without costs, through the creation of an open- source ISA. This way, it is possible that any project that im- a non-RISC-V core as the base of this architecture. The plements the RISC-V ISA can be made available open-source or proposed solution is focused on solving the problems and even implemented in commercial products. However, the RISC- limitations identified in the other RISC-V cores that were V solutions that have been developed do not present the needed analyzed in this thesis, especially in terms of the adaptability requirements so they can be included in projects, especially the and flexibility, allowing future modifications according to the research projects, because they offer poor documentation, and their performances are not suitable.
    [Show full text]
  • Computer Architecture Research with RISC-‐V
    Computer Architecture Research with RISC-V Krste Asanovic UC Berkeley, RISC-V Foundaon, & SiFive Inc. [email protected] www.riscv.org CARRV, Boston, MA October 14, 2017 Only Two Big Mistakes Possible when Picking Research ISA § Design your own § Use someone else’s Promise of using commercially popular ISAs for research § Ported applicaons/workloads to study § Standard soRware stacks (compilers, OS) § Real commercial hardware to experiment with § Real commercial hardware to validate models with § ExisAng implementaons to study / modify § Industry is more interested in your results 3 Types of projects and standard ISAs used by me or my group in last 30 years § Experiments on real hardware plaorms: - Transputer arrays, SPARC workstaons, MIPS workstaons, POWER workstaons, ARMv7 handhelds, x86 desktops/ servers § Research chips built around modified MIPS ISA: - T0, IRAM, STC1, Scale, Maven § FPGA prototypes/simulaons using various ISAs: - RAMP Blue (modified Microblaze), RAMP Gold/ DIABLO (SPARC v8) § Experiments using soRware architectural simulators: - SimpleScalar (PISA), SMTsim (Alpha), Simics (SPARC,x86), Bochs (x86), MARSS (x86), Gem5(SPARC), PIN (Itanium, x86), … § And of course, other groups used some others too. RealiMes of using standard ISAs § Everything only works if you don’t change anything - Stock binary applicaons - Stock libraries - Stock compiler - Stock OS - Stock hardware implementaon § Add a new instrucAon, get a new non-standard ISA! - Need source code for the apps and recompile - Impossible for most real interesAng applicaons
    [Show full text]
  • RISC-V Geneology
    RISC-V Geneology Tony Chen David A. Patterson Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-6 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-6.html January 24, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Introduction RISC-V is an open instruction set designed along RISC principles developed originally at UC Berkeley1 and is now set to become an open industry standard under the governance of the RISC-V Foundation (www.riscv.org). Since the instruction set architecture (ISA) is unrestricted, organizations can share implementations as well as open source compilers and operating systems. Designed for use in custom systems on a chip, RISC-V consists of a base set of instructions called RV32I along with optional extensions for multiply and divide (RV32M), atomic operations (RV32A), single-precision floating point (RV32F), and double-precision floating point (RV32D). The base and these four extensions are collectively called RV32G. This report discusses the historical precedents of RV32G. We look at 18 prior instruction set architectures, chosen primarily from earlier UC Berkeley RISC architectures and major proprietary RISC instruction sets. Among the 122 instructions in RV32G: ● 6 instructions do not have precedents among the selected instruction sets, ● 98 instructions of the 116 with precedents appear in at least three different instruction sets.
    [Show full text]
  • Design of the RISC-V Instruction Set Architecture
    Design of the RISC-V Instruction Set Architecture Andrew Waterman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.html January 3, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Patterson, Chair Professor Krste Asanovi´c Associate Professor Per-Olof Persson Spring 2016 Design of the RISC-V Instruction Set Architecture Copyright 2016 by Andrew Shell Waterman 1 Abstract Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Patterson, Chair The hardware-software interface, embodied in the instruction set architecture (ISA), is arguably the most important interface in a computer system. Yet, in contrast to nearly all other interfaces in a modern computer system, all commercially popular ISAs are proprietary.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Oral History of David (Dave) Ditzel
    Oral History of David (Dave) Ditzel Interviewed by: Kevin Krewell Recorded: July 31, 2015 Mountain View, California CHM Reference number: X7554.2016 © 2015 Computer History Museum Oral History of David (Dave) Ditzel Kevin Krewell: Hi, I'd like to introduce myself. I'm Kevin Krewell. I'm a member of the Semiconductor SIG at the Computer History Museum. Today is July 31st, 2015. We are at the Computer History Museum, and we're about to interview Dave Ditzel, who's probably best known as the founder of Transmeta. But, also an early researcher in RISC processor design at UC Berkeley. He's also worked at ATT Bell Labs, and at Sun Microsystems. Those are probably his most well-known attributes, or his well-known jobs. Dave Ditzel: And even at Intel. That was a surprise to people. Krewell: And Intel, but that's probably less well known. Most people were surprised when-- Ditzel: I wasn't allowed to talk about what I was doing there. Krewell: --I don't know if you still can. Ditzel: A little bit. Krewell: So, let's start off with a little background on Dave, and then we'll work into his history, fascinating history actually. But we're going to start off just a little bit about family background. Why don't just give us a little growing up, and where you born and raised, and how you started life. Ditzel: Generally, I grew up in the Midwest, Missouri and Iowa. My father was a chemical engineer, well trained, university-educated parents, encouraged me to read.
    [Show full text]
  • RISC Architecture
    REDUCED INSTRUCTION SET COMPUTERS Prof. Vojin G. Oklobdzija Integration Berkeley, CA 94708 Keywords: IBM 801; RISC; computer architecture; Load/Store Architecture; instruction sets; pipelining; super-scalar machines; super-pipeline machines; optimizing compiler; Branch and Execute; Delayed Branch; Cache; Harvard Architecture; Delayed Load; Super-Scalar; Super-Pipelined. Fall 1999 1. ARCHITECTURE The term Computer Architecture was first defined in the paper by Amdahl, Blaauw and Brooks of International Business Machines (IBM) Corporation announcing IBM System/360 computer family on April 7, 1964 [1,17]. On that day IBM Corporation introduced, in the words of IBM spokesman, "the most important product announcement that this corporation has made in its history". Computer architecture was defined as the attributes of a computer seen by the machine language programmer as described in the Principles of Operation. IBM referred to the Principles of Operation as a definition of the machine which enables machine language programmer to write functionally correct, time independent programs that would run across a number of implementations of that particular architecture. The architecture specification covers: all functions of the machine that are observable by the program [2]. On the other hand Principles of Operation. are used to define the functions that the implementation should provide. In order to be functionally correct it is necessary that the implementation conforms to the Principles of Operation. Principles of Operation document defines computer architecture which includes: • Instruction set • Instruction format • Operation codes • Addressing modes • All registers and memory locations that may be directly manipulated or tested by a machine language program • Formats for data representation Machine Implementation was defined as the actual system organization and hardware structure encompassing the major functional units, data paths, and control.
    [Show full text]
  • Thesis May Never Have Been Completed
    UvA-DARE (Digital Academic Repository) Digital Equipment Corporation (DEC): A case study of indecision, innovation and company failure Goodwin, D.T. Publication date 2016 Document Version Final published version Link to publication Citation for published version (APA): Goodwin, D. T. (2016). Digital Equipment Corporation (DEC): A case study of indecision, innovation and company failure. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:26 Sep 2021 Digital Equipment Corporation (DEC) (DEC) Corporation Digital Equipment David Thomas David Goodwin Digital Equipment Corporation (DEC): A Case Study of Indecision, Innovation and Company Failure David Thomas Goodwin Digital Equipment Corporation (DEC): A Case Study of Indecision, Innovation and Company Failure David Thomas Goodwin 1 Digital Equipment Corporation (DEC): A Case Study of Indecision, Innovation and Company Failure ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof.
    [Show full text]
  • MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor
    Awards ................................................................................................................................................................ Common Bonds: MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor ONUR MUTLU ETH Zurich RICH BELGARD ......We are continuing our series of of performance optimization on the com- sions made in the MIPS project that retrospectives for the 10 papers that piler and keep the hardware design sim- passed the test of time. The retrospective received the first set of MICRO Test of ple. The compiler is responsible for touches on the design tradeoffs made to Time (“ToT”) Awards in December generating and scheduling simple instruc- couple the hardware and the software, 2014.1,2 This issue features four retro- tions, which require little translation in the MIPS project’s effect on the later spectives written for six of the award- hardware to generate control signals to development of “fabless” semiconductor winning papers. We briefly introduce control the datapath components, which companies, and the use of benchmarks these papers and retrospectives and in turn keeps the hardware design simple. as a method for evaluating end-to-end hope that you will enjoy reading them as Thus, the instructions and hardware both performance of a system as, among much as we have. If anything ties these remain simple, whereas the compiler others, contributions of the MIPS project works together, it is the innovation they becomes much more important (and that have stood the test of time. delivered by taking a strong position in likely complex) because it must schedule the RISC/CISC debates of their decade. instructions well to ensure correct and High-Performance Systems We hope the IEEE Micro audience, espe- high-performance use of a simple pipe- The second retrospective addresses three cially younger generations, will find the line.
    [Show full text]
  • PAP Advanced Computer Architectures 1 ARM Architecture - Registers
    Advanced Computer Architectures RISC architectures examples – ARM, AArch64 a RISC-V Czech Technical University in Prague, Faculty of Electrical Engineering Slides authors: Pavel Píša, Michal Štepanovský B4M35PAP Advanced Computer Architectures 1 ARM architecture - registers Current Visible Registers Abort Mode r0 r1 r2 r3 Banked out Registers r4 r5 r6 User FIQ IRQ SVC Undef r7 r8 r8 r9 r9 r10 r10 r11 r11 r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc) cpsr spsr spsr spsr spsr spsr B4M35PAP Advanced Computer Architectures 2 ARM architecture – ALU and operands encoding Register, optionally with shift operation Operand Operand Shift value can be either be: 1 2 5 bit unsigned integer Specified in bottom byte of another register. Barrel Shifter Used for multiplication by constant Immediate value 8 bit number, with a range of 0-255. ALU Rotated right through even number of positions Allows increased range of 32-bit constants to be loaded directly into Result registers B4M35PAP Advanced Computer Architectures 3 ARM architecture – program status word 31 28 27 24 23 16 15 8 7 6 5 4 0 N Z C V Q J U n d e f i n e d I F T mode f s x c Condition code flags Interrupt Disable bits. N = Negative result from ALU I = 1: Disables the IRQ. Z = Zero result from ALU F = 1: Disables the FIQ. C = ALU operation Carried out V = ALU operation oVerflowed T Bit Architecture xT only Sticky Overflow flag - Q flag T = 0: Processor in ARM state Architecture 5TE/J only T
    [Show full text]
  • Reduced Instruction Set Computers
    ARTICLES REDUCED INSTRUCTION SET COMPUTERS Reduced instruction set computers aim for both simplicity in hardware and synergy between architectures and compilers. Optimizing compilers are used to compile programming languages down to instructions that are as unencumbered as microinstructions in a large virtual address space, and to make the instruction cycle time as fast as possible. DAVID A. PATTERSON As circuit technologies reduce the relative cost of proc- called a microinstruction, and the contents are essen- essing and memory, instruction sets that are too com- tially an interpreter, programmed in microinstructions. plex become a distinct liability to performance. The The main memories of these computers were magnetic designers of reduced instruction set computers (RISCs) core memories, the small control memories of which strive for both simplicity in hardware and synergy be- were usually 10 times faster than core. tween architecture and compilers, in order to stream- Minicomputer manufacturers tend to follow the lead line processing as much as possible. Early experience of mainframe manufacturers, especially when the indicates that RISCs can in fact run much faster than mainframe manufacturer is IBM, and so microprogram- more conventionally designed machines. ming caught on quickly. The rapid growth of semicon- ductor memories also speeded this trend. In the early BACKGROUND 1970s. for example, 8192 bits of read-only memory The IBM System/360, first introduced in 1964, was the (ROM) took up no more space than 8 bits of register. real beginning of modern computer architecture. Al- Eventually, minicomputers using core main memory though computers in the System/360 “family” provided and semiconductor control memory became standard in a different level of performance for a different price, all the minicomputer industry.
    [Show full text]
  • Arm System-On-Chip Architecture.Pdf
    Preface Aims This book introduces the concepts and methodologies employed in designing a system-on-chip (SoC) based around a microprocessor core and in designing the microprocessor core itself. The principles of microprocessor design are made con- crete by extensive illustrations based upon the ARM. The aim of the book is to assist the reader in understanding how SoCs and micro- processors are designed and used, and why a modern processor is designed the way that it is. The reader who wishes to know only the general principles should find that the ARM illustrations add substance to issues which can otherwise appear somewhat ethereal; the reader who wishes to understand the design of the ARM should find that the general principles illuminate the rationale for the ARM being as it is. Other microprocessor architectures are not described in this book. The reader who wishes to make a comparative study of architectures will find the required informa- tion on the ARM here but must look elsewhere for information on other designs. Audience The book is intended to be of use to two distinct groups of readers: • Professional hardware and software engineers who are tasked with designing an SoC product which incorporates an ARM processor, or who are evaluating the ARM for a product, should find the book helpful in their duties. Although there is considerable overlap with ARM technical publications, this book provides a broader context with more background. It is not a substitute for the manufac turer's data, since much detail has had to be omitted, but it should be useful as an introductory overview and adjunct to that data.
    [Show full text]