An Improved Instruction-Level Power and Energy Model for RISC Microprocessors

Total Page:16

File Type:pdf, Size:1020Kb

An Improved Instruction-Level Power and Energy Model for RISC Microprocessors UNIVERSITY OF SOUTHAMPTON FACULTY OF PHYSICAL SCIENCES AND ENGINEERING Electronics and Computer Science An Improved Instruction-Level Power and Energy Model for RISC Microprocessors by Wei Wang Thesis for the degree of Doctor of Philosophy March 2017 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL SCIENCES AND ENGINEERING Electronics and Computer Science Doctor of Philosophy AN IMPROVED INSTRUCTION-LEVEL POWER AND ENERGY MODEL FOR RISC MICROPROCESSORS by Wei Wang Recently, the power and energy consumed by a chip has become a primary design con- straint for embedded systems and is largely affected by software. Because aims vary with the application domain, the best program is sometimes the most power or energy efficient one rather than the fastest. However, there is a gap between software and hardware that makes it hard to predict which code consumes the least power without measurement. Therefore, it is vital to discover which factors can affect a program's power and energy consumption. In this thesis we present an instruction level model to estimate the power and energy consumed by a program. Firstly, instead of studying the different instructions individ- ually, we cluster instructions into three groups: ALU, load and store. The power is affected by the percentage of each group in the program. Secondly, the power is affected by the instructions per cycle (IPC) of the program since IPC can reflect how fast the processor runs. There are three advantages of this method, and the first one is conciseness. The reason is that it does not consider the overhead energy as an independent factor or the operand Hamming distance of two consecutive instructions. The second one is accuracy. For example, the errors of our method across different benchmarks with different processors on the development boards are all less than 10%. The last and the most important advantage of this method is that it can apply to different processors, such as OpenRISC processor, ARM11, ARM Cortex-A8, and a dual- core ARM Cortex-A9 processor. We have demonstrated that the previous instruction level power/energe model cannot be extended to superscalar processors and multi-core processors. Contents Acknowledgements xvii Abbreviations xix Nomenclature xxi 1 Introduction1 1.1 Motivation and Objects............................1 1.2 Contribution..................................3 1.3 Thesis Organization..............................5 1.4 List of Publications...............................6 2 Literature Review9 2.1 Basic Model................................... 10 2.1.1 The Base Power/Energy Cost..................... 10 2.1.2 The effect of two adjacent instructions................ 11 2.1.3 The Basic Instruction Level Model.................. 13 2.1.4 The extension of the basic model................... 14 2.2 Nop Model................................... 16 2.3 Clustering Instructions Model......................... 21 2.4 Linear Regression Method to Analyze the Power/Energy.......... 23 2.4.1 Introduction to linear regression................... 23 2.4.2 Linear Regression Model........................ 24 2.4.3 Data Dependent Model........................ 26 2.4.3.1 The Effect of Data...................... 26 2.4.3.2 Different Data Dependent Models............. 28 2.4.4 Cycle-accurate Model......................... 33 2.5 Functional-level Power Model......................... 35 2.6 Architecture-level Estimation......................... 41 2.7 System-level Estimation............................ 43 2.8 Some Other Related Research......................... 50 2.8.1 The Effect of Cache.......................... 52 2.8.2 The Effect of The Different Hazards................. 54 2.8.3 The Effect of Memory and The Various Address Models...... 55 2.9 Conclusion................................... 56 2.9.1 Summary of The Previous Work................... 56 2.9.2 Trends and Changes.......................... 56 v vi CONTENTS 3 OpenRISC 59 3.1 Target Processor................................ 59 3.2 Experimental Tools............................... 61 3.2.1 Synthesis tool: Design Compiler................... 61 3.2.2 CMOS power dissipation and power analysis tool: Primetime... 61 3.2.2.1 CMOS power dissipation.................. 61 3.2.2.2 Power analysis tool: Primetime............... 63 3.3 Experimental Methodology.......................... 63 3.4 Power Analysis of Basic Test......................... 65 3.4.1 Design Of The Tests.......................... 65 3.4.2 Test Results............................... 68 3.5 Instruction Level Modeling.......................... 70 3.6 Estimation And Analysis........................... 72 3.6.1 Design of the tests........................... 72 3.6.2 Test Result............................... 75 3.7 Comparison................................... 77 3.8 Conclusions................................... 82 3.9 Limitation of the work............................. 83 4 ARM11 85 4.1 Target Processor................................ 85 4.2 Experimental Methodology.......................... 86 4.3 Basic Power Consumption of Different Instructions............. 88 4.4 The Power Consumption of Different Hamming Distances......... 91 4.5 The Overhead Power Cost........................... 94 4.6 Instruction level power analysis and modeling................ 97 4.7 Validation.................................... 101 4.8 Energy model.................................. 102 4.8.1 Comparison with Previous Work................... 103 4.8.2 Comparison with the Basic Energy Model.............. 104 4.8.3 Discussion: Low Energy Software................... 109 4.9 Conclusions................................... 111 5 ARM Cortex-A8 113 5.1 Target Processor................................ 114 5.2 Experimental Methodology.......................... 115 5.3 The Power Consumption of Different Instructions When Dual-Issue.... 116 5.3.1 The IPC for Each Dual-Issue Test.................. 116 5.3.2 The Power Consumption of Arithmetic and Logic Instructions.. 118 5.3.3 The Power Consumption of Load&Store and MUL......... 120 5.4 The Power Consumption of Dual-issue Restrictions............. 121 5.4.1 The IPC of the Dual-issue Restriction Tests............. 123 5.4.2 The Power Consumption of Dual-issue Restrictions......... 123 5.5 The Power Consumption with Different Hamming Distances........ 124 5.6 The Overhead Power Cost........................... 125 5.7 The Power Consumption of Data Cache................... 128 5.7.1 Design of the experiment....................... 128 CONTENTS vii 5.7.2 The Power Consumption And IPC of The Data Cache Experiment 133 5.8 Instruction Level Power Analysis and Modeling............... 135 5.8.1 The Power Analysis of the Combined Tests............. 135 5.8.2 Instruction Level Model........................ 138 5.9 Validation of the Power Model........................ 140 5.10 Comparison and Discussion.......................... 141 5.11 Discussion: Low Energy Software....................... 145 5.12 Conclusion................................... 146 6 ARM Cortex-A9 Dual-core Processor 149 6.1 Target Processor................................ 150 6.2 Experimental Methodology.......................... 153 6.3 Instruction Level Power Model Analysis for a Dual-core.......... 154 6.4 Experimental Design.............................. 156 6.5 Experimental Results.............................. 159 6.5.1 The Test Results of the Best Case.................. 159 6.5.2 The Test Result of the Worst Case.................. 161 6.5.3 Summary of the Experimental Results................ 163 6.6 Instruction Level Power Modeling....................... 164 6.7 Validation.................................... 166 6.8 Discussion: Energy Consumption and Performance............. 170 6.8.1 Discussion: EPI vs IPC for a Dual-core Processor.......... 170 6.8.2 Discussion: the Energy of a Single Thread Program vs a Multi Thread Program............................ 172 6.9 Conclusion................................... 173 7 How To Apply The Model To New Processors 175 7.1 The limitation of the method......................... 175 7.2 How to apply the model to new processors.................. 176 7.3 Conclusion................................... 181 8 Conclusion and Future Work 183 8.1 Conclusion................................... 183 8.2 Future Work.................................. 184 8.2.1 Apply to more complex systems................... 184 8.2.2 Reduce the energy consumption of the system............ 185 8.2.3 Static program analysis........................ 185 A Design Codes and Benchmarks 189 A.1 The Test Code of Chapter 3.......................... 189 A.1.1 The code of Section 3.3........................ 189 A.1.2 The code of Section 3.4........................ 189 A.1.3 The code of Section 3.6........................ 210 A.1.4 The code of Section 3.7........................ 223 A.2 The Test Code of Chapter 4.......................... 233 A.2.1 The code of Section 4.3........................ 233 A.2.2 The code of Section 4.4........................ 239 A.2.3 The code of Section 4.5........................ 240 viii CONTENTS A.2.4 The code of Section 4.6........................ 242 A.2.5 The code of Section 4.7........................ 243 A.3 The Test Code of Chapter 5.......................... 247 A.3.1 The code of Section 5.4........................ 247 A.3.2 The code of Section 5.7........................ 248 A.3.3 The code of Section 5.8........................ 251 A.3.4 The code of Section 5.9........................ 253 A.4 The Test Code of Chapter 6.......................... 253 A.4.1 The code of Section
Recommended publications
  • Debugging System for Openrisc 1000- Based Systems
    Debugging System for OpenRisc 1000- based Systems Nathan Yawn [email protected] 05/12/09 Copyright (C) 2008 Nathan Yawn Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license should be included with this document. If not, the license may be obtained from www.gnu.org, or by writing to the Free Software Foundation. This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. History Rev Date Author Comments 1.0 20/7/2008 Nathan Yawn Initial version Contents 1.Introduction.............................................................................................................................................5 1.1.Overview.........................................................................................................................................5 1.2.Versions...........................................................................................................................................6 1.3.Stub-based methods.........................................................................................................................6 2.System Components................................................................................................................................7
    [Show full text]
  • Implementation, Verification and Validation of an Openrisc-1200
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 10, No. 1, 2019 Implementation, Verification and Validation of an OpenRISC-1200 Soft-core Processor on FPGA Abdul Rafay Khatri Department of Electronic Engineering, QUEST, NawabShah, Pakistan Abstract—An embedded system is a dedicated computer system in which hardware and software are combined to per- form some specific tasks. Recent advancements in the Field Programmable Gate Array (FPGA) technology make it possible to implement the complete embedded system on a single FPGA chip. The fundamental component of an embedded system is a microprocessor. Soft-core processors are written in hardware description languages and functionally equivalent to an ordinary microprocessor. These soft-core processors are synthesized and implemented on the FPGA devices. In this paper, the OpenRISC 1200 processor is used, which is a 32-bit soft-core processor and Fig. 1. General block diagram of embedded systems. written in the Verilog HDL. Xilinx ISE tools perform synthesis, design implementation and configure/program the FPGA. For verification and debugging purpose, a software toolchain from (RISC) processor. This processor consists of all necessary GNU is configured and installed. The software is written in C components which are available in any other microproces- and Assembly languages. The communication between the host computer and FPGA board is carried out through the serial RS- sor. These components are connected through a bus called 232 port. Wishbone bus. In this work, the OR1200 processor is used to implement the system on a chip technology on a Virtex-5 Keywords—FPGA Design; HDLs; Hw-Sw Co-design; Open- FPGA board from Xilinx.
    [Show full text]
  • Design and VHDL Implementation of an Application-Specific Instruction Set Processor
    Design and VHDL Implementation of an Application-Specific Instruction Set Processor Lauri Isola School of Electrical Engineering Thesis submitted for examination for the degree of Master of Science in Technology. Espoo 19.12.2019 Supervisor Prof. Jussi Ryynänen Advisor D.Sc. (Tech.) Marko Kosunen Copyright © 2019 Lauri Isola Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of the master’s thesis Author Lauri Isola Title Design and VHDL Implementation of an Application-Specific Instruction Set Processor Degree programme Computer, Communication and Information Sciences Major Signal, Speech and Language Processing Code of major ELEC3031 Supervisor Prof. Jussi Ryynänen Advisor D.Sc. (Tech.) Marko Kosunen Date 19.12.2019 Number of pages 66+45 Language English Abstract Open source processors are becoming more popular. They are a cost-effective option in hardware designs, because using the processor does not require an expensive license. However, a limited number of open source processors are still available. This is especially true for Application-Specific Instruction Set Processors (ASIPs). In this work, an ASIP processor was designed and implemented in VHDL hardware description language. The design was based on goals that make the processor easily customizable, and to have a low resource consumption in a System- on-Chip (SoC) design. Finally, the processor was implemented on an FPGA circuit, where it was tested with a specially designed VGA graphics controller. Necessary software tools, such as an assembler were also implemented for the processor. The assembler was used to write comprehensive test programs for testing and verifying the functionality of the processor. This work also examined some future upgrades of the designed processor.
    [Show full text]
  • Openpiton: an Open Source Manycore Research Framework
    OpenPiton: An Open Source Manycore Research Framework Jonathan Balkind Michael McKeown Yaosheng Fu Tri Nguyen Yanqi Zhou Alexey Lavrov Mohammad Shahrad Adi Fuchs Samuel Payne ∗ Xiaohua Liang Matthew Matl David Wentzlaff Princeton University fjbalkind,mmckeown,yfu,trin,yanqiz,alavrov,mshahrad,[email protected], [email protected], fxiaohua,mmatl,[email protected] Abstract chipset Industry is building larger, more complex, manycore proces- sors on the back of strong institutional knowledge, but aca- demic projects face difficulties in replicating that scale. To Tile alleviate these difficulties and to develop and share knowl- edge, the community needs open architecture frameworks for simulation, synthesis, and software exploration which Chip support extensibility, scalability, and configurability, along- side an established base of verification tools and supported software. In this paper we present OpenPiton, an open source framework for building scalable architecture research proto- types from 1 core to 500 million cores. OpenPiton is the world’s first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton leverages the industry hardened OpenSPARC T1 core with modifica- Figure 1: OpenPiton Architecture. Multiple manycore chips tions and builds upon it with a scratch-built, scalable uncore are connected together with chipset logic and networks to creating a flexible, modern manycore design. In addition, build large scalable manycore systems. OpenPiton’s cache OpenPiton provides synthesis and backend scripts for ASIC coherence protocol extends off chip. and FPGA to enable other researchers to bring their designs to implementation. OpenPiton provides a complete verifica- tion infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and has been widespread across the industry with manycore pro- is written in industry standard Verilog.
    [Show full text]
  • Comparative Architectures
    Comparative Architectures CST Part II, 16 lectures Lent Term 2006 David Greaves [email protected] Slides Lectures 1-13 (C) 2006 IAP + DJG Course Outline 1. Comparing Implementations • Developments fabrication technology • Cost, power, performance, compatibility • Benchmarking 2. Instruction Set Architecture (ISA) • Classic CISC and RISC traits • ISA evolution 3. Microarchitecture • Pipelining • Super-scalar { static & out-of-order • Multi-threading • Effects of ISA on µarchitecture and vice versa 4. Memory System Architecture • Memory Hierarchy 5. Multi-processor systems • Cache coherent and message passing Understanding design tradeoffs 2 Reading material • OHP slides, articles • Recommended Book: John Hennessy & David Patterson, Computer Architecture: a Quantitative Approach (3rd ed.) 2002 Morgan Kaufmann • MIT Open Courseware: 6.823 Computer System Architecture, by Krste Asanovic • The Web http://bwrc.eecs.berkeley.edu/CIC/ http://www.chip-architect.com/ http://www.geek.com/procspec/procspec.htm http://www.realworldtech.com/ http://www.anandtech.com/ http://www.arstechnica.com/ http://open.specbench.org/ • comp.arch News Group 3 Further Reading and Reference • M Johnson Superscalar microprocessor design 1991 Prentice-Hall • P Markstein IA-64 and Elementary Functions 2000 Prentice-Hall • A Tannenbaum, Structured Computer Organization (2nd ed.) 1990 Prentice-Hall • A Someren & C Atack, The ARM RISC Chip, 1994 Addison-Wesley • R Sites, Alpha Architecture Reference Manual, 1992 Digital Press • G Kane & J Heinrich, MIPS RISC Architecture
    [Show full text]
  • CS152: Computer Systems Architecture Pipelining
    CS152: Computer Systems Architecture Pipelining Sang-Woo Jun Winter 2021 Large amount of material adapted from MIT 6.004, “Computation Structures”, Morgan Kaufmann “Computer Organization and Design: The Hardware/Software Interface: RISC-V Edition”, and CS 152 Slides by Isaac Scherson Eight great ideas ❑ Design for Moore’s Law ❑ Use abstraction to simplify design ❑ Make the common case fast ❑ Performance via parallelism ❑ Performance via pipelining ❑ Performance via prediction ❑ Hierarchy of memories ❑ Dependability via redundancy But before we start… Performance Measures ❑ Two metrics when designing a system 1. Latency: The delay from when an input enters the system until its associated output is produced 2. Throughput: The rate at which inputs or outputs are processed ❑ The metric to prioritize depends on the application o Embedded system for airbag deployment? Latency o General-purpose processor? Throughput Performance of Combinational Circuits ❑ For combinational logic o latency = tPD o throughput = 1/t F and G not doing work! PD Just holding output data X F(X) X Y G(X) H(X) Is this an efficient way of using hardware? Source: MIT 6.004 2019 L12 Pipelined Circuits ❑ Pipelining by adding registers to hold F and G’s output o Now F & G can be working on input Xi+1 while H is performing computation on Xi o A 2-stage pipeline! o For input X during clock cycle j, corresponding output is emitted during clock j+2. Assuming ideal registers Assuming latencies of 15, 20, 25… 15 Y F(X) G(X) 20 H(X) 25 Source: MIT 6.004 2019 L12 Pipelined Circuits 20+25=45 25+25=50 Latency Throughput Unpipelined 45 1/45 2-stage pipelined 50 (Worse!) 1/25 (Better!) Source: MIT 6.004 2019 L12 Pipeline conventions ❑ Definition: o A well-formed K-Stage Pipeline (“K-pipeline”) is an acyclic circuit having exactly K registers on every path from an input to an output.
    [Show full text]
  • Quantum Chemical Computational Methods Have Proved to Be An
    34671 K. Immanuvel Arokia James et al./ Elixir Comp. Sci. & Engg. 85 (2015) 34671-34676 Available online at www.elixirpublishers.com (Elixir International Journal) Computer Science and Engineering Elixir Comp. Sci. & Engg. 85 (2015) 34671-34676 Outerloop pipelining in FFT using single dimension software pipelining K. Immanuvel Arokia James1 and M. J. Joyce Kiruba2 1Department of EEE VEL Tech Multi Tech Dr. RR Dr. SR Engg College, Chennai, India. 2Department of ECE, Dr. MGR University, Chennai, India. ARTICLE INFO ABSTRACT Article history: The aim of the paper is to produce faster results when pipelining above the inner most loop. Received: 19 May 2012; The concept of outerloop pipelining is tested here with Fast Fourier Transform. Reduction Received in revised form: in the number of cycles spent flushing and filling the pipeline and the potential for data reuse 22 August 2015; is another advantage in the outerloop pipelining. In this work we extend and adapt the Accepted: 29 August 2015; existing SSP approach to better suit the generation of schedules for hardware, specifically FPGAs. We also introduce a search scheme to find the shortest schedule available within the Keywords pipelining framework to maximize the gains in pipelining above the innermost loop. The FFT, hardware compilers apply loop pipelining to increase the parallelism achieved, but Pipelining, VHDL, pipelining is restricted to the only innermost level in nested loop. In this work we extend and Nested loop, adapt an existing outer loop pipelining approach known as Single Dimension Software FPGA, Pipelining to generate schedules for FPGA hardware coprocessors. Each loop level in nine Integer Linear Programming (ILP).
    [Show full text]
  • Design and Implementation of a Multithreaded Associative Simd Processor
    DESIGN AND IMPLEMENTATION OF A MULTITHREADED ASSOCIATIVE SIMD PROCESSOR A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Kevin Schaffer December, 2011 Dissertation written by Kevin Schaffer B.S., Kent State University, 2001 M.S., Kent State University, 2003 Ph.D., Kent State University, 2011 Approved by Robert A. Walker, Chair, Doctoral Dissertation Committee Johnnie W. Baker, Members, Doctoral Dissertation Committee Kenneth E. Batcher, Eugene C. Gartland, Accepted by John R. D. Stalvey, Administrator, Department of Computer Science Timothy Moerland, Dean, College of Arts and Sciences ii TABLE OF CONTENTS LIST OF FIGURES ......................................................................................................... viii LIST OF TABLES ............................................................................................................. xi CHAPTER 1 INTRODUCTION ........................................................................................ 1 1.1. Architectural Trends .............................................................................................. 1 1.1.1. Wide-Issue Superscalar Processors............................................................... 2 1.1.2. Chip Multiprocessors (CMPs) ...................................................................... 2 1.2. An Alternative Approach: SIMD ........................................................................... 3 1.3. MTASC Processor ................................................................................................
    [Show full text]
  • Notes-Cso-Unit-5
    Unit-05/Lecture-01 Pipeline Processing In computing, a pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion; in that case, some amount of buffer storage is often inserted between elements. Computer-related pipelines include: Instruction pipelines, such as the classic RISC pipeline, which are used in central processing units (CPUs) to allow overlapping execution of multiple instructions with the same circuitry. The circuitry is usually divided up into stages, including instruction decoding, arithmetic, and register fetching stages, wherein each stage processes one instruction at a time. Graphics pipelines, found in most graphics processing units (GPUs), which consist of multiple arithmetic units, or complete CPUs, that implement the various stages of common rendering operations (perspective projection, window clipping, color and light calculation, rendering, etc.). Software pipelines, where commands can be written where the output of one operation is automatically fed to the next, following operation. The Unix system call pipe is a classic example of this concept, although other operating systems do support pipes as well. Pipelining is a natural concept in everyday life, e.g. on an assembly line. Consider the assembly of a car: assume that certain steps in the assembly line are to install the engine, install the hood, and install the wheels (in that order, with arbitrary interstitial steps). A car on the assembly line can have only one of the three steps done at once.
    [Show full text]
  • Small Soft Core up Inventory ©2019 James Brakefield Opencore and Other Soft Core Processors Reverse-U16 A.T
    tool pip _uP_all_soft opencores or style / data inst repor com LUTs blk F tool MIPS clks/ KIPS ven src #src fltg max max byte adr # start last secondary web status author FPGA top file chai e note worthy comments doc SOC date LUT? # inst # folder prmary link clone size size ter ents ALUT mults ram max ver /inst inst /LUT dor code files pt Hav'd dat inst adrs mod reg year revis link n len Small soft core uP Inventory ©2019 James Brakefield Opencore and other soft core processors reverse-u16 https://github.com/programmerby/ReVerSE-U16stable A.T. Z80 8 8 cylcone-4 James Brakefield11224 4 60 ## 14.7 0.33 4.0 X Y vhdl 29 zxpoly Y yes N N 64K 64K Y 2015 SOC project using T80, HDMI generatorretro Z80 based on T80 by Daniel Wallner copyblaze https://opencores.org/project,copyblazestable Abdallah ElIbrahimi picoBlaze 8 18 kintex-7-3 James Brakefieldmissing block622 ROM6 217 ## 14.7 0.33 2.0 57.5 IX vhdl 16 cp_copyblazeY asm N 256 2K Y 2011 2016 wishbone extras sap https://opencores.org/project,sapstable Ahmed Shahein accum 8 8 kintex-7-3 James Brakefieldno LUT RAM48 or block6 RAM 200 ## 14.7 0.10 4.0 104.2 X vhdl 15 mp_struct N 16 16 Y 5 2012 2017 https://shirishkoirala.blogspot.com/2017/01/sap-1simple-as-possible-1-computer.htmlSimple as Possible Computer from Malvinohttps://www.youtube.com/watch?v=prpyEFxZCMw & Brown "Digital computer electronics" blue https://opencores.org/project,bluestable Al Williams accum 16 16 spartan-3-5 James Brakefieldremoved clock1025 constraint4 63 ## 14.7 0.67 1.0 41.1 X verilog 16 topbox web N 4K 4K N 16 2 2009
    [Show full text]
  • Evaluation of Synthesizable CPU Cores
    Evaluation of synthesizable CPU cores DANIEL MATTSSON MARCUS CHRISTENSSON Maste r ' s Thesis Com p u t e r Science an d Eng i n ee r i n g Pro g r a m CHALMERS UNIVERSITY OF TECHNOLOGY Depart men t of Computer Engineering Gothe n bu r g 20 0 4 All rights reserved. This publication is protected by law in accordance with “Lagen om Upphovsrätt, 1960:729”. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the authors. Daniel Mattsson and Marcus Christensson, Gothenburg 2004. Evaluation of synthesizable CPU cores Abstract The three synthesizable processors: LEON2 from Gaisler Research, MicroBlaze from Xilinx, and OpenRISC 1200 from OpenCores are evaluated and discussed. Performance in terms of benchmark results and area resource usage is measured. Different aspects like usability and configurability are also reviewed. Three configurations for each of the processors are defined and evaluated: the comparable configuration, the performance optimized configuration and the area optimized configuration. For each of the configurations three benchmarks are executed: the Dhrystone 2.1 benchmark, the Stanford benchmark suite and a typical control application run as a benchmark. A detailed analysis of the three processors and their development tools is presented. The three benchmarks are described and motivated. Conclusions and results in terms of benchmark results, performance per clock cycle and performance per area unit are discussed and presented. Sammanfattning De tre syntetiserbara processorerna: LEON2 från Gaisler Research, MicroBlaze från Xilinx och OpenRISC 1200 från OpenCores utvärderas och diskuteras.
    [Show full text]
  • An Evaluation of Soft Processors As a Reliable Computing Platform
    Brigham Young University BYU ScholarsArchive Theses and Dissertations 2015-07-01 An Evaluation of Soft Processors as a Reliable Computing Platform Michael Robert Gardiner Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Electrical and Computer Engineering Commons BYU ScholarsArchive Citation Gardiner, Michael Robert, "An Evaluation of Soft Processors as a Reliable Computing Platform" (2015). Theses and Dissertations. 5509. https://scholarsarchive.byu.edu/etd/5509 This Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. An Evaluation of Soft Processors as a Reliable Computing Platform Michael Robert Gardiner A thesis submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of Master of Science Michael J. Wirthlin, Chair Brad L. Hutchings Brent E. Nelson Department of Electrical and Computer Engineering Brigham Young University July 2015 Copyright © 2015 Michael Robert Gardiner All Rights Reserved ABSTRACT An Evaluation of Soft Processors as a Reliable Computing Platform Michael Robert Gardiner Department of Electrical and Computer Engineering, BYU Master of Science This study evaluates the benefits and limitations of soft processors operating in a radiation-hardened FPGA, focusing primarily on the performance and reliability of these systems. FPGAs designs for four popular soft processors, the MicroBlaze, LEON3, Cortex- M0 DesignStart, and OpenRISC 1200 are developed for a Virtex-5 FPGA. The performance of these soft processor designs is then compared on ten widely-used benchmark programs.
    [Show full text]