Register Pointer Architecture for Efficient Embedded Processors
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Analyzing the Benefits of Scratchpad Memories for Scientific Matrix
The Pennsylvania State University The Graduate School Department of Computer Science and Engineering ANALYZING THE BENEFITS OF SCRATCHPAD MEMORIES FOR SCIENTIFIC MATRIX COMPUTATIONS A Thesis in Computer Science and Engineering by Bryan Alan Cover c 2008 Bryan Alan Cover Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science May 2008 The thesis of Bryan Alan Cover was reviewed and approved* by the following: Mary Jane Irwin Evan Pugh Professor of Computer Science and Engineering Thesis Co-Advisor Padma Raghavan Professor of Computer Science and Engineering Thesis Co-Advisor Raj Acharya Professor of Computer Science and Engineering Head of the Department of Computer Science and Engineering *Signatures are on file in the Graduate School. iii Abstract Scratchpad memories (SPMs) have been shown to be more energy efficient, have faster access times, and take up less area than traditional hardware-managed caches. This, coupled with the predictability of data presence and reduced thermal properties, makes SPMs an attractive alternative to cache for many scientific applications. In this work, SPM based systems are considered for a variety of different functions. The first study performed is to analyze SPMs for their thermal and area properties on a conven- tional RISC processor. Six performance optimized variants of architecture are explored that evaluate the impact of having an SPM in the on-chip memory hierarchy. Increasing the performance and energy efficiency of both dense and sparse matrix-vector multipli- cation on a chip multi-processor are also looked at. The efficient utilization of the SPM is ensured by profiling the application for the data structures which do not perform well in traditional cache. -
Memory Architectures 13 Preeti Ranjan Panda
Memory Architectures 13 Preeti Ranjan Panda Abstract In this chapter we discuss the topic of memory organization in embedded systems and Systems-on-Chips (SoCs). We start with the simplest hardware- based systems needing registers for storage and proceed to hardware/software codesigned systems with several standard structures such as Static Random- Access Memory (SRAM) and Dynamic Random-Access Memory (DRAM). In the process, we touch upon concepts such as caches and Scratchpad Memories (SPMs). In general, the emphasis is on concepts that are more generally found in SoCs and less on general-purpose computing systems, although this distinction is not very clearly defined with respect to the memory subsystem. We touch upon implementations of these ideas in modern research and commercial scenarios. In this chapter, we also point out issues arising in the context of the memory architectures that become exported as problems to be addressed by the compiler and system designer. Acronyms ALU Arithmetic-Logic Unit ARM Advanced Risc Machines CGRA Coarse Grained Reconfigurable Architecture CPU Central Processing Unit DMA Direct Memory Access DRAM Dynamic Random-Access Memory GPGPU General-Purpose computing on Graphics Processing Units GPU Graphics Processing Unit HDL Hardware Description Language PCM Phase Change Memory P.R. Panda () Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India e-mail: [email protected] © Springer Science+Business Media Dordrecht 2017 411 S. Ha, J. Teich (eds.), Handbook of Hardware/Software Codesign, DOI 10.1007/978-94-017-7267-9_14 412 P.R. Panda SoC System-on-Chip SPM Scratchpad Memory SRAM Static Random-Access Memory STT-RAM Spin-Transfer Torque Random-Access Memory SWC Software Cache Contents 13.1 Motivating the Significance of Memory.................................... -
Computer Organization and Architecture Designing for Performance Ninth Edition
COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION William Stallings Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Editorial Director: Marcia Horton Designer: Bruce Kenselaar Executive Editor: Tracy Dunkelberger Manager, Visual Research: Karen Sanatar Associate Editor: Carole Snyder Manager, Rights and Permissions: Mike Joyce Director of Marketing: Patrice Jones Text Permission Coordinator: Jen Roach Marketing Manager: Yez Alayan Cover Art: Charles Bowman/Robert Harding Marketing Coordinator: Kathryn Ferranti Lead Media Project Manager: Daniel Sandin Marketing Assistant: Emma Snider Full-Service Project Management: Shiny Rajesh/ Director of Production: Vince O’Brien Integra Software Services Pvt. Ltd. Managing Editor: Jeff Holcomb Composition: Integra Software Services Pvt. Ltd. Production Project Manager: Kayla Smith-Tarbox Printer/Binder: Edward Brothers Production Editor: Pat Brown Cover Printer: Lehigh-Phoenix Color/Hagerstown Manufacturing Buyer: Pat Brown Text Font: Times Ten-Roman Creative Director: Jayne Conte Credits: Figure 2.14: reprinted with permission from The Computer Language Company, Inc. Figure 17.10: Buyya, Rajkumar, High-Performance Cluster Computing: Architectures and Systems, Vol I, 1st edition, ©1999. Reprinted and Electronically reproduced by permission of Pearson Education, Inc. Upper Saddle River, New Jersey, Figure 17.11: Reprinted with permission from Ethernet Alliance. Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text. Copyright © 2013, 2010, 2006 by Pearson Education, Inc., publishing as Prentice Hall. All rights reserved. Manufactured in the United States of America. -
Introduction to the Intel® Nios® II Soft Processor
Introduction to the Intel® Nios® II Soft Processor For Quartus® Prime 18.1 1 Introduction This tutorial presents an introduction the Intel® Nios® II processor, which is a soft processor that can be instantiated on an Intel FPGA device. It describes the basic architecture of Nios II and its instruction set. The Nios II processor and its associated memory and peripheral components are easily instantiated by using Intel’s SOPC Builder or Platform Designer tool in conjunction with the Quartus® Prime software. A full description of the Nios II processor is provided in the Nios II Processor Reference Handbook, which is available in the literature section of the Intel web site. Introductions to the SOPC Builder and Platform Designer tools are given in the tutorials Introduction to the Intel SOPC Builder and Introduction to the Intel Platform Designer Tool, respectively. Both can be found in the University Program section of the web site. Contents: • Nios II System • Overview of Nios II Processor Features • Register Structure • Accessing Memory and I/O Devices • Addressing • Instruction Set • Assembler Directives • Example Program • Exception Processing • Cache Memory • Tightly Coupled Memory Intel Corporation - FPGA University Program 1 March 2019 ® ® INTRODUCTION TO THE INTEL NIOS II SOFT PROCESSOR For Quartus® Prime 18.1 2 Background Intel’s Nios II is a soft processor, defined in a hardware description language, which can be implemented in Intel’s FPGA devices by using the Quartus Prime CAD system. This tutorial provides a basic introduction to the Nios II processor, intended for a user who wishes to implement a Nios II based system on an Intel Development and Education board. -
Computer Organization & Architecture Eie
COMPUTER ORGANIZATION & ARCHITECTURE EIE 411 Course Lecturer: Engr Banji Adedayo. Reg COREN. The characteristics of different computers vary considerably from category to category. Computers for data processing activities have different features than those with scientific features. Even computers configured within the same application area have variations in design. Computer architecture is the science of integrating those components to achieve a level of functionality and performance. It is logical organization or designs of the hardware that make up the computer system. The internal organization of a digital system is defined by the sequence of micro operations it performs on the data stored in its registers. The internal structure of a MICRO-PROCESSOR is called its architecture and includes the number lay out and functionality of registers, memory cell, decoders, controllers and clocks. HISTORY OF COMPUTER HARDWARE The first use of the word ‘Computer’ was recorded in 1613, referring to a person who carried out calculation or computation. A brief History: Computer as we all know 2day had its beginning with 19th century English Mathematics Professor named Chales Babage. He designed the analytical engine and it was this design that the basic frame work of the computer of today are based on. 1st Generation 1937-1946 The first electronic digital computer was built by Dr John V. Atanasoff & Berry Cliford (ABC). In 1943 an electronic computer named colossus was built for military. 1946 – The first general purpose digital computer- the Electronic Numerical Integrator and computer (ENIAC) was built. This computer weighed 30 tons and had 18,000 vacuum tubes which were used for processing. -
Gpgpus: Overview Pedagogically Precursor Concepts UNIVERSITY of ILLINOIS at URBANA-CHAMPAIGN
GPGPUs: Overview Pedagogically precursor concepts UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN © 2018 L. V. Kale at the University of Illinois Urbana-Champaign Precursor Concepts: Pedagogical • Architectural elements that, at least pedagogically, are precursors to understanding GPGPUs • SIMD and vector units. • We have already seen those • Large scale “Hyperthreading” for latency tolerance • Scratchpad memories • High BandWidth memory L.V.Kale 2 Tera MTA, Sun UltraSPARC T1 (Niagara) • Tera computers, With Burton Smith as a co-founder (1987) • Precursor: HEP processor (Denelcor Inc.), 1982 • First machine Was MTA • MTA-1, MTA-2, MTA-3 • Basic idea: • Support a huge number of hardWare threads (128) • Each With its oWn registers • No Cache! • Switch among threads on every cycle, thus tolerating DRAM latency • These threads could be running different processes • Such processors are called “barrel processors” in literature • But they sWitched to the “next thread” alWays.. So your turn is 127 clocks aWay, always • Was especially good for highly irregular accesses L.V.Kale 3 Scratchpad memory • Caches are complex and can cause unpredictable impact on performance • Scratchpad memories are made from SRAM on chip • Separate part of the address space • As fast or faster than caches, because no tag-matching or associative search • Need explicit instructions to bring data into scratchpad from memory • Load/store instructions exist betWeen registers and scratchpad • Example: IBM/Toshiba/Sony cell processor used in PS/3 • 1 PPE, and 8 SPE cores • Each SPE core has 256 KiB scratchpad • DMA is mechanism for moving data betWeen scratchpad and external DRAM L.V.Kale 4 High BandWidth Memory • As you increase compute capacity of processor chips, the bandWidth to DRAM comes under pressure • Many past improvements (e.g. -
A Modular Soft Processor Core in VHDL
A Modular Soft Processor Core in VHDL Jack Whitham 2002-2003 This is a Third Year project submitted for the degree of MEng in the Department of Computer Science at the University of York. The project will attempt to demonstrate that a modular soft processor core can be designed and implemented on an FPGA, and that the core can be optimised to run a particular embedded application using a minimal amount of FPGA space. The word count of this project (as counted by the Unix wc command after detex was run on the LaTeX source) is 33647 words. This includes all text in the main report and Appendices A, B and C. Excluding source code, the project is 70 pages in length. i Contents I. Introduction 1 1. Background and Literature 1 1.1. Soft Processor Cores . 1 1.2. A Field Programmable Gate Array . 1 1.3. VHSIC Hardware Definition Language (VHDL) . 2 1.4. The Motorola 68020 . 2 II. High-level Project Decisions 3 2. Should the design be based on an existing one? 3 3. Which processor should the soft core be based upon? 3 4. Which processor should be chosen? 3 5. Restating the aims of the project in terms of the chosen processor 4 III. Modular Processor Design Decisions 4 6. Processor Design 4 6.1. Alternatives to a complete processor implementation . 4 6.2. A real processor . 5 6.3. Instruction Decoder and Control Logic . 5 6.4. Arithmetic and Logic Unit (ALU) . 7 6.5. Register File . 7 6.6. Links between Components . -
Assignment Solutions
Week 1: Assignment Solutions 1. Which of the following statements are true? a. The ENIAC computer was built using mechanical relays. b. Harvard Mark1 computer was built using mechanical relays. c. PASCALINE computer could multiply and divide numbers by repeated addition and subtraction. d. Charles Babbage built his automatic computing engine in 19th century. Solution: ((b) and (c)) ENIAC was built using vacuum tubes. Charles Babbage designed his automatic computing engine but could not built it. 2. Which of the following statements are true for Moore’s law? a. Moore’s law predicts that power dissipation will double every 18 months. b. Moore’s law predicts that the number of transistors per chip will double every 18 months. c. Moore’s law predicts that the speed of VLSI circuits will double every 18 months. d. None of the above. Solution: (b) Moore’s law only predicts that number of transistors per chip will double every 18 months. 3. Which of the following generates the necessary signals required to execute an instruction in a computer? a. Arithmetic and Logic Unit b. Memory Unit c. Control Unit d. Input/Output Unit Solution: (c) Control unit acts as the nerve center of a computer and generates the necessary control signals required to execute an instruction. 4. An instruction ADD R1, A is stored at memory location 4004H. R1 is a processor register and A is a memory location with address 400CH. Each instruction is 32-bit long. What will be the values of PC, IR and MAR during execution of the instruction? a. -
Hardware and Software Optimizations for GPU Resource Management
Hardware and Software Optimizations for GPU Resource Management A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Vishwesh Jatala to the DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY KANPUR, INDIA December, 2018 ii Scanned by CamScanner Abstract Graphics Processing Units (GPUs) are widely adopted across various domains due to their massive thread level parallelism (TLP). The TLP that is present in the GPUs is limited by the number of resident threads, which in turn depends on the available resources in the GPUs { such as registers and scratchpad memory. Recent GPUs aim to improve the TLP, and consequently the throughput, by increasing the number of resources. Further, the improvements in semiconductor fabrication enable smaller feature sizes. However, for smaller feature sizes, the leakage power is a significant part of the total power consumption. In this thesis, we provide hardware and software solutions that aim towards the two problems of GPU design: improving throughput and reducing leakage energy. In the first work of the thesis, we focus on improving the performance of GPUs by effective resource management. In GPUs, resources (registers and scratchpad memory) are allocated at thread block level granularity, as a result, some of the resources may not be used up completely and hence will be wasted. We propose an approach that shares the resources of SM to utilize the wasted resources by launching more thread blocks in each SM. We show the effectiveness of our approach for two resources: registers and scratchpad memory (shared memory). On evaluating our approach experimentally with 19 kernels from several benchmark suites, we observed that kernels that underutilize register resource show an average improvement of 11% with register sharing. -
Register Are Used to Quickly Accept, Store, and Transfer Data And
Register are used to quickly accept, store, and transfer data and instructions that are being used immediately by the CPU, there are various types of Registers those are used for various purpose. Among of the some Mostly used Registers named as AC or Accumulator, Data Register or DR, the AR or Address Register, program counter (PC), Memory Data Register (MDR) ,Index register,Memory Buffer Register. These Registers are used for performing the various Operations. While we are working on the System then these Registers are used by the CPU for Performing the Operations. When We Gives Some Input to the System then the Input will be Stored into the Registers and When the System will gives us the Results after Processing then the Result will also be from the Registers. So that they are used by the CPU for Processing the Data which is given by the User. Registers Perform:- 1) Fetch: The Fetch Operation is used for taking the instructions those are given by the user and the Instructions those are stored into the Main Memory will be fetch by using Registers. 2) Decode: The Decode Operation is used for interpreting the Instructions means the Instructions are decoded means the CPU will find out which Operation is to be performed on the Instructions. 3) Execute: The Execute Operation is performed by the CPU. And Results those are produced by the CPU are then Stored into the Memory and after that they are displayed on the user Screen. Types of Registers are as Followings 1. MAR stand for Memory Address Register This register holds the memory addresses of data and instructions. -
Improving Memory Subsystem Performance Using Viva: Virtual Vector Architecture
Improving Memory Subsystem Performance using ViVA: Virtual Vector Architecture Joseph Gebis12,Leonid Oliker12, John Shalf1, Samuel Williams12,Katherine Yelick12 1 CRD/NERSC, Lawrence Berkeley National Laboratory Berkeley, CA 94720 2 CS Division, University of California at Berkeley, Berkeley, CA 94720 fJGebis, LOliker, JShalf, SWWilliams, [email protected] Abstract. The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changes to the core design and could thus be eas- ily integrated with conventional processor cores. To validate our approach, we implemented ViVA on the Mambo cycle-accurate full system simulator, which was carefully calibrated to match the performance on our underlying PowerPC Apple G5 architecture. Results show that ViVA is able to deliver significant per- formance benefits over scalar techniques for a variety of memory access pat- terns as well as two important memory-bound compact kernels, corner turn and sparse matrix-vector multiplication — achieving 2x–13x improvement compared the scalar version. Overall, our preliminary ViVA exploration points to a promis- ing approach for improving application performance on leading microprocessors with minimal design and complexity costs, in a power efficient manner. -
IAR C/C++ Compiler Reference Guide for V850
IAR Embedded Workbench® IAR C/C++ Compiler Reference Guide for the Renesas V850 Microcontroller Family CV850-9 COPYRIGHT NOTICE © 1998–2013 IAR Systems AB. No part of this document may be reproduced without the prior written consent of IAR Systems AB. The software described in this document is furnished under a license and may only be used or copied in accordance with the terms of such a license. DISCLAIMER The information in this document is subject to change without notice and does not represent a commitment on any part of IAR Systems. While the information contained herein is assumed to be accurate, IAR Systems assumes no responsibility for any errors or omissions. In no event shall IAR Systems, its employees, its contractors, or the authors of this document be liable for special, direct, indirect, or consequential damage, losses, costs, charges, claims, demands, claim for lost profits, fees, or expenses of any nature or kind. TRADEMARKS IAR Systems, IAR Embedded Workbench, C-SPY, visualSTATE, The Code to Success, IAR KickStart Kit, I-jet, I-scope, IAR and the logotype of IAR Systems are trademarks or registered trademarks owned by IAR Systems AB. Microsoft and Windows are registered trademarks of Microsoft Corporation. Renesas is a registered trademark of Renesas Electronics Corporation. V850 is a trademark of Renesas Electronics Corporation. Adobe and Acrobat Reader are registered trademarks of Adobe Systems Incorporated. All other product names are trademarks or registered trademarks of their respective owners. EDITION NOTICE Ninth edition: May 2013 Part number: CV850-9 This guide applies to version 4.x of IAR Embedded Workbench® for the Renesas V850 microcontroller family.