CPU Architecture Slides
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A 1024-Core 70GFLOPS/W Floating Point Manycore Microprocessor
A 1024-core 70GFLOPS/W Floating Point Manycore Microprocessor Andreas Olofsson, Roman Trogan, Oleg Raikhman Adapteva, Lexington, MA The Past, Present, & Future of Computing SIMD MIMD PE PE PE PE MINI MINI MINI CPU CPU CPU PE PE PE PE MINI MINI MINI CPU CPU CPU PE PE PE PE MINI MINI MINI CPU CPU CPU MINI MINI MINI BIG BIG CPU CPU CPU CPU CPU BIG BIG BIG BIG CPU CPU CPU CPU PAST PRESENT FUTURE 2 Adapteva’s Manycore Architecture C/C++ Programmable Incredibly Scalable 70 GFLOPS/W 3 Routing Architecture 4 E64G400 Specifications (Jan-2012) • 64-Core Microprocessor • 100 GFLOPS performance • 800 MHz Operation • 8GB/sec IO bandwidth • 1.6 TB/sec on chip memory BW • 0.8 TB/sec network on chip BW • 64 Billion Messages/sec IO Pads Core • 2 Watt total chip power • 2MB on chip memory Link Logic • 10 mm2 total silicon area • 324 ball 15x15mm plastic BGA 5 Lab Measurements 80 Energy Efficiency 70 60 50 GFLOPS/W 40 30 20 10 0 0 200 400 600 800 1000 1200 MHz ENERGY EFFICIENCY ENERGY EFFICIENCY (28nm) 6 Epiphany Performance Scaling 16,384 G 4,096 F 1,024 L 256 O 64 4096 P 1024 S 16 256 64 4 16 1 # Cores On‐demand scaling from 0.25W to 64 Watt 7 Hold on...the title said 1024 cores! • We can build it any time! • Waiting for customer • LEGO approach to design • No global timinga paths • Guaranteed by design • Generate any array in 1 day • ~130 mm2 silicon area 1024 Cores 1Core 8 What about 64-bit Floating Point? Single Precision Double Precision 2 FLOPS/CYCLE 2 FLOPS/CYCLE 64KB SRAM 64KB SRAM 0.215mm^2 0.237mm^2 700MHz 600MHz 9 Epiphany Latency Specifications -
Comparison of 116 Open Spec, Hacker Friendly Single Board Computers -- June 2018
Comparison of 116 Open Spec, Hacker Friendly Single Board Computers -- June 2018 Click on the product names to get more product information. In most cases these links go to LinuxGizmos.com articles with detailed product descriptions plus market analysis. HDMI or DP- USB Product Price ($) Vendor Processor Cores 3D GPU MCU RAM Storage LAN Wireless out ports Expansion OSes 86Duino Zero / Zero Plus 39, 54 DMP Vortex86EX 1x x86 @ 300MHz no no2 128MB no3 Fast no4 no5 1 headers Linux Opt. 4GB eMMC; A20-OLinuXino-Lime2 53 or 65 Olimex Allwinner A20 2x A7 @ 1GHz Mali-400 no 1GB Fast no yes 3 other Linux, Android SATA A20-OLinuXino-Micro 65 or 77 Olimex Allwinner A20 2x A7 @ 1GHz Mali-400 no 1GB opt. 4GB NAND Fast no yes 3 other Linux, Android Debian Linux A33-OLinuXino 42 or 52 Olimex Allwinner A33 4x A7 @ 1.2GHz Mali-400 no 1GB opt. 4GB NAND no no no 1 dual 40-pin 3.4.39, Android 4.4 4GB (opt. 16GB A64-OLinuXino 47 to 88 Olimex Allwinner A64 4x A53 @ 1.2GHz Mali-400 MP2 no 1GB GbE WiFi, BT yes 1 40-pin custom Linux eMMC) Banana Pi BPI-M2 Berry 36 SinoVoip Allwinner V40 4x A7 Mali-400 MP2 no 1GB SATA GbE WiFi, BT yes 4 Pi 40 Linux, Android 8GB eMMC (opt. up Banana Pi BPI-M2 Magic 21 SinoVoip Allwinner A33 4x A7 Mali-400 MP2 no 512MB no Wifi, BT no 2 Pi 40 Linux, Android to 64GB) 8GB to 64GB eMMC; Banana Pi BPI-M2 Ultra 56 SinoVoip Allwinner R40 4x A7 Mali-400 MP2 no 2GB GbE WiFi, BT yes 4 Pi 40 Linux, Android SATA Banana Pi BPI-M2 Zero 21 SinoVoip Allwinner H2+ 4x A7 @ 1.2GHz Mali-400 MP2 no 512MB no no WiFi, BT yes 1 Pi 40 Linux, Android Banana -
MIPS Architecture • MIPS (Microprocessor Without Interlocked Pipeline Stages) • MIPS Computer Systems Inc
Spring 2011 Prof. Hyesoon Kim MIPS Architecture • MIPS (Microprocessor without interlocked pipeline stages) • MIPS Computer Systems Inc. • Developed from Stanford • MIPS architecture usages • 1990’s – R2000, R3000, R4000, Motorola 68000 family • Playstation, Playstation 2, Sony PSP handheld, Nintendo 64 console • Android • Shift to SOC http://en.wikipedia.org/wiki/MIPS_architecture • MIPS R4000 CPU core • Floating point and vector floating point co-processors • 3D-CG extended instruction sets • Graphics – 3D curved surface and other 3D functionality – Hardware clipping, compressed texture handling • R4300 (embedded version) – Nintendo-64 http://www.digitaltrends.com/gaming/sony- announces-playstation-portable-specs/ Not Yet out • Google TV: an Android-based software service that lets users switch between their TV content and Web applications such as Netflix and Amazon Video on Demand • GoogleTV : search capabilities. • High stream data? • Internet accesses? • Multi-threading, SMP design • High graphics processors • Several CODEC – Hardware vs. Software • Displaying frame buffer e.g) 1080p resolution: 1920 (H) x 1080 (V) color depth: 4 bytes/pixel 4*1920*1080 ~= 8.3MB 8.3MB * 60Hz=498MB/sec • Started from 32-bit • Later 64-bit • microMIPS: 16-bit compression version (similar to ARM thumb) • SIMD additions-64 bit floating points • User Defined Instructions (UDIs) coprocessors • All self-modified code • Allow unaligned accesses http://www.spiritus-temporis.com/mips-architecture/ • 32 64-bit general purpose registers (GPRs) • A pair of special-purpose registers to hold the results of integer multiply, divide, and multiply-accumulate operations (HI and LO) – HI—Multiply and Divide register higher result – LO—Multiply and Divide register lower result • a special-purpose program counter (PC), • A MIPS64 processor always produces a 64-bit result • 32 floating point registers (FPRs). -
Developer Guide
Red Hat Enterprise Linux 6 Developer Guide An introduction to application development tools in Red Hat Enterprise Linux 6 Last Updated: 2017-10-20 Red Hat Enterprise Linux 6 Developer Guide An introduction to application development tools in Red Hat Enterprise Linux 6 Robert Krátký Red Hat Customer Content Services [email protected] Don Domingo Red Hat Customer Content Services Jacquelynn East Red Hat Customer Content Services Legal Notice Copyright © 2016 Red Hat, Inc. and others. This document is licensed by Red Hat under the Creative Commons Attribution-ShareAlike 3.0 Unported License. If you distribute this document, or a modified version of it, you must provide attribution to Red Hat, Inc. and provide a link to the original. If the document is modified, all Red Hat trademarks must be removed. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries. Node.js ® is an official trademark of Joyent. -
Superh RISC Engine SH-1/SH-2
SuperH RISC Engine SH-1/SH-2 Programming Manual September 3, 1996 Hitachi America Ltd. Notice When using this document, keep the following in mind: 1. This document may, wholly or partially, be subject to change without notice. 2. All rights are reserved: No one is permitted to reproduce or duplicate, in any form, the whole or part of this document without Hitachi’s permission. 3. Hitachi will not be held responsible for any damage to the user that may result from accidents or any other reasons during operation of the user’s unit according to this document. 4. Circuitry and other examples described herein are meant merely to indicate the characteristics and performance of Hitachi’s semiconductor products. Hitachi assumes no responsibility for any intellectual property claims or other problems that may result from applications based on the examples described herein. 5. No license is granted by implication or otherwise under any patents or other rights of any third party or Hitachi, Ltd. 6. MEDICAL APPLICATIONS: Hitachi’s products are not authorized for use in MEDICAL APPLICATIONS without the written consent of the appropriate officer of Hitachi’s sales company. Such use includes, but is not limited to, use in life support systems. Buyers of Hitachi’s products are requested to notify the relevant Hitachi sales offices when planning to use the products in MEDICAL APPLICATIONS. Introduction The SuperH RISC engine family incorporates a RISC (Reduced Instruction Set Computer) type CPU. A basic instruction can be executed in one clock cycle, realizing high performance operation. A built-in multiplier can execute multiplication and addition as quickly as DSP. -
Survey and Benchmarking of Machine Learning Accelerators
1 Survey and Benchmarking of Machine Learning Accelerators Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner MIT Lincoln Laboratory Supercomputing Center Lexington, MA, USA freuther,pmichaleas,michael.jones,vijayg,sid,[email protected] Abstract—Advances in multicore processors and accelerators components play a major role in the success or failure of an have opened the flood gates to greater exploration and application AI system. of machine learning techniques to a variety of applications. These advances, along with breakdowns of several trends including Moore’s Law, have prompted an explosion of processors and accelerators that promise even greater computational and ma- chine learning capabilities. These processors and accelerators are coming in many forms, from CPUs and GPUs to ASICs, FPGAs, and dataflow accelerators. This paper surveys the current state of these processors and accelerators that have been publicly announced with performance and power consumption numbers. The performance and power values are plotted on a scatter graph and a number of dimensions and observations from the trends on this plot are discussed and analyzed. For instance, there are interesting trends in the plot regarding power consumption, numerical precision, and inference versus training. We then select and benchmark two commercially- available low size, weight, and power (SWaP) accelerators as these processors are the most interesting for embedded and Fig. 1. Canonical AI architecture consists of sensors, data conditioning, mobile machine learning inference applications that are most algorithms, modern computing, robust AI, human-machine teaming, and users (missions). Each step is critical in developing end-to-end AI applications and applicable to the DoD and other SWaP constrained users. -
Analysing CMS Software Performance Using Igprof, Oprofile and Callgrind
Analysing CMS software performance using IgProf, OPro¯le and callgrind L Tuura1, V Innocente2, G Eulisse1 1 Northeastern University, Boston, MA, USA 2 CERN, Geneva, Switzerland Abstract. The CMS experiment at LHC has a very large body of software of its own and uses extensively software from outside the experiment. Understanding the performance of such a complex system is a very challenging task, not the least because there are extremely few developer tools capable of pro¯ling software systems of this scale, or producing useful reports. CMS has mainly used IgProf, valgrind, callgrind and OPro¯le for analysing the performance and memory usage patterns of our software. We describe the challenges, at times rather extreme ones, faced as we've analysed the performance of our software and how we've developed an understanding of the performance features. We outline the key lessons learnt so far and the actions taken to make improvements. We describe why an in-house general pro¯ler tool still ends up besting a number of renowned open-source tools, and the improvements we've made to it in the recent year. 1. The performance optimisation issue The Compact Muon Solenoid (CMS) experiment on the Large Hadron Collider at CERN [1] is expected to start running within a year. The computing requirements for the experiment are immense. For the ¯rst year of operation, 2008, we budget 32.5M SI2k (SPECint 2000) computing capacity. This corresponds to some 2650 servers if translated into the most capable computers currently available to us.1 It is critical CMS on the one hand acquires the resources needed for the planned physics analyses, and on the other hand optimises the software to complete the task with the resources available. -
SPORK: a Summarization Pipeline for Online Repositories of Knowledge
SPORK: A SUMMARIZATION PIPELINE FOR ONLINE REPOSITORIES OF KNOWLEDGE A Thesis presented to the Faculty of California Polytechnic State University San Luis Obispo In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Steffen Lyngbaek June 2013 c 2013 Steffen Lyngbaek ALL RIGHTS RESERVED ii COMMITTEE MEMBERSHIP TITLE: SPORK: A Summarization Pipeline for Online Repositories of Knowledge AUTHOR: Steffen Lyngbaek DATE SUBMITTED: June 2013 COMMITTEE CHAIR: Professor Alexander Dekhtyar, Ph.D., De- parment of Computer Science COMMITTEE MEMBER: Professor Franz Kurfess, Ph.D., Depar- ment of Computer Science COMMITTEE MEMBER: Professor Foaad Khosmood, Ph.D., Depar- ment of Computer Science iii Abstract SPORK: A Summarization Pipeline for Online Repositories of Knowledge Steffen Lyngbaek The web 2.0 era has ushered an unprecedented amount of interactivity on the Internet resulting in a flood of user-generated content. This content is of- ten unstructured and comes in the form of blog posts and comment discussions. Users can no longer keep up with the amount of content available, which causes developers to start relying on natural language techniques to help mitigate the problem. Although many natural language processing techniques have been em- ployed for years, automatic text summarization, in particular, has recently gained traction. This research proposes a graph-based, extractive text summarization system called SPORK (Summarization Pipeline for Online Repositories of Knowl- edge). The goal of SPORK is to be able to identify important key topics presented in multi-document texts, such as online comment threads. While most other automatic summarization systems simply focus on finding the top sentences rep- resented in the text, SPORK separates the text into clusters, and identifies dif- ferent topics and opinions presented in the text. -
Master's Thesis: Adaptive Core Assignment for Adapteva Epiphany
Adaptive core assignment for Adapteva Epiphany Master of Science Thesis in Embedded Electronic System Design Erik Alveflo Chalmers University of Technology Department of Computer Science and Engineering G¨oteborg, Sweden 2015 The Author grants to Chalmers University of Technology and University of Gothenburg the non-exclusive right to publish the Work electronically and in a non-commercial pur- pose make it accessible on the Internet. The Author warrants that he/she is the author to the Work, and warrants that the Work does not contain text, pictures or other mate- rial that violates copyright law. The Author shall, when transferring the rights of the Work to a third party (for example a publisher or a company), acknowledge the third party about this agreement. If the Author has signed a copyright agreement with a third party regarding the Work, the Author warrants hereby that he/she has obtained any necessary permission from this third party to let Chalmers University of Technology and University of Gothenburg store the Work electronically and make it accessible on the Internet. Adaptive core assignment for Adapteva Epiphany Erik Alveflo, c Erik Alveflo, 2015. Examiner: Per Larsson-Edefors Chalmers University of Technology Department of Computer Science and Engineering SE-412 96 G¨oteborg Sweden Telephone + 46 (0)31-772 1000 Department of Computer Science and Engineering G¨oteborg, Sweden 2015 Adaptive core assignment for Adapteva Epiphany Erik Alveflo Department of Computer Science and Engineering Chalmers University of Technology Abstract The number of cores in many-core processors is ever increasing, and so is the number of defects due to manufacturing variations and wear-out mechanisms. -
The 32-Bit PA-RISC Run- Time Architecture Document
The 32-bit PA-RISC Run- time Architecture Document HP-UX 10.20 Version 3.0 (c) Copyright 1985-1997 HEWLETT-PACKARD COMPANY. The information contained in this document is subject to change without notice. HEWLETT-PACKARD MAKES NO WARRANTY OF ANY KIND WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OFMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Hewlett-Packard shall not be liable for errors contained herein or for incidental or consequential damages in connection with furnishing, performance, or use of this material. Hewlett-Packard assumes no responsibility for the use or reliability of its software on equipment that is not furnished by Hewlett-Packard. This document contains proprietary information which is protected by copyright. All rights are reserved. No part of this document may be photocopied, reproduced, or translated to another language without the prior written consent of Hewlett- Packard Company. CSO/STG/STD/CLO Hewlett-Packard Company 11000 Wolfe Road Cupertino, California 95014 By The Run-time Architecture Team CHAPTER 1 Introduction This document describes the runtime architecture for PA-RISC systems running either the HP-UX or the MPE/iX operating system. Other operating systems running on PA-RISC may also use this runtime architecture or a variant of it. The runtime architecture defines all the conventions and formats necessary to compile, link, and execute a program on one of these operating systems. Its purpose is to ensure that object modules produced by many different compilers can be linked together into a single application, and to specify the interfaces between compilers and linker, and between linker and operating system. -
Design of the RISC-V Instruction Set Architecture
Design of the RISC-V Instruction Set Architecture Andrew Waterman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.html January 3, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Patterson, Chair Professor Krste Asanovi´c Associate Professor Per-Olof Persson Spring 2016 Design of the RISC-V Instruction Set Architecture Copyright 2016 by Andrew Shell Waterman 1 Abstract Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Patterson, Chair The hardware-software interface, embodied in the instruction set architecture (ISA), is arguably the most important interface in a computer system. Yet, in contrast to nearly all other interfaces in a modern computer system, all commercially popular ISAs are proprietary. -
Understanding the Linux Kernel, 3Rd Edition by Daniel P
1 Understanding the Linux Kernel, 3rd Edition By Daniel P. Bovet, Marco Cesati ............................................... Publisher: O'Reilly Pub Date: November 2005 ISBN: 0-596-00565-2 Pages: 942 Table of Contents | Index In order to thoroughly understand what makes Linux tick and why it works so well on a wide variety of systems, you need to delve deep into the heart of the kernel. The kernel handles all interactions between the CPU and the external world, and determines which programs will share processor time, in what order. It manages limited memory so well that hundreds of processes can share the system efficiently, and expertly organizes data transfers so that the CPU isn't kept waiting any longer than necessary for the relatively slow disks. The third edition of Understanding the Linux Kernel takes you on a guided tour of the most significant data structures, algorithms, and programming tricks used in the kernel. Probing beyond superficial features, the authors offer valuable insights to people who want to know how things really work inside their machine. Important Intel-specific features are discussed. Relevant segments of code are dissected line by line. But the book covers more than just the functioning of the code; it explains the theoretical underpinnings of why Linux does things the way it does. This edition of the book covers Version 2.6, which has seen significant changes to nearly every kernel subsystem, particularly in the areas of memory management and block devices. The book focuses on the following topics: • Memory management, including file buffering, process swapping, and Direct memory Access (DMA) • The Virtual Filesystem layer and the Second and Third Extended Filesystems • Process creation and scheduling • Signals, interrupts, and the essential interfaces to device drivers • Timing • Synchronization within the kernel • Interprocess Communication (IPC) • Program execution Understanding the Linux Kernel will acquaint you with all the inner workings of Linux, but it's more than just an academic exercise.