Lessons from the ARM Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems. -
POWER-AWARE MICROARCHITECTURE: Design and Modeling Challenges for Next-Generation Microprocessors
POWER-AWARE MICROARCHITECTURE: Design and Modeling Challenges for Next-Generation Microprocessors THE ABILITY TO ESTIMATE POWER CONSUMPTION DURING EARLY-STAGE DEFINITION AND TRADE-OFF STUDIES IS A KEY NEW METHODOLOGY ENHANCEMENT. OPPORTUNITIES FOR SAVING POWER CAN BE EXPOSED VIA MICROARCHITECTURE-LEVEL MODELING, PARTICULARLY THROUGH CLOCK- GATING AND DYNAMIC ADAPTATION. Power dissipation limits have Thus far, most of the work done in the area David M. Brooks emerged as a major constraint in the design of high-level power estimation has been focused of microprocessors. At the low end of the per- at the register-transfer-level (RTL) description Pradip Bose formance spectrum, namely in the world of in the processor design flow. Only recently have handheld and portable devices or systems, we seen a surge of interest in estimating power Stanley E. Schuster power has always dominated over perfor- at the microarchitecture definition stage, and mance (execution time) as the primary design specific work on power-efficient microarchi- Hans Jacobson issue. Battery life and system cost constraints tecture design has been reported.2-8 drive the design team to consider power over Here, we describe the approach of using Prabhakar N. Kudva performance in such a scenario. energy-enabled performance simulators in Increasingly, however, power is also a key early design. We examine some of the emerg- Alper Buyuktosunoglu design issue in the workstation and server mar- ing paradigms in processor design and com- kets (see Gowan et al.)1 In this high-end arena ment on their inherent power-performance John-David Wellman the increasing microarchitectural complexities, characteristics. clock frequencies, and die sizes push the chip- Victor Zyuban level—and hence the system-level—power Power-performance efficiency consumption to such levels that traditionally See the “Power-performance fundamentals” Manish Gupta air-cooled multiprocessor server boxes may box. -
ARM Architecture
ARM Architecture Comppgzuter Organization and Assembly ygg Languages Yung-Yu Chuang with slides by Peng-Sheng Chen, Ville Pietikainen ARM history • 1983 developed by Acorn computers – To replace 6502 in BBC computers – 4-man VLSI design team – Its simp lic ity comes from the inexper ience team – Match the needs for generalized SoC for reasonable power, performance and die size – The first commercial RISC implemenation • 1990 ARM (Advanced RISC Mac hine ), owned by Acorn, Apple and VLSI ARM Ltd Design and license ARM core design but not fabricate Why ARM? • One of the most licensed and thus widespread processor cores in the world – Used in PDA, cell phones, multimedia players, handheld game console, digital TV and cameras – ARM7: GBA, iPod – ARM9: NDS, PSP, Sony Ericsson, BenQ – ARM11: Apple iPhone, Nokia N93, N800 – 90% of 32-bit embedded RISC processors till 2009 • Used especially in portable devices due to its low power consumption and reasonable performance ARM powered products ARM processors • A simple but powerful design • A whlhole filfamily of didesigns shiharing siilimilar didesign principles and a common instruction set Naming ARM •ARMxyzTDMIEJFS – x: series – y: MMU – z: cache – T: Thumb – D: debugger – M: Multiplier – I: EmbeddedICE (built-in debugger hardware) – E: Enhanced instruction – J: Jazell e (JVM) – F: Floating-point – S: SthiiblSynthesizible version (source code version for EDA tools) Popular ARM architectures •ARM7TDMI – 3 pipe line stages (ft(fetc h/deco de /execu te ) – High code density/low power consumption – One of the most used ARM-version (for low-end systems) – All ARM cores after ARM7TDMI include TDMI even if they do not include TDMI in their labels • ARM9TDMI – Compatible with ARM7 – 5 stages (fe tc h/deco de /execu te /memory /wr ite ) – Separate instruction and data cache •ARM11 ARM family comparison year 1995 1997 1999 2003 ARM is a RISC • RISC: simple but powerful instructions that execute within a single cycle at high clock speed. -
Hardware Architecture
Hardware Architecture Components Computing Infrastructure Components Servers Clients LAN & WLAN Internet Connectivity Computation Software Storage Backup Integration is the Key ! Security Data Network Management Computer Today’s Computer Computer Model: Von Neumann Architecture Computer Model Input: keyboard, mouse, scanner, punch cards Processing: CPU executes the computer program Output: monitor, printer, fax machine Storage: hard drive, optical media, diskettes, magnetic tape Von Neumann architecture - Wiki Article (15 min YouTube Video) Components Computer Components Components Computer Components CPU Memory Hard Disk Mother Board CD/DVD Drives Adaptors Power Supply Display Keyboard Mouse Network Interface I/O ports CPU CPU CPU – Central Processing Unit (Microprocessor) consists of three parts: Control Unit • Execute programs/instructions: the machine language • Move data from one memory location to another • Communicate between other parts of a PC Arithmetic Logic Unit • Arithmetic operations: add, subtract, multiply, divide • Logic operations: and, or, xor • Floating point operations: real number manipulation Registers CPU Processor Architecture See How the CPU Works In One Lesson (20 min YouTube Video) CPU CPU CPU speed is influenced by several factors: Chip Manufacturing Technology: nm (2002: 130 nm, 2004: 90nm, 2006: 65 nm, 2008: 45nm, 2010:32nm, Latest is 22nm) Clock speed: Gigahertz (Typical : 2 – 3 GHz, Maximum 5.5 GHz) Front Side Bus: MHz (Typical: 1333MHz , 1666MHz) Word size : 32-bit or 64-bit word sizes Cache: Level 1 (64 KB per core), Level 2 (256 KB per core) caches on die. Now Level 3 (2 MB to 8 MB shared) cache also on die Instruction set size: X86 (CISC), RISC Microarchitecture: CPU Internal Architecture (Ivy Bridge, Haswell) Single Core/Multi Core Multi Threading Hyper Threading vs. -
Microcontroller Serial Interfaces
Microcontroller Serial Interfaces Dr. Francesco Conti [email protected] Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU Example: STM32F101 MCU Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU • Memory • RAM (from 512B to 256kB) • FLASH (from 512B to 1MB) Example: STM32F101 MCU Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU • Memory • RAM (from 512B to 256kB) • FLASH (from 512B to 1MB) • Peripherals • DMA • Timer • Interfaces • Digital Interfaces • Analog Timer DMAs Example: STM32F101 MCU Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU • Memory • RAM (from 512B to 256kB) • FLASH (from 512B to 1MB) • Peripherals • DMA • Timer • Interfaces • Digital • Analog • Interconnect Example: STM32F101 MCU • AHB system bus (ARM-based MCUs) • APB peripheral bus (ARM-based MCUs) Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU • Memory • RAM (from 512B to 256kB) • FLASH -
Realtime Capabilities of Low-End Powerpc and ARM Boards for Embedded Systems
Realtime capabilities of low-end PowerPC and ARM boards for embedded systems Alexander Bauer PHYTEC Messtechnik Gmbh Robert-Koch-Str.39, 55129 Mainz, Germany [email protected] Abstract With the stepwise integration of the Realtime Preemption Patches (RT-Preempt) into the Mainline Linux kernel and their support for architectures other than Intel and AMD, there are now a number of choices which board to use for a particular embedded realtime project running Mainline Linux. In order to select the appropriate processor and clock frequency, it would be desirable to have some generally applicable ranges of worst-case latencies that can be obtained using the various processor types and conditions. We, therefore, determined the internal worst-case latency of PowerPC and ARM boards running Linux 2.6.20 and above patched with RT-Preempt. The PowerPC-board (Phytec phyCORE-MPC5200B) was running at 266 and 400 MHz, the ARM board (Phytec phyCORE-PXA270) was running at 266 and 520 MHz. This article will provide the details of the various measurement set-ups, present the results and discuss them with respect to the design differences between PowerPC and ARM. 1 Introduction This paper presents the results of the latency tests and discusses the results with respect of the different In the embedded market there is a wide range of processor designs. processors to choose from. A processor is typically selected for a customer design because of it features, e.g. video interface and peripherals, and the clock 2 Latency Tests frequency. With the growing importance of Linux and especially realtime Linux for customer designs For the latency tests based on MPC5200 we used in the embedded market, it is also essential to choose the PHYTEC phyCORE MPC5200 board with 400 the right processor that will cope with the realtime MHz as a reference platform. -
Reverse Engineering X86 Processor Microcode
Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz, Ruhr-University Bochum https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/koppe This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz Ruhr-Universitat¨ Bochum Abstract hardware modifications [48]. Dedicated hardware units to counter bugs are imperfect [36, 49] and involve non- Microcode is an abstraction layer on top of the phys- negligible hardware costs [8]. The infamous Pentium fdiv ical components of a CPU and present in most general- bug [62] illustrated a clear economic need for field up- purpose CPUs today. In addition to facilitate complex and dates after deployment in order to turn off defective parts vast instruction sets, it also provides an update mechanism and patch erroneous behavior. Note that the implementa- that allows CPUs to be patched in-place without requiring tion of a modern processor involves millions of lines of any special hardware. While it is well-known that CPUs HDL code [55] and verification of functional correctness are regularly updated with this mechanism, very little is for such processors is still an unsolved problem [4, 29]. known about its inner workings given that microcode and the update mechanism are proprietary and have not been Since the 1970s, x86 processor manufacturers have throughly analyzed yet. -
RISC-V Geneology
RISC-V Geneology Tony Chen David A. Patterson Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-6 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-6.html January 24, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Introduction RISC-V is an open instruction set designed along RISC principles developed originally at UC Berkeley1 and is now set to become an open industry standard under the governance of the RISC-V Foundation (www.riscv.org). Since the instruction set architecture (ISA) is unrestricted, organizations can share implementations as well as open source compilers and operating systems. Designed for use in custom systems on a chip, RISC-V consists of a base set of instructions called RV32I along with optional extensions for multiply and divide (RV32M), atomic operations (RV32A), single-precision floating point (RV32F), and double-precision floating point (RV32D). The base and these four extensions are collectively called RV32G. This report discusses the historical precedents of RV32G. We look at 18 prior instruction set architectures, chosen primarily from earlier UC Berkeley RISC architectures and major proprietary RISC instruction sets. Among the 122 instructions in RV32G: ● 6 instructions do not have precedents among the selected instruction sets, ● 98 instructions of the 116 with precedents appear in at least three different instruction sets. -
Intel(R) Software Guard Extensions Developer Guide
Intel® Software Guard Extensions (Intel® SGX) Developer Guide Intel(R) Software Guard Extensions Developer Guide Legal Information No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, sched- ule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current char- acterized errata are available on request. Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at Intel.com, or from the OEM or retailer. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.in- tel.com/design/literature.htm. Intel, the Intel logo, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel micro- processors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. -
Itanium® 2 Processor Microarchitecture Overview
Itanium® 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy Hot Chips 14, August 2002 Itanium® 2 Processor Overview 16KB16KB L1L1 I-cacheI-cache BranchBranch PredictPredict 2 bundles (128 bits each) InstrInstr 22 InstrInstr 11 InstrInstr 00 TemplateTemplate Issue up to 6 instructions to the 11 pipes FF FF M/AM/A M/AM/A M/AM/A M/AM/A I/AI/A I/AI/A BB BB BB 22 FMACsFMACs 66 ALUsALUs BranchBranch UnitUnit 6 Opnds 2 Preds 16KB16KB L1L1 D-cacheD-cache 12 Opnds 2 Results 4 Preds 6 Results 2 Loads, 2 Stores 6 Predicates 128 FP GPRs 128 Int GPRs Consumed 1 Predicate 3 Predicates 128 FP GPRs 4 Predicates 128 Int GPRs 12 Predicates Produced Block Diagram 64 Predicate Registers 4 Loads 64 Predicate Registers 2 Stores 256KB256KB L2L2 CacheCache 3MB3MB L3L3 CacheCache BusBus InterfaceInterface Hot Chips 14 Itanium® 2 Processor Overview EXEEXE DETDET WRBWRB IPGIPG ROTROT EXPEXP RENREN REGREG FP1FP1 FP2FP2 FP3FP3 FP4FP4 WRBWRB IPG: Instruction Pointer Generate, Instruction address to L1 I-cache ROT: Present 2 Instruction Bundles from L1 I-cache to dispersal hardware EXP: Disperse up to 6 instruction syllables from the 2 instruction bundles REN: Rename (or convert) virtual register IDs to physical register IDs REG: Register file read, or bypass results in flight as operands EXE: Execute integer instructions; generate results and predicates DET: Detect exceptions, traps, etc. FP1-4: Execute floating point instructions; generate results and predicates Main Execution Unit Pipeline WRB: Write back results to the register file -
Design of the RISC-V Instruction Set Architecture
Design of the RISC-V Instruction Set Architecture Andrew Waterman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.html January 3, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Patterson, Chair Professor Krste Asanovi´c Associate Professor Per-Olof Persson Spring 2016 Design of the RISC-V Instruction Set Architecture Copyright 2016 by Andrew Shell Waterman 1 Abstract Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Patterson, Chair The hardware-software interface, embodied in the instruction set architecture (ISA), is arguably the most important interface in a computer system. Yet, in contrast to nearly all other interfaces in a modern computer system, all commercially popular ISAs are proprietary. -
Microarchitecture-Level Soc Design 27 Young-Hwan Park, Amin Khajeh, Jun Yong Shin, Fadi Kurdahi, Ahmed Eltawil, and Nikil Dutt
Microarchitecture-Level SoC Design 27 Young-Hwan Park, Amin Khajeh, Jun Yong Shin, Fadi Kurdahi, Ahmed Eltawil, and Nikil Dutt Abstract In this chapter we consider the issues related to integrating microarchitectural IP blocks into complex SoCs while satisfying performance, power, thermal, and reliability constraints. We first review different abstraction levels for SoC design that promote IP reuse, and which enable fast simulation for early functional validation of the SoC platform. Since SoCs must satisfy a multitude of interrelated constraints, we then present high-level power, thermal, and reliability models for predicting these constraints. These constraints are not unrelated and their interactions must be considered, modeled and evaluated. Once constraints are modeled, we must explore the design space trading off performance, power and reliability. Several case studies are presented illustrating how the design space can be explored across layers, and what modifications could be applied at design time and/or runtime to deal with reliability issues that may arise. Acronyms AHB Advanced High-performance Bus APB Advanced Peripheral Bus ASIC Application-Specific Integrated Circuit BER Bit Error Rate BLB Bit Lock Block Y.-H. Park Digital Media and Communications R&D Center, Samsung Electronics, Seoul, Korea e-mail: [email protected] A. Khajeh Broadcom Corp., San Jose, CA, USA e-mail: [email protected] J. Yong Shin • F. Kurdahi () • A. Eltawil • N. Dutt Center for Embedded and Cyber-Physical Systems, University of California Irvine, Irvine, CA, USA e-mail: [email protected]; [email protected]; [email protected]; [email protected] © Springer Science+Business Media Dordrecht 2017 867 S.