Embedded Processors, Part One January 11, 2002 By: Jim Turley
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Average Memory Access Time: Reducing Misses
Review: Cache performance 332 Miss-oriented Approach to Memory Access: Advanced Computer Architecture ⎛ MemAccess ⎞ CPUtime = IC × ⎜ CPI + × MissRate × MissPenalt y ⎟ × CycleTime Chapter 2 ⎝ Execution Inst ⎠ ⎛ MemMisses ⎞ CPUtime = IC × ⎜ CPI + × MissPenalt y ⎟ × CycleTime Caches and Memory Systems ⎝ Execution Inst ⎠ CPIExecution includes ALU and Memory instructions January 2007 Separating out Memory component entirely Paul H J Kelly AMAT = Average Memory Access Time CPIALUOps does not include memory instructions ⎛ AluOps MemAccess ⎞ These lecture notes are partly based on the course text, Hennessy CPUtime = IC × ⎜ × CPI + × AMAT ⎟ × CycleTime and Patterson’s Computer Architecture, a quantitative approach (3rd ⎝ Inst AluOps Inst ⎠ and 4th eds), and on the lecture slides of David Patterson and John AMAT = HitTime + MissRate × MissPenalt y Kubiatowicz’s Berkeley course = ()HitTime Inst + MissRate Inst × MissPenalt y Inst + ()HitTime Data + MissRate Data × MissPenalt y Data Advanced Computer Architecture Chapter 2.1 Advanced Computer Architecture Chapter 2.2 Average memory access time: Reducing Misses Classifying Misses: 3 Cs AMAT = HitTime + MissRate × MissPenalt y Compulsory—The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses. There are three ways to improve cache (Misses in even an Infinite Cache) Capacity—If the cache cannot contain all the blocks needed during performance: execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Fully Associative Size X Cache) 1. Reduce the miss rate, Conflict—If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will 2. -
Ece585 Lec2.Pdf
ECE 485/585 Microprocessor System Design Lecture 2: Memory Addressing 8086 Basics and Bus Timing Asynchronous I/O Signaling Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering and Computer Science Source: Lecture based on materials provided by Mark F. Basic I/O – Part I ECE 485/585 Outline for next few lectures Simple model of computation Memory Addressing (Alignment, Byte Order) 8088/8086 Bus Asynchronous I/O Signaling Review of Basic I/O How is I/O performed Dedicated/Isolated /Direct I/O Ports Memory Mapped I/O How do we tell when I/O device is ready or command complete? Polling Interrupts How do we transfer data? Programmed I/O DMA ECE 485/585 Simplified Model of a Computer Control Control Data, Address, Memory Data Path Microprocessor Keyboard Mouse [Fetch] Video display [Decode] Printer [Execute] I/O Device Hard disk drive Audio card Ethernet WiFi CD R/W DVD ECE 485/585 Memory Addressing Size of operands Bytes, words, long/double words, quadwords 16-bit half word (Intel: word) 32-bit word (Intel: doubleword, dword) 0x107 64-bit double word (Intel: quadword, qword) 0x106 Note: names are non-standard 0x105 SUN Sparc word is 32-bits, double is 64-bits 0x104 0x103 Alignment 0x102 Can multi-byte operands begin at any byte address? 0x101 Yes: non-aligned 0x100 No: aligned. Low order address bit(s) will be zero ECE 485/585 Memory Operand Alignment …Intel IA speak (i.e. word = 16-bits = 2 bytes) 0x107 0x106 0x105 0x104 0x103 0x102 0x101 0x100 Aligned Unaligned Aligned Unaligned Aligned Unaligned word word Double Double Quad Quad address address word word word word -----0 address address address address -----00 ----000 ECE 485/585 Memory Operand Alignment Why do we care? Unaligned memory references Can cause multiple memory bus cycles for a single operand May also span cache lines Requiring multiple evictions, multiple cache line fills Complicates memory system and cache controller design Some architectures restrict addresses to be aligned Even in architectures without alignment restrictions (e.g. -
Mips 16 Bit Instruction Set
Mips 16 bit instruction set Continue Instruction set architecture MIPSDesignerMIPS Technologies, Imagination TechnologiesBits64-bit (32 → 64)Introduced1985; 35 years ago (1985)VersionMIPS32/64 Issue 6 (2014)DesignRISCTypeRegister-RegisterEncodingFixedBranchingCompare and branchEndiannessBiPage size4 KBExtensionsMDMX, MIPS-3DOpenPartly. The R12000 has been on the market for more than 20 years and therefore cannot be subject to patent claims. Thus, the R12000 and old processors are completely open. RegistersGeneral Target32Floating Point32 MIPS (Microprocessor without interconnected pipeline stages) is a reduced setting of the Computer Set (RISC) Instruction Set Architecture (ISA):A-3:19, developed by MIPS Computer Systems, currently based in the United States. There are several versions of MIPS: including MIPS I, II, III, IV and V; and five MIPS32/64 releases (for 32- and 64-bit sales, respectively). The early MIPS architectures were only 32-bit; The 64-bit versions were developed later. As of April 2017, the current version of MIPS is MIPS32/64 Release 6. MiPS32/64 differs primarily from MIPS I-V, defining the system Control Coprocessor kernel preferred mode in addition to the user mode architecture. The MIPS architecture has several additional extensions. MIPS-3D, which is a simple set of floating-point SIMD instructions dedicated to common 3D tasks, MDMX (MaDMaX), which is a more extensive set of SIMD instructions using 64-bit floating current registers, MIPS16e, which adds compression to flow instructions to make programs that take up less space, and MIPS MT, which adds layered potential. Computer architecture courses in universities and technical schools often study MIPS architecture. Architecture has had a major impact on later RISC architectures such as Alpha. -
Avaya Aura® Communication Manager Hardware Description and Reference
Avaya Aura® Communication Manager Hardware Description and Reference Release 7.0.1 555-245-207 Issue 2 May 2016 © 2015-2016, Avaya, Inc. Link disclaimer All Rights Reserved. Avaya is not responsible for the contents or reliability of any linked Notice websites referenced within this site or Documentation provided by Avaya. Avaya is not responsible for the accuracy of any information, While reasonable efforts have been made to ensure that the statement or content provided on these sites and does not information in this document is complete and accurate at the time of necessarily endorse the products, services, or information described printing, Avaya assumes no liability for any errors. Avaya reserves or offered within them. Avaya does not guarantee that these links will the right to make changes and corrections to the information in this work all the time and has no control over the availability of the linked document without the obligation to notify any person or organization pages. of such changes. Licenses Warranty THE SOFTWARE LICENSE TERMS AVAILABLE ON THE AVAYA Avaya provides a limited warranty on Avaya hardware and software. WEBSITE, HTTPS://SUPPORT.AVAYA.COM/LICENSEINFO, Refer to your sales agreement to establish the terms of the limited UNDER THE LINK “AVAYA SOFTWARE LICENSE TERMS (Avaya warranty. In addition, Avaya’s standard warranty language, as well as Products)” OR SUCH SUCCESSOR SITE AS DESIGNATED BY information regarding support for this product while under warranty is AVAYA, ARE APPLICABLE TO ANYONE WHO DOWNLOADS, available to Avaya customers and other parties through the Avaya USES AND/OR INSTALLS AVAYA SOFTWARE, PURCHASED Support website: https://support.avaya.com/helpcenter/ FROM AVAYA INC., ANY AVAYA AFFILIATE, OR AN AVAYA getGenericDetails?detailId=C20091120112456651010 under the link CHANNEL PARTNER (AS APPLICABLE) UNDER A COMMERCIAL “Warranty & Product Lifecycle” or such successor site as designated AGREEMENT WITH AVAYA OR AN AVAYA CHANNEL PARTNER. -
A Programming Model and Processor Architecture for Heterogeneous Multicore Computers
A PROGRAMMING MODEL AND PROCESSOR ARCHITECTURE FOR HETEROGENEOUS MULTICORE COMPUTERS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Michael D. Linderman February 2009 c Copyright by Michael D. Linderman 2009 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Teresa H. Meng) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Mark Horowitz) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Krishna V. Shenoy) Approved for the University Committee on Graduate Studies. iii Abstract Heterogeneous multicore computers, those systems that integrate specialized accelerators into and alongside multicore general-purpose processors (GPPs), provide the scalable performance needed by computationally demanding information processing (informatics) applications. However, these systems often feature instruction sets and functionality that significantly differ from GPPs and for which there is often little or no sophisticated compiler support. Consequently developing applica- tions for these systems is difficult and developer productivity is low. This thesis presents Merge, a general-purpose programming model for heterogeneous multicore systems. The Merge programming model enables the programmer to leverage different processor- specific or application domain-specific toolchains to create software modules specialized for differ- ent hardware configurations; and provides language mechanisms to enable the automatic mapping of processor-agnostic applications to these processor-specific modules. -
Reverse Engineering X86 Processor Microcode
Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz, Ruhr-University Bochum https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/koppe This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz Ruhr-Universitat¨ Bochum Abstract hardware modifications [48]. Dedicated hardware units to counter bugs are imperfect [36, 49] and involve non- Microcode is an abstraction layer on top of the phys- negligible hardware costs [8]. The infamous Pentium fdiv ical components of a CPU and present in most general- bug [62] illustrated a clear economic need for field up- purpose CPUs today. In addition to facilitate complex and dates after deployment in order to turn off defective parts vast instruction sets, it also provides an update mechanism and patch erroneous behavior. Note that the implementa- that allows CPUs to be patched in-place without requiring tion of a modern processor involves millions of lines of any special hardware. While it is well-known that CPUs HDL code [55] and verification of functional correctness are regularly updated with this mechanism, very little is for such processors is still an unsolved problem [4, 29]. known about its inner workings given that microcode and the update mechanism are proprietary and have not been Since the 1970s, x86 processor manufacturers have throughly analyzed yet. -
Computer Architectures an Overview
Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements. -
Modern Processor Design: Fundamentals of Superscalar
Fundamentals of Superscalar Processors John Paul Shen Intel Corporation Mikko H. Lipasti University of Wisconsin WAVELAND PRESS, INC. Long Grove, Illinois To Our parents: Paul and Sue Shen Tarja and Simo Lipasti Our spouses: Amy C. Shen Erica Ann Lipasti Our children: Priscilla S. Shen, Rachael S. Shen, and Valentia C. Shen Emma Kristiina Lipasti and Elias Joel Lipasti For information about this book, contact: Waveland Press, Inc. 4180 IL Route 83, Suite 101 Long Grove, IL 60047-9580 (847) 634-0081 info @ waveland.com www.waveland.com Copyright © 2005 by John Paul Shen and Mikko H. Lipasti 2013 reissued by Waveland Press, Inc. 10-digit ISBN 1-4786-0783-1 13-digit ISBN 978-1-4786-0783-0 All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without permission in writing from the publisher. Printed in the United States of America 7 6 5 4 3 2 1 Table of Contents PrefaceAbout the Authors x ix 1 Processor Design 1 1.1 The Evolution of Microprocessors 2 1.21.2.1 Instruction Digital Set Systems Processor Design Design 44 1.2.2 Architecture,Realization Implementation, and 5 1.2.3 Instruction Set Architecture 6 1.2.4 Dynamic-Static Interface 8 1.3 Principles of Processor Performance 10 1.3.1 Processor Performance Equation 10 1.3.2 Processor Performance Optimizations 11 1.3.3 Performance Evaluation Method 13 1.4 Instruction-Level Parallel Processing 16 1.4.1 From Scalar to Superscalar 16 1.4.2 Limits of Instruction-Level Parallelism 24 1.51.4.3 Machines Summary for Instruction-Level -
CHERI Concentrate: Practical Compressed Capabilities
1 CHERI Concentrate: Practical Compressed Capabilities Jonathan Woodruff, Alexandre Joannou, Hongyan Xia, Anthony Fox, Robert Norton, Thomas Bauereiss, David Chisnall, Brooks Davis, Khilan Gudka, Nathaniel W. Filardo, A. Theodore Markettos, Michael Roe, Peter G. Neumann, Robert N. M. Watson, Simon W. Moore Abstract—We present CHERI Concentrate, a new fat-pointer compression scheme applied to CHERI, the most developed capability-pointer system at present. Capability fat pointers are a primary candidate to enforce fine-grained and non-bypassable security properties in future computer systems, although increased pointer size can severely affect performance. Thus, several proposals for capability compression have been suggested elsewhere that do not support legacy instruction sets, ignore features critical to the existing software base, and also introduce design inefficiencies to RISC-style processor pipelines. CHERI Concentrate improves on the state-of-the-art region-encoding efficiency, solves important pipeline problems, and eases semantic restrictions of compressed encoding, allowing it to protect a full legacy software stack. We present the first quantitative analysis of compiled capability code, which we use to guide the design of the encoding format. We analyze and extend logic from the open-source CHERI prototype processor design on FPGA to demonstrate encoding efficiency, minimize delay of pointer arithmetic, and eliminate additional load-to-use delay. To verify correctness of our proposed high-performance logic, we present a HOL4 machine-checked proof of the decode and pointer-modify operations. Finally, we measure a 50% to 75% reduction in L2 misses for many compiled C-language benchmarks running under a commodity operating system using compressed 128-bit and 64-bit formats, demonstrating both compatibility with and increased performance over the uncompressed, 256-bit format. -
Production Cluster Visualization: Experiences and Challenges
Workshop on Parallel Visualization and Graphics Production Cluster Visualization: Experiences and Challenges Randall Frank VIEWS Visualization Project Lead Lawrence Livermore National Laboratory UCRL-PRES-200237 Workshop on Parallel Visualization and Graphics Why COTS Distributed Visualization Clusters? The realities of extreme dataset sizes (10TB+) • Stored with the compute platform • Cannot afford to copy the data • Co-resident visualization Track compute platform trends • Distributed infrastructure • Commodity hardware trends • Cost-effective solutions Migration of graphics leadership • The PC (Gamers) Desktops • Display technologies • HDR, resolution, stereo, tiled, etc Workshop on Parallel Visualization and Graphics 2 Production Visualization Requirements Unique, aggresive I/O requirements • Access patterns/performance Generation of graphical primitives • Graphics computation: primitive extraction/computation • Dataset decomposition (e.g. slabs vs chunks) Rendering of primitives • Aggregation of multiple rendering engines Video displays • Routing of digital or video tiles to displays (over distance) Interactivity (not a render-farm!) • Real-time imagery • Interaction devices, human in the loop (latency, prediction issues) Scheduling • Systems and people Workshop on Parallel Visualization and Graphics 3 Visualization Environment Architecture Archive PowerWall Analog Simulation Video Data Switch GigE Switch Visualization Engine Data Manipulation Engine Offices • Raw data on platform disks/archive systems • Data manipulation -
Introduction to Playstation®2 Architecture
Introduction to PlayStation®2 Architecture James Russell Software Engineer SCEE Technology Group In this presentation ä Company overview ä PlayStation 2 architecture overview ä PS2 Game Development ä Differences between PS2 and PC. Technology Group 1) Sony Computer Entertainment Overview SCE Europe (includes Aus, NZ, Mid East, America Technology Group Japan Southern Africa) Sales ä 40 million sold world-wide since launch ä Since March 2000 in Japan ä Since Nov 2000 in Europe/US ä New markets: Middle East, India, Korea, China ä Long term aim: 100 million within 5 years of launch ä Production facilities can produce 2M/month. Technology Group Design considerations ä Over 5 years, we’ll make 100,000,000 PS2s ä Design is very important ä Must be inexpensive (or should become that way) ä Technology must be ahead of the curve ä Need high performance, low price. Technology Group How to achieve this? ä Processor yield ä High CPU clock speed means lower yields ä Solution? ä Low CPU clock speed, but high parallelism ä Nothing readily available ä SCE designs custom chips. Technology Group 2) Technical Aspects of PlayStation 2 ä 128-bit CPU core “Emotion Engine” ä + 2 independent Vector Units ä + Image Processing Unit (for MPEG) ä GS - “Graphics Synthesizer” GPU ä SPU2 - Sound Processing Unit ä I/O Processor (CD/DVD, USB, i.Link). Technology Group “Emotion Engine” - Specifications ä CPU Core 128 bit CPU ä System Clock 300MHz ä Bus Bandwidth 3.2GB/sec ä Main Memory 32MB (Direct Rambus) ä Floating Point Calculation 6.2 GFLOPS ä 3D Geometry Performance 66 Million polygons/sec. -
R00456--FM Getting up to Speed
GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING Susan L. Graham, Marc Snir, and Cynthia A. Patterson, Editors Committee on the Future of Supercomputing Computer Science and Telecommunications Board Division on Engineering and Physical Sciences THE NATIONAL ACADEMIES PRESS Washington, D.C. www.nap.edu THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Gov- erning Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engi- neering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for ap- propriate balance. Support for this project was provided by the Department of Energy under Spon- sor Award No. DE-AT01-03NA00106. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the organizations that provided support for the project. International Standard Book Number 0-309-09502-6 (Book) International Standard Book Number 0-309-54679-6 (PDF) Library of Congress Catalog Card Number 2004118086 Cover designed by Jennifer Bishop. Cover images (clockwise from top right, front to back) 1. Exploding star. Scientific Discovery through Advanced Computing (SciDAC) Center for Supernova Research, U.S. Department of Energy, Office of Science. 2. Hurricane Frances, September 5, 2004, taken by GOES-12 satellite, 1 km visible imagery. U.S. National Oceanographic and Atmospheric Administration. 3. Large-eddy simulation of a Rayleigh-Taylor instability run on the Lawrence Livermore National Laboratory MCR Linux cluster in July 2003.