Using Dynamic Binary Instrumentation to Generate Multi-Platform Simpoints

Total Page:16

File Type:pdf, Size:1020Kb

Using Dynamic Binary Instrumentation to Generate Multi-Platform Simpoints Using Dynamic Binary Instrumentation to Generate Multi-Platform SimPoints Vincent M. Weaver and Sally A. McKee Cornell University 29 January 2008 Breakdown of Benchmark Methodology Used in Major Conferences HPCA, ISCA, MICRO ISCA (1995-2004) (2006-2007) Yi et al HPCA 2005 Run Complete (simulator) 18% 19% Run Multiple SimPoints 0% 4% Run SMARTS 0% 11% Run One SimPoint 0% 30% Fast Forward X, Run Z 27% 30% Run Reduced (train, MinneSPEC) 19% 22% Run Z 23% 4% Fusion 1 Phase Behavior | twolf 5 4 3 2 1 L1 D-Cache 0 Miss Rate (%) 2 1 CPI 0 0 1000 2000 3000 Instruction Interval (100M) 300.twolf default Fusion 2 Phase Behavior | mcf 30 20 10 L1 D-Cache 0 Miss Rate (%) 8 6 4 CPI 2 0 0 200 400 600 Instruction Interval (100M) 181.mcf default Fusion 3 Phase Behavior | gcc.200 30 20 10 L1 D-Cache 0 Miss Rate (%) 10 CPI 5 0 0 200 400 600 Instruction Interval (100M) 176.gcc 200 Fusion 4 SimPoint Basic Block Vectors • Instrument every Basic Block • Increment unique counter on entry to each BB • Save list of BBs and frequencies periodically These BBVs are used by SimPoint utility to determine phase behavior Fusion 5 Dynamic Binary Instrumentation • Instruments each BB at first execution • Caches instrumented code • Runs faster than simulation, slower than native Fusion 6 Previous Tools for Generating BBVs • Atom | only Alpha with Tru64 UNIX • Pin | only Intel architectures (IA32, Intel 64, IA64, XScale) • Simulator mods | only simulator specific (SLOW) Fusion 7 Goal { Expand Applicability of SimPoints • Support more architectures • Use existing DBI tools • Validate generated BBVs • Generate cross-platform BBV files Fusion 8 Solution: Valgrind and Qemu Pin Valgrind itanium x86 x86_64 arm ppc alpha sh4 m68k sparc mips hppa cris Qemu Fusion 9 Pin • Intel proprietary DBI tool: IA32, IA64, Intel 64, Xscale (Windows, Linux, OSX) • Plugins are written in C++ • Average slowdown of 15x on SPEC 2000 Fusion 10 Qemu • Open Source • Full-system simulation or syscall-by-proxy • Translates from arbitrary ISAs via DBI • Runs ARM, IA32, Intel 64, MIPS, PPC, SPARC, HPPA, Etrax CRIS, Alpha, sh4, m68k • Average slowdown of 28x on SPEC 2000 Fusion 11 Valgrind • Open Source, external plugin interface • Originally designed to find memory access violations • Translates to own IR, instruments, retranslates back to native ISA • Runs on IA32, Intel 64 and PPC platforms (Linux and AIX) • Average slowdown of 39x on SPEC 2000 Fusion 12 Validation Systems machine processor memory L1 I/D L2/L3 Cache nestle 400MHz Pentium II 256MB 16KB/16KB 512KB spruengli 550MHz Pentium III 512MB 16KB/16KB 512KB itanium 800MHz Itanium (x86 mode) 1GB 16KB/16KB 96KB/3MB chocovic 1.66GHz Core Duo 1GB 32KB/32KB 1MB milka 1.733MHz Athlon MP 512MB 64KB/64KB 256KB gallais 1.8GHz Pentium 4 256MB 12Ku/16KB 256KB jennifer 2GHz Athlon64 X2 1GB 64KB/64KB 512KB sampaka12 2.8GHz Pentium 4 2GB 12Ku/16KB 512KB domori25 3.46GHz Pentium D 4GB 12Ku/16KB 2MB Fusion 13 SPEC CPU 2000 40 46.7% 50.1% 40.6% 44.9% 82.6% 50.8% 69.1% 55.3% 54.4% 54.7% 43.1% 46.2% 40.3% 30 20 CPI Error (%) 10 0 nestle spruengli itanium chocovic milka gallais jennifer sampaka12 domori25 Pentium II Pentium III Itanium Core Duo Athlon Pentium 4 Athlon 64 Pentium 4 Pentium D First 100M Pin, one SimPoint Pin, up to 10 SimPoints Pin, up to 20 SimPoints Ffwd 1B, 100M Qemu, one SimPoint Qemu, up to 10 SimPoints Qemu, up to 20 SimPoints Valgrind, one SimPoint Valgrind, up to 10 SimPoints Valgrind, up to 20 SimPoints Fusion 14 Integer Results, up to 20 SimPoints, Pentium D Pin Qemu Valgrind 10 11.8 19.8 19.3 12.9 22.8 19.9 10.7 5 0 SPEC CPU 2000 Breakdown { Pentium D CPI Error (%) -5 -10 -109 eon.cook gcc.expr gcc.166gcc.200 gzip.log vortex.1vortex.2vortex.3vpr.routevpr.place eon.kajiyagap.default gcc.scilab gzip.sourcemcf.default bzip2.graphicbzip2.sourcecrafty.default gcc.integrate gzip.graphicgzip.programgzip.random parser.default perlbmk.535perlbmk.704perlbmk.957perlbmk.850twolf.default bzip2.program eon.rushmeier perlbmk.diffmailperlbmk.perfect perlbmk.makerand FP Results, up to 20 SimPoints, Pentium D Pin Qemu Valgrind 10 17.5 10.8 5 0 CPI Error (%) -5 -10 -15.5 art.110 art.470 apsi.default ammp.defaultapplu.default fma3d.defaultgalgel.defaultlucas.defaultmesa.defaultmgrid.default swim.default equake.defaultfacerec.default sixtrack.default wupwise.default Fusion 15 SPEC CPU 2006 40 63.7% 110.1% 42.6% 124.7% 110.0% 30 20 CPI Error (%) 10 0 chocovic - Core Duo sampaka12 - Pentium 4 domori25 - Pentium D jennifer - Athlon 64 First 100M Pin, one SimPoint Pin, up to 10 SimPoints Pin, up to 20 SimPoints Ffwd 1 B, 100M Qemu, one SimPoint Qemu, up to 10 SimPoints Qemu, up to 20 SimPoints Valgrind, one SimPoint Valgrind, up to 10 SimPoints Valgrind, up to 20 SimPoints machine processor memory L1 I/D L2 Cache chocovic 1.66GHz Core Duo 1GB 32KB/32KB 1MB jennifer 2GHz Athlon64 X2 1GB 64KB/64KB 512KB sampaka12 2.8GHz Pentium 4 2GB 12Ku/16KB 512KB domori25 3.46GHz Pentium D 4GB 12Ku/16KB 2MB Fusion 16 Integer Results, up to 20 SimPoints, Pentium D Pin Qemu Valgrind 10 12.1 13.0 11.3 10.711.5 13.1 15.6 5 0 SPEC CPU 2006 Breakdown { Pentium D CPI Error (%) -5 -10 -11.4 -12.6 -36.3 -12.8 mcf sjeng gcc.166gcc.200 gcc.expr gcc.g23gcc.s04 omnetpp astar.rivers bzip2.html gcc.cp-declgcc.expr2 gcc.scilab libquantum xalancbmk bzip2.sourcebzip2.chickenbzip2.liberty gcc.c-typeck gobmk.13x13gobmk.nngs hmmer.nph3hmmer.retro astar.BigLakes bzip2.programbzip2.combined gobmk.score2gobmk.trevorcgobmk.trevord h264ref.fore_baseh264ref.fore_mainh264ref.sss_main perlbench.diffmailperlbench.splitmail perlbench.checkspam FP Results, up to 20 SimPoints, Pentium D Pin Qemu Valgrind 10 5 0 n/a CPI Error (%) -5 -10 -20.5 -33.25 lbm wrf dealII milc namd tonto bwaves calculix leslie3d povray sphinx3 gromacs soplex.ref zeusmp cactusADM GemsFDTD gamess.h2ocu2 soplex.pds-50 gamess.cytosinegamess.triazolium Fusion 17 Results { Average CPI error • SPEC2000: 10 SimPoints (<0.4% of reference inputs) ◦ Pin 5.32% ◦ Qemu 5.04% ◦ Valgrind 5.38% • SPEC2006: 10 SimPoints (<0.06% of reference inputs) ◦ Pin 5.58% ◦ Qemu 5.30% ◦ Valgrind 5.28% Fusion 18 Future Work • Generating multi-platform results { MIPS BBV file from IA32 Qemu • Running non-Linux binaries (Solaris, IRIX) • Generating OS-aware SimPoints { Qemu runs full OS Fusion 19 Tools Available for Download All code is available from: http://fusion.csl.cornell.edu/tools/ Fusion 20 Questions? Feedback? All code is available from: http://fusion.csl.cornell.edu/tools/ [email protected] | http://www.csl.cornell.edu/~vince [email protected] | http://www.csl.cornell.edu/~sam Fusion 21.
Recommended publications
  • Comparison of Contemporary Real Time Operating Systems
    ISSN (Online) 2278-1021 IJARCCE ISSN (Print) 2319 5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 11, November 2015 Comparison of Contemporary Real Time Operating Systems Mr. Sagar Jape1, Mr. Mihir Kulkarni2, Prof.Dipti Pawade3 Student, Bachelors of Engineering, Department of Information Technology, K J Somaiya College of Engineering, Mumbai1,2 Assistant Professor, Department of Information Technology, K J Somaiya College of Engineering, Mumbai3 Abstract: With the advancement in embedded area, importance of real time operating system (RTOS) has been increased to greater extent. Now days for every embedded application low latency, efficient memory utilization and effective scheduling techniques are the basic requirements. Thus in this paper we have attempted to compare some of the real time operating systems. The systems (viz. VxWorks, QNX, Ecos, RTLinux, Windows CE and FreeRTOS) have been selected according to the highest user base criterion. We enlist the peculiar features of the systems with respect to the parameters like scheduling policies, licensing, memory management techniques, etc. and further, compare the selected systems over these parameters. Our effort to formulate the often confused, complex and contradictory pieces of information on contemporary RTOSs into simple, analytical organized structure will provide decisive insights to the reader on the selection process of an RTOS as per his requirements. Keywords:RTOS, VxWorks, QNX, eCOS, RTLinux,Windows CE, FreeRTOS I. INTRODUCTION An operating system (OS) is a set of software that handles designed known as Real Time Operating System (RTOS). computer hardware. Basically it acts as an interface The motive behind RTOS development is to process data between user program and computer hardware.
    [Show full text]
  • SIMD Extensions
    SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel.
    [Show full text]
  • Software De Sistemas Introduccion
    SOFTWARE DE SISTEMAS INTRODUCCION • EL CONCEPTO DE SOFTWARE VA MAS ALLA DE LOS PROGRAMAS DE COMPUTACION EN SUS DISTINTOS ESTADOS QUE ABARCA TODO LO INTANGIBLE RELACIONADO Algodesl.wordpress.com t i ¿QUE ES? p o s d e s o f t w a r e . c o m • Parte esencial para clasificar sistemas operativos • Conocido como software base • Conjunto de programas de software que permite interactuar con el hardware y operarlo junto con configuraciones y funciones de entrada y salida ¿Que hace? • Permite utilizar sistema operativo e informático, incluyendo herramientas de diagnostico • Su propósito es aislar un programa o hardware de aplicaciones tanto como sea posible proyectoova.webcindario.com Ejemplos de programas de software de sistema a d • Potenciales ejemplos: r ia n ▪ cargadores a l is s e ▪ Enlazadores t .b lo ▪ Utilidad de software g s p o ▪ Interfaz grafica t ▪ Celdas ▪ Bios ▪ Hipervisores ▪ Gestores de arranque losejemplos.com *si el software del sistema se almacena en memoria volátil se denomina firware Tipos • Como tal no existen varios tipos de software de sistema pero se pueden dividir en 3; O k • Sistema operativo h o s • t Controladores i n g • . Programas de utilería c o m Tipos; SISTEMA OPERATIVO • Parte que se encarga de administración de hardware como los componentes de computadora y encargado de que todos se unan para funcionar en 1 solo objetivo m i n d 4 2 . c o m SISTEMAS OPERATIVOS • MICROSOFT WINDOWS ▪ Núcleo: monolítico (versiones basadas en MS­DOS) e hibrido (versiones basadas en Windows NT) ▪ Plataformas: ARM, arquitectura Intel, MIPS, Alpha, Power PC SISTEMAS OPERATIVOS • MAC OS ▪ Núcleo: XNU basado en mach y BSD, es tipo hibrido ▪ Plataformas; power PC SISTEMAS OPERATIVOS • LINUX: ▪ Núcleo: núcleo Linux ▪ Plataformas; DEC Alpha, ARM, POWER PC, superH, SPARC, ETRAX CRIS, MIPS, MN103… etc.
    [Show full text]
  • Generic Pipelined Processor Modeling and High Performance
    Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation Mehrdad Reshadi, Nikil Dutt Center for Embedded Computer Systems (CECS), Donald Bren School of Information and Computer Science, University of California Irvine, CA 92697, USA. {reshadi, dutt}@cecs.uci.edu simulators were more limited or slower than their manually generated Abstract counterparts. Detailed modeling of processors and high performance cycle- Colored Petri Net (CPN) [1] is a very powerful and flexible accurate simulators are essential for today’s hardware and software modeling technique and has been successfully used for describing design. These problems are challenging enough by themselves and parallelism, resource sharing and synchronization. It can naturally have seen many previous research efforts. Addressing both capture most of the behavioral elements of instruction flow in a simultaneously is even more challenging, with many existing processor. However, CPN models of realistic processors are very approaches focusing on one over another. In this paper, we propose complex mostly due to incompatibility of a token-based mechanism for the Reduced Colored Petri Net (RCPN) model that has two capturing data hazards. Such complexity reduces the productivity and advantages: first, it offers a very simple and intuitive way of modeling results in very slow simulators. In this paper, we present Reduced pipelined processors; second, it can generate high performance cycle- Colored Petri Net (RCPN), a generic modeling approach for accurate simulators. RCPN benefits from all the useful features of generating fast cycle-accurate simulators for pipelined processors. Colored Petri Nets without suffering from their exponential growth in RCPN is based on CPN and reduces the modeling complexity by complexity.
    [Show full text]
  • Improving Performance of Virtual Machines by Virtio Bridge Bypass
    www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 4 April 2017, Page No. 20931-20937 Index Copernicus value (2015): 58.10 DOI: 10.18535/ijecs/v6i4.24 Improving performance of Virtual Machines by Virtio bridge Bypass for PCI devices 1Shirley Kotian, 2Kirti Menon, 3Kirti Menon, 4Utsav Mundada, 5Neeraj Vilas Auti 1234PICT, 5PICT [CS] ABSTRACT Inspired by the Virtio module of virtualization, we propose an alternate method to directly communicate with PCI devices such as NIC without the use of any kernel modules. This method uses a specialized module written by us which will avoid the mechanism of bridges like the ones used in Virtio that increase latency. This module will be present in the userspace of the guest OS and we are specifically targeting the e1000 device for this purpose and later plan to make it generic for all PCI devices. Our motivation is to avoid unnecessary communication with the kernel which slows down the system. For the first step. we do resource mapping to map the PCI device memory into userspace. Then, we expose PCI configuration space through a userspace module using ACPI cables. Thus, we create a userspace PCI driver which will decrease the latency in access time and increase speed of execution. The applications in the Guest OS that request communication with the PCI devices will be redirected to our application. This will take some load off the kernel and reduce its overhead. Finally, we boot a VM that actually talks to our PCI device emulator. GENERAL TERMS Virtio, UPCI, e1000, QEMU, virt-manager, UIO.
    [Show full text]
  • Virtualization for Embedded Software
    JETS VIRTUALIZATION FOR EMBEDDED SOFTWARE Seasoned aerospace organizations have already weathered the VIRTUALIZATION WITH JETS: change from bespoke systems and proprietary languages to COTS A KEY ENABLER OF DEVOPS TRANSFORMATION and open-source, as well as paradigm shifts from high-overhead FOR AEROSPACE AND DEFENSE waterfall and spiral development models to lightweight and agile methodologies and are considering moving to scalable agile Like all companies in the technology industry, aerospace and defense methods. As with previous seismic shifts in the industry, most suppliers are facing increased pressure to reduce cost and increase defense and aerospace firms will have to follow, especially as best-of- productivity, while meeting their market’s demand for higher breed tools, talent and processes are all refactored to take advantage overall quality and rapid adoption of new platforms, development of DevOps’ unique productivity and quality improvements. DevOps paradigms and user experiences. is a collective term for a range of modern development practices that combines loosely-coupled architectures and process automation In recent years, the rapid pace of innovation for consumer tech, with changes to the structure of development, IT and product teams. together with a new generation of ‘digital native’ users that demand Adoption of DevOps (See Figure 1). more capability than previous generations, has left many in the industry playing constant catch-up. Adoption of DevOps includes a shift from long, structured release cycles to continuous delivery, and relies on heavy use of automation to improve productivity and quality. This can be complicated by The commercial software world is shifting again, several factors common to the defense and aerospace industry.
    [Show full text]
  • Demystifying Internet of Things Security Successful Iot Device/Edge and Platform Security Deployment — Sunil Cheruvu Anil Kumar Ned Smith David M
    Demystifying Internet of Things Security Successful IoT Device/Edge and Platform Security Deployment — Sunil Cheruvu Anil Kumar Ned Smith David M. Wheeler Demystifying Internet of Things Security Successful IoT Device/Edge and Platform Security Deployment Sunil Cheruvu Anil Kumar Ned Smith David M. Wheeler Demystifying Internet of Things Security: Successful IoT Device/Edge and Platform Security Deployment Sunil Cheruvu Anil Kumar Chandler, AZ, USA Chandler, AZ, USA Ned Smith David M. Wheeler Beaverton, OR, USA Gilbert, AZ, USA ISBN-13 (pbk): 978-1-4842-2895-1 ISBN-13 (electronic): 978-1-4842-2896-8 https://doi.org/10.1007/978-1-4842-2896-8 Copyright © 2020 by The Editor(s) (if applicable) and The Author(s) This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material.
    [Show full text]
  • IXP43X Product Line of Network Processors Specification Update December 2008 2 Order Number: 316847; Revision: 005US Contents
    Intel® IXP43X Product Line of Network Processors Specification Update December 2008 Order Number: 316847; Revision: 005US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
    [Show full text]
  • Intel® Itanium® Architecture Assembly Language Reference Guide
    Intel® Itanium® Architecture Assembly Language Reference Guide Copyright © 2000 - 2003 Intel Corporation. All rights reserved. Order Number: 248801-004 World Wide Web: http://developer.intel.com Intel(R) Itanium(R) Architecture Assembly Lanuage Reference Guide Page 2 Disclaimer and Legal Information Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. This Intel® Itanium® Architecture Assembly Language Reference Guide as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Intel® 80219 General Purpose PCI Processor Based on Intel Xscale® Microarchitecture Adds Performance and Feature Integration at a Low Cost
    Product Brief Intel® 80219 General Purpose PCI Processor Based on Intel XScale® Microarchitecture Adds Performance and Feature Integration at a Low Cost Product Highlights I 32-bit high-performance CPU (400, 600 MHz) based upon Intel XScale® microarchitecture I Integrated 64-bit PCI/PCI-X interface (PCI 2.2, PCI-X 1.0a), 32-bit PCI supported I 200 MHz DDR SDRAM with ECC with support for up to 1 GB of 64-bit memory (32-bit mode supported) I Intel® Superpipelined RISC Technology (7-stage integer, 8-stage memory) Single-Chip General Purpose I 32 KB data cache, 32 KB instruction cache PCI Processor Based on ® I 2 KB mini-data cache Intel XScale Technology I ARM* Version 5TE compliant Intel® 80219 General Purpose PCI Processor I Programmable speed 32-bit local bus with Intel XScale microarchitecture is a single- (33, 66, or 100 MHz)/Flash I/F chip solution that opens the door to cost-sensitive, I 200 MHz internal bus delivers high-performance embedded application use. ~1.6 GB/s bandwidth The 64-bit PCI-X interface is one of several I 2-channel DMA controller (1024-byte registers) enticing features. The 80219 processor’s 64-bit I 2 I2C ports PCI-X interface is fully backward compatible to I Watchdog timer, 2 programmable timers PCI 2.2 standards enabling easy interface to (Auto-reload, programmed duration, lower-cost PCI components today, and the PCI-X selectable prescaling) 1.0a compatibility makes it simple to upgrade to I 4096-byte ATU buffers PCI-X in the future.
    [Show full text]
  • Network Processors: Building Block for Programmable Networks
    NetworkNetwork Processors:Processors: BuildingBuilding BlockBlock forfor programmableprogrammable networksnetworks Raj Yavatkar Chief Software Architect Intel® Internet Exchange Architecture [email protected] 1 Page 1 Raj Yavatkar OutlineOutline y IXP 2xxx hardware architecture y IXA software architecture y Usage questions y Research questions Page 2 Raj Yavatkar IXPIXP NetworkNetwork ProcessorsProcessors Control Processor y Microengines – RISC processors optimized for packet processing Media/Fabric StrongARM – Hardware support for Interface – Hardware support for multi-threading y Embedded ME 1 ME 2 ME n StrongARM/Xscale – Runs embedded OS and handles exception tasks SRAM DRAM Page 3 Raj Yavatkar IXP:IXP: AA BuildingBuilding BlockBlock forfor NetworkNetwork SystemsSystems y Example: IXP2800 – 16 micro-engines + XScale core Multi-threaded (x8) – Up to 1.4 Ghz ME speed RDRAM Microengine Array Media – 8 HW threads/ME Controller – 4K control store per ME Switch MEv2 MEv2 MEv2 MEv2 Fabric – Multi-level memory hierarchy 1 2 3 4 I/F – Multiple inter-processor communication channels MEv2 MEv2 MEv2 MEv2 Intel® 8 7 6 5 y NPU vs. GPU tradeoffs PCI XScale™ Core MEv2 MEv2 MEv2 MEv2 – Reduce core complexity 9 10 11 12 – No hardware caching – Simpler instructions Î shallow MEv2 MEv2 MEv2 MEv2 Scratch pipelines QDR SRAM 16 15 14 13 Memory – Multiple cores with HW multi- Controller Hash Per-Engine threading per chip Unit Memory, CAM, Signals Interconnect Page 4 Raj Yavatkar IXPIXP 24002400 BlockBlock DiagramDiagram Page 5 Raj Yavatkar XScaleXScale
    [Show full text]