Processor Pipelines and Static Worst-Case Execution Time Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Processor Pipelines and Static Worst-Case Execution Time Analysis Uppsala Dissertations from the Faculty of Science and Technology 36 __________________________________________________________________________________________ JAKOB ENGBLOM Processor Pipelines and Static Worst-Case Execution Time Analysis ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2002 Dissertation for the Degree of Doctor of Philosophy in Computer Systems pre- sented at Uppsala University, April 19, 2002. ABSTRACT Engblom, J. 2002: Processor Pipelines and Static Worst-Case Execution Time Analy- sis. Acta Universitatis Upsaliensis. Uppsala dissertations from the Faculty of Science and Technology 36. 130 pp. Uppsala. ISBN 91-554-5228-0. Worst-Case Execution Time (WCET) estimates for programs are necessary when building real-time systems. They are used to ensure timely responses from interrupts, to guarantee the throughput of cyclic tasks, as input to scheduling and schedule anal- ysis algorithms, and in many other circumstances. Traditionally, such estimates have been obtained either by measurements or labor-intensive manual analysis, which is both time consuming and error-prone. Static worst-case execution time analysis is a family of techniques that promise to quickly provide safe execution time estimates for real-time programs, simultaneously increasing system quality and decreasing the development cost. This thesis presents several contributions to the state-of-the-art in WCET analysis. We present an overall architecture for WCET analysis tools that provides a frame- work for implementing modules. Within the stable interfaces provided, modules can be independently replaced, making it easy to customize a tool for a particular target and perform performance-precision trade-offs. We have developed concrete techniques for analyzing and representing the timing behavior of programs running on pipelined processors. The representation and anal- ysis is more powerful than previous approaches in that pipeline timing effects across more than pairs of instructions can be handled, and in that no assumptions are made about the program structure. The analysis algorithm relies on a trace-driven processor simulator instead of a special-purpose processor model. This allows us to use existing simulators to adapt the analysis to a new target platform, reducing the retargeting effort. We have defined a formal mathematical model of processor pipelines, which we use to investigate the properties of pipelines and WCET analysis. We prove several interesting properties of processors with in-order issue, such as the freedom from timing anomalies and the fundamental safety of WCET analysis for certain classes of pipelines. We have also constructed a number of examples that demonstrate that tight and safe WCET analysis for pipelined processors might not be as easy as once believed. Considering the link between the analysis methods and the real world, we discuss how to build accurate software models of processor hardware, and the conditions under which accuracy is achievable. Jakob Engblom, Department of Information Technology, Uppsala University, Box 337, SE-751 05 Uppsala, Sweden, E-mail: [email protected]. Also at IAR Systems AB, Box 23051, SE-750 23, Uppsala, Sweden, E-mail: [email protected] c Jakob Engblom 2002 ISSN 1104-2516 ISBN 91-554-5228-0 Printed in Sweden by Elanders Gotab, Stockholm 2002 Distributor: Uppsala University Library, Box 510, SE-751 20 Uppsala, Sweden www.uu.se; e-mail: [email protected] Acknowledgements First of all, I would like to thank Professor Bengt Jonsson, my supervisor. We have both worked hard during the writing of this thesis, and without him and his keen eye for muddled thinking and hand-waving arguments, the quality of this work would have been much lower. This thesis work was performed as an “industry PhD Student” within the Advanced Software Technology (ASTEC) competence centre (www.astec.uu.se) at Uppsala University funded in part by NUTEK/VINNOVA. The Department of Computer Systems (DoCS, www.docs.uu.se)atUp- psala University represented the academic side and provided half of my fi- nancing. DoCS is now part of the Department of Information Technology (www.it.uu.se). The industrial partner was IAR Systems (www.iar.com). I would like to thank IAR Systems in general, and Olle Landstr¨om and Anders Berg in partic- ular, for giving me the opportunity to do an industry PhD. Taking on a PhD student is a big step for a small company like IAR was in 1997. Being an industry PhD, I have traded teaching work for development work and technical training work at IAR, which I believe has been a very succesful formula. I have felt equally at home at IAR and at the university, and hopefully I have helped increase the flow of information and ideas between industry and academia. For me personally, this duality has been very inspiring and rewarding. Despite the principle of no teaching, I have taken the chance to get a little involved in undergraduate teaching anyway, giving guest lectures for a number of real-time and computer architecture courses. Andreas “Ebbe” Ermedahl has been my team-mate in WCET analysis re- search since I started working in the project in 1997. Together, we have achieved much more than any one of us could have done by himself. I thank Ebbe for years of intense and inspiring cooperation and discussion. Friedhelm Stappert at C-Lab in Paderborn, Germany, joined our project in 1999, adding fresh perspectives and implementation manpower. Despite the i ii geographical distribution, Ebbe, Friedhelm, and I have managed to produce a prototype tool and a number of papers together. Professor Hans Hansson got me into real-time systems and WCET research via my Master’s thesis project back in 1997, and has been a great help in writing papers and discussing ideas ever since. Jan Gustafsson has provided valuable and constructive feedback on this thesis and earlier papers, and has been a valuable partner in the research for a long time. Carl von Platen at IAR has been team leader for me and the other industry PhD students at IAR (Jan Sj¨odin and Johan Runesson) for the past few years, and I thank him for many invigorating discussions. Some of our research in WCET analysis and its applications has been per- formed with the help of hard-working Master’s Thesis students: Sven Mont`an, Martin Carlsson,andMagnus Nilsson. Thanks to them, several interesting ideas have been explored that we would not have had time for otherwise. Jan Lindblad at Enea OSE Systems (www.ose.com)andJ¨orgen Hansson at CC- Systems (www.cc-systems.se) have been kind enough to help finance some of the students and have provided industrial input to our research. Over the years, I have had many interesting discussions with people in in- dustry and academia (not all on WCET analysis). I cannot list them all, but some of them are Professor David Whalley at Florida State University, Iain Bate in York, Professor Sang Lyul Min at Seoul National University, Stefan Petters in M¨unchen, Peter Altenbernd at C-Lab in Paderborn, Thomas Lundqvist at Chalmers, Professor Erik Hagersten here in Uppsala, and Raimund Kirner and Pavel Atanassov at TU Wien. Bj¨orn Victor helped me set up the LATEXdocuments, and Johan Bengtsson and I helped each other figure out how to publish our theses and how and where to print them. I would also like to thank everyone at IAR in Uppsala and DoCS for all the fun we have had together! Anna-Maria Lundins Stipendiefond at Sm˚alands Nation has provided gen- erous grants of travel money. If you are planning on getting a PhD in Uppsala, Sm˚alands is the best nation! The ARTES Network (www.artes.uu.se) has provided travel funding and some nice summer schools. Of course, life in Uppsala would have been much less fun without all of my wonderful friends. I would also like thank my parents,Lars-Ake˚ and Christina, and my brother Samuel and sister Cecilia. They have always been there for me! And a special thanks goes to that very special person who has brightened my life in the past few years and who supported me in the stressful spurt of my thesis writing, Eva. Contents 1 Introduction 1 1.1Real-TimeSystems.......................... 1 1.2EmbeddedSystems.......................... 2 1.3 Execution Time and Real-Time Systems .............. 3 1.4 Execution Time Estimates . ..................... 4 1.5UsesofWCET............................ 5 1.6 Obtaining WCET Estimates ..................... 6 1.7ProcessorPipelines.......................... 7 1.7.1 Simple Scalar Pipelines ................... 9 1.7.2 Scalar Pipelines . ..................... 10 1.7.3 Superscalar In-Order Pipelines ............... 10 1.7.4 VLIW (Very Long Instruction Word) ............ 11 1.7.5 Superscalar Out-of-Order Pipelines ............. 11 1.8PropertiesofEmbeddedHardware................. 12 1.9PropertiesofEmbeddedSoftware.................. 14 1.9.1 ProgrammingStyle...................... 14 1.9.2 Algorithmic Timing Behavior ................ 15 1.10 Industrial Practice and Attitudes .................. 16 1.11 Contributions of This Thesis .................... 17 1.12 Outline ................................ 19 2 WCET Overview and Previous Work 21 2.1ComponentsofWCETAnalysis................... 21 2.2 Flow Analysis ............................. 22 2.2.1 Flow Determination ..................... 22 2.2.2 Flow Representation ..................... 24 2.2.3 PreparationforCalculation................. 25 2.2.4 The Mapping Problem .................... 25 2.3 Low-Level Analysis .......................... 26 2.3.1 Global Low-Level Analysis ................. 26 2.3.2 Local Low-Level Analysis .................
Recommended publications
  • Pipelining and Vector Processing
    Chapter 8 Pipelining and Vector Processing 8–1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8–2 Pipeline stalls can be caused by three types of hazards: resource, data, and control hazards. Re- source hazards result when two or more instructions in the pipeline want to use the same resource. Such resource conflicts can result in serialized execution, reducing the scope for overlapped exe- cution. Data hazards are caused by data dependencies among the instructions in the pipeline. As a simple example, suppose that the result produced by instruction I1 is needed as an input to instruction I2. We have to stall the pipeline until I1 has written the result so that I2 reads the correct input. If the pipeline is not designed properly, data hazards can produce wrong results by using incorrect operands. Therefore, we have to worry about the correctness first. Control hazards are caused by control dependencies. As an example, consider the flow control altered by a branch instruction. If the branch is not taken, we can proceed with the instructions in the pipeline. But, if the branch is taken, we have to throw away all the instructions that are in the pipeline and fill the pipeline with instructions at the branch target. 8–3 Prefetching is a technique used to handle resource conflicts. Pipelining typically uses just-in- time mechanism so that only a simple buffer is needed between the stages. We can minimize the performance impact if we relax this constraint by allowing a queue instead of a single buffer.
    [Show full text]
  • Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
    White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems.
    [Show full text]
  • Diploma Thesis
    Faculty of Computer Science Chair for Real Time Systems Diploma Thesis Timing Analysis in Software Development Author: Martin Däumler Supervisors: Jun.-Prof. Dr.-Ing. Robert Baumgartl Dr.-Ing. Andreas Zagler Date of Submission: March 31, 2008 Martin Däumler Timing Analysis in Software Development Diploma Thesis, Chemnitz University of Technology, 2008 Abstract Rapid development processes and higher customer requirements lead to increasing inte- gration of software solutions in the automotive industry’s products. Today, several elec- tronic control units communicate by bus systems like CAN and provide computation of complex algorithms. This increasingly requires a controlled timing behavior. The following diploma thesis investigates how the timing analysis tool SymTA/S can be used in the software development process of the ZF Friedrichshafen AG. Within the scope of several scenarios, the benefits of using it, the difficulties in using it and the questions that can not be answered by the timing analysis tool are examined. Contents List of Figures iv List of Tables vi 1 Introduction 1 2 Execution Time Analysis 3 2.1 Preface . 3 2.2 Dynamic WCET Analysis . 4 2.2.1 Methods . 4 2.2.2 Problems . 4 2.3 Static WCET Analysis . 6 2.3.1 Methods . 6 2.3.2 Problems . 7 2.4 Hybrid WCET Analysis . 9 2.5 Survey of Tools: State of the Art . 9 2.5.1 aiT . 9 2.5.2 Bound-T . 11 2.5.3 Chronos . 12 2.5.4 MTime . 13 2.5.5 Tessy . 14 2.5.6 Further Tools . 15 2.6 Examination of Methods . 16 2.6.1 Software Description .
    [Show full text]
  • Computer Science 246 Computer Architecture Spring 2010 Harvard University
    Computer Science 246 Computer Architecture Spring 2010 Harvard University Instructor: Prof. David Brooks [email protected] Dynamic Branch Prediction, Speculation, and Multiple Issue Computer Science 246 David Brooks Lecture Outline • Tomasulo’s Algorithm Review (3.1-3.3) • Pointer-Based Renaming (MIPS R10000) • Dynamic Branch Prediction (3.4) • Other Front-end Optimizations (3.5) – Branch Target Buffers/Return Address Stack Computer Science 246 David Brooks Tomasulo Review • Reservation Stations – Distribute RAW hazard detection – Renaming eliminates WAW hazards – Buffering values in Reservation Stations removes WARs – Tag match in CDB requires many associative compares • Common Data Bus – Achilles heal of Tomasulo – Multiple writebacks (multiple CDBs) expensive • Load/Store reordering – Load address compared with store address in store buffer Computer Science 246 David Brooks Tomasulo Organization From Mem FP Op FP Registers Queue Load Buffers Load1 Load2 Load3 Load4 Load5 Store Load6 Buffers Add1 Add2 Mult1 Add3 Mult2 Reservation To Mem Stations FP adders FP multipliers Common Data Bus (CDB) Tomasulo Review 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 LD F0, 0(R1) Iss M1 M2 M3 M4 M5 M6 M7 M8 Wb MUL F4, F0, F2 Iss Iss Iss Iss Iss Iss Iss Iss Iss Ex Ex Ex Ex Wb SD 0(R1), F0 Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss M1 M2 M3 Wb SUBI R1, R1, 8 Iss Ex Wb BNEZ R1, Loop Iss Ex Wb LD F0, 0(R1) Iss Iss Iss Iss M Wb MUL F4, F0, F2 Iss Iss Iss Iss Iss Ex Ex Ex Ex Wb SD 0(R1), F0 Iss Iss Iss Iss Iss Iss Iss Iss Iss M1 M2
    [Show full text]
  • Computer Organization and Architecture Designing for Performance Ninth Edition
    COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION William Stallings Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Editorial Director: Marcia Horton Designer: Bruce Kenselaar Executive Editor: Tracy Dunkelberger Manager, Visual Research: Karen Sanatar Associate Editor: Carole Snyder Manager, Rights and Permissions: Mike Joyce Director of Marketing: Patrice Jones Text Permission Coordinator: Jen Roach Marketing Manager: Yez Alayan Cover Art: Charles Bowman/Robert Harding Marketing Coordinator: Kathryn Ferranti Lead Media Project Manager: Daniel Sandin Marketing Assistant: Emma Snider Full-Service Project Management: Shiny Rajesh/ Director of Production: Vince O’Brien Integra Software Services Pvt. Ltd. Managing Editor: Jeff Holcomb Composition: Integra Software Services Pvt. Ltd. Production Project Manager: Kayla Smith-Tarbox Printer/Binder: Edward Brothers Production Editor: Pat Brown Cover Printer: Lehigh-Phoenix Color/Hagerstown Manufacturing Buyer: Pat Brown Text Font: Times Ten-Roman Creative Director: Jayne Conte Credits: Figure 2.14: reprinted with permission from The Computer Language Company, Inc. Figure 17.10: Buyya, Rajkumar, High-Performance Cluster Computing: Architectures and Systems, Vol I, 1st edition, ©1999. Reprinted and Electronically reproduced by permission of Pearson Education, Inc. Upper Saddle River, New Jersey, Figure 17.11: Reprinted with permission from Ethernet Alliance. Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text. Copyright © 2013, 2010, 2006 by Pearson Education, Inc., publishing as Prentice Hall. All rights reserved. Manufactured in the United States of America.
    [Show full text]
  • Branch-Directed and Pointer-Based Data Cache Prefetching
    Branch-directed and Pointer-based Data Cache Prefetching Yue Liu Mona Dimitri David R. Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA Abstract The design of the on-chip cache memory and branch prediction logic has become an integral part of a microprocessor implementation. Branch predictors reduce the effects of control hazards on pipeline performance. Branch prediction implementations have been proposed which eliminate a majority of the pipeline stalls associated with branches. Caches are commonly used to reduce the performance gap between microprocessor and memory technology. Caches can only provide this benefit if the requested address is in the cache when requested. To increase the probability that addresses are present when requested, prefetching can be employed. Prefetching attempts to prime the cache with instructions and data which will be accessed in the near future. The work presented here describes two new prefetching algorithms. The first ties data cache prefetching to branches in the instruction stream. History of the data stream is incorporated into a Branch Target Buffer (BTB). Since branch behavior determines program control flow, data access patterns are also dependent upon this behavior. Results indicate that combining this strategy with tagged prefetching can significantly improve cache hit ratios. While improving cache hit rates is important, the resulting memory bus traffic introduced can be an issue. Our prefetching technique can also significantly reduce the overall memory bus traffic. The second prefetching scheme dynamically identifies pointer-based data references in the data stream, and records these references in order to successfully prime the cache with the next element in the linked data structure in advance of its access.
    [Show full text]
  • POWER-AWARE MICROARCHITECTURE: Design and Modeling Challenges for Next-Generation Microprocessors
    POWER-AWARE MICROARCHITECTURE: Design and Modeling Challenges for Next-Generation Microprocessors THE ABILITY TO ESTIMATE POWER CONSUMPTION DURING EARLY-STAGE DEFINITION AND TRADE-OFF STUDIES IS A KEY NEW METHODOLOGY ENHANCEMENT. OPPORTUNITIES FOR SAVING POWER CAN BE EXPOSED VIA MICROARCHITECTURE-LEVEL MODELING, PARTICULARLY THROUGH CLOCK- GATING AND DYNAMIC ADAPTATION. Power dissipation limits have Thus far, most of the work done in the area David M. Brooks emerged as a major constraint in the design of high-level power estimation has been focused of microprocessors. At the low end of the per- at the register-transfer-level (RTL) description Pradip Bose formance spectrum, namely in the world of in the processor design flow. Only recently have handheld and portable devices or systems, we seen a surge of interest in estimating power Stanley E. Schuster power has always dominated over perfor- at the microarchitecture definition stage, and mance (execution time) as the primary design specific work on power-efficient microarchi- Hans Jacobson issue. Battery life and system cost constraints tecture design has been reported.2-8 drive the design team to consider power over Here, we describe the approach of using Prabhakar N. Kudva performance in such a scenario. energy-enabled performance simulators in Increasingly, however, power is also a key early design. We examine some of the emerg- Alper Buyuktosunoglu design issue in the workstation and server mar- ing paradigms in processor design and com- kets (see Gowan et al.)1 In this high-end arena ment on their inherent power-performance John-David Wellman the increasing microarchitectural complexities, characteristics. clock frequencies, and die sizes push the chip- Victor Zyuban level—and hence the system-level—power Power-performance efficiency consumption to such levels that traditionally See the “Power-performance fundamentals” Manish Gupta air-cooled multiprocessor server boxes may box.
    [Show full text]
  • 45-Year CPU Evolution: One Law and Two Equations
    45-year CPU evolution: one law and two equations Daniel Etiemble LRI-CNRS University Paris Sud Orsay, France [email protected] Abstract— Moore’s law and two equations allow to explain the a) IC is the instruction count. main trends of CPU evolution since MOS technologies have been b) CPI is the clock cycles per instruction and IPC = 1/CPI is the used to implement microprocessors. Instruction count per clock cycle. c) Tc is the clock cycle time and F=1/Tc is the clock frequency. Keywords—Moore’s law, execution time, CM0S power dissipation. The Power dissipation of CMOS circuits is the second I. INTRODUCTION equation (2). CMOS power dissipation is decomposed into static and dynamic powers. For dynamic power, Vdd is the power A new era started when MOS technologies were used to supply, F is the clock frequency, ΣCi is the sum of gate and build microprocessors. After pMOS (Intel 4004 in 1971) and interconnection capacitances and α is the average percentage of nMOS (Intel 8080 in 1974), CMOS became quickly the leading switching capacitances: α is the activity factor of the overall technology, used by Intel since 1985 with 80386 CPU. circuit MOS technologies obey an empirical law, stated in 1965 and 2 Pd = Pdstatic + α x ΣCi x Vdd x F (2) known as Moore’s law: the number of transistors integrated on a chip doubles every N months. Fig. 1 presents the evolution for II. CONSEQUENCES OF MOORE LAW DRAM memories, processors (MPU) and three types of read- only memories [1]. The growth rate decreases with years, from A.
    [Show full text]
  • Cuda C Best Practices Guide
    CUDA C BEST PRACTICES GUIDE DG-05603-001_v9.0 | June 2018 Design Guide TABLE OF CONTENTS Preface............................................................................................................ vii What Is This Document?..................................................................................... vii Who Should Read This Guide?...............................................................................vii Assess, Parallelize, Optimize, Deploy.....................................................................viii Assess........................................................................................................ viii Parallelize.................................................................................................... ix Optimize...................................................................................................... ix Deploy.........................................................................................................ix Recommendations and Best Practices.......................................................................x Chapter 1. Assessing Your Application.......................................................................1 Chapter 2. Heterogeneous Computing.......................................................................2 2.1. Differences between Host and Device................................................................ 2 2.2. What Runs on a CUDA-Enabled Device?...............................................................3 Chapter 3. Application Profiling.............................................................................
    [Show full text]
  • Introduction to the Intel® Nios® II Soft Processor
    Introduction to the Intel® Nios® II Soft Processor For Quartus® Prime 18.1 1 Introduction This tutorial presents an introduction the Intel® Nios® II processor, which is a soft processor that can be instantiated on an Intel FPGA device. It describes the basic architecture of Nios II and its instruction set. The Nios II processor and its associated memory and peripheral components are easily instantiated by using Intel’s SOPC Builder or Platform Designer tool in conjunction with the Quartus® Prime software. A full description of the Nios II processor is provided in the Nios II Processor Reference Handbook, which is available in the literature section of the Intel web site. Introductions to the SOPC Builder and Platform Designer tools are given in the tutorials Introduction to the Intel SOPC Builder and Introduction to the Intel Platform Designer Tool, respectively. Both can be found in the University Program section of the web site. Contents: • Nios II System • Overview of Nios II Processor Features • Register Structure • Accessing Memory and I/O Devices • Addressing • Instruction Set • Assembler Directives • Example Program • Exception Processing • Cache Memory • Tightly Coupled Memory Intel Corporation - FPGA University Program 1 March 2019 ® ® INTRODUCTION TO THE INTEL NIOS II SOFT PROCESSOR For Quartus® Prime 18.1 2 Background Intel’s Nios II is a soft processor, defined in a hardware description language, which can be implemented in Intel’s FPGA devices by using the Quartus Prime CAD system. This tutorial provides a basic introduction to the Nios II processor, intended for a user who wishes to implement a Nios II based system on an Intel Development and Education board.
    [Show full text]
  • MIPS Architecture • MIPS (Microprocessor Without Interlocked Pipeline Stages) • MIPS Computer Systems Inc
    Spring 2011 Prof. Hyesoon Kim MIPS Architecture • MIPS (Microprocessor without interlocked pipeline stages) • MIPS Computer Systems Inc. • Developed from Stanford • MIPS architecture usages • 1990’s – R2000, R3000, R4000, Motorola 68000 family • Playstation, Playstation 2, Sony PSP handheld, Nintendo 64 console • Android • Shift to SOC http://en.wikipedia.org/wiki/MIPS_architecture • MIPS R4000 CPU core • Floating point and vector floating point co-processors • 3D-CG extended instruction sets • Graphics – 3D curved surface and other 3D functionality – Hardware clipping, compressed texture handling • R4300 (embedded version) – Nintendo-64 http://www.digitaltrends.com/gaming/sony- announces-playstation-portable-specs/ Not Yet out • Google TV: an Android-based software service that lets users switch between their TV content and Web applications such as Netflix and Amazon Video on Demand • GoogleTV : search capabilities. • High stream data? • Internet accesses? • Multi-threading, SMP design • High graphics processors • Several CODEC – Hardware vs. Software • Displaying frame buffer e.g) 1080p resolution: 1920 (H) x 1080 (V) color depth: 4 bytes/pixel 4*1920*1080 ~= 8.3MB 8.3MB * 60Hz=498MB/sec • Started from 32-bit • Later 64-bit • microMIPS: 16-bit compression version (similar to ARM thumb) • SIMD additions-64 bit floating points • User Defined Instructions (UDIs) coprocessors • All self-modified code • Allow unaligned accesses http://www.spiritus-temporis.com/mips-architecture/ • 32 64-bit general purpose registers (GPRs) • A pair of special-purpose registers to hold the results of integer multiply, divide, and multiply-accumulate operations (HI and LO) – HI—Multiply and Divide register higher result – LO—Multiply and Divide register lower result • a special-purpose program counter (PC), • A MIPS64 processor always produces a 64-bit result • 32 floating point registers (FPRs).
    [Show full text]
  • Micro-Circuits for High Energy Physics*
    MICRO-CIRCUITS FOR HIGH ENERGY PHYSICS* Paul F. Kunz Stanford Linear Accelerator Center Stanford University, Stanford, California, U.S.A. ABSTRACT Microprogramming is an inherently elegant method for implementing many digital systems. It is a mixture of hardware and software techniques with the logic subsystems controlled by "instructions" stored Figure 1: Basic TTL Gate in a memory. In the past, designing microprogrammed systems was difficult, tedious, and expensive because the available components were capable of only limited number of functions. Today, however, large blocks of microprogrammed systems have been incorporated into a A input B input C output single I.e., thus microprogramming has become a simple, practical method. false false true false true true true false true true true false 1. INTRODUCTION 1.1 BRIEF HISTORY OF MICROCIRCUITS Figure 2: Truth Table for NAND Gate. The first question which arises when one talks about microcircuits is: What is a microcircuit? The answer is simple: a complete circuit within a single integrated-circuit (I.e.) package or chip. The next question one might ask is: What circuits are available? The answer to this question is also simple: it depends. It depends on the economics of the circuit for the semiconductor manufacturer, which depends on the technology he uses, which in turn changes as a function of time. Thus to understand Figure 3: Logical NOT Circuit. what microcircuits are available today and what makes them different from those of yesterday it is interesting to look into the economics of producing microcircuits. The basic element in a logic circuit is a gate, which is a circuit with a number of inputs and one output and it performs a basic logical function such as AND, OR, or NOT.
    [Show full text]