Microprocessor Systems I

Total Page:16

File Type:pdf, Size:1020Kb

Microprocessor Systems I TALLINN UNIVERSITY of TECHNOLOGY Department of Computer Engineering MICROPROCESSOR SYSTEMS I Microprocessor Systems Architecture IAY 0021 Lecture Notes Arvo Toomsalu Tallinn 1 © Arvo Toomsalu All rights reserved, except as permitted under the Estonian Copyright Act. Contents 1. Microprocessor System's Architecture 3 2. Performance Measuring 16 3. Amdahl's Law 28 4. Microprocessors Taxonomy 31 5. Memory-Storage Devices 39 6. Associative Memory 55 7. Virtual Memory System 61 8. Cache Memory System 71 9. Input-output System 90 10. Input-output System Controllers 112 10.1. Graphics Processor** 116 10.2. Transputer** 123 11. Microprocessor System's Performance 126 12. Instruction Fetch and Execution Principles 132 13. Instruction Pipelining 139 14. Superscalar Processor Fundamentals 172 15. RISC Architecture** 220 16. Modern Superscalar Architecture 231 17. VLIW Architecture 249 18. Processor Architecture Development Trends 266 19. Development Trends in Technology 272 20. Dual- and Multicore Processors 282 21. Computer Architecture Formulas 286 22. Glossary 287 23. Acronyms 293 2 MICROPROCESSOR SYSTEM ARCHITECTURE System Description (AIR Principle) System Description SYSTEM Levels ( MPS) I Architecture [ A] Level WHAT ? System extraordinariness, singularity (some quality of a thing by which it is distinguished from all, or most, others). Specific software/hardware interface , which includes: Instruction set; Memory management and protection; Interrupts; Floating-point standard, etc. Architecture development life-cycle Development Speed V A1 A2 Time 0 t ABC A1, A2 – different architectures of a system 3 Forces on the microprocessor system (computer) architecture are: 1. Applications; 2. Technology; 3. Operating systems; 4. Programming languages; 5. History. Architecture is the iterative searching of the possible designs at all levels of microprocessor system. II Implementation [I] Level HOW ? Logical structure , organization or microarchitecture . Number and location functions; Pipeline configuration; I/O system organization; Location and configuration of caches, etc. ► Microprocessor Families III Realization [R] Level WHICH and WHERE ? Architecture versus Microarchitecture Architecture is used to describe an abstract requirement specification . It refers to the instruction set, registers, and memory data-resident structures that are public to a programmer . Microarchitecture is used to describe the specific composition of elements that realizes architecture in a specific design . It refers to an implementation of (micro)processor architecture in silicon. 4 Intel EPIC IXA Architecture Explicitly Instruction set definition Internet Parallel IA-32 and compatibility Exchange Instruction Architecture Computing Microarchitecture Hardware implementation maintaining instruction set P5 P6 NetBurst Mobile compatibility with high-level architecture Pentium Pro Pentium 4 Realization Pentium Pentium II Pentium D Pentium M Implementation of microarchitecture in microprocessor Pentium III Xeon AIR principle uses only three system description levels, but for more precise description there is used more description levels, as for: 1. Processor-memory level , at which architecture of a microprocessor system is described . 2. Instruction set level , at which the function of each instruction is described . 3. Register transfer level ( RTL ), at which the system’s hardware structure is described in details. 4. Logic gate level , at which the hardware components are described as standard logic elements and flip-flops. 5. Circuit level, at which the hardware components internal structure is opened and they described by active (transistors) and passive elements. 6. Integrated circuit masks level, at which the silicon structures and their layout , that used to implements the system components, are shown. Moving from the first description level to the last, we can see how the behavior of the microprocessor system is transformed into the real hardware and software structure. 5 Architecture Taxonomies Classical Architectures Princeton or von Neumann Architecture Characteristic features : 1. Integral processing unit; 2. United data and instructions memory; 3. United bus between CPU and memory; 4. Centralized control; 5. Low-level programming languages; 6. Memory linear addressing. Harvard Architecture 6 Characteristic features : 1. Instruction and data memories occupy different address spaces; 2. Instruction and data memories have separate buses to the CPU ( A); 3. Instruction and data memories can be implemented in different ways. Modified Harvard Architecture and Flynn`s Classification A pure Harvard architecture has the disadvantage that mechanisms must be provided to separately load the program to be executed into instruction memory and data to data memory. Modern Harvard architecture computers often use a read-only technology for the instruction memory and read-write technology for the data memory . This allows the computer to begin execution of a pre-loaded program as soon as power is applied. The Modified Harvard Architecture is relegated to niche applications – microcontrollers and digital signal processors. The principal advantage of the Harvard architecture – simultaneous access to instruction and data memories has been nullified by modern cache systems. Multi-microprocessor system - a system that use more than one microprocessor to perform a desired application. Multiprocessor systems were designed for at least one of two reasons : 1. Speed-up program; 2. Fault tolerance. The classification of computer architectures based on notions of instruction and data streams or Michael Flynn`s (1972)) classification. Flynn`s classification stresses the architectural relationship at the memory-processor level . Other architectural levels are overlooked. Single Data Stream SISD MISD Multiple Data SIMD => Data Level Parallelism Streams SIMD MIMD MIMD => Thread Level Parallelism Single Multiple Instruction Instruction Stream Streams Flynn-Johnson Classification 7 Instruction Set Architecture In the past, the term computer architecture referred only to instruction set design. The term instruction set architecture ( ISA ) refers to the actual programmer-visible instruction set . ISA enables different implementations of the same architecture but it may prevent using new innovations . Classification instruction set architectures (by J. L. Hennessy & D. A. Patterson) 1. Register-memory architecture 2. Register-register or load-store architecture 3. Accumulator architecture 4. Stack architecture (5). Memory-memory architecture RGF RGF from memory ALU ALU Register-memory architecture Register-register/load architecture Stack top RGF RGF from memory ALU ALU Accumulator architecture Stack architecture 8 Architectures Taxonomy by Milutinovic (1989) 1. Control -driven ( control-flow ) architectures The instruction sequence guides the processing activity. a. Reduced instruction set computers ( RISC ); b. Complex instruction set computers ( CISC ); c. High-level language architectures ( HLL ). 2. Data-driven ( data-flow ) architectures The processing activity is controlled by the readiness of data (experimental). 3. Demand-driven ( reduction ) architectures An instruction is enabled for execution when its results are required as operands for another instruction that has been already enabled for execution (experimental). Architecture Standardization and Open-system Concept An open-system is a collection interfaces , protocols , and data formats that are based on the commonly available, universally accepted standards providing software portability, system interaction, and scalability. o Portability is a property of a source code program to be executed on different hardware platforms under different operating systems. o System interaction is the ability of systems to exchange information automatically, recognizing the format and semantics of data. o Scalability is a property of a program to be executed using different resources with the efficiency rate being proportional to the resources available. Summary Comparison Different ISA Structures Number of Maximum number memory accesses of operands allowed Architecture Examples 0 3 register-register Alpha, Power PC 1 2 register-memory IBM 360/370, Intel 80x86 2 2 memory-memory VAX 3 3 memory-memory VAX 9 Advantages and Disadvantages Different ISA Structures Architecture Advantages Disadvantages 1. Simple, fixed-length 1. Higher instruction count than instruction decoding. architectures with memory register-register 2. Simple code generation references in instructions. (0,3)* model. 2. More instructions, lower 3. Instructions take similar instruction density - larger number of clocks to execute. programs. 1. Operands are not equivalent 1. Data can be accessed since a source operand in a without a separate load binary operation is destroyed. register-memory instruction first. 2. Encoding a register number (1,2) 2. Instruction format tends to and a memory address in each be easy to encode and offers instruction may restrict the good density. number of registers. 3. Clock per instruction varies by operand location. 1. Large variation in instruction Most compact, and doesn't size. memory-memory waste registers for 2. Large variation in work per (2,2) or (3,3) temporaries. instruction. 3. Memory accesses create memory bottleneck. 4. Not used. • In the notation ( m,n ) means m memory operands and n total operands. 10 Operating System Management Functions Operating system (OS) controls the execution of programs on a processor and manages
Recommended publications
  • Antikernel: a Decentralized Secure Hardware-Software Operating
    Antikernel A Decentralized Secure Hardware-Software Operating System Andrew Zonenberg (@azonenberg) Senior Security Consultant, IOActive Bülent Yener Professor, Rensselaer Polytechnic Institute This work is based on Zonenberg’s 2015 doctoral dissertation, advised by Yener. IOActive, Inc. Copyright ©2016. All Rights Reserved. Kernel mode = full access to all state • What OS code needs this level of access? – Memory manager only needs heap metadata – Scheduler only needs run queue – Drivers only need their peripheral – Nothing needs access to state of user-mode apps • No single subsystem that needs access to all state • Any code with ring 0 privs is incompatible with LRP! IOActive, Inc. Copyright ©2016. All Rights Reserved. Monolithic kernel, microkernel, … Huge Huge attack surface Small Better 0 code size - Ring Can we get here? None Monolithic Micro ??? IOActive, Inc. Copyright ©2016. All Rights Reserved. Exokernel (MIT, 1995) • OS abstractions can often hurt performance – You don’t need a full FS to store temporary data on disk • Split protection / segmentation from abstraction Word proc FS Cache Disk driver DHCP server IOActive, Inc. Copyright ©2016. All Rights Reserved. Exokernel (MIT, 1995) • OS does very little: – Divide resources into blocks (CPU time quanta, RAM pages…) – Provide controlled access to them IOActive, Inc. Copyright ©2016. All Rights Reserved. But wait, there’s more… • By removing non-security abstractions from the kernel, we shrink the TCB and thus the attack surface! IOActive, Inc. Copyright ©2016. All Rights Reserved. So what does the kernel have to do? • Well, obviously a few things… – Share CPU time between multiple processes – Allow processes to talk to hardware/drivers – Allow processes to talk to each other – Page-level RAM allocation/access control IOActive, Inc.
    [Show full text]
  • Mali-400 MP: a Scalable GPU for Mobile Devices
    Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Director, Graphics Research, ARM Outline . ARM and Mobile Graphics . Design Constraints for Mobile GPUs . Mali Architecture Overview . Multicore Scaling in Mali-400 MP . Results 2 About ARM . World’s leading supplier of semiconductor IP . Processor Architectures and Implementations . Related IP: buses, caches, debug & trace, physical IP . Software tools and infrastructure . Business Model . License fees . Per-chip royalties . Graphics at ARM . Acquired Falanx in 2006 . ARM Mali is now the world’s most widely licensed GPU family 3 Challenges for Mobile GPUs . Size . Power . Memory Bandwidth 4 More Challenges . Graphics is going into “anything that has a screen” . Mobile . Navigation . Set Top Box/DTV . Automotive . Video telephony . Cameras . Printers . Huge range of form factors, screen sizes, power budgets, and performance requirements . In some applications, a huge difference between peak and average performance requirements 5 Solution: Scalability . Address a wide variety of performance points and applications with a single IP and a single software stack. Need static scalability to adapt to different peak requirements in different platforms / markets . Need dynamic scalability to reduce power when peak performance isn’t needed 6 Options for Scalability . Fine-grained: Multiple pipes, wide SIMD, etc . Proven approach, efficient and effective . But, adding pipes / lanes is invasive . Hard for IP licensees to do on their own . And, hard to partition to provide dynamic scalability . Coarse-grained: Multicore . Easy for licensees to select desired performance . Putting cores on separate power islands allows dynamic scaling 7 Mali 400-MP Top Level Architecture Asynch Mali-400 MP Top-Level APB Geometry Pixel Processor Pixel Processor Pixel Processor Pixel Processor Processor #1 #2 #3 #4 CLKs MaliMMUs RESETs IRQs IDLEs MaliL2 AXI .
    [Show full text]
  • Nyami: a Synthesizable GPU Architectural Model for General-Purpose and Graphics-Specific Workloads
    Nyami: A Synthesizable GPU Architectural Model for General-Purpose and Graphics-Specific Workloads Jeff Bush Philip Dexter†, Timothy N. Miller†, and Aaron Carpenter⇤ San Jose, California †Dept. of Computer Science [email protected] ⇤ Dept. of Electrical & Computer Engineering Binghamton University {pdexter1, millerti, carpente}@binghamton.edu Abstract tempt to bridge this gap, Intel developed Larrabee (now called Graphics processing units (GPUs) continue to grow in pop- Xeon Phi) [25]. Larrabee is architected around small in-order ularity for general-purpose, highly parallel, high-throughput cores with wide vector ALUs to facilitate graphics rendering and systems. This has forced GPU vendors to increase their fo- multi-threading to hide instruction latencies. The use of small, cus on general purpose workloads, sometimes at the expense simple processor cores allows many cores to be packed onto a of the graphics-specific workloads. Using GPUs for general- single die and into a limited power envelope. purpose computation is a departure from the driving forces be- Although GPUs were originally designed to render images for hind programmable GPUs that were focused on a narrow subset visual displays, today they are used frequently for more general- of graphics rendering operations. Rather than focus on purely purpose applications. However, they must still efficiently per- graphics-related or general-purpose use, we have designed and form what would be considered traditional graphics tasks (i.e. modeled an architecture that optimizes for both simultaneously rendering images onto a screen). GPUs optimized for general- to efficiently handle all GPU workloads. purpose computing may downplay graphics-specific optimiza- In this paper, we present Nyami, a co-optimized GPU archi- tions, even going so far as to offload them to software.
    [Show full text]
  • Dynamic Task Scheduling and Binding for Many-Core Systems Through Stream Rewriting
    Dynamic Task Scheduling and Binding for Many-Core Systems through Stream Rewriting Dissertation zur Erlangung des akademischen Grades Doktor-Ingenieur (Dr.-Ing.) der Fakultät für Informatik und Elektrotechnik der Universität Rostock vorgelegt von Lars Middendorf, geb. am 21.09.1982 in Iserlohn aus Rostock Rostock, 03.12.2014 Gutachter Prof. Dr.-Ing. habil. Christian Haubelt Lehrstuhl "Eingebettete Systeme" Institut für Angewandte Mikroelektronik und Datentechnik Universität Rostock Prof. Dr.-Ing. habil. Heidrun Schumann Lehrstuhl Computergraphik Institut für Informatik Universität Rostock Prof. Dr.-Ing. Michael Hübner Lehrstuhl für Eingebettete Systeme der Informationstechnik Fakultät für Elektrotechnik und Informationstechnik Ruhr-Universität Bochum Datum der Abgabe: 03.12.2014 Datum der Verteidigung: 05.03.2015 Acknowledgements First of all, I would like to thank my supervisor Prof. Dr. Christian Haubelt for his guidance during the years, the scientific assistance to write this thesis, and the chance to research on a very individual topic. In addition, I thank my colleagues for interesting discussions and a pleasant working environment. Finally, I would like to thank my family for supporting and understanding me. Contents 1 INTRODUCTION........................................................................................................................1 1.1 STREAM REWRITING ...................................................................................................................5 1.2 RELATED WORK .........................................................................................................................7
    [Show full text]
  • An Overview of MIPS Multi-Threading White Paper
    Public Imagination Technologies An Overview of MIPS Multi-Threading White Paper Copyright © Imagination Technologies Limited. All Rights Reserved. This document is Public. This publication contains proprietary information which is subject to change without notice and is supplied ‘as is’, without any warranty of any kind. Filename : Overview_of_MIPS_Multi_Threading.docx Version : 1.0.3 Issue Date : 19 Dec 2016 Author : Imagination Technologies 1 Revision 1.0.3 Imagination Technologies Public Contents 1. Motivations for Multi-threading ................................................................................................. 3 2. Performance Gains from Multi-threading ................................................................................. 4 3. Types of Multi-threading ............................................................................................................ 4 3.1. Coarse-Grained MT ............................................................................................................ 4 3.2. Fine-Grained MT ................................................................................................................ 5 3.3. Simultaneous MT ................................................................................................................ 6 4. MIPS Multi-threading .................................................................................................................. 6 5. R6 Definition of MT: Virtual Processors ..................................................................................
    [Show full text]
  • C for a Tiny System Implementing C for a Tiny System and Making the Architecture More Suitable for C
    Abstract We have implemented support for Padauk microcontrollers, tiny 8-Bit devices with 60 B to 256 B of RAM, in the Small Device C Compiler (SDCC), showing that the use of (mostly) standard C to program such minimal devices is feasible. We report on our experience and on the difficulties in supporting the hardware multithreading present on some of these devices. To make the devices a better target for C, we propose various enhancements of the architecture, and empirically evaluated their impact on code size. arXiv:2010.04633v1 [cs.PL] 9 Oct 2020 1 C for a tiny system Implementing C for a tiny system and making the architecture more suitable for C Philipp Klaus Krause, Nicolas Lesser October 12, 2020 1 The architecture Padauk microcontrollers use a Harvard architecture with an OTP or Flash pro- gram memory and an 8 bit wide RAM data memory. There also is a third address space for input / output registers. These three memories are accessed using separate instructions. Figure 1 shows the 4 architecture variants, which are commonly called pdk13, pdk14, pdk15 and pdk16 (these names are different from the internal names found in files from the manufacturer-provided IDE) by the width of their program memory. Each instruction is exactly one word in program memory. Most instructions execute in a single cycle, the few excep- tions take 2 cycles. Most instructions use either implicit addressing or direct addressing; the latter usually use the accumulator and one memory operand and write their result into the accumulator or memory. On the pdk13, pdk14 and pdk15, the bit set, reset and test instructions, which use direct addressing, can only access the lower half of the data address space.
    [Show full text]
  • RISC Architecture
    REDUCED INSTRUCTION SET COMPUTERS Prof. Vojin G. Oklobdzija Integration Berkeley, CA 94708 Keywords: IBM 801; RISC; computer architecture; Load/Store Architecture; instruction sets; pipelining; super-scalar machines; super-pipeline machines; optimizing compiler; Branch and Execute; Delayed Branch; Cache; Harvard Architecture; Delayed Load; Super-Scalar; Super-Pipelined. Fall 1999 1. ARCHITECTURE The term Computer Architecture was first defined in the paper by Amdahl, Blaauw and Brooks of International Business Machines (IBM) Corporation announcing IBM System/360 computer family on April 7, 1964 [1,17]. On that day IBM Corporation introduced, in the words of IBM spokesman, "the most important product announcement that this corporation has made in its history". Computer architecture was defined as the attributes of a computer seen by the machine language programmer as described in the Principles of Operation. IBM referred to the Principles of Operation as a definition of the machine which enables machine language programmer to write functionally correct, time independent programs that would run across a number of implementations of that particular architecture. The architecture specification covers: all functions of the machine that are observable by the program [2]. On the other hand Principles of Operation. are used to define the functions that the implementation should provide. In order to be functionally correct it is necessary that the implementation conforms to the Principles of Operation. Principles of Operation document defines computer architecture which includes: • Instruction set • Instruction format • Operation codes • Addressing modes • All registers and memory locations that may be directly manipulated or tested by a machine language program • Formats for data representation Machine Implementation was defined as the actual system organization and hardware structure encompassing the major functional units, data paths, and control.
    [Show full text]
  • Antikernel: a Decentralized Secure Hardware-Software Operating System Architecture
    Antikernel: A Decentralized Secure Hardware-Software Operating System Architecture Andrew Zonenberg1 and B¨ulent Yener2 1 IOActive Inc., Seattle WA 98105, USA, [email protected] 2 Rensselaer Polytechnic Institute, Troy NY 12180, USA, [email protected] Abstract. The \kernel" model has been part of operating system ar- chitecture for decades, but upon closer inspection it clearly violates the principle of least required privilege. The kernel is a single entity which provides many services (memory management, interfacing to drivers, context switching, IPC) having no real relation to each other, and has the ability to observe or tamper with all state of the system. This work presents Antikernel, a novel operating system architecture consisting of both hardware and software components and designed to be fundamen- tally more secure than the state of the art. To make formal verification easier, and improve parallelism, the Antikernel system is highly modular and consists of many independent hardware state machines (one or more of which may be a general-purpose CPU running application or systems software) connected by a packet-switched network-on-chip (NoC). We create and verify an FPGA-based prototype of the system. Keywords: network on chip · system on chip · security · operating sys- tems · hardware accelerators 1 Introduction The Antikernel architecture is intended to be more, yet less, than simply a \ker- nel in hardware". By breaking up functionality and decentralizing as much as possible we aim to create a platform that allows applications to pick and choose the OS features they wish to use, thus reducing their attack surface dramati- cally compared to a conventional OS (and potentially experiencing significant performance gains, as in an exokernel).3 Antikernel is a decentralized architecture with no system calls; all OS func- tionality is accessed through message passing directly to the relevant service.
    [Show full text]
  • The Multiple Precision Integers and Rationals Library Edition 1.0
    MPIR The Multiple Precision Integers and Rationals Library Edition 1.0 8 April 2009 This manual describes how to install and use MPIR, the Multiple Precision Integers and Ratio- nals library, version 1.0. Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. Copyright 2008 William Hart Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being \A GNU Manual", and with the Back-Cover Texts being \You have freedom to copy and modify this GNU Manual, like GNU software". A copy of the license is included in Appendix C [GNU Free Documentation License], page 121. i Table of Contents MPIR Copying Conditions ................................. 1 1 Introduction to MPIR .................................. 2 1.1 How to use this Manual ........................................................ 2 2 Installing MPIR........................................ 3 2.1 Build Options ................................................................. 3 2.2 ABI and ISA.................................................................. 8 2.3 Notes for Package Builds ...................................................... 11 2.4 Notes for Particular Systems .................................................. 12 2.5 Known Build Problems ....................................................... 14 2.6
    [Show full text]
  • On the Exploration of the DRISC Architecture
    UvA-DARE (Digital Academic Repository) On the exploration of the DRISC architecture Yang, Q. Publication date 2014 Link to publication Citation for published version (APA): Yang, Q. (2014). On the exploration of the DRISC architecture. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:30 Sep 2021 Chapter 1 Introduction Contents 1.1 Background . 8 1.2 Multithreading . 8 1.3 Multiple cores . 10 1.4 What’s next . 11 1.5 Case study . 12 1.6 What have we done . 14 1.7 Overview and organization . 16 7 8 CHAPTER 1. INTRODUCTION 1.1 Background Moore’s law, first raised in 1965 [M 65] but later widely accepted as being “the number of transistors on integrated circuits+ doubles approximately every two year” (fig.
    [Show full text]
  • Programming the Cray XMT at PNNL
    21‐May‐12 What is parallel computing? Using multiple computing elements to solve a problem faster Multiple systems Multiple processors Multiple nodes Multiple cores Parallel computing is becoming ubiquitous due to power constraints Multiple cores rather than faster clock speeds Programming the Cray XMT Since cores share memory and programming cores as separate computing elements is too heavy weight, shared memory programming John Feo will become ubiquitous Director Center for Adaptive Supercomputing Software Intel 48‐core x86 processor AMD Athlon X2 6400+ dual‐core en.wikipedia.org/wiki/Multi‐core_processor www.pcper.com/reviews/Processors/Intel‐Shows‐48‐core‐x86‐Processor‐Single‐chip‐Cloud‐Computer May 21, 2012 1 Shared memory Hiding memory latencies The Good Memory hierarchy Read and write any data item C1 . Cn Reduce latency by storing some data nearby No partitioning No message passing Vectors L1 L1 Reads and write performed by Amortize latency by fetching N words at a time L2 L2 hardware Little overhead lower latency, Parallelism higher bandwidth Hide latency by switching tasks The Bad L3 Can also hide other forms of latencies Read and write any data item Race conditions Memory 4 1 21‐May‐12 Barrel processor Multithreading Many threads per processor core Hide latencies via parallelism Thread-level context switch at every instruction cycle Maintain multiple active threads per processor, so that gaps introduced by long latency operations in one thread are filled by instructions in other threads registers ALU “stream” program counter
    [Show full text]
  • Abstract PRET Machines
    Abstract PRET Machines Invited TCRTS award paper Edward A. Lee (award recipient) Jan Reineke Michael Zimmer UC Berkeley Saarland University Swarm64 AS Berkeley, CA 94720 Saarland Informatics Campus Berlin, Germany Email: [email protected] Saarbrucken,¨ Germany Email: [email protected] Email: [email protected] Abstract—Prior work has shown that it is possible to design stability is maintained if events occur within some specified microarchitectures called PRET machines that deliver precise latency after some stimulus. and repeatable timing of software execution without sacrificing No engineered system is perfect. No matter what specifica- performance. That prior work provides specific designs for PRET microarchitectures and compares them against conventional de- tions we use for what a “correct behavior” of the system is, signs. This paper defines a class of microarchitectures called there will always be the possibility that the realized system will abstract PRET machines (APMs) that capture the essential deviate from that behavior in the field. The goal of engineering, temporal properties of PRET machines. We show that APMs therefore, needs to be to clearly define what is a correct deliver deterministic timing with no loss of performance for a behavior, to design a system that realizes that behavior with family of real-time problems consisting of sporadic event streams with deadlines equal to periods. On the other hand, we observe high probability, to provide detectors for violations, and to a tradeoff between deterministic timing and the ability to meet provide safe fallback behaviors when violations occur. deadlines for sporadic event streams with constrained deadlines. A straightforward way to define correct behavior is to specify what properties the output of a system must exhibit for each possible input.
    [Show full text]