Scaling to the End of Silicon with EDGE Architectures

Total Page:16

File Type:pdf, Size:1020Kb

Scaling to the End of Silicon with EDGE Architectures COVER FEATURE Scaling to the End of Silicon with EDGE Architectures The TRIPS architecture is the first instantiation of an EDGE instruction set, a new, post-RISC class of instruction set architectures intended to match semiconductor technology evolution over the next decade, scaling to new levels of power efficiency and high performance. Doug Burger nstruction set architectures have long lifetimes as well as simplified control logic, an entire RISC Stephen W. because introducing a new ISA is tremendously processor could fit on a single chip. By moving to a Keckler disruptive to all aspects of a computer system. register-register architecture, RISC ISAs supported However, slowly evolving ISAs eventually aggressive pipelining and, with careful compiler Kathryn S. I become a poor match to the rapidly changing scheduling, RISC processors attained high perfor- McKinley underlying fabrication technology. When that gap mance despite their reduction in complexity. Mike Dahlin eventually grows too large, the benefits gained by Since RISC architectures, including the x86 renormalizing the architecture to match the under- equivalent with µops, are explicitly designed to sup- Lizy K. John lying technology make the pain of switching ISAs port pipelining, the unprecedented acceleration of Calvin Lin well worthwhile. clock rates—40 percent per year for well over a Charles R. Microprocessor designs are on the verge of a decade—has permitted performance scaling for the Moore post-RISC era in which companies must introduce past 20 years with only small ISA changes. For new ISAs to address the challenges that modern example, Intel has scaled its x86 line from 33 MHz James Burrill CMOS technologies pose while also exploiting the in 1990 to 3.4 GHz today—more than a 100-fold Robert G. massive levels of integration now possible. To meet increase over 14 years. Approximately half of that McDonald these challenges, we have developed a new class of increase has come from designing deeper pipelines. ISAs, called Explicit Data Graph Execution However, recent studies show that pipeline scaling William (EDGE), that will match the characteristics of semi- is nearly exhausted,1 indicating that processor Yoder, and conductor technology over the next decade. design now requires innovations beyond pipeline- the TRIPS centric ISAs. Intel’s recent cancellation of its high- Team TIME FOR A NEW ARCHITECTURAL MODEL? frequency Pentium 4 successors is further evidence The University of The “Architecture Comparisons” sidebar pro- of this imminent shift. Texas at Austin vides a detailed view of how previous architectures Future architectures must support four major evolved to match the driving technology at the time emerging technology characteristics: they were defined. In the 1970s, memory was expensive, so CISC • Pipeline depth limits mean that architects must architectures minimized state with dense instruction rely on other fine-grained concurrency mecha- encoding, variable-length instructions, and small nisms to improve performance. numbers of registers. In the 1980s, the number of • The extreme acceleration of clock speeds has devices that could fit on a single chip replaced hastened power limits; in each market, future absolute transistor count as the key limiting resource. architectures will be constrained to obtain as With a reduced number of instructions and modes much performance as possible given a hard 44 Computer Published by the IEEE Computer Society 0018-9162/04/$20.00 © 2004 IEEE Architecture Comparisons Comparing EDGE with historic architectures illus- ture is that RAW uses compiler-determined issue trates that architectures are never designed in a vac- order (including the inter-ALU routers), whereas uum—they borrow frequently from previous archi- the TRIPS architecture uses dynamic issue, to tol- tectures and can have many similar attributes. erate variable latencies. However, since the RAW ISA supports direct communication of a producer • VLIW: A TRIPS block resembles a 3D VLIW instruction in one tile to a consuming instruction’s instruction, with instructions filling fixed slots in a ALU on another tile, it could be viewed as a stati- rigid structure. However, the execution semantics cally scheduled variant of an EDGE architecture, differ markedly. A VLIW instruction is statically whereas TRIPS is a dynamically scheduled EDGE scheduled—the compiler guarantees when it will ISA. execute in relation to all other instructions.1 All • Dataflow machines: Classic dataflow machines, instructions in a VLIW packet must be indepen- researched heavily at MIT in the 1970s and 1980s,4 dent. The TRIPS processor is a static placement, bear considerable resemblances to intrablock exe- dynamic issue (SPDI) architecture, whereas a VLIW cution in EDGE architectures. Dataflow machines machine is a static placement, static issue (SPSI) were originally targeted at functional programming architecture. The static issue model makes VLIW languages, a suitable match because they have architectures a poor match for highly communica- ample concurrency and have side-effect free, “write- tion-dominated future technologies. Intel’s family once” memory semantics. However, in the context of EPIC architectures is a VLIW variant with simi- of a contemporary imperative language, a dataflow lar limitations. architecture would need to store and process many • Superscalar: An out-of-order RISC or x86 super- unnecessary instructions because of complex con- scalar processor and an EDGE processor traverse trol-flow conditions. EDGE architectures use con- similar dataflow graphs. However, the superscalar trol flow between blocks and support conventional graph traversal involves following renamed point- memory semantics within and across blocks, per- ers in a centralized issue window, which the hard- mitting them to run traditional imperative lan- ware constructs—adding instructions individually— guages such as C or Java, while gaining many of the at great cost to power and scalability. Despite benefits of more traditional dataflow architectures. attempts at partitioned variants,2 the scheduling • Systolic processors: Systolic arrays are a special his- scope of superscalar hardware is too constrained torical class of multidimensional array processors5 to place instructions dynamically and effectively. In that continually process the inputs fed to them. In our scheduling taxonomy, a superscalar processor contrast, EDGE processors issue the set of instruc- is a dynamic placement, dynamic issue (DPDI) tions dynamically in mapped blocks, which makes machine. them considerably more general purpose. • CMP: Many researchers have remarked that a TRIPS-like architecture resembles a chip multi- References processor (CMP) with 16 lightweight processors. 1. J.A. Fisher et al., “Parallel Processing: A Smart Compiler In an EDGE architecture, the global control maps and a Dumb Machine,” Proc. 1984 SIGPLAN Symp. irregular blocks of code to all 16 ALUs at once, and Compiler Construction, ACM Press, 1984, pp. 37-47. the instructions execute at will based on dataflow 2. K.I. Farkas et al., “The Multicluster Architecture: Reduc- order. In a 16-tile CMP, a separate program counter ing Cycle Time Through Partitioning,” Proc. 30th Ann. exists at each tile, and tiles can communicate only IEEE/ACM Int’l Symp. Microarchitecture (MICRO-30), through memory. EDGE architectures are finer- IEEE CS Press, 1997, pp. 149-159. grained than both CMPs and proposed specula- 3. G.S. Sohi, S.E. Breach, and T.N. Vijaykumar, “Multiscalar tively threaded processors.3 Processors,” Proc. 22nd Int’l Symp. Computer Architec- • RAW processors: At first glance, the TRIPS ture (ISCA 95), IEEE CS Press, 1995, pp. 414-425. microarchitecture bears similarities to the RAW 4. Arvind, “Data Flow Languages and Architecture,” Proc. architecture. A RAW processor is effectively a 2D, 8th Int’l Symp. Computer Architecture (ISCA 81), IEEE tiled static machine, with shades of a highly parti- CS Press, 1981, p. 1. tioned VLIW architecture. The main difference 5. H.T. Kung, “Why Systolic Architectures?” Computer, Jan. between a RAW processor and the TRIPS architec- 1982, pp. 37-46. July 2004 45 power ceiling; thus, they must support TRIPS: AN EDGE ARCHITECTURE power-efficient performance. Just as MIPS was an early example of a RISC • Increasing resistive delays through global ISA, the TRIPS architecture is an instantiation of An EDGE ISA on-chip wires means that future ISAs must an EDGE ISA. While other implementations of provides a richer be amenable to on-chip communication- EDGE ISAs are certainly possible, the TRIPS archi- dominated execution. tecture couples compiler-driven placement of interface • Design and mask costs will make running instructions with hardware-determined issue order between the many application types across a single to obtain high performance with good power effi- compiler and the design desirable; future ISAs should sup- ciency. At the University of Texas at Austin, we are microarchitecture. port polymorphism—the ability to use building both a prototype processor and compiler their execution and memory units in dif- implementing the TRIPS architecture, to address ferent ways and modes to run diverse the above four technology-driven challenges as applications. detailed below: The ISA in an EDGE architecture supports •To increase concurrency, the TRIPS ISA in- one main characteristic: direct instruction com- cludes an array of concurrently executing munication. Direct instruction communication arithmetic logic units (ALUs) that provide both
Recommended publications
  • Computer Organization and Architecture Designing for Performance Ninth Edition
    COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION William Stallings Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Editorial Director: Marcia Horton Designer: Bruce Kenselaar Executive Editor: Tracy Dunkelberger Manager, Visual Research: Karen Sanatar Associate Editor: Carole Snyder Manager, Rights and Permissions: Mike Joyce Director of Marketing: Patrice Jones Text Permission Coordinator: Jen Roach Marketing Manager: Yez Alayan Cover Art: Charles Bowman/Robert Harding Marketing Coordinator: Kathryn Ferranti Lead Media Project Manager: Daniel Sandin Marketing Assistant: Emma Snider Full-Service Project Management: Shiny Rajesh/ Director of Production: Vince O’Brien Integra Software Services Pvt. Ltd. Managing Editor: Jeff Holcomb Composition: Integra Software Services Pvt. Ltd. Production Project Manager: Kayla Smith-Tarbox Printer/Binder: Edward Brothers Production Editor: Pat Brown Cover Printer: Lehigh-Phoenix Color/Hagerstown Manufacturing Buyer: Pat Brown Text Font: Times Ten-Roman Creative Director: Jayne Conte Credits: Figure 2.14: reprinted with permission from The Computer Language Company, Inc. Figure 17.10: Buyya, Rajkumar, High-Performance Cluster Computing: Architectures and Systems, Vol I, 1st edition, ©1999. Reprinted and Electronically reproduced by permission of Pearson Education, Inc. Upper Saddle River, New Jersey, Figure 17.11: Reprinted with permission from Ethernet Alliance. Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text. Copyright © 2013, 2010, 2006 by Pearson Education, Inc., publishing as Prentice Hall. All rights reserved. Manufactured in the United States of America.
    [Show full text]
  • Introduction to the Intel® Nios® II Soft Processor
    Introduction to the Intel® Nios® II Soft Processor For Quartus® Prime 18.1 1 Introduction This tutorial presents an introduction the Intel® Nios® II processor, which is a soft processor that can be instantiated on an Intel FPGA device. It describes the basic architecture of Nios II and its instruction set. The Nios II processor and its associated memory and peripheral components are easily instantiated by using Intel’s SOPC Builder or Platform Designer tool in conjunction with the Quartus® Prime software. A full description of the Nios II processor is provided in the Nios II Processor Reference Handbook, which is available in the literature section of the Intel web site. Introductions to the SOPC Builder and Platform Designer tools are given in the tutorials Introduction to the Intel SOPC Builder and Introduction to the Intel Platform Designer Tool, respectively. Both can be found in the University Program section of the web site. Contents: • Nios II System • Overview of Nios II Processor Features • Register Structure • Accessing Memory and I/O Devices • Addressing • Instruction Set • Assembler Directives • Example Program • Exception Processing • Cache Memory • Tightly Coupled Memory Intel Corporation - FPGA University Program 1 March 2019 ® ® INTRODUCTION TO THE INTEL NIOS II SOFT PROCESSOR For Quartus® Prime 18.1 2 Background Intel’s Nios II is a soft processor, defined in a hardware description language, which can be implemented in Intel’s FPGA devices by using the Quartus Prime CAD system. This tutorial provides a basic introduction to the Nios II processor, intended for a user who wishes to implement a Nios II based system on an Intel Development and Education board.
    [Show full text]
  • Performance and Energy Efficient Network-On-Chip Architectures
    Linköping Studies in Science and Technology Dissertation No. 1130 Performance and Energy Efficient Network-on-Chip Architectures Sriram R. Vangal Electronic Devices Department of Electrical Engineering Linköping University, SE-581 83 Linköping, Sweden Linköping 2007 ISBN 978-91-85895-91-5 ISSN 0345-7524 ii Performance and Energy Efficient Network-on-Chip Architectures Sriram R. Vangal ISBN 978-91-85895-91-5 Copyright Sriram. R. Vangal, 2007 Linköping Studies in Science and Technology Dissertation No. 1130 ISSN 0345-7524 Electronic Devices Department of Electrical Engineering Linköping University, SE-581 83 Linköping, Sweden Linköping 2007 Author email: [email protected] Cover Image A chip microphotograph of the industry’s first programmable 80-tile teraFLOPS processor, which is implemented in a 65-nm eight-metal CMOS technology. Printed by LiU-Tryck, Linköping University Linköping, Sweden, 2007 Abstract The scaling of MOS transistors into the nanometer regime opens the possibility for creating large Network-on-Chip (NoC) architectures containing hundreds of integrated processing elements with on-chip communication. NoC architectures, with structured on-chip networks are emerging as a scalable and modular solution to global communications within large systems-on-chip. NoCs mitigate the emerging wire-delay problem and addresses the need for substantial interconnect bandwidth by replacing today’s shared buses with packet-switched router networks. With on-chip communication consuming a significant portion of the chip power and area budgets, there is a compelling need for compact, low power routers. While applications dictate the choice of the compute core, the advent of multimedia applications, such as three-dimensional (3D) graphics and signal processing, places stronger demands for self-contained, low-latency floating-point processors with increased throughput.
    [Show full text]
  • Computer Organization & Architecture Eie
    COMPUTER ORGANIZATION & ARCHITECTURE EIE 411 Course Lecturer: Engr Banji Adedayo. Reg COREN. The characteristics of different computers vary considerably from category to category. Computers for data processing activities have different features than those with scientific features. Even computers configured within the same application area have variations in design. Computer architecture is the science of integrating those components to achieve a level of functionality and performance. It is logical organization or designs of the hardware that make up the computer system. The internal organization of a digital system is defined by the sequence of micro operations it performs on the data stored in its registers. The internal structure of a MICRO-PROCESSOR is called its architecture and includes the number lay out and functionality of registers, memory cell, decoders, controllers and clocks. HISTORY OF COMPUTER HARDWARE The first use of the word ‘Computer’ was recorded in 1613, referring to a person who carried out calculation or computation. A brief History: Computer as we all know 2day had its beginning with 19th century English Mathematics Professor named Chales Babage. He designed the analytical engine and it was this design that the basic frame work of the computer of today are based on. 1st Generation 1937-1946 The first electronic digital computer was built by Dr John V. Atanasoff & Berry Cliford (ABC). In 1943 an electronic computer named colossus was built for military. 1946 – The first general purpose digital computer- the Electronic Numerical Integrator and computer (ENIAC) was built. This computer weighed 30 tons and had 18,000 vacuum tubes which were used for processing.
    [Show full text]
  • Computer Architecture: Dataflow (Part I)
    Computer Architecture: Dataflow (Part I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture n These slides are from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 22: Dataflow I n Video: n http://www.youtube.com/watch? v=D2uue7izU2c&list=PL5PHm2jkkXmh4cDkC3s1VBB7- njlgiG5d&index=19 2 Some Required Dataflow Readings n Dataflow at the ISA level q Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974. q Arvind and Nikhil, “Executing a Program on the MIT Tagged- Token Dataflow Architecture,” IEEE TC 1990. n Restricted Dataflow q Patt et al., “HPS, a new microarchitecture: rationale and introduction,” MICRO 1985. q Patt et al., “Critical issues regarding HPS, a high performance microarchitecture,” MICRO 1985. 3 Other Related Recommended Readings n Dataflow n Gurd et al., “The Manchester prototype dataflow computer,” CACM 1985. n Lee and Hurson, “Dataflow Architectures and Multithreading,” IEEE Computer 1994. n Restricted Dataflow q Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003. q Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004. 4 Today n Start Dataflow 5 Data Flow Readings: Data Flow (I) n Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974. n Treleaven et al., “Data-Driven and Demand-Driven Computer Architecture,” ACM Computing Surveys 1982. n Veen, “Dataflow Machine Architecture,” ACM Computing Surveys 1986. n Gurd et al., “The Manchester prototype dataflow computer,” CACM 1985. n Arvind and Nikhil, “Executing a Program on the MIT Tagged-Token Dataflow Architecture,” IEEE TC 1990.
    [Show full text]
  • A Modular Soft Processor Core in VHDL
    A Modular Soft Processor Core in VHDL Jack Whitham 2002-2003 This is a Third Year project submitted for the degree of MEng in the Department of Computer Science at the University of York. The project will attempt to demonstrate that a modular soft processor core can be designed and implemented on an FPGA, and that the core can be optimised to run a particular embedded application using a minimal amount of FPGA space. The word count of this project (as counted by the Unix wc command after detex was run on the LaTeX source) is 33647 words. This includes all text in the main report and Appendices A, B and C. Excluding source code, the project is 70 pages in length. i Contents I. Introduction 1 1. Background and Literature 1 1.1. Soft Processor Cores . 1 1.2. A Field Programmable Gate Array . 1 1.3. VHSIC Hardware Definition Language (VHDL) . 2 1.4. The Motorola 68020 . 2 II. High-level Project Decisions 3 2. Should the design be based on an existing one? 3 3. Which processor should the soft core be based upon? 3 4. Which processor should be chosen? 3 5. Restating the aims of the project in terms of the chosen processor 4 III. Modular Processor Design Decisions 4 6. Processor Design 4 6.1. Alternatives to a complete processor implementation . 4 6.2. A real processor . 5 6.3. Instruction Decoder and Control Logic . 5 6.4. Arithmetic and Logic Unit (ALU) . 7 6.5. Register File . 7 6.6. Links between Components .
    [Show full text]
  • Assignment Solutions
    Week 1: Assignment Solutions 1. Which of the following statements are true? a. The ENIAC computer was built using mechanical relays. b. Harvard Mark1 computer was built using mechanical relays. c. PASCALINE computer could multiply and divide numbers by repeated addition and subtraction. d. Charles Babbage built his automatic computing engine in 19th century. Solution: ((b) and (c)) ENIAC was built using vacuum tubes. Charles Babbage designed his automatic computing engine but could not built it. 2. Which of the following statements are true for Moore’s law? a. Moore’s law predicts that power dissipation will double every 18 months. b. Moore’s law predicts that the number of transistors per chip will double every 18 months. c. Moore’s law predicts that the speed of VLSI circuits will double every 18 months. d. None of the above. Solution: (b) Moore’s law only predicts that number of transistors per chip will double every 18 months. 3. Which of the following generates the necessary signals required to execute an instruction in a computer? a. Arithmetic and Logic Unit b. Memory Unit c. Control Unit d. Input/Output Unit Solution: (c) Control unit acts as the nerve center of a computer and generates the necessary control signals required to execute an instruction. 4. An instruction ADD R1, A is stored at memory location 4004H. R1 is a processor register and A is a memory location with address 400CH. Each instruction is 32-bit long. What will be the values of PC, IR and MAR during execution of the instruction? a.
    [Show full text]
  • Register Are Used to Quickly Accept, Store, and Transfer Data And
    Register are used to quickly accept, store, and transfer data and instructions that are being used immediately by the CPU, there are various types of Registers those are used for various purpose. Among of the some Mostly used Registers named as AC or Accumulator, Data Register or DR, the AR or Address Register, program counter (PC), Memory Data Register (MDR) ,Index register,Memory Buffer Register. These Registers are used for performing the various Operations. While we are working on the System then these Registers are used by the CPU for Performing the Operations. When We Gives Some Input to the System then the Input will be Stored into the Registers and When the System will gives us the Results after Processing then the Result will also be from the Registers. So that they are used by the CPU for Processing the Data which is given by the User. Registers Perform:- 1) Fetch: The Fetch Operation is used for taking the instructions those are given by the user and the Instructions those are stored into the Main Memory will be fetch by using Registers. 2) Decode: The Decode Operation is used for interpreting the Instructions means the Instructions are decoded means the CPU will find out which Operation is to be performed on the Instructions. 3) Execute: The Execute Operation is performed by the CPU. And Results those are produced by the CPU are then Stored into the Memory and after that they are displayed on the user Screen. Types of Registers are as Followings 1. MAR stand for Memory Address Register This register holds the memory addresses of data and instructions.
    [Show full text]
  • IAR C/C++ Compiler Reference Guide for V850
    IAR Embedded Workbench® IAR C/C++ Compiler Reference Guide for the Renesas V850 Microcontroller Family CV850-9 COPYRIGHT NOTICE © 1998–2013 IAR Systems AB. No part of this document may be reproduced without the prior written consent of IAR Systems AB. The software described in this document is furnished under a license and may only be used or copied in accordance with the terms of such a license. DISCLAIMER The information in this document is subject to change without notice and does not represent a commitment on any part of IAR Systems. While the information contained herein is assumed to be accurate, IAR Systems assumes no responsibility for any errors or omissions. In no event shall IAR Systems, its employees, its contractors, or the authors of this document be liable for special, direct, indirect, or consequential damage, losses, costs, charges, claims, demands, claim for lost profits, fees, or expenses of any nature or kind. TRADEMARKS IAR Systems, IAR Embedded Workbench, C-SPY, visualSTATE, The Code to Success, IAR KickStart Kit, I-jet, I-scope, IAR and the logotype of IAR Systems are trademarks or registered trademarks owned by IAR Systems AB. Microsoft and Windows are registered trademarks of Microsoft Corporation. Renesas is a registered trademark of Renesas Electronics Corporation. V850 is a trademark of Renesas Electronics Corporation. Adobe and Acrobat Reader are registered trademarks of Adobe Systems Incorporated. All other product names are trademarks or registered trademarks of their respective owners. EDITION NOTICE Ninth edition: May 2013 Part number: CV850-9 This guide applies to version 4.x of IAR Embedded Workbench® for the Renesas V850 microcontroller family.
    [Show full text]
  • Computer Organization and Architecture Structure
    COMPUTER ORGANIZATION AND ARCHITECTURE Computer Architecture refers to those attributes of a system that have a direct impact on the logical execution of a program. Examples: o the instruction set o the number of bits used to represent various data types o I/O mechanisms o memory addressing techniques Computer Organization refers to the operational units and their interconnections that realize the architectural specifications. Examples are things that are transparent to the programmer: o control signals o interfaces between computer and peripherals o the memory technology being used So, for example, the fact that a multiply instruction is available is a computer architecture issue. How that multiply is implemented is a computer organization issue. • Architecture is those attributes visible to the programmer o Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques. o e.g. Is there a multiply instruction? • Organization is how features are implemented o Control signals, interfaces, memory technology. o e.g. Is there a hardware multiply unit or is it done by repeated addition? • All Intel x86 family share the same basic architecture • The IBM System/370 family share the same basic architecture • This gives code compatibility o At least backwards • Organization differs between different versions STRUCTURE AND FUNCTION • Structure is the way in which components relate to each other • Function is the operation of individual components as part of the structure • All computer functions are: o Data processing: Computer must be able to process data which may take a wide variety of forms and the range of processing. o Data storage: Computer stores data either temporarily or permanently.
    [Show full text]
  • Configurable Fine-Grain Protection for Multicore Processor Virtualization 1
    Configurable Fine-Grain Protection for Multicore Processor Virtualization David Wentzlaff1, Christopher J. Jackson2, Patrick Griffin3, and Anant Agarwal2 [email protected], [email protected], griffi[email protected], [email protected] 1Princeton University 2Tilera Corp. 3Google Inc. Abstract TLB Access DMA Engine “User” Network I/O Network 2 2 2 2 1 3 1 3 1 3 1 3 Multicore architectures, with their abundant on-chip re- 0 0 0 0 sources, are effectively collections of systems-on-a-chip. The protection system for these architectures must support Key: 0 – User Code, 1 – OS, 2 – Hypervisor, 3 – Hypervisor Debugger multiple concurrently executing operating systems (OSes) Figure 1. With CFP, system software can dy• with different needs, and manage and protect the hard- namically set the privilege level needed to ac• ware’s novel communication mechanisms and hardware cess each fine•grain processor resource. features. Traditional protection systems are insufficient; they protect supervisor from user code, but typically do not protect one system from another, and only support fixed as- ticore systems, a protection system must both temporally signment of resources to protection levels. In this paper, protect and spatially isolate access to resources. Spatial iso- we propose an alternative to traditional protection systems lation is the need to isolate different system software stacks which we call configurable fine-grain protection (CFP). concurrently executing on spatially disparate cores in a mul- CFP enables the dynamic assignment of in-core resources ticore system. Spatial isolation is especially important now to protection levels. We investigate how CFP enables differ- that multicore systems have directly accessible networks ent system software stacks to utilize the same configurable connecting cores to other cores and cores to I/O devices.
    [Show full text]
  • Processor-Organaization.Pdf
    Computer Architecture Faculty Of Computers And Information Technology Second Term 2019- 2020 Dr.Khaled Kh. Sharaf Computer Architecture Chapter 7 CENTRAL PROCESSING UNIT ORGANIZATION Computer Architecture OBJECTIVES • Overview • Processor Unit • Control Unit • CPU Structure and Function • Arithmetic and Logic Unit • Instruction Formats • Addressing Modes • Data Transfer and Manipulation • RISC and CISC Computer Architecture Overview The part of the computer that performs the bulk of data processing operations is called the Central Processing Unit (CPU) and is the central component of a digital computer. Its purpose is to interpret instruction cycles received from memory and perform arithmetic, logic and control operations with data stored in internal register, memory words and I/O interface units. A CPU is usually divided into two parts namely processor unit (Register Unit and Arithmetic Logic Unit) and control unit. Register Unit Control Unit Arithmetic logic Unit (ALU) Components of CPU Computer Architecture Processor Unit The processor unit consists of arithmetic unit, logic unit, a number of registers and internal buses that provides data path for transfer of information between register and arithmetic logic unit. The block diagram of processor unit is shown in figure where all registers are connected through common buses. The registers communicate each other not only for direct data transfer but also while performing various micro-operations. Here two sets of multiplexers select register which perform input data for ALU. A decoder selects destination register by enabling its load input. The function select in ALU determines the particular operation that to be performed. Processor Unit Computer Architecture Processor Unit For an example to perform the operation: R3 = R1 + R2 1.
    [Show full text]