Hardware Design I Chap. 10 Design of Microprocessor

Total Page:16

File Type:pdf, Size:1020Kb

Hardware Design I Chap. 10 Design of Microprocessor Outline z What is microprocessor? Hardware Design I Chap. 10 z Microprocessor from sequential machine Design of microprocessor viewpoint {Microprocessor and Neumann computer Computing Architecture Lab. {Memory hierarchy {Instruction set architecture Hajime Shimada z Microarchitecture of the microprocessor E-mail: [email protected] {Microarchitecture with sequential processing {Microarchitecture with pipelined processing 1 Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 2 What is microprocessor? Central Processing Unit (CPU) z LSI for processing data at the center of the z A nucleus of Neumann computer computer { Detail will be taught in later slide {Also, called “processor” z Sometimes, the word “microprocessor” denotes this z There are several type of microprocessors z By combining CPU with memories, disks, I/Os, we can create PC or server {Central Processing Unit (CPU) z Examples: Intel core i7, Fujitsu SPARC64 VII, AMD {Microcontroller Opteron, ... {Graphic accelerator {Other several accelerators Computing Architecture Lab. Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 3 Hajime Shimada Hardware Design I (Chap. 10) 4 Microcontroller Graphic accelerator z A processor used for control of electric devices z A processor used for processing graphic { Optimized for those use { Also called Graphic Processing Unit (GPU) z e.g. give high current drive ability to output pin to directly drive LED z Implement too many ALU to utilize parallelism z Many of them can organize computer with one chip { In graphic processing, usually, we can process each pixel { Implement memory hierarchy into them independently z Example: Renesus H8, Atmel AT91, Zilog Z80, ... { It also utilized for high parallelism arithmetic { Too many companies provide them z Examples: NVIDIA GeForce, AMD(ATI) Radeon, ... Computing Architecture Lab. Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 5 Hajime Shimada Hardware Design I (Chap. 10) 6 1 Other accelerators Outline z There’s several processor to accelerate data processing z What is microprocessor? which is not suitable to process with CPU or GPU z Microprocessor from sequential machine { But recently, GPU intrudes to this area viewpoint z Usually, it implements much ALU to supply high arithmetic performance {Microprocessor and Neumann computer z Example: ClearSpeed CSX, Ageia PhysiX, ... {Memory hierarchy {Instruction set architecture z Microarchitecture of the microprocessor {Microarchitecture with sequential processing {Microarchitecture with pipelined processing Computing Architecture Lab. Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 7 Hajime Shimada Hardware Design I (Chap. 10) 8 Microprocessor from sequential circuit Neumann computer viewpoint z We can abstract microprocessor with following z Neumann computer is current major organization sequential machine z Point of Neumann computer z Inputs { Instructions and data are placed in main memory { Programs (= instructions) { Instructions manipulate data of flip-flop and memories { Data for processing { We can apply different manipulation by different instruction z Outputs: Processed data Neumann computer Non-Neman computer z State: Register (and register file) Data before Instructions A hardware processing A hardware which apply Combinational Data before which apply operation Programs Processed processing fixed logic circuit notated in (=instructions) data operation Data after instruction Data after data for processing to data Register (and register file) processing to data processing Main memory Main memory Computing Architecture Lab. Clock Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 9 Hajime Shimada Hardware Design I (Chap. 10) 10 Advantages and disadvantages of Neumann computer and microprocessor Neumann computer z Advantages This hardware is Instructions A hardware a microprocessor { We don’t have to modify hardware between different data which apply Combinational processing Data before processing operation logic circuit z EDSAC(1949) is one of the early Neumann computer notated in z ENIAC have to change wire connection if it change processing Data after instruction { We can execute complicated processing with multiple processing to data Register Main memory (and instructions register file) z Disadvantages (Neumann bottleneck) z Microprocessor is a hardware which operate data processing in { Communication between processor and memory increases Neman computer { Slow memory drags down processor performance { Also called “processor” or “Central Processing Unit (CPU)” z How about non-Neumann computer? { It includes a part of main memory (see memory hierarchy) { It remains in some specific use (e.g. movie codec) z Hardware organization of the microprocessor differs between its purpose { It begins to reposition with reconfigurable hardware ->Chap. 9 { For server, for PC, for high performance embedded, for embedded, ... Computing Architecture Lab. Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 11 Hajime Shimada Hardware Design I (Chap. 10) 12 2 States in the microprocessor Inputs for the microprocessor z How we define states of sequential machine in the z There are two inputs of sequential machine in the processor? processor { Usually, we call it register { Instruction: must be defined if we design sequential machine z There are many types of registers { Data: don’t have to define them { Special purpose registers (SPR) z What’s instruction? z Program counter (PC): Denotes position of instruction which is executing { Series of bits: e.g. 00000000100001010100000000010000 z Flag register: Denotes carry generation, overflow, ... { Usually, we use assembly language to represent it { Global purpose registers (GPR) z A programming language which has one to one relationship to z Used for hold data before/after processing (work as a part of main instruction memory) z It defines operation relationship between registers and main z Also, used for intermediate data under arithmetic memory (in basic) z The organization of register differs between instruction z e.g. add R8, R4, R5 (GPR #8 = GPR #4 + GPR #5) set architectures Relationship between GPR and <-> 00000000100001010100000000010000 Introduce how to define it memory hierarchy is shown in later efficiency in later slide Computing Architecture Lab. Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 13 Hajime Shimada Hardware Design I (Chap. 10) 14 GPR and memory hierarchy (1/2) GPR and memory hierarchy (2/2) z In recent processors, the GPR becomes a part of main z We call “memory hierarchy” for those hierarchical memory { Including disk { Firstly the processor moves data from main memory to register { It also reduces performance degradation from slow device { Processor apply operation to the data in the register z Recently, number of hierarchy increasing because the { After operation, it write back data to main memory speed difference between devices is increasing z This organization effectively reduces workload for main Data Data memory Traditional transmission transmission memory { Assuming that we apply multiple operation to data Disk hierarchy Register Data Data memory Main transmission transmission SRAM SRAM Recent Disk memory Register Internal processor Main memory Main hierarchy Disk L2 cache L2 L1 cache L1 Register FF or SRAM DRAM HDD or SSD memory Main Computing Architecture Lab. Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 15 Hajime Shimada Hardware Design I (Chap. 10) 16 Instruction set architecture (ISA) Instruction encoding z To create sequential machine, we have to define format z Instruction is encoded to chunk of binary under ISA of inputs and internal state definition { Internal state: denoted by registers (for internal state) { e.g. add R8, R4, R5 (GPR #8 = GPR #4 + GPR #5) { Inputs: instructions <-> 00000000100001010100000000010000 z We usually call this definition as Instruction Set z In usual encoding, we give meaning into some chunk of Architecture (ISA) bits { Including systematic instruction construction method z By defining ISA carefully you can reduce { States (registers) R type op rs rt rd shift func { Combinational logics 31 26 21 16 11 6 0 I type op rs rt addr 31 26 21 16 0 Computing Architecture Lab. Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 17 Hajime Shimada Hardware Design I (Chap. 10) 18 3 Example of instruction encoding (1/3) Example of instruction encoding (2/3) z Example: Instruction encoding of MIPS z add R8, R4, R5 z Total length is 32-bit z It has meaning in several chunk of bits 000000 00100 00101 01000 no use 010000 { op: Type of operation (arithmetic, load, store, branch, ...) 31 26 21 16 11 6 0 { rs: Source operand 1 for arithmetic { Operation: R8 = R4 + R5 (add: addition) This difference { rt: Source operand 2 for arithmetic z sub R8, R4, R5 indicates different { rd: Destination operand for arithmetic (store arithmetic result) arithmetic { shift: Amount of shift { func: Type of arithmetic (supplemental for op) 000000 00100 00101 01000 no use 010010 { addr: Immediate value for arithmetic 31 26 21 16 11 6 0 R type op rs rt rd shift func { Operation: R8 = R4 - R5 (sub: subtract) 31 26 21 16 11 6 0 I type op rs rt addr 31 26 21 16 0 Computing Architecture Lab. Computing Architecture Lab. Hajime Shimada Hardware Design I (Chap. 10) 19 Hajime Shimada Hardware Design I (Chap. 10) 20 Example of instruction encoding (3/3) Short Exercise z lw R8, 8(R4) z
Recommended publications
  • Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
    White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems.
    [Show full text]
  • How Data Hazards Can Be Removed Effectively
    International Journal of Scientific & Engineering Research, Volume 7, Issue 9, September-2016 116 ISSN 2229-5518 How Data Hazards can be removed effectively Muhammad Zeeshan, Saadia Anayat, Rabia and Nabila Rehman Abstract—For fast Processing of instructions in computer architecture, the most frequently used technique is Pipelining technique, the Pipelining is consider an important implementation technique used in computer hardware for multi-processing of instructions. Although multiple instructions can be executed at the same time with the help of pipelining, but sometimes multi-processing create a critical situation that altered the normal CPU executions in expected way, sometime it may cause processing delay and produce incorrect computational results than expected. This situation is known as hazard. Pipelining processing increase the processing speed of the CPU but these Hazards that accrue due to multi-processing may sometime decrease the CPU processing. Hazards can be needed to handle properly at the beginning otherwise it causes serious damage to pipelining processing or overall performance of computation can be effected. Data hazard is one from three types of pipeline hazards. It may result in Race condition if we ignore a data hazard, so it is essential to resolve data hazards properly. In this paper, we tries to present some ideas to deal with data hazards are presented i.e. introduce idea how data hazards are harmful for processing and what is the cause of data hazards, why data hazard accord, how we remove data hazards effectively. While pipelining is very useful but there are several complications and serious issue that may occurred related to pipelining i.e.
    [Show full text]
  • POWER-AWARE MICROARCHITECTURE: Design and Modeling Challenges for Next-Generation Microprocessors
    POWER-AWARE MICROARCHITECTURE: Design and Modeling Challenges for Next-Generation Microprocessors THE ABILITY TO ESTIMATE POWER CONSUMPTION DURING EARLY-STAGE DEFINITION AND TRADE-OFF STUDIES IS A KEY NEW METHODOLOGY ENHANCEMENT. OPPORTUNITIES FOR SAVING POWER CAN BE EXPOSED VIA MICROARCHITECTURE-LEVEL MODELING, PARTICULARLY THROUGH CLOCK- GATING AND DYNAMIC ADAPTATION. Power dissipation limits have Thus far, most of the work done in the area David M. Brooks emerged as a major constraint in the design of high-level power estimation has been focused of microprocessors. At the low end of the per- at the register-transfer-level (RTL) description Pradip Bose formance spectrum, namely in the world of in the processor design flow. Only recently have handheld and portable devices or systems, we seen a surge of interest in estimating power Stanley E. Schuster power has always dominated over perfor- at the microarchitecture definition stage, and mance (execution time) as the primary design specific work on power-efficient microarchi- Hans Jacobson issue. Battery life and system cost constraints tecture design has been reported.2-8 drive the design team to consider power over Here, we describe the approach of using Prabhakar N. Kudva performance in such a scenario. energy-enabled performance simulators in Increasingly, however, power is also a key early design. We examine some of the emerg- Alper Buyuktosunoglu design issue in the workstation and server mar- ing paradigms in processor design and com- kets (see Gowan et al.)1 In this high-end arena ment on their inherent power-performance John-David Wellman the increasing microarchitectural complexities, characteristics. clock frequencies, and die sizes push the chip- Victor Zyuban level—and hence the system-level—power Power-performance efficiency consumption to such levels that traditionally See the “Power-performance fundamentals” Manish Gupta air-cooled multiprocessor server boxes may box.
    [Show full text]
  • Flynn's Taxonomy
    Flynn’s Taxonomy n Michael Flynn (from Stanford) q Made a characterization of computer systems which became known as Flynn’s Taxonomy Computer Instructions Data SISD – Single Instruction Single Data Systems SI SISD SD SIMD – Single Instruction Multiple Data Systems “Vector Processors” SIMD SD SI SIMD SD Multiple Data SIMD SD MIMD Multiple Instructions Multiple Data Systems “Multi Processors” Multiple Instructions Multiple Data SI SIMD SD SI SIMD SD SI SIMD SD MISD – Multiple Instructions / Single Data Systems n Some people say “pipelining” lies here, but this is debatable Single Data Multiple Instructions SIMD SI SD SIMD SI SIMD SI Abbreviations •PC – Program Counter •MAR – Memory Access Register •M – Memory •MDR – Memory Data Register •A – Accumulator •ALU – Arithmetic Logic Unit •IR – Instruction Register •OP – Opcode •ADDR – Address •CLU – Control Logic Unit LOAD X n MAR <- PC n MDR <- M[ MAR ] n IR <- MDR n MAR <- IR[ ADDR ] n DECODER <- IR[ OP ] n MDR <- M[ MAR ] n A <- MDR ADD n MAR <- PC n MDR <- M[ MAR ] n IR <- MDR n MAR <- IR[ ADDR ] n DECODER <- IR[ OP ] n MDR <- M[ MAR ] n A <- A+MDR STORE n MDR <- A n M[ MAR ] <- MDR SISD Stack Machine •Stack Trace •Push 1 1 _ •Push 2 2 1 •Add 2 3 •Pop _ 3 •Pop C _ _ •First Stack Machine •B5000 Array Processor Array Processors n One of the first Array Processors was the ILLIIAC IV n Load A1, V[1] n Load B1, Y[1] n Load A2, V[2] n Load B2, Y[2] n Load A3, V[3] n Load B3, Y[3] n ADDALL n Store C1, W[1] n Store C2, W[2] n Store C3, W[3] Memory Interleaving Definition: Memory interleaving is a design used to gain faster access to memory, by organizing memory into separate memories, each with their own MAR (memory address register).
    [Show full text]
  • Computer Organization & Architecture Eie
    COMPUTER ORGANIZATION & ARCHITECTURE EIE 411 Course Lecturer: Engr Banji Adedayo. Reg COREN. The characteristics of different computers vary considerably from category to category. Computers for data processing activities have different features than those with scientific features. Even computers configured within the same application area have variations in design. Computer architecture is the science of integrating those components to achieve a level of functionality and performance. It is logical organization or designs of the hardware that make up the computer system. The internal organization of a digital system is defined by the sequence of micro operations it performs on the data stored in its registers. The internal structure of a MICRO-PROCESSOR is called its architecture and includes the number lay out and functionality of registers, memory cell, decoders, controllers and clocks. HISTORY OF COMPUTER HARDWARE The first use of the word ‘Computer’ was recorded in 1613, referring to a person who carried out calculation or computation. A brief History: Computer as we all know 2day had its beginning with 19th century English Mathematics Professor named Chales Babage. He designed the analytical engine and it was this design that the basic frame work of the computer of today are based on. 1st Generation 1937-1946 The first electronic digital computer was built by Dr John V. Atanasoff & Berry Cliford (ABC). In 1943 an electronic computer named colossus was built for military. 1946 – The first general purpose digital computer- the Electronic Numerical Integrator and computer (ENIAC) was built. This computer weighed 30 tons and had 18,000 vacuum tubes which were used for processing.
    [Show full text]
  • Pipeline and Vector Processing
    Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing Chapter – 4 Pipeline and Vector Processing 4.1 Pipelining Pipelining is a technique of decomposing a sequential process into suboperations, with each subprocess being executed in a special dedicated segment that operates concurrently with all other segments. The overlapping of computation is made possible by associating a register with each segment in the pipeline. The registers provide isolation between each segment so that each can operate on distinct data simultaneously. Perhaps the simplest way of viewing the pipeline structure is to imagine that each segment consists of an input register followed by a combinational circuit. o The register holds the data. o The combinational circuit performs the suboperation in the particular segment. A clock is applied to all registers after enough time has elapsed to perform all segment activity. The pipeline organization will be demonstrated by means of a simple example. o To perform the combined multiply and add operations with a stream of numbers Ai * Bi + Ci for i = 1, 2, 3, …, 7 Each suboperation is to be implemented in a segment within a pipeline. R1 Ai, R2 Bi Input Ai and Bi R3 R1 * R2, R4 Ci Multiply and input Ci R5 R3 + R4 Add Ci to product Each segment has one or two registers and a combinational circuit as shown in Fig. 9-2. The five registers are loaded with new data every clock pulse. The effect of each clock is shown in Table 4-1. Compiled By: Er. Hari Aryal [[email protected]] Reference: W. Stallings | 1 Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing Fig 4-1: Example of pipeline processing Table 4-1: Content of Registers in Pipeline Example General Considerations Any operation that can be decomposed into a sequence of suboperations of about the same complexity can be implemented by a pipeline processor.
    [Show full text]
  • Hardware Architecture
    Hardware Architecture Components Computing Infrastructure Components Servers Clients LAN & WLAN Internet Connectivity Computation Software Storage Backup Integration is the Key ! Security Data Network Management Computer Today’s Computer Computer Model: Von Neumann Architecture Computer Model Input: keyboard, mouse, scanner, punch cards Processing: CPU executes the computer program Output: monitor, printer, fax machine Storage: hard drive, optical media, diskettes, magnetic tape Von Neumann architecture - Wiki Article (15 min YouTube Video) Components Computer Components Components Computer Components CPU Memory Hard Disk Mother Board CD/DVD Drives Adaptors Power Supply Display Keyboard Mouse Network Interface I/O ports CPU CPU CPU – Central Processing Unit (Microprocessor) consists of three parts: Control Unit • Execute programs/instructions: the machine language • Move data from one memory location to another • Communicate between other parts of a PC Arithmetic Logic Unit • Arithmetic operations: add, subtract, multiply, divide • Logic operations: and, or, xor • Floating point operations: real number manipulation Registers CPU Processor Architecture See How the CPU Works In One Lesson (20 min YouTube Video) CPU CPU CPU speed is influenced by several factors: Chip Manufacturing Technology: nm (2002: 130 nm, 2004: 90nm, 2006: 65 nm, 2008: 45nm, 2010:32nm, Latest is 22nm) Clock speed: Gigahertz (Typical : 2 – 3 GHz, Maximum 5.5 GHz) Front Side Bus: MHz (Typical: 1333MHz , 1666MHz) Word size : 32-bit or 64-bit word sizes Cache: Level 1 (64 KB per core), Level 2 (256 KB per core) caches on die. Now Level 3 (2 MB to 8 MB shared) cache also on die Instruction set size: X86 (CISC), RISC Microarchitecture: CPU Internal Architecture (Ivy Bridge, Haswell) Single Core/Multi Core Multi Threading Hyper Threading vs.
    [Show full text]
  • Microcontroller Serial Interfaces
    Microcontroller Serial Interfaces Dr. Francesco Conti [email protected] Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU Example: STM32F101 MCU Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU • Memory • RAM (from 512B to 256kB) • FLASH (from 512B to 1MB) Example: STM32F101 MCU Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU • Memory • RAM (from 512B to 256kB) • FLASH (from 512B to 1MB) • Peripherals • DMA • Timer • Interfaces • Digital Interfaces • Analog Timer DMAs Example: STM32F101 MCU Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU • Memory • RAM (from 512B to 256kB) • FLASH (from 512B to 1MB) • Peripherals • DMA • Timer • Interfaces • Digital • Analog • Interconnect Example: STM32F101 MCU • AHB system bus (ARM-based MCUs) • APB peripheral bus (ARM-based MCUs) Microcontroller System Architecture Each MCU (micro-controller unit) is characterized by: • Microprocessor • 8,16,32 bit architecture • Usually “simple” in-order microarchitecture, no FPU • Memory • RAM (from 512B to 256kB) • FLASH
    [Show full text]
  • Computer Organization Structure of a Computer Registers Register
    Computer Organization Structure of a Computer z Computer design as an application of digital logic design procedures z Block diagram view address z Computer = processing unit + memory system Processor read/write Memory System central processing data z Processing unit = control + datapath unit (CPU) z Control = finite state machine y Inputs = machine instruction, datapath conditions y Outputs = register transfer control signals, ALU operation control signals codes Control Data Path y Instruction interpretation = instruction fetch, decode, data conditions execute z Datapath = functional units + registers instruction unit execution unit y Functional units = ALU, multipliers, dividers, etc. instruction fetch and functional units interpretation FSM y Registers = program counter, shifters, storage registers and registers CS 150 - Spring 2001 - Computer Organization - 1 CS 150 - Spring 2001 - Computer Organization - 2 Registers Register Transfer z Selectively loaded EN or LD input z Point-to-point connection MUX MUX MUX MUX z Output enable OE input y Dedicated wires y Muxes on inputs of rs rt rd R4 z Multiple registers group 4 or 8 in parallel each register z Common input from multiplexer y Load enables LD OE for each register rs rt rd R4 D7 Q7 OE asserted causes FF state to be D6 Q6 connected to output pins; otherwise they y Control signals D5 Q5 are left unconnected (high impedance) MUX D4 Q4 for multiplexer D3 Q3 D2 Q2 LD asserted during a lo-to-hi clock D1 Q1 transition loads new data into FFs z Common bus with output enables D0 CLK
    [Show full text]
  • Computer Architectures
    Parallel (High-Performance) Computer Architectures Tarek El-Ghazawi Department of Electrical and Computer Engineering The George Washington University Tarek El-Ghazawi, Introduction to High-Performance Computing slide 1 Introduction to Parallel Computing Systems Outline Definitions and Conceptual Classifications » Parallel Processing, MPP’s, and Related Terms » Flynn’s Classification of Computer Architectures Operational Models for Parallel Computers Interconnection Networks MPP’s Performance Tarek El-Ghazawi, Introduction to High-Performance Computing slide 2 Definitions and Conceptual Classification What is Parallel Processing? - A form of data processing which emphasizes the exploration and exploitation of inherent parallelism in the underlying problem. Other related terms » Massively Parallel Processors » Heterogeneous Processing – In the1990s, heterogeneous workstations from different processor vendors – Now, accelerators such as GPUs, FPGAs, Intel’s Xeon Phi, … » Grid computing » Cloud Computing Tarek El-Ghazawi, Introduction to High-Performance Computing slide 3 Definitions and Conceptual Classification Why Massively Parallel Processors (MPPs)? » Increase processing speed and memory allowing studies of problems with higher resolutions or bigger sizes » Provide a low cost alternative to using expensive processor and memory technologies (as in traditional vector machines) Tarek El-Ghazawi, Introduction to High-Performance Computing slide 4 Stored Program Computer The IAS machine was the first electronic computer developed, under
    [Show full text]
  • Reverse Engineering X86 Processor Microcode
    Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz, Ruhr-University Bochum https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/koppe This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz Ruhr-Universitat¨ Bochum Abstract hardware modifications [48]. Dedicated hardware units to counter bugs are imperfect [36, 49] and involve non- Microcode is an abstraction layer on top of the phys- negligible hardware costs [8]. The infamous Pentium fdiv ical components of a CPU and present in most general- bug [62] illustrated a clear economic need for field up- purpose CPUs today. In addition to facilitate complex and dates after deployment in order to turn off defective parts vast instruction sets, it also provides an update mechanism and patch erroneous behavior. Note that the implementa- that allows CPUs to be patched in-place without requiring tion of a modern processor involves millions of lines of any special hardware. While it is well-known that CPUs HDL code [55] and verification of functional correctness are regularly updated with this mechanism, very little is for such processors is still an unsolved problem [4, 29]. known about its inner workings given that microcode and the update mechanism are proprietary and have not been Since the 1970s, x86 processor manufacturers have throughly analyzed yet.
    [Show full text]
  • UTP Cable Connectors
    Computer Architecture Prof. Dr. Nizamettin AYDIN [email protected] MIPS ISA - I [email protected] http://www.yildiz.edu.tr/~naydin 1 2 Outline Computer Architecture • Overview of the MIPS architecture • can be viewed as – ISA vs Microarchitecture? – the machine language the CPU implements – CISC vs RISC • Instruction set architecture (ISA) – Basics of MIPS – Built in data types (integers, floating point numbers) – Components of the MIPS architecture – Fixed set of instructions – Fixed set of on-processor variables (registers) – Datapath and control unit – Interface for reading/writing memory – Memory – Mechanisms to do input/output – Memory addressing issue – how the ISA is implemented – Other components of the datapath • Microarchitecture – Control unit 3 4 Computer Architecture MIPS Architecture • Microarchitecture • MIPS – An acronym for Microprocessor without Interlocking Pipeline Stages – Not to be confused with a unit of computing speed equivalent to a Million Instructions Per Second • MIPS processor – born in the early 1980s from the work done by John Hennessy and his students at Stanford University • exploring the architectural concept of RISC – Originally used in Unix workstations, – now mainly used in small devices • Play Station, routers, printers, robots, cameras 5 6 Copyright 2000 N. AYDIN. All rights reserved. 1 MIPS Architecture CISC vs RISC • Designer MIPS Technologies, Imagination Technologies • Advantages of CISC Architecture: • Bits 64-bit (32 → 64) • Introduced 1985; – Microprogramming is easy to implement and much • Version MIPS32/64 Release 6 (2014) less expensive than hard wiring a control unit. • Design RISC – It is easy to add new commands into the chip without • Type Register-Register changing the structure of the instruction set as the • Encoding Fixed architecture uses general-purpose hardware to carry out • Branching Compare and branch commands.
    [Show full text]