COPE-09 Architecture.Indd

Total Page:16

File Type:pdf, Size:1020Kb

COPE-09 Architecture.Indd Computer Organisation & Program Execution 2021 Architecture9 Uwe R. Zimmer - The Australian National University Architecture References for this chapter [Patterson17] David A. Patterson & John L. Hennessy Computer Organization and Design – The Hardware/Software Interface Chapter 4 “The Processor”, Chapter 6 “Parallel Processors from Client to Cloud” ARM edition, Morgan Kaufmann 2017 © 2021 Uwe R. Zimmer, The Australian National University page 468 of 490 (chapter 9: “Architecture” up to page 487) Architecture © 2021 Uwe R. Zimmer, The Australian National University page 469 of 490 (chapter 9: “Architecture” up to page 487) Architecture Defi nition: Processor Hardware origins 18th century machines L’Ecrivain 1770 Programmable, yet not a computer in today’s defi nition (not Turing complete) L’Ecrivain (1770) Pierre Jaquet-Droz, Henri-Louis Jaquet-Droz & Jean-Frédéric Lescho © 2021 Uwe R. Zimmer, The Australian National University page 470 of 490 (chapter 9: “Architecture” up to page 487) Architecture Defi nition: Processor Digital Computers Konrad Zuse with Z1, © Dr. Horst Zuse Hardware origins (replica of the 1937 computer) • Patents by Konrad Zuse (Germany), 1936. ENIAC 1946, • First digital computer: Z1 (Germany), 1937: Re- Glen Beck (background), Betty Jennings (foreground) lays, programmable via punch tape, clock: 1 Hz, 64 words memory à 22-bit, 2 registers, fl oating point unit, weight: 1 t. • First freely programmable (Turing complete) relays computer: Z3 (Germany), 1941: 5.3 Hz • Atanasoff Berry Computer (US) 1942: Vacuum tubes, (not Turing complete). • Colossus Mark 1 (UK) 1944: Vacuum tubes (not Turing complete). • “First Draft of a Report on the EDVAC” (Electronic Discrete Variable Automatic Computer) by John von Neumann (US), 1945: Infl uential article about core elements of a computer: Arithmetic unit, control unit (Sequencer), memory (holding data and program), and I/O. • First high level programming language: Plankalkül (“Plan Calculus”) by Konrad Zuse, 1945. • ENIAC (Electronic Numerical Integrator And Computer) (US) 1946: programed by plugboard, First Turing complete vacuum tubes based computer, clock: 100 kHz, weight: 27 t on 167 m2. © 2021 Uwe R. Zimmer, The Australian National University page 471 of 490 (chapter 9: “Architecture” up to page 487) Architecture Computer Architectures Harvard Architecture • Control unit Instructions Concurrently addresses program and data Control unit memory and fetches next instruction. Controls next ALU operations and Program memoryProgram Address determines the next instruction (based on ALU status). • Arithmetic Logic Unit (ALU) Control Status Fetches data from memory. Executes arithmetic/logic operation. Address Writes data to memory. • Input/Output Arithmetic Logic Unit • Program memory Data memory Input/Output • Data memory Data © 2021 Uwe R. Zimmer, The Australian National University page 472 of 490 (chapter 9: “Architecture” up to page 487) Architecture Computer Architectures von Neumann Architecture • Control unit Instructions Sequentially addresses program and data Control unit memory and fetches next instruction. Controls next ALU operations and Address determines the next instruction (based on ALU status). • Arithmetic Logic Unit (ALU) Control Status Memory Fetches data from memory. Executes arithmetic/logic operation. Writes data to memory. • Input/Output Arithmetic Logic Unit • Memory Program and data is not distinguished Input/Output Data G Programs can change themselves. © 2021 Uwe R. Zimmer, The Australian National University page 473 of 490 (chapter 9: “Architecture” up to page 487) Architecture Computer Architectures A simple processor (CPU) Code management • Decoder/Sequencer Decoder Can be a machine in itself which breaks CPU Sequencer instructions into concurrent micro code. • Execution Unit / Arithmetic-Logic-Unit (ALU) IP A collection of transformational logic. Flags Registers • Memory SP Memory • Registers Instruction pointer, stack pointer, general purpose and specialized registers. ALU • Flags Indicating the states of the latest calculations. Data management • Code/Data management Fetching, Caching, Storing. © 2021 Uwe R. Zimmer, The Australian National University page 474 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Pipeline CodeCode management managementent Some CPU actions are naturally sequential Decoder (e.g. instructions need to be fi rst loaded, then Int. SequencerSe Sequencer decoded before they can be executed). More fi ne grained sequences can IP be introduced by breaking CPU Registers Flags instructions into micro code. SP Memory G Overlapping those sequences in time will lead to the concept of pipelines. ALU G Same latency, yet higher throughput. G (Conditional) branches might break the pipelines G Branch predictors become essential. DataData management management © 2021 Uwe R. Zimmer, The Australian National University page 475 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Parallel pipelines CodeCode management managementt Filling parallel pipelines Decoder (by alternating incoming commands between Int. SequencerSe Sequencererr pipelines) may employ multiple ALU’s. G (Conditional) branches might IP again break the pipelines. Registers FlagsFla G Interdependencies might limit SP Memory the degree of concurrency. G Same latency, yet even higher throughput. AALU ALULU G Compilers need to be aware of the options. DataData management management © 2021 Uwe R. Zimmer, The Australian National University page 476 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Pipeline hazards CodeCode management managementent Structural hazard Decoder G Lack of hardware to run Int. SequencerSe Sequencer operations in parallel, … e.g. load an new instruction and IP load new data in parallel. Registers Flags Control hazard SP G A decision depends on the Memory previous instruction. … e.g. a conditional branch based on the ALU fl ags from the previous instruction. Data hazard G Needed data is not yet available DataData management management … e.g. the result of an arithmetic operation is needed in the next instruction. © 2021 Uwe R. Zimmer, The Australian National University page 477 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Out of order execution CodeCode management managemente Breaking the sequence inside each pipe- Decoder line leads to ‘out of order’ CPU designs. Int. SequencerSe Sequencer G Replace pipelines with hardware scheduler. IP G Results need to be Registers FlagsFla “re-sequentialized” or possibly discarded. SP G “Conditional branch prediction” executes Memory the most likely branch or multiple branches. G Works better if the presented code AALU ALULU sequence has more independent instructions and fewer conditional branches. G This hardware will require (extensive) code optimization to be fully utilized. DataData management management © 2021 Uwe R. Zimmer, The Australian National University page 478 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures SIMD ALU units Code management Provides the facility to apply the same in- Decoder struction to multiple data concurrently. Int. Sequencer Also referred to as “vector units”. Examples: Altivec, MMX, SSE[2|3|4], … IP Flags Registers G Requires specialized compilers SP or programming languages with Memory implicit concurrency. AALALULU ALU ALU ALU GPU processing Graphics processor as a vector unit. Data managementgeemmenmeent G Unifying architecture languages are used (OpenCL, CUDA, GPGPU). © 2021 Uwe R. Zimmer, The Australian National University page 479 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Hyper-threading CodeCod management Emulates multiple virtual CPU cores DecoderDecoder by means of replication of: Int. SequencerSequencer • Register sets • Sequencer IPIPIP • Flags RegistersRRegistersegistersegisegi FlagsFFlagslaggss • Interrupt logic SPSSPP Memory while keeping the “expensive” resources like the ALU central yet accessible by multiple hyper-threads concurrently. ALU G Requires programming languages with implicit or explicit concurrency. DataData management Examples: Intel Pentium 4, Core i5/i7, Xeon, Atom, Sun UltraSPARC T2 (8 threads per core) © 2021 Uwe R. Zimmer, The Australian National University page 480 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Multi-core CPUs CodeCod management Full replication of multiple CPU cores DecoderDecoder on the same chip package. Int. SequencerSequencer • Often combined with hyper-thread- ing and/or multiple other means (as IPIP IP introduced above) on each core. RegistersRRegistersegistersegisgi FlagsFFlagslaggss • Cleanest and most explicit implementation SPSSPP Memory of concurrency on the CPU level. G Requires synchronized atomic operations. ALU G Requires programming languages with implicit or explicit concurrency. Historically the introduction of multi-core DataData management CPUs ended the “GHz race” in the early 2000’s. © 2021 Uwe R. Zimmer, The Australian National University page 481 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Virtual memory Code management Translates logical memory addresses Decoder into physical memory addresses Int. Sequencer and provides memory protection features. IP • Does not introduce concurrency by itself. Registers Flags G Is still essential for concurrent programming as hardware memory protection SP
Recommended publications
  • Computer Architectures
    Computer Architectures Central Processing Unit (CPU) Pavel Píša, Michal Štepanovský, Miroslav Šnorek The lecture is based on A0B36APO lecture. Some parts are inspired by the book Paterson, D., Henessy, V.: Computer Organization and Design, The HW/SW Interface. Elsevier, ISBN: 978-0-12-370606-5 and it is used with authors' permission. Czech Technical University in Prague, Faculty of Electrical Engineering English version partially supported by: European Social Fund Prague & EU: We invests in your future. AE0B36APO Computer Architectures Ver.1.10 1 Computer based on von Neumann's concept ● Control unit Processor/microprocessor ● ALU von Neumann architecture uses common ● Memory memory, whereas Harvard architecture uses separate program and data memories ● Input ● Output Input/output subsystem The control unit is responsible for control of the operation processing and sequencing. It consists of: ● registers – they hold intermediate and programmer visible state ● control logic circuits which represents core of the control unit (CU) AE0B36APO Computer Architectures 2 The most important registers of the control unit ● PC (Program Counter) holds address of a recent or next instruction to be processed ● IR (Instruction Register) holds the machine instruction read from memory ● Another usually present registers ● General purpose registers (GPRs) may be divided to address and data or (partially) specialized registers ● SP (Stack Pointer) – points to the top of the stack; (The stack is usually used to store local variables and subroutine return addresses) ● PSW (Program Status Word) ● IM (Interrupt Mask) ● Optional Floating point (FPRs) and vector/multimedia regs. AE0B36APO Computer Architectures 3 The main instruction cycle of the CPU 1. Initial setup/reset – set initial PC value, PSW, etc.
    [Show full text]
  • Microprocessor Architecture
    EECE416 Microcomputer Fundamentals Microprocessor Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer System CPU (with PC, Register, SR) + Memory 2 Computer Architecture •ALU (Arithmetic Logic Unit) •Binary Full Adder 3 Microprocessor Bus 4 Architecture by CPU+MEM organization Princeton (or von Neumann) Architecture MEM contains both Instruction and Data Harvard Architecture Data MEM and Instruction MEM Higher Performance Better for DSP Higher MEM Bandwidth 5 Princeton Architecture 1.Step (A): The address for the instruction to be next executed is applied (Step (B): The controller "decodes" the instruction 3.Step (C): Following completion of the instruction, the controller provides the address, to the memory unit, at which the data result generated by the operation will be stored. 6 Harvard Architecture 7 Internal Memory (“register”) External memory access is Very slow For quicker retrieval and storage Internal registers 8 Architecture by Instructions and their Executions CISC (Complex Instruction Set Computer) Variety of instructions for complex tasks Instructions of varying length RISC (Reduced Instruction Set Computer) Fewer and simpler instructions High performance microprocessors Pipelined instruction execution (several instructions are executed in parallel) 9 CISC Architecture of prior to mid-1980’s IBM390, Motorola 680x0, Intel80x86 Basic Fetch-Execute sequence to support a large number of complex instructions Complex decoding procedures Complex control unit One instruction achieves a complex task 10
    [Show full text]
  • What Do We Mean by Architecture?
    Embedded programming: Comparing the performance and development workflows for architectures Embedded programming week FABLAB BRIGHTON 2018 What do we mean by architecture? The architecture of microprocessors and microcontrollers are classified based on the way memory is allocated (memory architecture). There are two main ways of doing this: Von Neumann architecture (also known as Princeton) Von Neumann uses a single unified cache (i.e. the same memory) for both the code (instructions) and the data itself, Under pure von Neumann architecture the CPU can be either reading an instruction or reading/writing data from/to the memory. Both cannot occur at the same time since the instructions and data use the same bus system. Harvard architecture Harvard architecture uses different memory allocations for the code (instructions) and the data, allowing it to be able to read instructions and perform data memory access simultaneously. The best performance is achieved when both instructions and data are supplied by their own caches, with no need to access external memory at all. How does this relate to microcontrollers/microprocessors? We found this page to be a good introduction to the topic of microcontrollers and ​ ​ microprocessors, the architectures they use and the difference between some of the common types. First though, it’s worth looking at the difference between a microprocessor and a microcontroller. Microprocessors (e.g. ARM) generally consist of just the Central ​ ​ Processing Unit (CPU), which performs all the instructions in a computer program, including arithmetic, logic, control and input/output operations. Microcontrollers (e.g. AVR, PIC or ​ 8051) contain one or more CPUs with RAM, ROM and programmable input/output ​ peripherals.
    [Show full text]
  • The Von Neumann Computer Model 5/30/17, 10:03 PM
    The von Neumann Computer Model 5/30/17, 10:03 PM CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm The von Neumann Computer Model 1. The von Neumann Computer Model 2. Components of the Von Neumann Model 3. Communication Between Memory and Processing Unit 4. CPU data-path 5. Memory Operations 6. Understanding the MAR and the MDR 7. Understanding the MAR and the MDR, Cont. 8. ALU, the Processing Unit 9. ALU and the Word Length 10. Control Unit 11. Control Unit, Cont. 12. Input/Output 13. Input/Output Ports 14. Input/Output Address Space 15. Console Input/Output in Protected Memory Mode 16. Instruction Processing 17. Instruction Components 18. Why Learn Intel x86 ISA ? 19. Design of the x86 CPU Instruction Set 20. CPU Instruction Set 21. History of IBM PC 22. Early x86 Processor Family 23. 8086 and 8088 CPU 24. 80186 CPU 25. 80286 CPU 26. 80386 CPU 27. 80386 CPU, Cont. 28. 80486 CPU 29. Pentium (Intel 80586) 30. Pentium Pro 31. Pentium II 32. Itanium processor 1. The von Neumann Computer Model Von Neumann computer systems contain three main building blocks: The following block diagram shows major relationship between CPU components: the central processing unit (CPU), memory, and input/output devices (I/O). These three components are connected together using the system bus. The most prominent items within the CPU are the registers: they can be manipulated directly by a computer program. http://www.c-jump.com/CIS77/CPU/VonNeumann/lecture.html Page 1 of 15 IPR2017-01532 FanDuel, et al.
    [Show full text]
  • Short Range Object Detection and Avoidance
    Short Range Object Detection and Avoidance N.F. Jansen CST 2010.068 Traineeship report Coach(es): dr. E. García Canseco, TU/e dr. ing. S. Lichiardopol, TU/e ing. R. Niesten, Wingz BV Supervisor: prof.dr.ir. M. Steinbuch Eindhoven University of Technology Department of Mechanical Engineering Control Systems Technology Eindhoven, November, 2010 Abstract The scope of this internship is to investigate, model, simulate and experiment with a sensor for close range object detection for the purpose of the Tele-Service Robot (TSR) robot. The TSR robot will be implemented in care institutions for the care of elderly and/or disabled. The sensor system should have a supporting role in navigation and mapping of the environment of the robot. Several sensors are investigated, whereas the sonar system is the optimal solution for this application. It’s cost, wide field-of-view, sufficient minimal and maximal distance and networking capabilities of the Devantech SRF-08 sonar sensor is decisive to ultimately choose this sensor system. The positioning, orientation and tilting of the sonar devices is calculated and simulations are made to obtain knowledge about the behavior and characteristics of the sensors working in a ring. Issues surrounding the sensors are mainly erroneous ranging results due to specular reflection, cross-talk and incorrect mounting. Cross- talk can be suppressed by operating in groups, but induces the decrease of refresh rate of the entire robot’s surroundings. Experiments are carried out to investigate the accuracy and sensitivity to ranging errors and cross-talk. Eventually, due to the existing cross-talk, experiments should be carried out to decrease the range and timing to increase the refresh rate because the sensors cannot be fired more than only two at a time.
    [Show full text]
  • Hardware Architecture
    Hardware Architecture Components Computing Infrastructure Components Servers Clients LAN & WLAN Internet Connectivity Computation Software Storage Backup Integration is the Key ! Security Data Network Management Computer Today’s Computer Computer Model: Von Neumann Architecture Computer Model Input: keyboard, mouse, scanner, punch cards Processing: CPU executes the computer program Output: monitor, printer, fax machine Storage: hard drive, optical media, diskettes, magnetic tape Von Neumann architecture - Wiki Article (15 min YouTube Video) Components Computer Components Components Computer Components CPU Memory Hard Disk Mother Board CD/DVD Drives Adaptors Power Supply Display Keyboard Mouse Network Interface I/O ports CPU CPU CPU – Central Processing Unit (Microprocessor) consists of three parts: Control Unit • Execute programs/instructions: the machine language • Move data from one memory location to another • Communicate between other parts of a PC Arithmetic Logic Unit • Arithmetic operations: add, subtract, multiply, divide • Logic operations: and, or, xor • Floating point operations: real number manipulation Registers CPU Processor Architecture See How the CPU Works In One Lesson (20 min YouTube Video) CPU CPU CPU speed is influenced by several factors: Chip Manufacturing Technology: nm (2002: 130 nm, 2004: 90nm, 2006: 65 nm, 2008: 45nm, 2010:32nm, Latest is 22nm) Clock speed: Gigahertz (Typical : 2 – 3 GHz, Maximum 5.5 GHz) Front Side Bus: MHz (Typical: 1333MHz , 1666MHz) Word size : 32-bit or 64-bit word sizes Cache: Level 1 (64 KB per core), Level 2 (256 KB per core) caches on die. Now Level 3 (2 MB to 8 MB shared) cache also on die Instruction set size: X86 (CISC), RISC Microarchitecture: CPU Internal Architecture (Ivy Bridge, Haswell) Single Core/Multi Core Multi Threading Hyper Threading vs.
    [Show full text]
  • Architecture of 8051 & Their Pin Details
    SESHASAYEE INSTITUTE OF TECHNOLOGY ARIYAMANGALAM , TRICHY – 620 010 ARCHITECTURE OF 8051 & THEIR PIN DETAILS UNIT I WELCOME ARCHITECTURE OF 8051 & THEIR PIN DETAILS U1.1 : Introduction to microprocessor & microcontroller : Architecture of 8085 -Functions of each block. Comparison of Microprocessor & Microcontroller - Features of microcontroller -Advantages of microcontroller -Applications Of microcontroller -Manufactures of microcontroller. U1.2 : Architecture of 8051 : Block diagram of Microcontroller – Functions of each block. Pin details of 8051 -Oscillator and Clock -Clock Cycle -State - Machine Cycle -Instruction cycle –Reset - Power on Reset - Special function registers :Program Counter -PSW register -Stack - I/O Ports . U1.3 : Memory Organisation & I/O port configuration: ROM RAM - Memory Organization of 8051,Interfacing external memory to 8051 Microcontroller vs. Microprocessors 1. CPU for Computers 1. A smaller computer 2. No RAM, ROM, I/O on CPU chip 2. On-chip RAM, ROM, I/O itself ports... 3. Example:Intel’s x86, Motorola’s 3. Example:Motorola’s 6811, 680x0 Intel’s 8051, Zilog’s Z8 and PIC Microcontroller vs. Microprocessors Microprocessor Microcontroller 1. CPU is stand-alone, RAM, ROM, I/O, timer are separate 1. CPU, RAM, ROM, I/O and timer are all on a single 2. designer can decide on the chip amount of ROM, RAM and I/O ports. 2. fix amount of on-chip ROM, RAM, I/O ports 3. expansive 3. for applications in which 4. versatility cost, power and space are 5. general-purpose critical 4. single-purpose uP vs. uC – cont. Applications – uCs are suitable to control of I/O devices in designs requiring a minimum component – uPs are suitable to processing information in computer systems.
    [Show full text]
  • SERVO MAGAZINE TEAM DARE’S ROBOT BAND • RADIO for ROBOTS • BUILDING MAXWELL June 2012 Full Page Full Page.Qxd 5/7/2012 6:41 PM Page 2
    0 0 06 . 7 4 $ A D A N A C 0 5 . 5 $ 71486 02422 . $5.50US $7.00CAN S . 0 U CoverNews_Layout 1 5/9/2012 3:21 PM Page 1 Vol. 10 No. 6 SERVO MAGAZINE TEAM DARE’S ROBOT BAND • RADIO FOR ROBOTS • BUILDING MAXWELL June 2012 Full Page_Full Page.qxd 5/7/2012 6:41 PM Page 2 HS-430BH HS-5585MH HS-5685MH HS-7245MH DELUXE BALL BEARING HV CORELESS METAL GEAR HIGH TORQUE HIGH TORQUE CORELESS MINI 6.0 Volts 7.4 Volts 6.0 Volts 7.4 Volts 6.0 Volts 7.4 Volts 6.0 Volts 7.4 Volts Torque: 57 oz-in 69 oz-in Torque: 194 oz-in 236 oz-in Torque: 157 oz-in 179 oz-in Torque: 72 oz-in 89 oz-in Speed: 0.16 sec/60° 0.14 sec/60° Speed: 0.17 sec/60° 0.14 sec/60° Speed: 0.20 sec/60° 0.17 sec/60° Speed: 0.13 sec/60° 0.11 sec/60° HS-7950THHS-7950TH HS-7955TG HS-M7990TH HS-5646WP ULTRA TORQUE CORELESS HIGH TORQUE CORELESS MEGA TORQUE HV MAGNETIC ENCODER WATERPROOF HIGH TORQUE 6.0 Volts 7.4 Volts 4.8 Volts 6.0 Volts 6.0 Volts 7.4 Volts 6.0 Volts 7.4 Volts Torque: 403 oz-in 486 oz-in Torque: 250 oz-in 333 oz-in Torque: 500 oz-in 611 oz-in Torque: 157 oz-in 179 oz-in Speed: 0.17 sec/60° 0.14 sec/60° Speed: 0.19 sec/60° 0.15 sec/60° Speed: 0.21 sec/60° 0.17 sec/60° Speed: 0.20 sec/60° 0.18 sec/60° DIY Projects: Programmable Controllers: Wild Thumper-Based Robot Wixel and Wixel Shield #1702: Premium Jumper #1336: Wixel programmable Wire Assortment M-M 6" microcontroller module with #1372: Pololu Simple Motor integrated USB and a 2.4 Controller 18v7 GHz radio.
    [Show full text]
  • Programming and Customizing the Multicore Propeller
    PROGRAMMING AND CUSTOMIZING THE MULTICORE PROPELLERTM MICROCONTROLLER This page intentionally left blank PROGRAMMING AND CUSTOMIZING THE MULTICORE PROPELLERTM MICROCONTROLLER THE OFFICIAL GUIDE PARALLAX INC. Shane Avery Chip Gracey Vern Graner Martin Hebel Joshua Hintze André LaMothe Andy Lindsay Jeff Martin Hanno Sander New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher. ISBN: 978-0-07-166451-6 MHID: 0-07-166451-3 The material in this eBook also appears in the print version of this title: ISBN: 978-0-07-166450-9, MHID: 0-07-166450-5. All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps. McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. To contact a representative please e-mail us at [email protected]. Information contained in this work has been obtained by The McGraw-Hill Companies, Inc.
    [Show full text]
  • The Von Neumann Architecture of Computer Systems
    The von Neumann Architecture of Computer Systems http://www.csupomona.edu/~hnriley/www/VonN.html The von Neumann Architecture of Computer Systems H. Norton Riley Computer Science Department California State Polytechnic University Pomona, California September, 1987 Any discussion of computer architectures, of how computers and computer systems are organized, designed, and implemented, inevitably makes reference to the "von Neumann architecture" as a basis for comparison. And of course this is so, since virtually every electronic computer ever built has been rooted in this architecture. The name applied to it comes from John von Neumann, who as author of two papers in 1945 [Goldstine and von Neumann 1963, von Neumann 1981] and coauthor of a third paper in 1946 [Burks, et al. 1963] was the first to spell out the requirements for a general purpose electronic computer. The 1946 paper, written with Arthur W. Burks and Hermann H. Goldstine, was titled "Preliminary Discussion of the Logical Design of an Electronic Computing Instrument," and the ideas in it were to have a profound impact on the subsequent development of such machines. Von Neumann's design led eventually to the construction of the EDVAC computer in 1952. However, the first computer of this type to be actually constructed and operated was the Manchester Mark I, designed and built at Manchester University in England [Siewiorek, et al. 1982]. It ran its first program in 1948, executing it out of its 96 word memory. It executed an instruction in 1.2 milliseconds, which must have seemed phenomenal at the time. Using today's popular "MIPS" terminology (millions of instructions per second), it would be rated at .00083 MIPS.
    [Show full text]
  • An Overview of Parallel Computing
    An Overview of Parallel Computing Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) Chengdu HPC Summer School 20-24 July 2015 Plan 1 Hardware 2 Types of Parallelism 3 Concurrency Platforms: Three Examples Cilk CUDA MPI Hardware Plan 1 Hardware 2 Types of Parallelism 3 Concurrency Platforms: Three Examples Cilk CUDA MPI Hardware von Neumann Architecture In 1945, the Hungarian mathematician John von Neumann proposed the above organization for hardware computers. The Control Unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to accomplish the programmed task. The Arithmetic Unit performs basic arithmetic operation, while Input/Output is the interface to the human operator. Hardware von Neumann Architecture The Pentium Family. Hardware Parallel computer hardware Most computers today (including tablets, smartphones, etc.) are equipped with several processing units (control+arithmetic units). Various characteristics determine the types of computations: shared memory vs distributed memory, single-core processors vs multicore processors, data-centric parallelism vs task-centric parallelism. Historically, shared memory machines have been classified as UMA and NUMA, based upon memory access times. Hardware Uniform memory access (UMA) Identical processors, equal access and access times to memory. In the presence of cache memories, cache coherency is accomplished at the hardware level: if one processor updates a location in shared memory, then all the other processors know about the update. UMA architectures were first represented by Symmetric Multiprocessor (SMP) machines. Multicore processors follow the same architecture and, in addition, integrate the cores onto a single circuit die. Hardware Non-uniform memory access (NUMA) Often made by physically linking two or more SMPs (or multicore processors).
    [Show full text]
  • Introduction to Graphics Hardware and GPU's
    Blockseminar: Verteiltes Rechnen und Parallelprogrammierung Tim Conrad GPU Computer: CHINA.imp.fu-berlin.de GPU Intro Today • Introduction to GPGPUs • “Hands on” to get you started • Assignment 3 • Projects GPU Intro Traditional Computing Von Neumann architecture: instructions are sent from memory to the CPU Serial execution: Instructions are executed one after another on a single Central Processing Unit (CPU) Problems: • More expensive to produce • More expensive to run • Bus speed limitation GPU Intro Parallel Computing Official-sounding definition: The simultaneous use of multiple compute resources to solve a computational problem. Benefits: • Economical – requires less power and cheaper to produce • Better performance – bus/bottleneck issue Limitations: • New architecture – Von Neumann is all we know! • New debugging difficulties – cache consistency issue GPU Intro Flynn’s Taxonomy Classification of computer architectures, proposed by Michael J. Flynn • SISD – traditional serial architecture in computers. • SIMD – parallel computer. One instruction is executed many times with different data (think of a for loop indexing through an array) • MISD - Each processing unit operates on the data independently via independent instruction streams. Not really used in parallel • MIMD – Fully parallel and the most common form of parallel computing. GPU Intro Enter CUDA CUDA is NVIDIA’s general purpose parallel computing architecture • Designed for calculation-intensive computation on GPU hardware • CUDA is not a language, it is an API • We will
    [Show full text]