CS152 Computer Architecture and Engineering Lecture 21 Buses and I

Total Page:16

File Type:pdf, Size:1020Kb

CS152 Computer Architecture and Engineering Lecture 21 Buses and I Recap: Levels of the Memory Hierarchy Capacity Upper Level Access Time Staging CS152 Cost Xfer Unit faster Computer Architecture and Engineering CPU Registers 100s Bytes Registers Lecture 21 <10s ns Instr. Operands prog./compiler 1-8 bytes Buses and I/O Cache K Bytes Cache #1 10-100 ns $.01-.001/bit cache cntl Blocks 8-128 bytes Main Memory M Bytes Memory November 10, 1999 100ns-1us $.01-.001 OS John Kubiatowicz (http.cs.berkeley.edu/~kubitron) Pages 512-4K bytes Disk G Bytes ms Disk -3 -4 lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ 10 - 10 cents user/operator Files Mbytes Tape Larger infinite sec-min Tape Lower Level 10-6 CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.1 Lec21.2 Recap: What is virtual memory? Recap: Three Advantages of Virtual Memory ° Virtual memory => treat memory as a cache for the disk ° Translation: • Program can be given consistent view of memory, even ° Terminology: blocks in this cache are called “Pages” though physical memory is scrambled ° Typical size of a page: 1K — 8K • Makes multithreading reasonable (now used a lot!) ° Page table maps virtual page numbers to physical frames • Only the most important part of program (“Working Set”) Virtual Physical Virtual Address must be in physical memory. Address Space Address Space 10 • Contiguous structures (like stacks) use only as much V page no. offset physical memory as necessary yet still grow later. ° Protection: Page Table Page Table • Different threads (or processes) protected from each other. Base Reg Access • Different pages can be given special behavior index V Rights PA - (Read Only, Invisible to user programs, etc). into page • Kernel data protected from User programs table table located • Very important for protection from malicious programs in physical P page no. offset => Far more “viruses” under Microsoft Windows memory 10 ° Sharing: Physical Address • Can map same physical page to multiple users CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99(“Shared memory”) ©UCB Fall 1999 Lec21.3 Lec21.4 Recap: Making address translation practical: TLB Recap: TLB organization: include protection ° Translation Look-aside Buffer (TLB) is a cache of recent translations Virtual Address Physical Address Dirty Ref Valid Access ASID ° Speeds up translation process “most of the time” 0xFA00 0x0003 Y N Y R/W 34 ° TLB is typically a fully-associative lookup-table 0x0040 0x0010 N Y Y R 0 virtual address 0x0041 0x0011 N Y Y R 0 page off Virtual Physical Address Space Memory Space Page Table ° TLB usually organized as fully-associative cache 2 • Lookup is by Virtual Address 0 • Returns Physical Address + other info 1 ° Dirty => Page modified (Y/N)? 3 physical address Ref => Page touched (Y/N)? page off Valid => TLB entry valid (Y/N)? TLB Access => Read? Write? frame page 2 2 ASID => Which User? 0 5 CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.5 Lec21.6 Recap: MIPS R3000 pipelining of TLB Reducing Translation Time I: Overlapped Access MIPS R3000 Pipeline Virtual Address (For 4K pages) Inst Fetch Dcd/ Reg ALU / E.A Memory Write Reg 12 TLB I-Cache RF Operation WB V page no. offset E.A. TLB D-Cache TLB Lookup TLB 64 entry, on-chip, fully associative, software TLB fault handler Access V Rights PA Virtual Address Space ASID V. Page Number Offset P page no. offset 6 20 12 12 Physical Address 0xx User segment (caching based on PT/TLB entry) 100 Kernel physical space, cached 101 Kernel physical space, uncached ° Machines with TLBs overlap TLB lookup with cache 11x Kernel virtual space access. Allows context switching among • Works because lower bits of result (offset) available early 64 user processes without TLB flush CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.7 Lec21.8 Overlapped TLB & Cache Access Problems With Overlapped TLB Access ° Overlapped access only works as long as the address bits used to ° If we do this in parallel, we have to be careful, index into the cache do not change as the result of VA translation however: Example: suppose everything the same except that the cache is assoc increased to 8 K bytes instead of 4 K: lookup index 11 2 32 TLB 4K Cache 1 K cache index 00 This bit is changed 20 10 2 4 bytes by VA translation, but 20 12 page # disp is needed for cache 00 virt page # disp lookup Hit/ Miss ° Solutions: ⇒ FN = FN Data Hit/ Go to 8K byte page sizes; Miss ⇒ Go to 2 way set associative cache; or ⇒ SW guarantee VA[13]=PA[13] ° With this technique, size of cache can be up to same size as pages. 1K 2 way set assoc cache 10 ⇒ What if we want a larger cache??? 44 CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.9 Lec21.10 Reduced Translation Time II: Virtually Addressed Cache Survey VA PA ° R4000 Trans- Main CPU lation Memory • 32 bit virtual, 36 bit physical • variable page size (4KB to 16 MB) Cache • 48 entries mapping page pairs (128 bit) hit ° MPC601 (32 bit implementation of 64 bit PowerPC data arch) ° Only require address translation on cache miss! • 52 bit virtual, 32 bit physical, 16 segment registers 428 • Very fast as result (as fast as cache lookup) • 4KB page, 256MB segment • No restrictions on cache organization 24 • 4 entry instruction TLB ° Synonym problem: two different virtual addresses map to same physical address ⇒ two cache entries holding data for the same physical address! • 256 entry, 2-way TLB (and variable sized block xlate) ° Solutions: • overlapped lookup into 8-way 32KB L1 cache • Provide associative lookup on physical tags during cache miss to enforce • hardware table search through hashed page tables a single copy in the cache (potentially expensive) • Make operating system enforce one copy per cache set by selecting ° Alpha 21064 virtual⇒physical mappings carefully. This only works for direct mapped • arch is 64 bit virtual, implementation subset: 43, 47,51,55 bit caches. • 8,16,32, or 64KB pages (3 level page table) ° Virtually Addressed caches currently out of favor because of synonym • 12 entry ITLB, 32 entry DTLB complexities CS152 / Kubiatowicz • 43 bit virtual, 28 bit physical octword address CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.11 Lec21.12 Alpha VM Mapping Administrivia ° “64-bit” address divided ° Important: Lab 7. Design for Test into 3 segments • You should be testing from the very start of your design • seg0 (bit 63=0) user • Consider adding special monitor modules at various points code/heap in design => I have asked you to label trace output from • seg1 (bit 63 = 1, 62 = 1) these modules with the current clock cycle # user stack • The time to understand how components of your design • kseg (bit 63 = 1, 62 = 0) should work is while you are designing! kernel segment for OS ° Question: Oral reports on 12/6? ° 3 level page table, each • Proposal: 10 — 12 am and 2 — 4 pm one page ° Pending schedule: • Alpha only 43 unique bits of VA • Sunday 11/14: Review session 7:00 in 306 Soda • (future min page size up to • Monday 11/15: Guest lecture by Bob Broderson 64KB => 55 bits of VA) • Tuesday 11/16: Lab 7 breakdowns and Web description ° PTE bits; valid, kernel & • Wednesday 11/17: Midterm I user read & write enable • Monday 11/29: no class? Possibly (No reference, use, or • Monday 12/1 Last class (wrap up, evaluations, etc) dirty bit) • Monday 12/6: final project reports due after oral report CS152 / Kubiatowicz • Friday 12/10 grades should be posted. CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.13 Lec21.14 Administrivia II Computers in the News: Sony Playstation 2000 ° Major organizational options: • 2-way superscalar (18 points) • 2-way multithreading (20 points) • 2-way multiprocessor (18 points) • out-of-order execution (22 points) • Deep Pipelined (12 points) ° Test programs will include multiprocessor versions ° Both multiprocessor and multithreaded must implement synchronizing “Test and Set” instruction: • Normal load instruction, with special address range: - Addresses from 0xFFFFFFF0 to 0xFFFFFFFF - Only need to implement 16 synchronizing locations • Reads and returns old value of memory location at specified address, while setting the value to one (stall memory stage for one extra cycle). ° (as reported in Microprocessor Report, Vol 13, No. 5) • For multiprocessor, this instruction must make sure that all • Emotion Engine: 6.2 GFLOPS, 75 million polygons per second updates to this address are suspended during operation. • Graphics Synthesizer: 2.4 Billion pixels per second • For multithreaded, switch to other processor if value is already • Claim: Toy Story realism brought to games! CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99non-zero (like a cache miss). ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.15 Lec21.16 Playstation 2000 Continued What is a bus? A Bus Is: ° shared communication link ° single set of wires used to connect multiple subsystems Processor Input Control Memory Datapath Output ° Emotion Engine: ° Sample Vector Unit • Superscalar MIPS core • 2-wide VLIW ° A Bus is also a fundamental tool for composing • Vector Coprocessor • Includes Microcode Memory large, complex systems Pipelines • High-level instructions like • systematic means of abstraction • RAMBUS DRAM interface matrix-multiply CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.17 Lec21.18 Buses Advantages of Buses I/O I/O I/O Processer Device Device Device Memory ° Versatility: • New devices can
Recommended publications
  • MIPS IV Instruction Set
    MIPS IV Instruction Set Revision 3.2 September, 1995 Charles Price MIPS Technologies, Inc. All Right Reserved RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and / or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor / manufacturer is MIPS Technologies, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94039-7311. R2000, R3000, R6000, R4000, R4400, R4200, R8000, R4300 and R10000 are trademarks of MIPS Technologies, Inc. MIPS and R3000 are registered trademarks of MIPS Technologies, Inc. The information in this document is preliminary and subject to change without notice. MIPS Technologies, Inc. (MTI) reserves the right to change any portion of the product described herein to improve function or design. MTI does not assume liability arising out of the application or use of any product or circuit described herein. Information on MIPS products is available electronically: (a) Through the World Wide Web. Point your WWW client to: http://www.mips.com (b) Through ftp from the internet site “sgigate.sgi.com”. Login as “ftp” or “anonymous” and then cd to the directory “pub/doc”. (c) Through an automated FAX service: Inside the USA toll free: (800) 446-6477 (800-IGO-MIPS) Outside the USA: (415) 688-4321 (call from a FAX machine) MIPS Technologies, Inc.
    [Show full text]
  • Average Memory Access Time: Reducing Misses
    Review: Cache performance 332 Miss-oriented Approach to Memory Access: Advanced Computer Architecture ⎛ MemAccess ⎞ CPUtime = IC × ⎜ CPI + × MissRate × MissPenalt y ⎟ × CycleTime Chapter 2 ⎝ Execution Inst ⎠ ⎛ MemMisses ⎞ CPUtime = IC × ⎜ CPI + × MissPenalt y ⎟ × CycleTime Caches and Memory Systems ⎝ Execution Inst ⎠ CPIExecution includes ALU and Memory instructions January 2007 Separating out Memory component entirely Paul H J Kelly AMAT = Average Memory Access Time CPIALUOps does not include memory instructions ⎛ AluOps MemAccess ⎞ These lecture notes are partly based on the course text, Hennessy CPUtime = IC × ⎜ × CPI + × AMAT ⎟ × CycleTime and Patterson’s Computer Architecture, a quantitative approach (3rd ⎝ Inst AluOps Inst ⎠ and 4th eds), and on the lecture slides of David Patterson and John AMAT = HitTime + MissRate × MissPenalt y Kubiatowicz’s Berkeley course = ()HitTime Inst + MissRate Inst × MissPenalt y Inst + ()HitTime Data + MissRate Data × MissPenalt y Data Advanced Computer Architecture Chapter 2.1 Advanced Computer Architecture Chapter 2.2 Average memory access time: Reducing Misses Classifying Misses: 3 Cs AMAT = HitTime + MissRate × MissPenalt y Compulsory—The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses. There are three ways to improve cache (Misses in even an Infinite Cache) Capacity—If the cache cannot contain all the blocks needed during performance: execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Fully Associative Size X Cache) 1. Reduce the miss rate, Conflict—If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will 2.
    [Show full text]
  • MIPS Architecture • MIPS (Microprocessor Without Interlocked Pipeline Stages) • MIPS Computer Systems Inc
    Spring 2011 Prof. Hyesoon Kim MIPS Architecture • MIPS (Microprocessor without interlocked pipeline stages) • MIPS Computer Systems Inc. • Developed from Stanford • MIPS architecture usages • 1990’s – R2000, R3000, R4000, Motorola 68000 family • Playstation, Playstation 2, Sony PSP handheld, Nintendo 64 console • Android • Shift to SOC http://en.wikipedia.org/wiki/MIPS_architecture • MIPS R4000 CPU core • Floating point and vector floating point co-processors • 3D-CG extended instruction sets • Graphics – 3D curved surface and other 3D functionality – Hardware clipping, compressed texture handling • R4300 (embedded version) – Nintendo-64 http://www.digitaltrends.com/gaming/sony- announces-playstation-portable-specs/ Not Yet out • Google TV: an Android-based software service that lets users switch between their TV content and Web applications such as Netflix and Amazon Video on Demand • GoogleTV : search capabilities. • High stream data? • Internet accesses? • Multi-threading, SMP design • High graphics processors • Several CODEC – Hardware vs. Software • Displaying frame buffer e.g) 1080p resolution: 1920 (H) x 1080 (V) color depth: 4 bytes/pixel 4*1920*1080 ~= 8.3MB 8.3MB * 60Hz=498MB/sec • Started from 32-bit • Later 64-bit • microMIPS: 16-bit compression version (similar to ARM thumb) • SIMD additions-64 bit floating points • User Defined Instructions (UDIs) coprocessors • All self-modified code • Allow unaligned accesses http://www.spiritus-temporis.com/mips-architecture/ • 32 64-bit general purpose registers (GPRs) • A pair of special-purpose registers to hold the results of integer multiply, divide, and multiply-accumulate operations (HI and LO) – HI—Multiply and Divide register higher result – LO—Multiply and Divide register lower result • a special-purpose program counter (PC), • A MIPS64 processor always produces a 64-bit result • 32 floating point registers (FPRs).
    [Show full text]
  • IDT79R4600 and IDT79R4700 RISC Processor Hardware User's Manual
    IDT79R4600™ and IDT79R4700™ RISC Processor Hardware User’s Manual Revision 2.0 April 1995 Integrated Device Technology, Inc. Integrated Device Technology, Inc. reserves the right to make changes to its products or specifications at any time, without notice, in order to improve design or performance and to supply the best possible product. IDT does not assume any respon- sibility for use of any circuitry described other than the circuitry embodied in an IDT product. ITD makes no representations that circuitry described herein is free from patent infringement or other rights of third parties which may result from its use. No license is granted by implication or otherwise under any patent, patent rights, or other rights of Integrated Device Technology, Inc. LIFE SUPPORT POLICY Integrated Device Technology’s products are not authorized for use as critical components in life sup- port devices or systems unless a specific written agreement pertaining to such intended use is executed between the manufacturer and an officer of IDT. 1. Life support devices or systems are devices or systems that (a) are intended for surgical implant into the body, or (b) support or sustain life, and whose failure to perform, when properly used in accor- dance with instructions for use provided in the labeling, can be reasonably expected to result in a sig- nificant injury to the user. 2. A critical component is any component of a life support device or system whose failure to perform can be reasonably expected to cause the failure of the life support device or system, or to affect its safety or effectiveness.
    [Show full text]
  • Mips 16 Bit Instruction Set
    Mips 16 bit instruction set Continue Instruction set architecture MIPSDesignerMIPS Technologies, Imagination TechnologiesBits64-bit (32 → 64)Introduced1985; 35 years ago (1985)VersionMIPS32/64 Issue 6 (2014)DesignRISCTypeRegister-RegisterEncodingFixedBranchingCompare and branchEndiannessBiPage size4 KBExtensionsMDMX, MIPS-3DOpenPartly. The R12000 has been on the market for more than 20 years and therefore cannot be subject to patent claims. Thus, the R12000 and old processors are completely open. RegistersGeneral Target32Floating Point32 MIPS (Microprocessor without interconnected pipeline stages) is a reduced setting of the Computer Set (RISC) Instruction Set Architecture (ISA):A-3:19, developed by MIPS Computer Systems, currently based in the United States. There are several versions of MIPS: including MIPS I, II, III, IV and V; and five MIPS32/64 releases (for 32- and 64-bit sales, respectively). The early MIPS architectures were only 32-bit; The 64-bit versions were developed later. As of April 2017, the current version of MIPS is MIPS32/64 Release 6. MiPS32/64 differs primarily from MIPS I-V, defining the system Control Coprocessor kernel preferred mode in addition to the user mode architecture. The MIPS architecture has several additional extensions. MIPS-3D, which is a simple set of floating-point SIMD instructions dedicated to common 3D tasks, MDMX (MaDMaX), which is a more extensive set of SIMD instructions using 64-bit floating current registers, MIPS16e, which adds compression to flow instructions to make programs that take up less space, and MIPS MT, which adds layered potential. Computer architecture courses in universities and technical schools often study MIPS architecture. Architecture has had a major impact on later RISC architectures such as Alpha.
    [Show full text]
  • A Programming Model and Processor Architecture for Heterogeneous Multicore Computers
    A PROGRAMMING MODEL AND PROCESSOR ARCHITECTURE FOR HETEROGENEOUS MULTICORE COMPUTERS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Michael D. Linderman February 2009 c Copyright by Michael D. Linderman 2009 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Teresa H. Meng) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Mark Horowitz) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Krishna V. Shenoy) Approved for the University Committee on Graduate Studies. iii Abstract Heterogeneous multicore computers, those systems that integrate specialized accelerators into and alongside multicore general-purpose processors (GPPs), provide the scalable performance needed by computationally demanding information processing (informatics) applications. However, these systems often feature instruction sets and functionality that significantly differ from GPPs and for which there is often little or no sophisticated compiler support. Consequently developing applica- tions for these systems is difficult and developer productivity is low. This thesis presents Merge, a general-purpose programming model for heterogeneous multicore systems. The Merge programming model enables the programmer to leverage different processor- specific or application domain-specific toolchains to create software modules specialized for differ- ent hardware configurations; and provides language mechanisms to enable the automatic mapping of processor-agnostic applications to these processor-specific modules.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Production Cluster Visualization: Experiences and Challenges
    Workshop on Parallel Visualization and Graphics Production Cluster Visualization: Experiences and Challenges Randall Frank VIEWS Visualization Project Lead Lawrence Livermore National Laboratory UCRL-PRES-200237 Workshop on Parallel Visualization and Graphics Why COTS Distributed Visualization Clusters? The realities of extreme dataset sizes (10TB+) • Stored with the compute platform • Cannot afford to copy the data • Co-resident visualization Track compute platform trends • Distributed infrastructure • Commodity hardware trends • Cost-effective solutions Migration of graphics leadership • The PC (Gamers) Desktops • Display technologies • HDR, resolution, stereo, tiled, etc Workshop on Parallel Visualization and Graphics 2 Production Visualization Requirements Unique, aggresive I/O requirements • Access patterns/performance Generation of graphical primitives • Graphics computation: primitive extraction/computation • Dataset decomposition (e.g. slabs vs chunks) Rendering of primitives • Aggregation of multiple rendering engines Video displays • Routing of digital or video tiles to displays (over distance) Interactivity (not a render-farm!) • Real-time imagery • Interaction devices, human in the loop (latency, prediction issues) Scheduling • Systems and people Workshop on Parallel Visualization and Graphics 3 Visualization Environment Architecture Archive PowerWall Analog Simulation Video Data Switch GigE Switch Visualization Engine Data Manipulation Engine Offices • Raw data on platform disks/archive systems • Data manipulation
    [Show full text]
  • Introduction to Playstation®2 Architecture
    Introduction to PlayStation®2 Architecture James Russell Software Engineer SCEE Technology Group In this presentation ä Company overview ä PlayStation 2 architecture overview ä PS2 Game Development ä Differences between PS2 and PC. Technology Group 1) Sony Computer Entertainment Overview SCE Europe (includes Aus, NZ, Mid East, America Technology Group Japan Southern Africa) Sales ä 40 million sold world-wide since launch ä Since March 2000 in Japan ä Since Nov 2000 in Europe/US ä New markets: Middle East, India, Korea, China ä Long term aim: 100 million within 5 years of launch ä Production facilities can produce 2M/month. Technology Group Design considerations ä Over 5 years, we’ll make 100,000,000 PS2s ä Design is very important ä Must be inexpensive (or should become that way) ä Technology must be ahead of the curve ä Need high performance, low price. Technology Group How to achieve this? ä Processor yield ä High CPU clock speed means lower yields ä Solution? ä Low CPU clock speed, but high parallelism ä Nothing readily available ä SCE designs custom chips. Technology Group 2) Technical Aspects of PlayStation 2 ä 128-bit CPU core “Emotion Engine” ä + 2 independent Vector Units ä + Image Processing Unit (for MPEG) ä GS - “Graphics Synthesizer” GPU ä SPU2 - Sound Processing Unit ä I/O Processor (CD/DVD, USB, i.Link). Technology Group “Emotion Engine” - Specifications ä CPU Core 128 bit CPU ä System Clock 300MHz ä Bus Bandwidth 3.2GB/sec ä Main Memory 32MB (Direct Rambus) ä Floating Point Calculation 6.2 GFLOPS ä 3D Geometry Performance 66 Million polygons/sec.
    [Show full text]
  • Sony's Emotionally Charged Chip
    VOLUME 13, NUMBER 5 APRIL 19, 1999 MICROPROCESSOR REPORT THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE Sony’s Emotionally Charged Chip Killer Floating-Point “Emotion Engine” To Power PlayStation 2000 by Keith Diefendorff rate of two million units per month, making it the most suc- cessful single product (in units) Sony has ever built. While Intel and the PC industry stumble around in Although SCE has cornered more than 60% of the search of some need for the processing power they already $6 billion game-console market, it was beginning to feel the have, Sony has been busy trying to figure out how to get more heat from Sega’s Dreamcast (see MPR 6/1/98, p. 8), which has of it—lots more. The company has apparently succeeded: at sold over a million units since its debut last November. With the recent International Solid-State Circuits Conference (see a 200-MHz Hitachi SH-4 and NEC’s PowerVR graphics chip, MPR 4/19/99, p. 20), Sony Computer Entertainment (SCE) Dreamcast delivers 3 to 10 times as many 3D polygons as and Toshiba described a multimedia processor that will be the PlayStation’s 34-MHz MIPS processor (see MPR 7/11/94, heart of the next-generation PlayStation, which—lacking an p. 9). To maintain king-of-the-mountain status, SCE had to official name—we refer to as PlayStation 2000, or PSX2. do something spectacular. And it has: the PSX2 will deliver Called the Emotion Engine (EE), the new chip upsets more than 10 times the polygon throughput of Dreamcast, the traditional notion of a game processor.
    [Show full text]
  • Single-Cycle Processors: Datapath & Control
    1 Single-Cycle Processors: Datapath & Control Arvind Computer Science & Artificial Intelligence Lab M.I.T. Based on the material prepared by Arvind and Krste Asanovic 6.823 L5- 2 Instruction Set Architecture (ISA) Arvind versus Implementation • ISA is the hardware/software interface – Defines set of programmer visible state – Defines instruction format (bit encoding) and instruction semantics –Examples:MIPS, x86, IBM 360, JVM • Many possible implementations of one ISA – 360 implementations: model 30 (c. 1964), z900 (c. 2001) –x86 implementations:8086 (c. 1978), 80186, 286, 386, 486, Pentium, Pentium Pro, Pentium-4 (c. 2000), AMD Athlon, Transmeta Crusoe, SoftPC – MIPS implementations: R2000, R4000, R10000, ... –JVM:HotSpot, PicoJava, ARM Jazelle, ... September 26, 2005 6.823 L5- 3 Arvind Processor Performance Time = Instructions Cycles Time Program Program * Instruction * Cycle – Instructions per program depends on source code, compiler technology, and ISA – Cycles per instructions (CPI) depends upon the ISA and the microarchitecture – Time per cycle depends upon the microarchitecture and the base technology Microarchitecture CPI cycle time Microcoded >1 short this lecture Single-cycle unpipelined 1 long Pipelined 1 short September 26, 2005 6.823 L5- 4 Arvind Microarchitecture: Implementation of an ISA Controller control status points lines Data path Structure: How components are connected. Static Behavior: How data moves between components Dynamic September 26, 2005 Hardware Elements • Combinational circuits OpSelect – Mux, Demux, Decoder, ALU, ... - Add, Sub, ... - And, Or, Xor, Not, ... Sel - GT, LT, EQ, Zero, ... Sel lg(n) lg(n) O O0 A A0 0 O O1 A O 1 A Result 1 . A . ALU Mux . lg(n) . Comp? Demux Decoder O B An-1 On n-1 1 • Synchronous state elements – Flipflop, Register, Register file, SRAM, DRAM D register Clk ..
    [Show full text]
  • Ilore: Discovering a Lineage of Microprocessors
    iLORE: Discovering a Lineage of Microprocessors Samuel Lewis Furman Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science & Applications Kirk Cameron, Chair Godmar Back Margaret Ellis May 24, 2021 Blacksburg, Virginia Keywords: Computer history, systems, computer architecture, microprocessors Copyright 2021, Samuel Lewis Furman iLORE: Discovering a Lineage of Microprocessors Samuel Lewis Furman (ABSTRACT) Researchers, benchmarking organizations, and hardware manufacturers maintain repositories of computer component and performance information. However, this data is split across many isolated sources and is stored in a form that is not conducive to analysis. A centralized repository of said data would arm stakeholders across industry and academia with a tool to more quantitatively understand the history of computing. We propose iLORE, a data model designed to represent intricate relationships between computer system benchmarks and computer components. We detail the methods we used to implement and populate the iLORE data model using data harvested from publicly available sources. Finally, we demonstrate the validity and utility of our iLORE implementation through an analysis of the characteristics and lineage of commercial microprocessors. We encourage the research community to interact with our data and visualizations at csgenome.org. iLORE: Discovering a Lineage of Microprocessors Samuel Lewis Furman (GENERAL AUDIENCE ABSTRACT) Researchers, benchmarking organizations, and hardware manufacturers maintain repositories of computer component and performance information. However, this data is split across many isolated sources and is stored in a form that is not conducive to analysis. A centralized repository of said data would arm stakeholders across industry and academia with a tool to more quantitatively understand the history of computing.
    [Show full text]