Teaching Computer Architecture/Organisation Using Simulators

Total Page:16

File Type:pdf, Size:1020Kb

Teaching Computer Architecture/Organisation Using Simulators 7HDFKLQJ&RPSXWHU$UFKLWHFWXUH2UJDQLVDWLRQXVLQJVLPXODWRUV Herbert Grünbacher Vienna University of Technology Treitlstrasse 3/182-2, A-1040 Vienna / Austria E-mail [email protected] Abstract ,QWURGXFWLRQ ([SHULHQFH VKRZV WKDW PDQ\ VWXGHQWV HVSHFLDOO\ Teaching the dynamics of pipelines and caches is WKRVH ZLWK OLWWOH KDUGZDUH EDFNJURXQG HQFRXQWHU rather difficult if done on a paper and pencil basis. In GLIILFXOWLHVLQXQGHUVWDQGLQJWKHFRQVHTXHQFHVDQGHYHQ our experience students find it difficult to understand FRQFHSWV RI FRQYHQWLRQDO LQVWUXFWLRQ SLSHOLQLQJ the principles and complications of pipelines and to a VXSHUVFDODU LQVWUXFWLRQ SURFHVVLQJ LV HYHQ PRUH lesser extend of caches. To support teaching and give FRPSOLFDWHGDQGKDUGHUWRXQGHUVWDQG,WLVSDUWLFXODUO\ students an environment to experiment, we developed GLIILFXOW WR VWDWLFDOO\ WHDFK WKH FRQFHSW RI D SLSHOLQH several pipeline simulators and a cache simulator. 7KHUHIRUH ZH GHYHORSHG VRIWZDUH WR VLPXODWH DQG My experience is that students appreciate using G\QDPLFDOO\YLVXDOL]HWKHSURFHVVLQJRILQVWUXFWLRQVE\ simulators and by using them get easily introduced to SLSHOLQHG VXSHUVFDODU SURFHVVRUV 7KUHH VLPXODWRUV the subject. Based on the knowledge gained from using KDYHEHHQGHYHORSHG the simulators they are motivated to further study the • :LQ'/; LV EDVHG RQ +HQQHVV\3DWWHUVRQV '/; subject using books. DUFKLWHFWXUH DQG LV PRGHOHG DW WKH DUFKLWHFWXUH Almost all of our students have their private PCs and OHYHO WKHUHIRUH YHU\ OLWWOH SURFHVVRULQWHUQDO most of them run Windows95/NT. This was the main LQIRUPDWLRQLVJLYHQ reason why we develped the simulators to run under MS • 0,36LP LV EDVHG RQ 3DWWHUVRQ+HQQHVV\ V 0,36 Windows. It turned out that students particuarly like to SURFHVVRU ERRN DQG LV PRGHOHG DW WKH FRPSXWHU work at home and they are usually well prepared to ask RUJDQL]DWLRQ OHYHO IXQFWLRQDO XQLWV OLNH UHJLVWHU questions in class. ILOHSLSHOLQHUHJLVWHUVPXOWLSOH[HUVDUHYLVLEOHDQG 0,36LPGLVSOD\VFRQWHQWDQGG\QDPLFEHKDYLRURI :LQ'/; VXFKXQLWV • 0N6LP LV EDVHG RQ WKH 0,36 5 WinDLX is a MS-Windows (16 bit) based pipeline DUFKLWHFWXUHDQGPRGHOVWKHLQVWUXFWLRQGHFRGHDQG simulator for the DLX processor as described in [1]. GLVSDWFK XQLW WKH EUDQFK XQLW WKH LQVWUXFWLRQ DLX is modeled at the architecture level, very little TXHXHV DQG WKH IXQFWLRQDO XQLWV DGGUHVV about the underlying computer organization is know at FDOFXODWLRQ ERWK $/8V IORDWLQJSRLQW DGGHU that level. IORDWLQJSRLQW PXOWLSO\GLYLGHVTXDUHURRW XQLW After loading a symbolic DLX assembler code, most &RQFHSWV OLNH register renaming, EUDQFK KLVWRU\ of the information relevant to the CPU (pipeline, WDEOHEUDQFKUHVXPHEXIIHURXWRIRUGHUH[HFXWLRQ registers, I/O, memory, …) can be viewed and modified FDQEHH[SODLQHGHDVLO\XVLQJWKHVLPXODWRU while executing the code step-by-step or continuously. 7HDFKLQJ FDFKH RUJDQL]DWLRQ LV DQ HDVLHU WDVN WinDLX offers statistics about pipeline behavior in QHYHUWKHOHVV YLVXDOLVLQJ FDFKH DFWLYLWLHV KHOSV time. XQGHUVWDQGLQJ WKH G\QDPLFV RI D FDFKH PHPRU\ WinDLX works with several configurations: ;FDFKH LV D VLPXODWRU ZKLFK GLVSOD\V WKH LQWHUDFWLRQV Structure (number of floating point functional units) and EHWZHHQLQVWUXFWLRQPHPRU\DQGLQVWUXFWLRQFDFKHGDWD latency of the floating point can be changed. PHPRU\DQGGDWDFDFKHUHVSHFWLYHO\ Forwarding can be enabled/disabled and memory size can be modified. There is extensive online help 7KHVLPXODWRUDUHDYDLODEOHIRUIUHHGRZQORDGLQJIURP available to explain the simulator and the internals of KWWSZZZYOVLYLHWXZLHQDFDW&RPS$UFK DLX. "Register", "Code", "Pipeline", "Clock Cycle Diagram", "Statistics" and "Breakpoints" windows show internals of the pipeline. Further explanation is given below. )LJXUH0DLQ:LQGRZZLWKRSHQ&RGH:LQGRZ &RGH:LQGRZ &ORFN&\FOH'LDJUDP:LQGRZ The code window displays a three column Figure 2 - the cycle diagram window - shows the representation of the memory: address (symbolic or in timing behavior of the pipeline. The simulation shown hex), the machine code in hex and the assembler is in the 4th cycle, the first command is in the MEM command. Figure 1 shows the main simulation stage, the second in intEX and the fourth in IF. The window with a code segment in the open Code third command, however, is denoted as "aborted". Window. Color coding in the different simulation This is because the second command, jal, is an windows is consistent, e.g. WB (Write Back) is unconditional branch. This is known after the 3rd colored in blue. Double-clicking on instructions in cycle, when jal has been decoded. During this cycle any of the simulation windows displays pipeline status the command movi2fp (following after jal) has information in text form giving details about internal already been fetched, but the next executed command registers, operations, stalling and forwarding status. will be at another address. Therefore the execution of movi2fp must be aborted, leaving a "bubble" in the 3LSHOLQH:LQGRZ pipeline. The branch address of jal is named The pipeline window shows the inner structure of "InputUnsigned". By clicking Memory/Symbols in the DLX processor - the five pipeline stages of the the main window, the correspondence between the DLX processor and the floating point units (addition / used symbols and the actual addresses is shown. subtraction, multiplication and division). )LJXUH&ORFN&\FOH'LDJUDP %UHDNSRLQW5HJLVWHUDQG6WDWLVWLFV:LQGRZ &RQWURO'DWD)ORZ6LJQDOV Setting breakpoints stops the simulation at user After executing the program code data path and defined points. control signal can be displayed by clicking on them. The register window shows all registers, not just The instruction content of the different pipeline stages the register file, and their content in hex. is displayed on top of each stage. This statistics window provides information about Extensive help as well as a introductory tutorial is general aspects (e.g. number of simulation cycles), the available online. hardware configuration used in the simulation, stalls and their causes, conditional branches, load-/store- 0.6LP instructions, floating point stage instructions and traps. Usually, absolute count of events and The R10000 is a dynamic superscalar percentage are given, e.g. "RAW stalls: 17 (7.91 % of microprocessor which implements the 64-bit Mips all cycles)". Instruction Set Architecture [3], [4]. It fetches and The statistics window is very useful to compare decodes four instructions per cycle and dynamically the effects of changes in the pipeline configuration. issues them to five fully-pipelined low-latency execution units. Instructions can be fetched and 0,36LP executed speculatively beyond branches. Instructions graduate in order upon completion. Although MIPSim is a pipeline simulator for the MIPS execution is aggressively out-of-order, the processor processor as described in [2]. MIPS is modeled at the still provides sequential memory consistency and computer organization level. Functional units like precise exception handling. register files, pipeline registers, ALU, multiplexers, data and control flow are visible. 0RGHORIWKH5 The user can write small programs (currently there is only a subset of the MIPS instruction set Our R1000k model concentrates on the most implemented) and watch the pipeline doing its work, important issues of a superscalar architecture and we modify the program and the content of data memory wanted to have an easy to learn not to complex user- and register file ‘on the fly’ and go on simulating to interface. The following parts of the processor are see the effects. modelled: At present MIPSim models a rather simple ,QVWUXFWLRQGHFRGHDQGGLVSDWFKXQLW, responsible pipeline without hazard detection and forwarding for instruction fetching, instruction decoding, register units. renaming and finally dispatching the instruction to the appropriate queues. The dispatcher works together $VVHPEOHU3URJUDP,QVWUXFWLRQ0HPRU\&RQWHQW with the EUDQFKXQLW when predicting the outcome of conditional branches. During this process they need to In the very left window in Figure 3 the program access the EUDQFK KLVWRU\ WDEOH and the EUDQFK code is shown. The program can be executed in single UHVXPHEXIIHU, which therefore are also simulated. As step or running mode. By setting the pointer (in soon as instructions are being dispatched to the essence the program counter) to a particular address, queues they are also given an entry in the DFWLYHOLVW, manual jumps in the program can be accomplished. which also is part of our simulation. By double clicking on the Instr. box a window opens All of the R10000's LQVWUXFWLRQ TXHXHV, namely an in which modifications of the instruction memory address queue, an integer queue and a floating-point content (the program) can be done. queue are included in the simulation. To be able to determine, which operand results are ready, they 'DWD0HPRU\&RQWHQW access the also simulated EXV\WDEOH. The remaining parts of the simulation are the five By double clicking on the Data box a window opens. IXQFWLRQDO H[HFXWLRQ XQLWV, the address calculation Modifications (overwriting) of the data memory unit, both ALUs, the floating-point adder unit and the content can be done interactively. floating-point multiply/divide/square-root unit. Modifying the content of instruction/data memory is Data is read from and written to PHPRU\, which can very valuable for experimenting with the pipeline, e.g. be viewed and modified during the simulation. to show data hazards. The memory is simplified and it is assumed to be accessible without any delay. Exception handling is not implemented.
Recommended publications
  • Mipspro C++ Programmer's Guide
    MIPSproTM C++ Programmer’s Guide 007–0704–150 CONTRIBUTORS Rewritten in 2002 by Jean Wilson with engineering support from John Wilkinson and editing support from Susan Wilkening. COPYRIGHT Copyright © 1995, 1999, 2002 - 2003 Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of Silicon Graphics, Inc. LIMITED RIGHTS LEGEND The electronic (software) version of this document was developed at private expense; if acquired under an agreement with the USA government or any contractor thereto, it is acquired as "commercial computer software" subject to the provisions of its applicable license agreement, as specified in (a) 48 CFR 12.212 of the FAR; or, if acquired for Department of Defense units, (b) 48 CFR 227-7202 of the DoD FAR Supplement; or sections succeeding thereto. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy 2E, Mountain View, CA 94043-1351. TRADEMARKS AND ATTRIBUTIONS Silicon Graphics, SGI, the SGI logo, IRIX, O2, Octane, and Origin are registered trademarks and OpenMP and ProDev are trademarks of Silicon Graphics, Inc. in the United States and/or other countries worldwide. MIPS, MIPS I, MIPS II, MIPS III, MIPS IV, R2000, R3000, R4000, R4400, R4600, R5000, and R8000 are registered or unregistered trademarks and MIPSpro, R10000, R12000, R1400 are trademarks of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. Portions of this publication may have been derived from the OpenMP Language Application Program Interface Specification.
    [Show full text]
  • Microprocessors History of Computing Nouf Assaid
    MICROPROCESSORS HISTORY OF COMPUTING NOUF ASSAID 1 Table of Contents Introduction 2 Brief History 2 Microprocessors 7 Instruction Set Architectures 8 Von Neumann Machine 9 Microprocessor Design 12 Superscalar 13 RISC 16 CISC 20 VLIW 23 Multiprocessor 24 Future Trends in Microprocessor Design 25 2 Introduction If we take a look around us, we would be sure to find a device that uses a microprocessor in some form or the other. Microprocessors have become a part of our daily lives and it would be difficult to imagine life without them today. From digital wrist watches, to pocket calculators, from microwaves, to cars, toys, security systems, navigation, to credit cards, microprocessors are ubiquitous. All this has been made possible by remarkable developments in semiconductor technology enabling in the last 30 years, enabling the implementation of ideas that were previously beyond the average computer architect’s grasp. In this paper, we discuss the various microprocessor technologies, starting with a brief history of computing. This is followed by an in-depth look at processor architecture, design philosophies, current design trends, RISC processors and CISC processors. Finally we discuss trends and directions in microprocessor design. Brief Historical Overview Mechanical Computers A French engineer by the name of Blaise Pascal built the first working mechanical computer. This device was made completely from gears and was operated using hand cranks. This machine was capable of simple addition and subtraction, but a few years later, a German mathematician by the name of Leibniz made a similar machine that could multiply and divide as well. After about 150 years, a mathematician at Cambridge, Charles Babbage made his Difference Engine.
    [Show full text]
  • MIPS IV Instruction Set
    MIPS IV Instruction Set Revision 3.2 September, 1995 Charles Price MIPS Technologies, Inc. All Right Reserved RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and / or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor / manufacturer is MIPS Technologies, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94039-7311. R2000, R3000, R6000, R4000, R4400, R4200, R8000, R4300 and R10000 are trademarks of MIPS Technologies, Inc. MIPS and R3000 are registered trademarks of MIPS Technologies, Inc. The information in this document is preliminary and subject to change without notice. MIPS Technologies, Inc. (MTI) reserves the right to change any portion of the product described herein to improve function or design. MTI does not assume liability arising out of the application or use of any product or circuit described herein. Information on MIPS products is available electronically: (a) Through the World Wide Web. Point your WWW client to: http://www.mips.com (b) Through ftp from the internet site “sgigate.sgi.com”. Login as “ftp” or “anonymous” and then cd to the directory “pub/doc”. (c) Through an automated FAX service: Inside the USA toll free: (800) 446-6477 (800-IGO-MIPS) Outside the USA: (415) 688-4321 (call from a FAX machine) MIPS Technologies, Inc.
    [Show full text]
  • On the Efficacy of Source Code Optimizations for Cache-Based Systems
    On the efficacy of source code optimizations for cache-based systems Rob F. Van der Wijngaart, MRJ Technology Solutions, NASA Ames Research Center, Moffett Field, CA 94035 William C. Saphir, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 Abstract. Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize tem- poral and spatial locality of memory references by reusing data. and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive im- pact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates--as reported by a cache simulation tool, and con- firmed by hardware counters--only partially explain the results. By contrast, the compiler- generated assembly code provides more insight by revealing the importance of processor- specific instructions and of compiler maturity, both of which strongly, and sometimes unex- pectedly, influence performance.
    [Show full text]
  • C++ Programmer's Guide
    C++ Programmer’s Guide Document Number 007–0704–130 St. Peter’s Basilica image courtesy of ENEL SpA and InfoByte SpA. Disk Thrower image courtesy of Xavier Berenguer, Animatica. Copyright © 1995, 1999 Silicon Graphics, Inc. All Rights Reserved. This document or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Silicon Graphics, Inc. LIMITED AND RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the Government is subject to restrictions as set forth in the Rights in Data clause at FAR 52.227-14 and/or in similar or successor clauses in the FAR, or in the DOD, DOE or NASA FAR Supplements. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy., Mountain View, CA 94043-1351. Autotasking, CF77, CRAY, Cray Ada, CraySoft, CRAY Y-MP, CRAY-1, CRInform, CRI/TurboKiva, HSX, LibSci, MPP Apprentice, SSD, SUPERCLUSTER, UNICOS, X-MP EA, and UNICOS/mk are federally registered trademarks and Because no workstation is an island, CCI, CCMT, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Animation Theater, CRAY APP, CRAY C90, CRAY C90D, Cray C++ Compiling System, CrayDoc, CRAY EL, CRAY J90, CRAY J90se, CrayLink, Cray NQS, Cray/REELlibrarian, CRAY S-MP, CRAY SSD-T90, CRAY SV1, CRAY T90, CRAY T3D, CRAY T3E, CrayTutor, CRAY X-MP, CRAY XMS, CRAY-2, CSIM, CVT, Delivering the power . ., DGauss, Docview, EMDS, GigaRing, HEXAR, IOS, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RQS, SEGLDR, SMARTE, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, and UNICOS MAX are trademarks of Cray Research, Inc., a wholly owned subsidiary of Silicon Graphics, Inc.
    [Show full text]
  • Design of the RISC-V Instruction Set Architecture
    Design of the RISC-V Instruction Set Architecture Andrew Waterman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.html January 3, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Patterson, Chair Professor Krste Asanovi´c Associate Professor Per-Olof Persson Spring 2016 Design of the RISC-V Instruction Set Architecture Copyright 2016 by Andrew Shell Waterman 1 Abstract Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Patterson, Chair The hardware-software interface, embodied in the instruction set architecture (ISA), is arguably the most important interface in a computer system. Yet, in contrast to nearly all other interfaces in a modern computer system, all commercially popular ISAs are proprietary.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Sony's Emotionally Charged Chip
    VOLUME 13, NUMBER 5 APRIL 19, 1999 MICROPROCESSOR REPORT THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE Sony’s Emotionally Charged Chip Killer Floating-Point “Emotion Engine” To Power PlayStation 2000 by Keith Diefendorff rate of two million units per month, making it the most suc- cessful single product (in units) Sony has ever built. While Intel and the PC industry stumble around in Although SCE has cornered more than 60% of the search of some need for the processing power they already $6 billion game-console market, it was beginning to feel the have, Sony has been busy trying to figure out how to get more heat from Sega’s Dreamcast (see MPR 6/1/98, p. 8), which has of it—lots more. The company has apparently succeeded: at sold over a million units since its debut last November. With the recent International Solid-State Circuits Conference (see a 200-MHz Hitachi SH-4 and NEC’s PowerVR graphics chip, MPR 4/19/99, p. 20), Sony Computer Entertainment (SCE) Dreamcast delivers 3 to 10 times as many 3D polygons as and Toshiba described a multimedia processor that will be the PlayStation’s 34-MHz MIPS processor (see MPR 7/11/94, heart of the next-generation PlayStation, which—lacking an p. 9). To maintain king-of-the-mountain status, SCE had to official name—we refer to as PlayStation 2000, or PSX2. do something spectacular. And it has: the PSX2 will deliver Called the Emotion Engine (EE), the new chip upsets more than 10 times the polygon throughput of Dreamcast, the traditional notion of a game processor.
    [Show full text]
  • Single-Cycle Processors: Datapath & Control
    1 Single-Cycle Processors: Datapath & Control Arvind Computer Science & Artificial Intelligence Lab M.I.T. Based on the material prepared by Arvind and Krste Asanovic 6.823 L5- 2 Instruction Set Architecture (ISA) Arvind versus Implementation • ISA is the hardware/software interface – Defines set of programmer visible state – Defines instruction format (bit encoding) and instruction semantics –Examples:MIPS, x86, IBM 360, JVM • Many possible implementations of one ISA – 360 implementations: model 30 (c. 1964), z900 (c. 2001) –x86 implementations:8086 (c. 1978), 80186, 286, 386, 486, Pentium, Pentium Pro, Pentium-4 (c. 2000), AMD Athlon, Transmeta Crusoe, SoftPC – MIPS implementations: R2000, R4000, R10000, ... –JVM:HotSpot, PicoJava, ARM Jazelle, ... September 26, 2005 6.823 L5- 3 Arvind Processor Performance Time = Instructions Cycles Time Program Program * Instruction * Cycle – Instructions per program depends on source code, compiler technology, and ISA – Cycles per instructions (CPI) depends upon the ISA and the microarchitecture – Time per cycle depends upon the microarchitecture and the base technology Microarchitecture CPI cycle time Microcoded >1 short this lecture Single-cycle unpipelined 1 long Pipelined 1 short September 26, 2005 6.823 L5- 4 Arvind Microarchitecture: Implementation of an ISA Controller control status points lines Data path Structure: How components are connected. Static Behavior: How data moves between components Dynamic September 26, 2005 Hardware Elements • Combinational circuits OpSelect – Mux, Demux, Decoder, ALU, ... - Add, Sub, ... - And, Or, Xor, Not, ... Sel - GT, LT, EQ, Zero, ... Sel lg(n) lg(n) O O0 A A0 0 O O1 A O 1 A Result 1 . A . ALU Mux . lg(n) . Comp? Demux Decoder O B An-1 On n-1 1 • Synchronous state elements – Flipflop, Register, Register file, SRAM, DRAM D register Clk ..
    [Show full text]
  • An Illustration of the Benefits of the MIPS® R12000® Microprocessor
    An Illustration of the Benefits of the MIPS® R12000® Microprocessor and OCTANETM System Architecture Ian Williams White Paper An Illustration of the Benefits of the MIPS® R12000® Microprocessor and OCTANETM System Architecture Ian Williams Overview In comparison with other contemporary microprocessors, many running at significantly higher clock rates, the MIPS R10000® demonstrates competitive performance, particularly when coupled with the OCTANE system architecture, which fully exploits the microprocessor’s capabilities. As part of Silicon Graphics’ commitment to deliver industry-leading application performance through advanced technology, the OCTANE platform now incorporates both system architectural improvements and a new- generation MIPS microprocessor, R12000. This paper discusses the developments in the MIPS R12000 microprocessor design and describes the application performance improvements available from the combina- tion of the microprocessor itself and OCTANE system architecture updates. Table of Contents 1. Introduction—OCTANE in the Current Competitive Landscape Summarizes the performance of OCTANE relative to current key competitive systems and micropro- cessors, highlighting MIPS R10000 strengths and weaknesses. 2. Advantages of MIPS R10000 and MIPS R12000 Microprocessors 2. 1 Architectural Features of the MIPS R10000 Microprocessor Describes the MIPS R10000 microprocessor’s strengths in detail. 2.2 Architectural Improvements of the MIPS R12000 Microprocessor Discusses the developments in the MIPS R12000 microprocessor to improve performance. 3. OCTANE System Architecture Improvements Describes the changes made to the OCTANE system architecture to complement the MIPS R12000 microprocessor. 4. Benefits of MIPS R12000 and OCTANE Architectural Changes on Application Performance Through a real customer test, shows in detail how the features described in the two previous sections translate to application performance.
    [Show full text]
  • MIPS R5000 Microprocessor Technical Backgrounder
    MIPS R5000 Microprocessor Technical Backgrounder Performance: SPECint95 5.5 SPECfp95 5.5 Instruction Set MIPS-IV ISA Compatibility MIPS-I, MIPS-II, AND MIPS-III Pipeline Clock 200 MHz System Interface clock Up to 100 MHz Caches 32 kB I-cache and 32 kB D-cache, each 2-way set associative TLB 48 dual entries; Variable Page size (4 kB to 16 MB in 4x increments) Power dissipation: 10 watts (peak). at maximum operating frequency Supply voltage min. 3.0 Vtyp. 3.3 Vmax. 3.6 V Packaging: 272-pin cavity-down Ball Grid Array (BGA) 223-pin ceramic Pin Grid Array (PGA) Fabrication Technology: Vendor specific process including 0.35 micron Die Size: 80-90 mm2 (Vendor Dependent) Number of Transistors: 3.6 million (4 Transistor SRAM cell), 5.0 million (6 Transistor SRAM cell) (Of these totals, logic transistors number 800,000). mips 1 Open RISC Technology Overview This backgrounder introduces the R5000 microprocessor from MIPS Technologies, Inc. The information presented in this paper discusses new features in the R5000, i.e. how the R5000 differs from previous microprocessors from MIPS. This section provides general information on the R5000, including: • Introduction • The R5000 microprocessor • Packaging • Future upgrades • Block Diagram Introduction to RISC Reduced instruction-set computing (RISC) architectures differ from older complex instruction-set computing (CISC) architectures by streamlining instruction execution. The MIPS architecture, developed by MIPS Technologies, is firmly established as the leading RISC architecture today. On introduction, RISC microprocessors were used for high performance computing applications. Lately, these processors have found their way into the consumer electronics and embedded systems markets as well.
    [Show full text]
  • A Study of Out-Of-Order Completion for the MIPS R10K Superscalar Processor
    A Study of Out-of-Order Completion for the MIPS R10K Superscalar Processor Prabhat Mishra Nikil Dutt Alex Nicolau [email protected] [email protected] [email protected] Architectures and Compilers for Embedded Systems (ACES) Laboratory Center for Embedded Computer Systems University of California, Irvine, CA 92697-3425, USA http://www.cecs.uci.edu/˜aces Technical Report #01-06 Dept. of Information and Computer Science University of California, Irvine, CA 92697, USA January 2001 Abstract Instruction level parallelism (ILP) improves performance for VLIW, EPIC, and Superscalar pro- cessors. Out-of-order execution improves performance further. The advantage of out-of-order execution is not fully utilized due to in-order completion. In this report we study the performance loss due to in-order completion for MIPS R10000 processor. Contents 1 Introduction 3 2 MIPS R10000 Architecture 5 2.1RegisterFiles..................................... 5 2.2InstructionPipeline.................................. 5 2.3RegisterRenaming.................................. 7 2.4BranchPrediction................................... 7 2.5IntegerQueue..................................... 7 2.6Floating-pointQueue................................. 8 2.7AddressQueue.................................... 8 2.8MemoryHierarchy.................................. 9 3 Experiments 10 3.1ExperimentalSetup.................................. 10 3.2Results......................................... 11 4 Summary 11 5 Acknowledgments 13 List of Figures 1 R10000 Microprocessor Block Diagram
    [Show full text]