A Highly Productive Implementation of an Out-Of-Order Processor Generator

Total Page:16

File Type:pdf, Size:1020Kb

A Highly Productive Implementation of an Out-Of-Order Processor Generator A Highly Productive Implementation of an Out-of-Order Processor Generator Christopher Celio Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2018-151 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-151.html December 1, 2018 Copyright © 2018, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. A Highly Productive Implementation of an Out-of-Order Processor Generator by Christopher Patrick Celio A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Emeritus David A. Patterson, Chair Professor Krste Asanovi´c,Co-chair Assistant Professor Zachary Pardos Fall 2017 A Highly Productive Implementation of an Out-of-Order Processor Generator Copyright 2017 by Christopher Patrick Celio 1 Abstract A Highly Productive Implementation of an Out-of-Order Processor Generator by Christopher Patrick Celio Doctor of Philosophy in Computer Science University of California, Berkeley Professor Emeritus David A. Patterson, Chair Professor Krste Asanovi´c,Co-chair General-purpose serial-thread performance gains have become more difficult for industry to realize due to the slowing down of process improvements. In this new regime of poor process scaling, continued performance improvement relies on a number of small-scale micro- architectural enhancements. However, software simulator-based models, which computer architecture research has largely relied upon, may not be well-suited for evaluating ideas at the necessary fidelity. To facilitate architecture research during this fallow period of Moore's Law, we propose using processor simulators built from synthesizable processor designs. This thesis describes the design of a synthesizable, industry-competitive processor built on recent advancements in open-source hardware: we leverage the new open-source RISC-V instruction set architecture, the new Chisel hardware construction language, and the Rocket-chip processor generator. Our processor generator is called BOOM, and it designed for use in education, research, and industry. Like most contemporary high-performance cores, BOOM is superscalar (able to execute multiple instructions per cycle) and out-of-order (able to execute instructions as their dependencies are resolved and not restricted to their program order). The BOOM generator was implemented using the Chisel hardware construction language, allowing for the rapid implementation of parameterized designs. The Chisel description generates synthesizable implementations of BOOM that can target both FPGAs and ASIC tool-flows. The BOOM effort culminated in a test chip that was fabricated in the TSMC 28 nm HPM process (high performance mobile) using the foundry-provided standard-cell library and memory compiler. This thesis highlights two aspects of the BOOM design: its industry-competitive branch prediction and its configurable execution datapath. The remainder of the thesis discusses the BOOM tape-out, which was performed by two graduate students and demonstrated the ability to quickly adapt the design to the physical design issues that arose. i To my parents, for the many opportunities that they gave me. ii Contents Contents ii List of Figures vi List of Tables viii 1 Introduction1 1.1 Leveraging New Infrastructure..........................1 1.2 Contributions...................................2 1.3 Thesis Outline...................................3 2 Background4 2.1 Motivation.....................................4 2.2 Technology and Research Trends........................5 2.2.1 Impact of Performance Slowdown on Architecture Research...... 10 2.3 Computer Architecture Research Trends.................... 12 2.3.1 Architectural Simulators......................... 12 2.3.2 RTL Implementations.......................... 15 2.3.3 Energy Modeling Methodologies..................... 18 2.4 Processor Microarchitecture........................... 20 2.5 Out-of-order Processor Microarchitectures................... 23 2.5.1 The Data-in-ROB Design (Implicit Renaming)............. 23 2.5.2 The Physical Register File Design (Explicit Renaming)........ 25 2.5.3 The Differences Between the Two Styles................ 25 2.6 History of Out-of-order Processors........................ 27 2.7 The Value of Out-of-order Execution...................... 28 2.8 Conclusion..................................... 29 3 BOOM Overview 31 3.1 The RISC-V Instruction Set Architecture.................... 31 3.2 The BOOM Microarchitecture.......................... 33 3.2.1 The BOOM Pipeline........................... 33 iii 3.2.2 Instruction Fetch and Branch Prediction................ 35 3.2.2.1 Branch Target Buffer (BTB)................. 35 3.2.2.2 Return Address Stack (RAS)................. 35 3.2.2.3 Conditional Branch Predictor (BPD)............. 35 3.2.3 The Decode Stage and Resource Allocation............... 36 3.2.4 The Register Rename Stage....................... 36 3.2.5 The Reorder Buffer (ROB) and Exception Handling.......... 36 3.2.6 The Issue Unit.............................. 38 3.2.7 The Register File and Bypass Network................. 38 3.2.8 The Execution Pipeline.......................... 38 3.2.9 The Load/Store Unit (LSU)....................... 38 3.2.10 The Memory System........................... 39 3.3 Design Methodology............................... 39 3.3.1 RTL Design................................ 39 3.3.2 RTL Verification............................. 40 3.3.3 The Chisel Hardware Construction Language.............. 42 3.3.4 The Rocket-chip System-on-a-Chip Generator............. 44 3.4 FPGA Implementation.............................. 44 3.5 ASIC Implementation.............................. 45 3.5.1 SRAM Generation............................ 50 3.5.2 Custom Bit Array Register File..................... 50 3.5.3 Timing Analysis.............................. 52 3.6 Parameters.................................... 53 3.7 Performance Evaluation............................. 53 4 Branch Prediction 59 4.1 Background.................................... 60 4.1.1 Research Methodology.......................... 60 4.1.2 Deficiencies of Trace-based, Unpipelined Models............ 61 4.1.3 The State-of-the-art TAGE Predictor.................. 62 4.2 The BOOM RTL Implementation........................ 64 4.2.1 The Frontend Organization....................... 64 4.2.2 Providing a Branch Predictor Framework................ 66 4.2.3 Managing the Global History Register................. 66 4.2.4 The Two-bit Counter Tables....................... 69 4.2.5 Superscalar Predictions.......................... 70 4.2.6 The BOOM GShare Predictor...................... 71 4.2.7 The BOOM TAGE Predictor...................... 71 4.2.7.1 TAGE Global History and the Circular Shift Registers (CSRs) 73 4.2.7.2 Usefulness Counters (u-bits).................. 73 4.2.7.3 TAGE Snapshot State..................... 74 4.3 Addressing the Gaps Between RTL and Models................ 74 iv 4.3.1 Superscalar Prediction.......................... 74 4.3.2 Delayed History Update......................... 75 4.3.3 Delayed Predictor Update........................ 76 4.3.4 Accurate Cost Measurements...................... 76 4.3.5 Implementation Realism......................... 77 4.4 Proposed Improvements for Software Model Evaluations........... 77 4.5 Conclusion..................................... 78 5 Describing an Out-of-order Execution Pipeline Generator 80 5.1 Goals and Challenges............................... 81 5.2 The BOOM Execution Pipeline......................... 83 5.2.1 Branch Speculation............................ 83 5.2.2 Execution Units.............................. 83 5.2.3 Functional Units............................. 86 5.2.3.1 Pipelined Functional Units................... 86 5.2.3.2 Iterative Functional Units................... 89 5.2.4 The Load/Store Unit........................... 89 5.3 Case Study: Adding Floating-point Support.................. 89 5.3.1 Register File and Register Renaming.................. 91 5.3.2 Issue Window............................... 91 5.3.3 Floating-point Control and Status Register (fcsr)........... 91 5.3.4 Hardfloat and Low-level Instantiations................. 92 5.3.5 Pipelined Functional Unit Wrapper................... 92 5.3.6 Adding the FPU to an Execution Unit................. 94 5.3.7 Results................................... 94 5.4 Case Study: Adding a Binary Manipulation Instruction............ 97 5.4.1 Decode, Rename, and Instruction Steering............... 97 5.4.2 The Popcount Unit Implementation................... 97 5.5 Limitations.................................... 100 5.6 Conclusion..................................... 102 6 VLSI Implementation Effort 104 6.1 Background.................................... 105 6.2 BOOMv1..................................... 105 6.3 BOOMv2..................................... 109 6.3.1 Frontend (Instruction Fetch)....................... 109 6.3.2 Distributed Issue Windows........................ 112 6.3.3 Register File Design........................... 112
Recommended publications
  • Review Memory Disambiguation Review Explicit Register Renaming
    5HYLHZ5HRUGHU%XIIHU 52% &6 *UDGXDWH&RPSXWHU$UFKLWHFWXUH 8VHRIUHRUGHUEXIIHU /HFWXUH ² ,QRUGHULVVXH2XWRIRUGHUH[HFXWLRQ,QRUGHUFRPPLW ² +ROGVUHVXOWVXQWLOWKH\FDQEHFRPPLWWHGLQRUGHU ,QVWUXFWLRQ/HYHO3DUDOOHOLVP ª 6HUYHVDVVRXUFHRIYDOXHVXQWLOLQVWUXFWLRQVFRPPLWWHG ² 3URYLGHVVXSSRUWIRUSUHFLVHH[FHSWLRQV6SHFXODWLRQVLPSO\WKURZRXW *HWWLQJWKH&3, LQVWUXFWLRQVODWHUWKDQH[FHSWHGLQVWUXFWLRQ ² &RPPLWVXVHUYLVLEOHVWDWHLQLQVWUXFWLRQRUGHU ² 6WRUHVVHQWWRPHPRU\V\VWHPRQO\ZKHQWKH\UHDFKKHDGRIEXIIHU 6HSWHPEHU ,Q2UGHU&RPPLW LVLPSRUWDQWEHFDXVH 3URI-RKQ.XELDWRZLF] ² $OORZVWKHJHQHUDWLRQRISUHFLVHH[FHSWLRQV ² $OORZVVSHFXODWLRQDFURVVEUDQFKHV &6.XELDWRZLF] &6.XELDWRZLF] /HF /HF 5HYLHZ0HPRU\'LVDPELJXDWLRQ 5HYLHZ([SOLFLW5HJLVWHU5HQDPLQJ 4XHVWLRQ*LYHQDORDGWKDWIROORZVDVWRUHLQSURJUDP 0DNHXVHRIDSK\VLFDO UHJLVWHUILOHWKDWLVODUJHUWKDQ RUGHUDUHWKHWZRUHODWHG" QXPEHURIUHJLVWHUVVSHFLILHGE\,6$ ² 7U\LQJWRGHWHFW5$:KD]DUGVWKURXJKPHPRU\ .H\LQVLJKW$OORFDWHDQHZSK\VLFDOGHVWLQDWLRQUHJLVWHU ² 6WRUHVFRPPLWLQRUGHU 52% VRQR:$5:$:PHPRU\KD]DUGV IRUHYHU\LQVWUXFWLRQWKDWZULWHV ,PSOHPHQWDWLRQ ² 5HPRYHVDOOFKDQFHRI:$5RU:$:KD]DUGV ² .HHSTXHXHRIVWRUHVLQSURJRUGHU ² 6LPLODUWRFRPSLOHUWUDQVIRUPDWLRQFDOOHG6WDWLF6LQJOH$VVLJQPHQW ² :DWFKIRUSRVLWLRQRIQHZORDGVUHODWLYHWRH[LVWLQJVWRUHV ª /LNHKDUGZDUHEDVHGG\QDPLFFRPSLODWLRQ" :KHQKDYHDGGUHVVIRUORDGFKHFNVWRUHTXHXH 0HFKDQLVP".HHSDWUDQVODWLRQWDEOH ² ,IDQ\ VWRUHSULRUWRORDGLVZDLWLQJIRULWVDGGUHVVVWDOOORDG ² ,6$UHJLVWHU⇒ SK\VLFDOUHJLVWHUPDSSLQJ ² ,IORDGDGGUHVVPDWFKHVHDUOLHUVWRUHDGGUHVV DVVRFLDWLYHORRNXS ² :KHQUHJLVWHUZULWWHQUHSODFHHQWU\ZLWKQHZUHJLVWHUIURPIUHHOLVW WKHQZHKDYHDPHPRU\LQGXFHG
    [Show full text]
  • Development of Systemc Modules from HDL for System-On-Chip Applications
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 8-2004 Development of SystemC Modules from HDL for System-on-Chip Applications Siddhartha Devalapalli University of Tennessee - Knoxville Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Electrical and Computer Engineering Commons Recommended Citation Devalapalli, Siddhartha, "Development of SystemC Modules from HDL for System-on-Chip Applications. " Master's Thesis, University of Tennessee, 2004. https://trace.tennessee.edu/utk_gradthes/2119 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Siddhartha Devalapalli entitled "Development of SystemC Modules from HDL for System-on-Chip Applications." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Master of Science, with a major in Electrical Engineering. Dr. Donald W. Bouldin, Major Professor We have read this thesis and recommend its acceptance: Dr. Gregory D. Peterson, Dr. Chandra Tan Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) To the Graduate Council: I am submitting herewith a thesis written by Siddhartha Devalapalli entitled "Development of SystemC Modules from HDL for System-on-Chip Applications".
    [Show full text]
  • A Fedora Electronic Lab Presentation
    Chitlesh GOORAH Design & Verification Club Bristol 2010 FUDConBrussels 2007 - [email protected] [ Free Electronic Lab ] (formerly Fedora Electronic Lab) An opensource Design and Simulation platform for Micro-Electronics A one-stop linux distribution for hardware design Marketing means for opensource EDA developers (Networking) From SPEC, Model, Frontend Design, Backend, Development boards to embedded software. FUDConBrussels 2007 - [email protected] Electronic Designers Problems Approx. 6 month design development cycle Tackling Design Complexity Lower Power, Lower Cost and Smaller Space Semiconductor Industry's neck squeezed in 2008 Management (digital/analog) IP Portfolio FUDConBrussels 2007 - [email protected] FUDConBrussels 2007 - [email protected] A basic Design Flow FUDConBrussels 2007 - [email protected] TIP: Use verilator to lint your verilog files. Most of the Veripool tools are available under FEL. They are in sync with Wilson Snyder's releases. FUDConBrussels 2007 - [email protected] FUDConBrussels 2007 - [email protected] GTKWaveGTKWave Don'tDon't forgetforget itsits TCLTCL backendbackend WidelyWidely usedused togethertogether withwith SystemCSystemC FUDConBrussels 2007 - [email protected] Tools Standard Cell libraries FUDConBrussels 2007 - [email protected] BackendBackend designdesign Open Circuit Design, Electric FUDConBrussels 2007 - [email protected], Toped gEDA/gafgEDA/gaf Well known and famous. A very good example of opensource
    [Show full text]
  • Simulator for the RV32-Versat Architecture
    Simulator for the RV32-Versat Architecture João César Martins Moutoso Ratinho Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering Supervisor(s): Prof. José João Henriques Teixeira de Sousa Examination Committee Chairperson: Prof. Francisco André Corrêa Alegria Supervisor: Prof. José João Henriques Teixeira de Sousa Member of the Committee: Prof. Marcelino Bicho dos Santos November 2019 ii Declaration I declare that this document is an original work of my own authorship and that it fulfills all the require- ments of the Code of Conduct and Good Practices of the Universidade de Lisboa. iii iv Acknowledgments I want to thank my supervisor, Professor Jose´ Teixeira de Sousa, for the opportunity to develop this work and for his guidance and support during that process. His help was fundamental to overcome the multiple obstacles that I faced during this work. I also want to acknowledge Professor Horacio´ Neto for providing a simple Convolutional Neural Net- work application, used as a basis for the application developed for the RV32-Versat architecture. A special acknowledgement goes to my friends, for their continuous support, and Valter,´ that is developing a multi-layer architecture for RV32-Versat. When everything seemed to be doomed he always had a miraculous solution. Finally, I want to express my sincere gratitude to my family for giving me all the support and encour- agement that I needed throughout my years of study and through the process of researching and writing this thesis. They are also part of this work. Thank you. v vi Resumo Esta tese apresenta um novo ambiente de simulac¸ao˜ para a arquitectura RV32-Versat baseado na ferramenta de simulac¸ao˜ Verilator.
    [Show full text]
  • ARM Cortex-A* Brian Eccles, Riley Larkins, Kevin Mee, Fred Silberberg, Alex Solomon, Mitchell Wills
    ARM Cortex-A* Brian Eccles, Riley Larkins, Kevin Mee, Fred Silberberg, Alex Solomon, Mitchell Wills The ARM Cortex­A product line has changed significantly since the introduction of the Cortex­A8 in 2005. ARM’s next major leap came with the Cortex­A9 which was the first design to incorporate multiple cores. The next advance was the development of the big.LITTLE architecture, which incorporates both high performance (A15) and high efficiency(A7) cores. Most recently the A57 and A53 have added 64­bit support to the product line. The ARM Cortex series cores are all made up of the main processing unit, a L1 instruction cache, a L1 data cache, an advanced SIMD core and a floating point core. Each processor then has an additional L2 cache shared between all cores (if there are multiple), debug support and an interface bus for communicating with the rest of the system. Multi­core processors (such as the A53 and A57) also include additional hardware to facilitate coherency between cores. The ARM Cortex­A57 is a 64­bit processor that supports 1 to 4 cores. The instruction pipeline in each core supports fetching up to three instructions per cycle to send down the pipeline. The instruction pipeline is made up of a 12 stage in order pipeline and a collection of parallel pipelines that range in size from 3 to 15 stages as seen below. The ARM Cortex­A53 is similar to the A57, but is designed to be more power efficient at the cost of processing power. The A57 in order pipeline is made up of 5 stages of instruction fetch and 7 stages of instruction decode and register renaming.
    [Show full text]
  • Alphaserver GS1280 Overview
    QuickSpecs HP AlphaServer GS1280 Systems Overview HP AlphaServer GS1280 Models 8 and 16 1. Cable/DSL HUB 2. PCI/PCI-X I/O Expansion Drawer (optional) 3. PCI/PCI-X I/O Master Drawer (mandatory option) 4. System Building Block Drawer (Model 8 Includes 1 drawer, Model 16 includes 2 drawers) 5. 48-volt DC Power shelves, 3 power supplies per shelf (Model 8 includes 1 shelf, Model 16 includes 2 shelves) 6. AC input controller(s) (mandatory option) DA - 11921 Worldwide — Version 24 — July 23, 2007 Page 1 QuickSpecs HP AlphaServer GS1280 Systems Overview HP AlphaServer GS1280 Model 32 1. Cable/DSL HUB 2. I/O Expansion Drawer (optional) 3. I/O Master Drawer (mandatory option) 4. System Building Block Drawer (Model 32 includes four drawers) 5. Power supply, dual AC input is standard 6. AC input controllers (two mandatory options for Model 32) DA - 11921 Worldwide — Version 24 — July 23, 2007 Page 2 QuickSpecs HP AlphaServer GS1280 Systems Overview HP AlphaServer GS1280 Model 64 1. Cable/DSL HUB 2. I/O Expansion Drawer (optional) 3. I/O Master Drawer (one drawer is optional) 4. System Building Block Drawer (Model 64 includes eight drawers) 5. Power supply, dual AC input is standard (two included for Model 64) 6. AC input controllers (two mandatory options for Model 64) DA - 11921 Worldwide — Version 24 — July 23, 2007 Page 3 QuickSpecs HP AlphaServer GS1280 Systems Overview At A Glance AlphaServer GS1280 Systems Up to 64 Alpha 21364 EV7 processors at 1300 MHz and 1150 MHz with advanced on-chip memory controllers and switch logic capable of providing
    [Show full text]
  • Out-Of-Order Execution & Register Renaming
    Asanovic/Devadas Spring 2002 6.823 Out-of-Order Execution & Register Renaming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Asanovic/Devadas Spring 2002 6.823 Scoreboard for In-order Issue Busy[unit#] : a bit-vector to indicate unit’s availability. (unit = Int, Add, Mult, Div) These bits are hardwired to FU's. WP[reg#] : a bit-vector to record the registers for which writes are pending Issue checks the instruction (opcode dest src1 src2) against the scoreboard (Busy & WP) to dispatch FU available? not Busy[FU#] RAW? WP[src1] or WP[src2] WAR? cannot arise WAW? WP[dest] Asanovic/Devadas Spring 2002 Out-of-Order Dispatch 6.823 ALU Mem IF ID Issue WB Fadd Fmul • Issue stage buffer holds multiple instructions waiting to issue. • Decode adds next instruction to buffer if there is space and the instruction does not cause a WAR or WAW hazard. • Any instruction in buffer whose RAW hazards are satisfied can be dispatched (for now, at most one dispatch per cycle). On a write back (WB), new instructions may get enabled. Asanovic/Devadas Spring 2002 6.823 Out-of-Order Issue: an example latency 1 LD F2, 34(R2) 1 1 2 2 LD F4, 45(R3) long 3 MULTD F6, F4, F2 3 4 3 4 SUBD F8, F2, F2 1 5 5 DIVD F4, F2, F8 4 6 ADDD F10, F6, F4 1 6 In-order: 1 (2,1) . 2 3 4 4 3 5 . 5 6 6 Out-of-order: 1 (2,1) 4 4 . 2 3 . 3 5 .
    [Show full text]
  • Dynamic Register Renaming Through Virtual-Physical Registers
    Dynamic Register Renaming Through Virtual-Physical Registers † Teresa Monreal [email protected] Antonio González* [email protected] Mateo Valero* [email protected] †† José González [email protected] † Víctor Viñals [email protected] †Departamento de Informática e Ing. de Sistemas. Centro Politécnico Superior - Univ. de Zaragoza, Zaragoza, Spain. *Departament d’ Arquitectura de Computadors. Universitat Politècnica de Catalunya, Barcelona, Spain. ††Departamento de Ingeniería y Tecnología de Computadores. Universidad de Murcia, Murcia, Spain. Abstract Register file access time represents one of the critical delays of current microprocessors, and it is expected to become more critical as future processors increase the instruction window size and the issue width. This paper present a novel dynamic register renaming scheme that delays the allocation of physical registers until a late stage in the pipeline. We show that it can provide important savings in number of physical registers so it can significantly shorter the register file access time. Delaying the allocation of physical registers requires some artifact to keep track of dependences. This is achieved by introducing the concept of virtual-physical registers, which are tags that do not require any storage location. The proposed renaming scheme shortens the average number of cycles that each physical register is allocated, and allows for an early execution of instructions since they can obtain a physical register for its destination earlier than with the conventional scheme. Early execution is especially beneficial for branches and memory operations, since the former can be resolved earlier and the latter can prefetch their data in advance. 1. Introduction Dynamically-scheduled superscalar processors exploit instruction-level parallelism (ILP) by overlapping the execution of instructions in an instruction window.
    [Show full text]
  • CMSC 411: Computer Architecture
    CMSC 411: Computer Architecture Spring 2019 Jason Tang "1 About Your Friendly Instructor • Jason Tang (just call me Jason!)! • UMBC adjunct faculty member since 2012! • Taught CMSC 104, 202, 421, and 411! • Work full-time at a nearby mega-corporation as a software engineer "2 Contact Information • Email me at [email protected]! • O$ce in ITE 201C! • Tuesday / Thursday, 7:00 pm - 8:00 pm, right after class! • Teaching Assistant:! • TBA! • "3 Am I in the Right Class? • Prerequisites are:! • CMSC 313, or! • CMPE 212 + CMPE 310! • Must be able to read hexadecimal notation! • Should already by familiar with C/C++ and some assembly code! • This does not mean Java, Python, or other scripting language "4 Required Programming Knowledge • Know how (or research on Stack Overflow) to do these things:! • Read the very fantastic man pages! • Call a function and pass values in and out! • Di%erence between an in parameter, out parameter, and in/out parameter! • Know what a C++ reference technically is! • Understand basic boolean logic "5 Topics Covered • Instruction Sets! • Performance Measurements! • Machine Arithmetic! • Processor Design! • Memory Systems! • I/O Design! • Computer Buses "6 Course Information • http://www.csee.umbc.edu/~jtang/cs411.s19! • Grades will be posted on Blackboard! • Discussion forums are also on Blackboard! • All assignments submitted via submit system at linux.gl.umbc.edu! • Ensure you have a way to transfer files between your development machine and UMBC server (scp, PuTTy, Cyberduck, or equivalent)! • Using the clipboard to
    [Show full text]
  • FPGA-Accelerated Evaluation and Verification of RTL Designs
    FPGA-Accelerated Evaluation and Verification of RTL Designs Donggyu Kim Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2019-57 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-57.html May 17, 2019 Copyright © 2019, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. FPGA-Accelerated Evaluation and Verification of RTL Designs by Donggyu Kim A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Krste Asanovi´c,Chair Adjunct Assistant Professor Jonathan Bachrach Professor Rhonda Righter Spring 2019 FPGA-Accelerated Evaluation and Verification of RTL Designs Copyright c 2019 by Donggyu Kim 1 Abstract FPGA-Accelerated Evaluation and Verification of RTL Designs by Donggyu Kim Doctor of Philosophy in Computer Science University of California, Berkeley Professor Krste Asanovi´c,Chair This thesis presents fast and accurate RTL simulation methodologies for performance, power, and energy evaluation as well as verification and debugging using FPGAs in the hardware/software co-design flow. Cycle-level microarchitectural software simulation is the bottleneck of the hard- ware/software co-design cycle due to its slow speed and the difficulty of simulator validation.
    [Show full text]
  • Geekbench 3 License Keygen 26
    1 / 5 Geekbench 3 License Keygen 26 Geekbench 3 License Keygen 26 > http://fancli.com/19xvdd 1a8c34a149 Geekbench 3 is Primate Labs' cross-platform processor benchmark, .... Call of duty 4 Serial key Included For Pc Only free · [P3D] - Beech B200 ... full hd Don 2 movies free download 720p torrent ... geekbench 3 license keygen 26. 5960x cpu cache voltage, Intel Core i7-5960X - Benchmark, Geekbench 5, Cinebench R20, ... The Intel XScale processor supports a range of frequencies and voltages in order to allow the user to save power [3]. Instead ... Serial number to imei converter free ... Vcenter 7 keygen ... Jobsmart 26 gallon air compressor specs.. Geekbench 3 License Keygen 26 DOWNLOAD: http://cinurl.com/1ff2qd geekbench keygen, geekbench 4 keygen, geekbench 3 keygen 608fcfdb5b 3 Crack 61 .... ... work with our system. 3. Ready for kiosk ... II gaming platform. Novomatic(Gaminator) offline casino system ... geekbench 3 license keygen 26. Coverage setting aviation 4 audio hijack pro keygen. ... Download Crack Archicad 19 4006 Ita: Audio hijack pro 2.10.5 keygen ... Geekbench 3.. I told them that I wasn't expecting that and have adobe cs5 keygen mac 2020 reason to do so given that CS3 is serving me. ... To or adobe mindjet 26 adobe. ... These Geekbench 3 benchmarks are in bit mode and are for a single processor .... Crack Data Glitch 2 0 1. 3 Juin 2020 0. data glitch, data ... Rowbyte Data Glitch 2.0; Data Glitch 2 Serial Serial Numbers. Con . ... geekbench 3 license keygen 26 Increase benchmark scores (Antutu, Geekbench, Quadrant) Use App ... PlayerPro Music Player v4.2 APK Is Here [LATEST] Driver Magician 5.0 Keygen Is Here! ..
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]