Instruction Sets Should Be Free: the Case for RISC-V
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  A Superscalar Out-Of-Order X86 Soft Processor for FPGAA Superscalar Out-of-Order x86 Soft Processor for FPGA Henry Wong University of Toronto, Intel [email protected] June 5, 2019 Stanford University EE380 1 Hi! ● CPU architect, Intel Hillsboro ● Ph.D., University of Toronto ● Today: x86 OoO processor for FPGA (Ph.D. work) – Motivation – High-level design and results – Microarchitecture details and some circuits 2 FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT6-LUT 6-LUT6-LUT 3 6-LUT 6-LUT FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT 6-LUT 6-LUT 6-LUT 4 6-LUT 6-LUT FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 5 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 6 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel
- 
												  Robust Architectural Support for Transactional Memory in the Power ArchitectureRobust Architectural Support for Transactional Memory in the Power Architecture Harold W. Cain∗ Brad Frey Derek Williams IBM Research IBM STG IBM STG Yorktown Heights, NY, USA Austin, TX, USA Austin, TX, USA [email protected] [email protected] [email protected] Maged M. Michael Cathy May Hung Le IBM Research IBM Research (retired) IBM STG Yorktown Heights, NY, USA Yorktown Heights, NY, USA Austin, TX, USA [email protected] [email protected] [email protected] ABSTRACT in current p795 systems, with 8 TB of DRAM), as well as On the twentieth anniversary of the original publication [10], strengths in RAS that differentiate it in the market, adding following ten years of intense activity in the research lit- TM must not compromise any of these virtues. A robust erature, hardware support for transactional memory (TM) system is one that is sturdy in construction, a trait that has finally become a commercial reality, with HTM-enabled does not usually come to mind in respect to HTM systems. chips currently or soon-to-be available from many hardware We structured TM to work in harmony with features that vendors. In this paper we describe architectural support for support the architecture's scalability. Our goal has been to TM provide a comprehensive programming environment includ- TM added to a future version of the Power ISA . Two im- ing support for simple system calls and debug aids, while peratives drove the development: the desire to complement providing a robust (in the sense of "no surprises") execu- our weakly-consistent memory model with a more friendly tion environment with reasonably consistent performance interface to simplify the development and porting of multi- and without unexpected transaction failures.2 TM must be threaded applications, and the need for robustness beyond usable throughout the system stack: in hypervisors, oper- that of some early implementations.
- 
												  Implementation, Verification and Validation of an Openrisc-1200(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 10, No. 1, 2019 Implementation, Verification and Validation of an OpenRISC-1200 Soft-core Processor on FPGA Abdul Rafay Khatri Department of Electronic Engineering, QUEST, NawabShah, Pakistan Abstract—An embedded system is a dedicated computer system in which hardware and software are combined to per- form some specific tasks. Recent advancements in the Field Programmable Gate Array (FPGA) technology make it possible to implement the complete embedded system on a single FPGA chip. The fundamental component of an embedded system is a microprocessor. Soft-core processors are written in hardware description languages and functionally equivalent to an ordinary microprocessor. These soft-core processors are synthesized and implemented on the FPGA devices. In this paper, the OpenRISC 1200 processor is used, which is a 32-bit soft-core processor and Fig. 1. General block diagram of embedded systems. written in the Verilog HDL. Xilinx ISE tools perform synthesis, design implementation and configure/program the FPGA. For verification and debugging purpose, a software toolchain from (RISC) processor. This processor consists of all necessary GNU is configured and installed. The software is written in C components which are available in any other microproces- and Assembly languages. The communication between the host computer and FPGA board is carried out through the serial RS- sor. These components are connected through a bus called 232 port. Wishbone bus. In this work, the OR1200 processor is used to implement the system on a chip technology on a Virtex-5 Keywords—FPGA Design; HDLs; Hw-Sw Co-design; Open- FPGA board from Xilinx.
- 
												  Softcores for FPGA: the Free and Open Source AlternativesSoftcores for FPGA: The Free and Open Source Alternatives Jeremy Bennett and Simon Cook, Embecosm Abstract Open source soft cores have reached a degree of maturity. You can find them in products as diverse as a Samsung set-top box, NXP's ZigBee chips and NASA's TechEdSat. In this article we'll look at some of the more widely used: the OpenRISC 1000 from OpenCores, Gaisler's LEON family, Lattice Semiconductor's LM32 and Oracle's OpenSPARC, as well as more bleeding edge research designs such as BERI and CHERI from Cambridge University Computer Laboratory. We'll consider the technology, the business case, the engineering risks, and the licensing challenges of using such designs. What do we mean by “free” “Free” is an overloaded term in English. It can mean something you don't pay for, as in “this beer is free”. But it can also mean freedom, as in “you are free to share this article”. It is this latter sense that is central when we talk about free software or hardware designs. Of course such software or hardware designs may also be free, in the sense that you don't have to pay for them, but that is secondary. “Free” software in this sense has been around for a very long time. Explicitly since 1993 and Richard Stallman's GNU Manifesto, but implicitly since the first software was written and shared with colleagues and friends. That freedom is achieved by making the source code available, so others can modify the program, hence the more recent alternative name “open source”, but it is freedom that remains the key concept.
- 
												  Openpiton: an Open Source Manycore Research FrameworkOpenPiton: An Open Source Manycore Research Framework Jonathan Balkind Michael McKeown Yaosheng Fu Tri Nguyen Yanqi Zhou Alexey Lavrov Mohammad Shahrad Adi Fuchs Samuel Payne ∗ Xiaohua Liang Matthew Matl David Wentzlaff Princeton University fjbalkind,mmckeown,yfu,trin,yanqiz,alavrov,mshahrad,[email protected], [email protected], fxiaohua,mmatl,[email protected] Abstract chipset Industry is building larger, more complex, manycore proces- sors on the back of strong institutional knowledge, but aca- demic projects face difficulties in replicating that scale. To Tile alleviate these difficulties and to develop and share knowl- edge, the community needs open architecture frameworks for simulation, synthesis, and software exploration which Chip support extensibility, scalability, and configurability, along- side an established base of verification tools and supported software. In this paper we present OpenPiton, an open source framework for building scalable architecture research proto- types from 1 core to 500 million cores. OpenPiton is the world’s first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton leverages the industry hardened OpenSPARC T1 core with modifica- Figure 1: OpenPiton Architecture. Multiple manycore chips tions and builds upon it with a scratch-built, scalable uncore are connected together with chipset logic and networks to creating a flexible, modern manycore design. In addition, build large scalable manycore systems. OpenPiton’s cache OpenPiton provides synthesis and backend scripts for ASIC coherence protocol extends off chip. and FPGA to enable other researchers to bring their designs to implementation. OpenPiton provides a complete verifica- tion infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and has been widespread across the industry with manycore pro- is written in industry standard Verilog.
- 
												  Modelling the Armv8 Architecture, Operationally: Concurrency and ISAModelling the ARMv8 Architecture, Operationally: Concurrency and ISA Shaked Flur1 Kathryn E. Gray1 Christopher Pulte1 Susmit Sarkar2 Ali Sezgin1 Luc Maranget3 Will Deacon4 Peter Sewell1 1 University of Cambridge, UK 2 University of St Andrews, UK 3 INRIA, France 4 ARM Ltd., UK [email protected] [email protected] [email protected] [email protected] Abstract Keywords Relaxed Memory Models, semantics, ISA In this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to 1. Introduction clarify what the range of architecturally allowable behaviour is, and The ARM architecture is the specification of a wide range of pro- thereby to support future work on formal verification, analysis, and cessors: cores designed by ARM that are integrated into devices testing of concurrent ARM software and hardware. produced by many other vendors, and cores designed ab initio by Establishing such models with high confidence is intrinsically ARM architecture partners, such as Nvidia and Qualcomm. The difficult: it involves capturing the vendor’s architectural intent, as- architecture defines the properties on which software can rely on, pects of which (especially for concurrency) have not previously identifying an envelope of behaviour that all these processors are been precisely defined. We therefore first develop a concurrency supposed to conform to. It is thus a central interface in the industry, model with a microarchitectural flavour, abstracting from many between those hardware vendors and software developers. It is also hardware implementation concerns but still close to hardware- a desirable target for software verification and analysis: software designer intuition.
- 
												  Small Soft Core up Inventory ©2019 James Brakefield Opencore and Other Soft Core Processors Reverse-U16 A.Ttool pip _uP_all_soft opencores or style / data inst repor com LUTs blk F tool MIPS clks/ KIPS ven src #src fltg max max byte adr # start last secondary web status author FPGA top file chai e note worthy comments doc SOC date LUT? # inst # folder prmary link clone size size ter ents ALUT mults ram max ver /inst inst /LUT dor code files pt Hav'd dat inst adrs mod reg year revis link n len Small soft core uP Inventory ©2019 James Brakefield Opencore and other soft core processors reverse-u16 https://github.com/programmerby/ReVerSE-U16stable A.T. Z80 8 8 cylcone-4 James Brakefield11224 4 60 ## 14.7 0.33 4.0 X Y vhdl 29 zxpoly Y yes N N 64K 64K Y 2015 SOC project using T80, HDMI generatorretro Z80 based on T80 by Daniel Wallner copyblaze https://opencores.org/project,copyblazestable Abdallah ElIbrahimi picoBlaze 8 18 kintex-7-3 James Brakefieldmissing block622 ROM6 217 ## 14.7 0.33 2.0 57.5 IX vhdl 16 cp_copyblazeY asm N 256 2K Y 2011 2016 wishbone extras sap https://opencores.org/project,sapstable Ahmed Shahein accum 8 8 kintex-7-3 James Brakefieldno LUT RAM48 or block6 RAM 200 ## 14.7 0.10 4.0 104.2 X vhdl 15 mp_struct N 16 16 Y 5 2012 2017 https://shirishkoirala.blogspot.com/2017/01/sap-1simple-as-possible-1-computer.htmlSimple as Possible Computer from Malvinohttps://www.youtube.com/watch?v=prpyEFxZCMw & Brown "Digital computer electronics" blue https://opencores.org/project,bluestable Al Williams accum 16 16 spartan-3-5 James Brakefieldremoved clock1025 constraint4 63 ## 14.7 0.67 1.0 41.1 X verilog 16 topbox web N 4K 4K N 16 2 2009
- 
												  Design of the RISC-V Instruction Set ArchitectureDesign of the RISC-V Instruction Set Architecture Andrew Waterman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.html January 3, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Patterson, Chair Professor Krste Asanovi´c Associate Professor Per-Olof Persson Spring 2016 Design of the RISC-V Instruction Set Architecture Copyright 2016 by Andrew Shell Waterman 1 Abstract Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Patterson, Chair The hardware-software interface, embodied in the instruction set architecture (ISA), is arguably the most important interface in a computer system. Yet, in contrast to nearly all other interfaces in a modern computer system, all commercially popular ISAs are proprietary.
- 
												  Computer Architectures an OverviewComputer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
- 
												  Evaluation of Synthesizable CPU CoresEvaluation of synthesizable CPU cores DANIEL MATTSSON MARCUS CHRISTENSSON Maste r ' s Thesis Com p u t e r Science an d Eng i n ee r i n g Pro g r a m CHALMERS UNIVERSITY OF TECHNOLOGY Depart men t of Computer Engineering Gothe n bu r g 20 0 4 All rights reserved. This publication is protected by law in accordance with “Lagen om Upphovsrätt, 1960:729”. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the authors. Daniel Mattsson and Marcus Christensson, Gothenburg 2004. Evaluation of synthesizable CPU cores Abstract The three synthesizable processors: LEON2 from Gaisler Research, MicroBlaze from Xilinx, and OpenRISC 1200 from OpenCores are evaluated and discussed. Performance in terms of benchmark results and area resource usage is measured. Different aspects like usability and configurability are also reviewed. Three configurations for each of the processors are defined and evaluated: the comparable configuration, the performance optimized configuration and the area optimized configuration. For each of the configurations three benchmarks are executed: the Dhrystone 2.1 benchmark, the Stanford benchmark suite and a typical control application run as a benchmark. A detailed analysis of the three processors and their development tools is presented. The three benchmarks are described and motivated. Conclusions and results in terms of benchmark results, performance per clock cycle and performance per area unit are discussed and presented. Sammanfattning De tre syntetiserbara processorerna: LEON2 från Gaisler Research, MicroBlaze från Xilinx och OpenRISC 1200 från OpenCores utvärderas och diskuteras.
- 
												  Small Soft Core up Inventory ©2014 James Brakefield Opencore and Other Soft Core Processors Only Cores in the "Usable" Category IncludedSmall soft core uP Inventory ©2014 James Brakefield Opencore and other soft core processors Only cores in the "usable" category included Most Prolific Authors ©2014 James Brakefield John Kent micro8a, micro16b, system05, system09, system11, system68 6 Daniel Wallner ax8, ppx16, t65, t80 4 Ulrich Riedel 68hc05, 68hc08, tiny64, tiny8 4 Jose Ruiz ion, light52, light8080 3 Lazaridis Dimitris mips_fault_tolerant, mipsr2000, mips_enhanced 3 Shawn Tan ae18, aeMB, k68 3 Most FPGA results (e.g. easy to compile, place & route on any FPGA family) ©2014 James Brakefield eco32 cyclone-4, arria-2, spartan-3, spartan-6, kintex-7 5 navre cyclone-4, arria-2, cyclone-5, spartan-6, kintex-7 5 leros cyclone-4, spartan-3, spartan-6, kintex-7 4 openmsp430 cyclone-4, stratix-3, spartan-6, virtex-6 4 Most Clones ©2014 James Brakefield ion, minimips, mips_fault_tolerant, misp32r1, misp789, mipsr2000, plasma, ucore, yacc, MIPS 10 m1_core 6502 ag_6502, cpu6502_true_cycle, free6502, lattice6502, m65c02, t65, t6507lp, m65 8 PIC16 free_risc8, lwrisc, minirisc, p16c5x, ppx16, recore54, risc16f84, risc5x 8 microblaze aeMB, mblite, microblaze, myblaze, openfire_core, secretblaze 6 6800 hd63701, system68, system05, 68hc05, 68hc08 5 8051 dalton_8051, light52, mc8051, t51, turbo8051 5 avr avr_core, avr_hp, avr8, navre, riscmcu 5 z80 nextz80, t80, tv80, wb_z80, y80e 5 openrisc altor32, minsoc, or1k, or1200_hp 4 6809 6809_6309, system09, mc6809e 3 8080 cpu8080, light8080, t80 3 68000 ao68000, tg68, v1_coldfire 3 PDP-8 pdp8, pdp8l, pdp8verilog 3 picoblaze copyblaze, pacoblaze,
- 
												  Outline What Makes a Good ISA? Programmability ImplementabilityOutline What Makes a Good ISA? • Instruction Sets in General • Programmability • MIPS Assembly Programming • Easy to express programs efficiently? • Other Instruction Sets • Implementability • Goals of ISA Design • Easy to design high-performance implementations (i.e., microarchitectures)? • RISC vs. CISC • Intel x86 (IA-32) • Compatibility • Easy to maintain programmability as languages and programs evolve? • Easy to maintain implementability as technology evolves? © 2012 Daniel J. Sorin 66 © 2012 Daniel J. Sorin 67 from Roth and Lebeck from Roth and Lebeck Programmability Implementability • Easy to express programs efficiently? • Every ISA can be implemented • For whom? • But not every ISA can be implemented well • Human • Bad ISA bad microarchitecture (slow, power-hungry, etc.) • Want high-level coarse-grain instructions • As similar to HLL as possible • We’d like to use some of these high-performance • This is the way ISAs were pre-1985 implementation techniques • Compilers were terrible, most code was hand-assembled • Pipelining, parallel execution, out-of-order execution • Compiler • We’ll discuss these later in the semester • Want low-level fine-grain instructions • Compiler can’t tell if two high-level idioms match exactly or not • Certain ISA features make these difficult • This is the way most post-1985 ISAs are • Variable length instructions • Optimizing compilers generate much better code than humans • Implicit state (e.g., condition codes) • ICQ: Why are compilers better than humans? • Wide variety of instruction formats © 2012 Daniel J. Sorin 68 © 2012 Daniel J. Sorin 69 from Roth and Lebeck from Roth and Lebeck Compatibility Compatibility in the Age of VMs • Few people buy new hardware … if it means they have to • Virtual machine (VM) : piece of software that emulates buy new software, too behavior of hardware platform • Intel was the first company to realize this • Examples: VMWare, Xen, Simics • ISA must stay stable, no matter what (microarch.