Rejuvenating Research (and the whole semiconductor industry) with Open-Source Hardware

Krste Asanovic UC Berkeley, RISC-V Foundation, & SiFive Inc. [email protected]

Keynote, MICRO-52, Columbus,Ohio October 14, 2019 What did Moore predict in famous Law? At lowest cost/component, number of components/die doubles every 12 months.

2 Moore’s Law: Worse than dead, now irrelevant!

. Very few potential products in advanced nodes are constrained by marginal cost/transistor 3 CMOS Today . Wonderful, almost magical, technology . Fabricate billions of transistors . Reliably connect them with billions of wires . Clock at a few GHz . Dissipate <100W . Near 100% yield, cost a few $/die

. Scaling continuing, if economically challenged . Manufacturing is least of our problems 4 Custom Chip Costs

. NRE: Non-Recurring Engineering costs - Design + tooling for production

. Marginal manufacturing cost/chip - Cost of each chip made once in production - Silicon manufacture plus package & test 5 Overall Cost/Project (grossly simplified) Expected lifetime

cost Total cost Total Intercept set by NRE Expected lifetime shipments

Total #transistors shipped 6 Chip Development Cost [Source: IBS] At $0.5B cost, not many $Bs markets in which to make a profit

Caveat: figure assumes all IP developed from scratch and only leading projects do this, but do need >$100M return to justify <28nm tech 7 Overall Cost/Project <28nm

Newer

nodes Total cost Total End of Moore’s Law, Flat Design Productivity: NRE doubling each generation, but slope not Older changing, or increasing, each generation! node Total #transistors 8 “Moore’s Law” Business Model . Design standard part for the “killer socket” . Killer socket was PC, now smartphone, next is ??? . Sell 100s millions parts . Ideally, ~O(1) product, ~O(∞) volume . At infinite volume, profit set by marginal price/cost

What Moore’s Law is about

9 New Vertical Semiconductor Business Model Instead of standard chip products, chip customers want own differentiated chip designs: . Apple, , for phones . Google, Amazon, Microsoft, Alibaba for client/cloud . Tesla for cars . End-system value justifies chip NRE But what about non-huge firms & startups, with great ideas but can’t afford NRE?

10 Semiconductor Industry Perfect Storm “Cold air” Custom chip design too costly, slow, risky “Hot air” New applications need capabilities from custom chips: Cloud AI, Edge Learning, Wearables, IoT, Autonomous Vehicles, … 1111 How did turn into a $1B acquisition with only 13 employees?*

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 12 * https://www.sigarch.org/open-source-hardware-stone-soups-and-not-stone-statues-please Tech Stack

Open-Source Technology

from techstacks.io

Infrastructure

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 13 Talk Outline . History of open-source hardware . The RISC-V story . Reinvigorating the hardware industry . How architecture research can benefit and help

14 Early Days in Open-Source Hardware . SPICE released by Berkeley in 1973 . Mead & Conway VLSI revolution, late 70s, simplified design rules, spawned MOSIS fabrication service, and a generation of open-source tools – Magic/espresso/MIS/SIS/Lager (Berkeley), irsim (Stanford), chipmunk (Caltech) . Early 80s, university RISC projects (Stanford, Berkeley) used and help build open-source CAD tools . Together with and BSD , helped create the RISC industry (Sun, MIPS, SGI, …) and the ECAD industry (Cadence, Mentor, Synopsys)

15 Ring Array Processor, (ICSI 1989) (Nelson Morgan, Jim Beck, Phil Kohn, Jeff Bilmes)

. RAP Machine built for fast training of “big dumb” neural networks for speech recognition . Ring of TMS320C30 floating-point DSPs – Each DSP providing 32MFLOPS (32-bit FP) – Four DSPs/board, up to 10 boards connected at once (>1GFLOP/s peak, 640MB DRAM) – Neural net training rate of >100MCUPS (million connection updates per second) on 10 boards – FPGA ring connection used for systolic all-all communication during training/inference . Fast, flexible, but expensive – ~$100,000 each 16 Realization Group, ICSI, 1990 New naïve grad student joins Morgan’s group to build custom ANN VLSI for speech training

This is a cool ANN architecture for which we need custom silicon!

Unary-encoded inputs to avoid multiply, 12-bit weights

17 HiPNeT-1: (Highly Pipelined Network Trainer, 1990) Krste Asanovic, Brian Kingsbury, Nelson Morgan, John Wawrzynek

. Custom architecture for neural algorithm . Ignores pipeline RAW hazards (net trains around them) . Predicted 200MCUPS in 16mm2 of 2µm CMOS running at 20MHz 18 The first few chips… . Used Magic, Irsim, plus other open-source tools . MOSIS had a “TinyChip” program – $500 to fab a 2.2mmx2.2mm chip in 2µm CMOS Sigmoid Multiplier unit JTAG (Brian) (Pawan latches Sinha) (Krste) Regfile 24b (Bertrand) Adder 8-bit datapath (Krste) (Brian) 19 Meanwhile, back at the speech ranch…

There’s this even cooler And it doesn’t look ANN architecture for much like the last one. which we need custom Can you build a silicon! different chip?

Time for a programmable architecture… 20 Torrent-0 (T0): A Vector . Vector (like Crays) are very successful in scientific computing and have a clean programming model

T0 idea: Add a vector coprocessor to a standard RISC scalar processor, all on one chip – Primary motivation was software support effort 21 System Design Choices . Which standard RISC? – Considered SPARC, HP PA, PowerPC, and Alpha – Chose MIPS because: simplest, good software tools, Unix desktop for development, and a 64-bit extension path . Buy or build a MIPS core? – Commercial MIPS R3000 chips had coprocessor interface – No MIPS soft cores (no Verilog or synthesis tools yet) – MIPS asked for $2M for architectural licence (1992 dollars!) – Decided to roll our own MIPS-compatible core • vector coprocessor would have played havoc with caches • coprocessor interface too inefficient • commercial chip plus glue logic would blow our size and power budgets (to fit inside workstation) • couldn’t simulate whole system in our environment 22 T0 Block Diagram Address 1 KB . 32-bit datapaths Bus I-Cache VP0 Conditional Move . 16b*16b multiplies Scalar Clip Bus Shift Right . up to 96x32b ops/cycle Add . 40MHz, 1.0µm CMOS Multiply MIPS-II . 16.7 x 16.7 mm2 CPU Shift Left 28 Logic VMP 32

128 Vector Memory Vector Registers Pipeline

Scan Logic TSIP Chains Shift Left Add 8 Data 8 Shift Right Bus Clip VP1 Conditional Move 23 T0-based SPERT-II Accelerator Board (1995)

(40 MHz)

(100 MHz) (60 MHz)

. 35 boards shipped to 9 international sites . Used as production research platform for nine years – last time powered up for work in 2004! 24 Microprocessor ISAs in 1990s . From mid-1980s, Cambrian explosion of RISC ISAs – SPARC, MIPS, ARM, Precision, Clipper, AMD 29000, i860, IBM POWER, DEC Alpha, … – ”Brainiac” versus “Speed Demon” design style wars – ACE/ARC initiative, Windows NT on RISC desktops . Extinction event: Intel – Run clunky CISC ISA fast by breaking into micro-ops and using dynamic scheduling of microops – Use advanced manufacturing made possible by revenue from PC – Both a Brainiac and a Speed Demon, but compatible with whole software ecosystem – RISC ISAs disappeared from workstations, and ultimately from servers – Down in the undergrowth, though, small creatures survived… 25 Mobile Computing Takes Off . For Newton project, Apple selects Acorn RISC Machine, and creates new company ”Advanced RISC Machines” in 1990 . In 90s, ARM develops new business model licensing soft cores, not selling chips . ARM becomes dominant in mobile & embedded computing

26 Closing up of hardware design . Late 90s transition from 0.25µm to 0.18µm is when open-source CAD tools stopped being competitive – Scalable CMOS design rules left too much on table – Foundry rules were under NDA – Magic had tough time scaling to 6 metal layers – Complex physical effects needed better tools (logic synthesis from Verilog, 3D extraction, power analysis, …)

27 Millennial Dark Ages Descend

Only two dominant application ISAs (x86, ARM)

Only big companies design silicon, rampant consolidation in semiconductor industry

Little open-source hardware or tools activity Why Instruction Set Architecture matters . Why can’t Intel sell mobile chips? - 99%+ of mobile phones/tablets based on ARM v7/v8 ISA . Why can’t ARM partners sell servers? - 99%+ of laptops/desktops/servers based on AMD64 ISA (over 95%+ built by Intel) . How can IBM still sell mainframes? - IBM 360, oldest surviving ISA (50+ years)

ISA is most important interface in computer system where software meets hardware Open Interfaces Work for Software!

Field Open Standard Free, Open Implement. Proprietary Implement. Networking Ethernet, TCP/IP Many Many OS Posix Linux, FreeBSD M/S Windows C gcc, LLVM Intel icc, ARMcc Databases SQL MySQL, PostgresSQL Oracle 12C, M/S DB2 Graphics OpenGL Mesa3D M/S DirectX ISA ?????? ------x86, ARM, IBM360

. Why not successful free & open standards and free & open implementations, like other fields?

30 Companies and their ISAs Come and Go Proprietary ISA fortunes tied to business fortunes and whims . Digital Equipment Corporation - PDP-11, VAX, Alpha . Intel - i960, i860, . MIPS - Sold to Imagination, then bought by Wave AI startup, now opening R6? . SPARC - Was opened by Sun, acquired by Oracle, now closed down . ARM - Sold to Softbank at >40% premium - Now 25% sold off to Abu Dhabi investment fund 31 Today, many ISAs on one SoC

▪ Applications processor (usually ARM) ▪ Graphics processors ▪ Image processors ▪ Radio DSPs ▪ Audio DSPs ▪ Security processors ▪ Power-management processor ▪ > dozen ISAs on some SoCs – each with unique software stack Why? ▪ Apps processor ISA too big, inflexible for accelerators SoC ▪ IP bought from different places, each proprietary ISA ▪ Engineers build home-grown ISA cores 32 Do we need all these different ISAs? Must they be proprietary? Must they keep disappearing?

What if there was one stable free and open ISA everyone could use for everything?

33 RISC-V Background

. In 2010, after many years and many research projects using MIPS, SPARC, and x86, time for architecture group at UC Berkeley to choose ISA for next set of projects . Obvious choices: x86 and ARM - x86 impossible – too complex, IP issues - ARM mostly impossible – complex, no 64-bit in 2010, IP issues . So we started “3-month project” during summer 2010 to develop clean-slate ISA - Principal designers: Andrew Waterman, Yunsup Lee, David Patterson, Krste Asanovic . Four years later, May 2014, released frozen base user spec - many tapeouts and several research publications along the way . Name RISC-V (pronounced “risk-five”) represents fifth major Berkeley RISC ISA First RISC-V SOAR aka RISC-I RISC-II SPUR aka (Raven-1, (1983) RISC-III RISC-IV 28nm FDSOI, (1981) (1988) 2011) (1984) 34 What’s Different about RISC-V? . Simple - Far smaller than other commercial ISAs . Clean-slate design - Clear separation between user and privileged ISA - Avoids µarchitecture or technology-dependent features . Modular ISA designed for extensibility/specialization - Small standard base ISA, with multiple standard extensions - Sparse &variable-length instruction encoding for vast opcode space . Stable - Base and first standard extensions are frozen - Additions via optional extensions, not new versions . Community designed - Developed with leading industry/academic experts and software developers 35 Open-Source RISC-V Rocket Chip Generator

1. Change Parameters 2. Develop New Accelerators 3. Develop Own RISC-V Core 4. Develop Own Device

3636 RISC-V SoCs Designed in Berkeley Raven-2 Hurricane-1 Hurricane-2 CraftP1 Raven-1 Raven-3 Raven-4 BROOM

May Apr Aug Feb Jul Sep Mar Nov Mar Apr Jul Mar Ma Aug 2012 2013 2014 2015 2016 2017y 2018

EOS14 CRAFT-0 SWERVE EOS18 EOS22 EOS24 Craft-FFT2 EOS20 Eagle EOS16 In IBM 45nm, ST 28nm FDOI, TSMC 28nm and 16nm FF, GF 14nm Chisel: Constructing Hardware in a Scala Embedded Language BOO Rocke Hwach projects M t a . FIRRTL (Flexible chisel- Intermediate libraries rocketchip utils Representation for language RTL) Chisel Frontend . Can support different front-end FIRRTL languages, Verilog in progress platforms Hot Chips 2014

39 RISC-V Foundation (2015- )

The RISC-V Foundation is a non-profit entity • RISC-V is the open- serving members and the industry source hardware Instruction Set Our mission is to accelerate RISC-V Architecture (ISA) adoption with shared benefit to the entire community of stakeholders. • Frozen base user spec released in 2014,  Drive progression of ratified specs, compliance suite, and contributed, ratified, other technical deliverables and openly published  Grow the overall ecosystem / membership, promoting by the RISC-V diversity while preventing fragmentation Foundation  Deepen community engagement and visibility

RISCRISC-V-V Foundation Foundation More than 250 RISC-V Members in 28 Countries Around the World

RISC-V Foundation Growth History 300 September 2015 to May 2019 275 250 225 200 175 13 Universities 150 29 Consulting; Research 125 23 Development Tools; SW and Cloud 100 75 104 Individual RISC-V developers and advocates 50 51 Machine Learning/AI; Commercial Chip Vendors; FPGA; 25 Broad Market; Networking; Application Processors, Graphics 45 Semiconductor IP; IP and Design Services; Foundry Services 0 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 2015 2015 2016 2016 2016 2016 2017 2017 2017 2017 2018 2018 2018 2018 2019 2019

May 2019 41

RISC-V Ecosystem Open-source software: Commercial software: Gcc, binutils, glibc, Linux, BSD, Lauterbach, Segger, IAR, LLVM, QEMU, FreeRTOS, Micrium, ExpressLogic, Ashling, ZephyrOS, LiteOS, SylixOS, … Imperas, AntMicro, … Software

Foundation ISA specification Golden Model Compliance Hardware Open-source cores: Commercial core providers: Inhouse cores: Rocket, BOOM, RI5CY, Andes, Bluespec, Cloudbear, Nvidia, WDC, Alibaba, Ariane, PicoRV32, Piccolo, Codasip, Cortus, C-Sky, +others SCR1, Swerv, Hummingbird, Nuclei, SiFive, Syntacore, … … 43 Debian Port Progress

44 RISC-V and Security Security is one of biggest challenges in contemporary computer architecture, so which to trust? . Simple free ISA with open implementations and publicly scrutinized security systems . Baroque proprietary ISAs with complex unauditable implementations of NDA-only security systems RISC-V already the center of security architecture research . Small set of hardware primitives support everything from embedded security to remote cloud enclaves 45 Modest RISC-V Project Goal Become the industry-standard ISA for all computing devices

46 Industry Adoption Status . Large companies adopting RISC-V - 2016 NVIDIA announced all future GPUs will use RISC-V - 2017 Western Digital announced transition of all billion cores/year to RISC-V - 2019 Gigadevice announces RISC-V microcontrollers - 2019 Alibaba announces 16-core, 3-issue OoO, 2.5GHz RISC-V chip in 12nm - Others waiting in the wings

47 Government Adoption

. India has adopted RISC-V . US DARPA mandated RISC-V in recent security call for proposals . Israel Innovation Authority creating GenPro incubator around RISC-V . Municipal Govt supporting RISC-V companies . Municipal Govt supporting RISC-V RIOS lab (Patterson) . Other governments at various stages of investigation

If your country wishes to control security of its own information infrastructure, and promote its indigenous semiconductor industry, support RISC-V 48 RISC-V: An Everyday Design Choice . For embedded/IoT, RISC-V is already strong competitor, and other areas adopting RISC-V also . Production ramp starting, expect “millions” of SoCs to ship with RISC-V cores in 2019 . SiFive announced >100 RISC-V IP design wins . Andes announced 21 wins in 2018, 60 in 2019

. Message: You won’t get fired for choosing RISC-V!

49 Why is RISC-V so popular? • Engineers sometimes “don’t see forest for the trees” - The movement is not happening because some benchmark ran 10% faster, or some implementation was 30% lower power • The movement is happening because new business model changes everything - Pick ISA first, then pick vendor or build own core - Add your own extension without getting permission - Open model, no export control • Implementation features/PPA will follow - Whatever is broken/missing in RISC-V will get fixed 50 RISC-V in Education

Books available now! In multiple languages

RISC-V spreading quickly throughout curricula of top schools 51 RISC-V: Completing the Innovation Cycle Research

Open ecosystem is key to keeping the virtuous cycle going

Industry Education

52 New Age of Open-Source Hardware . On back of RISC-V, new age of open-source hardware initiatives, significant industry support – Chips Alliance (Alibaba, AntMicro, Codasip, Esperanto, Google, Imperas, SiFive, Western Digital, …) – FOSSi – OpenHW group (Alibaba, Bluespec, Embecosm, ETH Zurich, Geenwaves, Huawei, Imperas, Metrics, Mythic, NXP, OneSpin, Silicon Labs, Thales) – China Open Instruction Ecosystem (ICT-CAS, Tsinghua Univ., Peking Univ., Baidu, Alibaba, SMIC and ) 53 Open-Source Tools . Surge of interest, mostly centered on FPGA development – Yosys, LiteDRAM, FuseSoC, … . Also, work on silicon tools/frameworks – Berkeley Chisel/FIRRTL Chipyard, Princeton OpenPiton, Cornell PyMTL, UW Basejump, … . New DARPA funding POSH, IDEA, RTML programs – Helping fund open-source CAD and IP . Fully open flow to silicon there but only 180nm – OnChip, eFabless . Long way before competitive 7nm open-source flow 54 Call to Action . Industry interface groups need to open specifications – Open spec means I can download PDF without registering or paying anything . Community help document public-domain architecture ideas – Hardware more susceptible to patent suits than software – Help PTO reject bad patents, help protect community against trolls (big and small) – Would be great for teaching/learning! . Other open architectures and tools needed – Open GPU, open FPGA, open PCI controller, USB controller, etc., 55 Summary . Design cost is biggest challenge in hardware today – Software is big part of hardware design cost, which is why standard open ISA so critical

. Open-source IP / tools reduces cost, helps everyone improve . Open-source is not just cost, it enables innovation . Open-source does not mean non-commercial – Crucial to have industry support and contributions

. Open-source aids frictionless industry-academia collaboration – Academia can see real problems, industry can quickly absorb solutions . Start the next wave of computing innovation! 56 Thanks!

. This research was funded in part by DoE Award DE- SC0003624, and by Microsoft (Award #024263 ) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG07-10227), also by DARPA PERFECT (Award HR0011-12-2-0016), DARPA POEM (Award HR0011-11- C-0100), DARPA CRAFT (HR0011-16-C-0052), Intel iSTC, ASPIRE Affiliates, ADEPT affiliates, BWRC members, TSMC, and ST Microelectronics. 57