Rejuvenating Computer Architecture Research (And the Whole Semiconductor Industry) with Open-Source Hardware

Rejuvenating Computer Architecture Research (And the Whole Semiconductor Industry) with Open-Source Hardware

Rejuvenating Computer Architecture Research (and the whole semiconductor industry) with Open-Source Hardware Krste Asanovic UC Berkeley, RISC-V Foundation, & SiFive Inc. [email protected] Keynote, MICRO-52, Columbus,Ohio October 14, 2019 What did Moore predict in famous Law? At lowest cost/component, number of components/die doubles every 12 months. 2 Moore’s Law: Worse than dead, now irrelevant! . Very few potential products in advanced nodes are constrained by marginal cost/transistor 3 CMOS Today . Wonderful, almost magical, technology . Fabricate billions of transistors . Reliably connect them with billions of wires . Clock at a few GHz . Dissipate <100W . Near 100% yield, cost a few $/die . Scaling continuing, if economically challenged . Manufacturing is least of our problems 4 Custom Chip Costs . NRE: Non-Recurring Engineering costs - Design + tooling for production . Marginal manufacturing cost/chip - Cost of each chip made once in production - Silicon manufacture plus package & test 5 Overall Cost/Project (grossly simplified) Expected lifetime cost Total cost Total Intercept set by NRE Expected lifetime shipments Total #transistors shipped 6 Chip Development Cost [Source: IBS] At $0.5B cost, not many $Bs markets in which to make a profit Caveat: figure assumes all IP developed from scratch and only leading projects do this, but do need >$100M return to justify <28nm tech 7 Overall Cost/Project <28nm Newer nodes Total cost Total End of Moore’s Law, Flat Design Productivity: NRE doubling each generation, but slope not Older changing, or increasing, each generation! node Total #transistors 8 “Moore’s Law” Business Model . Design standard part for the “killer socket” . Killer socket was PC, now smartphone, next is ??? . Sell 100s millions parts . Ideally, ~O(1) product, ~O(∞) volume . At infinite volume, profit set by marginal price/cost What Moore’s Law is about 9 New Vertical Semiconductor Business Model Instead of standard chip products, chip customers want own differentiated chip designs: . Apple, Samsung, Huawei for phones . Google, Amazon, Microsoft, Alibaba for client/cloud . Tesla for cars . End-system value justifies chip NRE But what about non-huge firms & startups, with great ideas but can’t afford NRE? 10 Semiconductor Industry Perfect Storm “Cold air” Custom chip design too costly, slow, risky “Hot air” New applications need capabilities from custom chips: Cloud AI, Edge Learning, Wearables, IoT, Autonomous Vehicles, … 1111 How did turn into a $1B acquisition with only 13 employees?* COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 12 * https://www.sigarch.org/open-source-hardware-stone-soups-and-not-stone-statues-please Tech Stack Open-Source Technology from techstacks.io Infrastructure COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 13 Talk Outline . History of open-source hardware . The RISC-V story . Reinvigorating the hardware industry . How architecture research can benefit and help 14 Early Days in Open-Source Hardware . SPICE released by Berkeley in 1973 . Mead & Conway VLSI revolution, late 70s, simplified design rules, spawned MOSIS fabrication service, and a generation of open-source tools – Magic/espresso/MIS/SIS/Lager (Berkeley), irsim (Stanford), chipmunk (Caltech) . Early 80s, university RISC projects (Stanford, Berkeley) used and help build open-source CAD tools . Together with C and BSD Unix, helped create the RISC workstation industry (Sun, MIPS, SGI, …) and the ECAD industry (Cadence, Mentor, Synopsys) 15 Ring Array Processor, (ICSI 1989) (Nelson Morgan, Jim Beck, Phil Kohn, Jeff Bilmes) . RAP Machine built for fast training of “big dumb” neural networks for speech recognition . Ring of TMS320C30 floating-point DSPs – Each DSP providing 32MFLOPS (32-bit FP) – Four DSPs/board, up to 10 boards connected at once (>1GFLOP/s peak, 640MB DRAM) – Neural net training rate of >100MCUPS (million connection updates per second) on 10 boards – FPGA ring connection used for systolic all-all communication during training/inference . Fast, flexible, but expensive – ~$100,000 each 16 Realization Group, ICSI, 1990 New naïve grad student joins Morgan’s group to build custom ANN VLSI for speech training This is a cool ANN architecture for which we need custom silicon! Unary-encoded inputs to avoid multiply, 12-bit weights 17 HiPNeT-1: (Highly Pipelined Network Trainer, 1990) Krste Asanovic, Brian Kingsbury, Nelson Morgan, John Wawrzynek . Custom architecture for neural algorithm . Ignores pipeline RAW hazards (net trains around them) . Predicted 200MCUPS in 16mm2 of 2µm CMOS running at 20MHz 18 The first few chips… . Used Magic, Irsim, plus other open-source tools . MOSIS had a “TinyChip” program – $500 to fab a 2.2mmx2.2mm chip in 2µm CMOS Sigmoid Multiplier unit JTAG (Brian) (Pawan latches Sinha) (Krste) Regfile 24b (Bertrand) Adder 8-bit datapath (Krste) (Brian) 19 Meanwhile, back at the speech ranch… There’s this even cooler And it doesn’t look ANN architecture for much like the last one. which we need custom Can you build a silicon! different chip? Time for a programmable architecture… 20 Torrent-0 (T0): A Vector Microprocessor . Vector supercomputers (like Crays) are very successful in scientific computing and have a clean programming model T0 idea: Add a vector coprocessor to a standard RISC scalar processor, all on one chip – Primary motivation was software support effort 21 System Design Choices . Which standard RISC? – Considered SPARC, HP PA, PowerPC, and Alpha – Chose MIPS because: simplest, good software tools, Unix desktop workstations for development, and a 64-bit extension path . Buy or build a MIPS core? – Commercial MIPS R3000 chips had coprocessor interface – No MIPS soft cores (no Verilog or synthesis tools yet) – MIPS asked for $2M for architectural licence (1992 dollars!) – Decided to roll our own MIPS-compatible core • vector coprocessor would have played havoc with caches • coprocessor interface too inefficient • commercial chip plus glue logic would blow our size and power budgets (to fit inside workstation) • couldn’t simulate whole system in our environment 22 T0 Block Diagram Address 1 KB . 32-bit datapaths Bus I-Cache VP0 Conditional Move . 16b*16b multiplies Scalar Clip Bus Shift Right . up to 96x32b ops/cycle Add . 40MHz, 1.0µm CMOS Multiply MIPS-II . 16.7 x 16.7 mm2 CPU Shift Left 28 Logic VMP 32 128 Vector Memory Vector Registers Pipeline Scan Logic TSIP Chains Shift Left Add 8 Data 8 Shift Right Bus Clip VP1 Conditional Move 23 T0-based SPERT-II Accelerator Board (1995) (40 MHz) (100 MHz) (60 MHz) . 35 boards shipped to 9 international sites . Used as production research platform for nine years – last time powered up for work in 2004! 24 Microprocessor ISAs in 1990s . From mid-1980s, Cambrian explosion of RISC ISAs – SPARC, MIPS, ARM, Precision, Clipper, AMD 29000, Intel i860, IBM POWER, DEC Alpha, … – ”Brainiac” versus “Speed Demon” design style wars – ACE/ARC initiative, Windows NT on RISC desktops . Extinction event: Intel Pentium P6 – Run clunky CISC ISA fast by breaking into micro-ops and using dynamic scheduling of microops – Use advanced manufacturing made possible by revenue from PC – Both a Brainiac and a Speed Demon, but compatible with whole x86 software ecosystem – RISC ISAs disappeared from workstations, and ultimately from servers – Down in the undergrowth, though, small creatures survived… 25 Mobile Computing Takes Off . For Newton project, Apple selects Acorn RISC Machine, and creates new company ”Advanced RISC Machines” in 1990 . In 90s, ARM develops new business model licensing soft cores, not selling chips . ARM becomes dominant in mobile & embedded computing 26 Closing up of hardware design . Late 90s transition from 0.25µm to 0.18µm is when open-source CAD tools stopped being competitive – Scalable CMOS design rules left too much on table – Foundry rules were under NDA – Magic had tough time scaling to 6 metal layers – Complex physical effects needed better tools (logic synthesis from Verilog, 3D extraction, power analysis, …) 27 Millennial Dark Ages Descend Only two dominant application ISAs (x86, ARM) Only big companies design silicon, rampant consolidation in semiconductor industry Little open-source hardware or tools activity Why Instruction Set Architecture matters . Why can’t Intel sell mobile chips? - 99%+ of mobile phones/tablets based on ARM v7/v8 ISA . Why can’t ARM partners sell servers? - 99%+ of laptops/desktops/servers based on AMD64 ISA (over 95%+ built by Intel) . How can IBM still sell mainframes? - IBM 360, oldest surviving ISA (50+ years) ISA is most important interface in computer system where software meets hardware Open Interfaces Work for Software! Field Open Standard Free, Open Implement. Proprietary Implement. Networking Ethernet, TCP/IP Many Many OS Posix Linux, FreeBSD M/S Windows Compilers C gcc, LLVM Intel icc, ARMcc Databases SQL MySQL, PostgresSQL Oracle 12C, M/S DB2 Graphics OpenGL Mesa3D M/S DirectX ISA ?????? ----------- x86, ARM, IBM360 . Why not successful free & open standards and free & open implementations, like other fields? 30 Companies and their ISAs Come and Go Proprietary ISA fortunes tied to business fortunes and whims . Digital Equipment Corporation - PDP-11, VAX, Alpha . Intel - i960, i860, Itanium . MIPS - Sold to Imagination, then bought by Wave AI startup, now opening R6? . SPARC - Was opened by Sun, acquired by Oracle, now closed down . ARM - Sold to Softbank at >40% premium - Now 25% sold off to Abu Dhabi investment fund 31 Today, many ISAs on one SoC ▪ Applications processor (usually ARM) ▪ Graphics processors ▪ Image processors ▪ Radio DSPs ▪ Audio DSPs ▪ Security processors ▪ Power-management processor ▪ > dozen ISAs on some SoCs – each with unique software stack Why? ▪ Apps processor ISA too big, inflexible for accelerators NVIDIA Tegra SoC ▪ IP bought from different places, each proprietary ISA ▪ Engineers build home-grown ISA cores 32 Do we need all these different ISAs? Must they be proprietary? Must they keep disappearing? What if there was one stable free and open ISA everyone could use for everything? 33 RISC-V Background . In 2010, after many years and many research projects using MIPS, SPARC, and x86, time for architecture group at UC Berkeley to choose ISA for next set of projects .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    57 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us