Z-Scale: Tiny 32-Bit RISC-V Systems with Updates to the Rocket Chip Generator

Z-Scale: Tiny 32-Bit RISC-V Systems with Updates to the Rocket Chip Generator

Z-scale: Tiny 32-bit RISC-V Systems With Updates to the Rocket Chip Generator Yunsup Lee, Albert Ou, Albert Magyar 2nd RISC-V Workshop [email protected] 6/30/2015 1 What is Z-scale? Why? § Z-scale is a tiny 32-bit RISC-V core generator suited for microcontrollers and embedded systems § Over the past months, external users have expressed great interest in small RISC-V cores - So we have listened to your feedback! § Z-scale is designed to talk to AHB-Lite buses - plug-compatible with ARM Cortex-M series § Z-scale generator also generates the interconnect between core and devices - Includes buses, slave muxes, and crossbars 2 Berkeley’s RISC-V Core Generators § Z-scale: Family of Tiny Cores - Similar in spirit to ARM Cortex M0/M0+/M3/M4 - Integrates with AHB-Lite interconnect § Rocket: Family of In-order Cores - Currently 64-bit single-issue only - Plans to work on dual-issue, 32-bit options - Similar in spirit to ARM Cortex A5/A7/A53 - Will integrate with AXI4 interconnect § BOOM: Family of Out-of-Order Cores - Supports 64-bit single-, dual-, quad-issue - Similar in spirit to ARM Cortex A9/A15/A57 - Will integrate with AXI4 interconnect - BOOM talk right after this one 3 Z-scale Pipeline WB PC DE IF MEM Gen. EX MUL I-Bus I-Bus D-Bus D-Bus Request Response Request Response § 32-bit 3-stage single-issue in-order pipe § Executes RV32IM ISA, has M/U privilege modes § I-bus and D-bus are AHB-Lite and 32-bits wide § Interrupts are supported § Will publish a “microarchitecture specification” 4 ARM Cortex-M0 vs. Z-scale Category ARM Cortex-M0 RISC-V Zscale ISA 32-bit ARM v6 32-bit RISC-V (RV32IM) Architecture Single-Issue In-Order 3-stage Single-Issue In-Order 3-stage Performance 0.87 DMIPS/MHz 1.35 DMIPS/MHz Process TSMC 40LP TSMC 40GPLUS Area w/o Caches 0.0070 mm2 0.0098 mm2 Area Efficiency 124 DMIPS/MHz/mm2 138 DMIPS/MHz/mm2 Frequency ≤50 MHz ~500 MHz Voltage (RTV) 1.1 V 0.99 V Dynamic Power 5.1 µW/MHz 1.8 µW/MHz § Note: numbers are very likely to change in the future as we tune the design and add things to the core. 5 RV32E § New base integer instruction set - Reduced version of RV32I designed for embedded systems § Cut number of integer registers to 16 § Remove counters that are mandatory in RV32I - Counter instructions (rdcycle[h], rdtime[h], rdinstret[h]) are not mandatory 6 Building a Z-scale System AHB-Lite JTAG Z-Scale Debugger Core APB D Devices J-Bus I-Bus D-Bus AHB-Lite P Peripherals Crossbar S-Bus NOR Boot SRAM Flash ROM AHB DRAM D D D APB § Working on P-Bus “platform specification” P P P P P 7 Z-scale Generator is Written in Chisel § Chisel is a new HDL embedded in Scala - Rely on good software engineering practices such as abstraction, reuse, object oriented programming, functional programming - Build hardware like software § 604 Unique LOC written in Chisel - Control: 274 lines - Datapath: 267 lines (99 lines could be generalized) - Top-level: 63 lines § 983 LOC in Chisel borrowed from Rocket § Reuse and parameterization in Chisel and Rocket chip actually works! 8 Functional Programming 101 § (1, 2, 3) map { n => n+1 } - (2, 3, 4) § (1, 2, 3) map { _+1 } § (1, 2, 3) zip (a, b, c) - ((1, a), (2, b), (3, c)) § (1, a)._1 - 1 § ((1, a), (2, b), (3, c)) map { _._1} § ((1, a), (2, b), (3, c)) map { n => n._1} - (1, 2, 3) § (0 until 3) foreach { println(_) } - 0/1/2 9 Functional Programming Example Used in AHB-Lite Crossbar masters(0) masters(1) AHB-Lite Crossbar masters(0,1) master AHB-Lite Bus AHB-Lite Bus AHB-Lite Bus AHB-Lite Crossbar slaves(0,1,2) ins(0,1) slaves(0,1,2) AHB-Lite Arbiter Arbiter Arbiter Slave Mux Arbiter Slave Mux Slave Mux Slave Mux out slaves(0) slaves(1) slaves(2) class AHBXbar(n: Int, amap: Seq[UInt=>Bool]) extends Module { val io = new Bundle { val masters = Vec.fill(n){new AHBMasterIO}.flip val slaves = Vec.fill(amap.size){new AHBSlaveIO}.flip } val buses = io.masters map { m => Module(new AHBBus(amap)).io } val muxes = io.slaves map { s => Module(new AHBSlaveMux(n)).io } (buses.map(_.master) zip io.masters) foreach { case (b, m) => b <> m } (0 until n) foreach { m => (0 until amap.size) foreach { s => buses(m).slaves(s) <> muxes(s).ins(m) } } (io.slaves zip muxes.map(_.out)) foreach { case (s, x) => s <> x } } Z-scale in Verilog § Talked to many external users, and perhaps the #1 reason why they can’t use our stuff is because it’s written in Chisel - So we have listened to your feedback! § We have implemented the same Z-scale core in Verilog § 1215 LOC § No more excuses for adoption! - If there still is any reason why you can’t use RISC-V, please do let us know 11 Z-scale FPGA DEMO System 0x8000_0800 JTAG Z-Scale CORE RESET Debugger Core (1KB) 0x8000_0400 GPIO LED J-Bus I-Bus D-Bus (1KB) 0x8000_0000 AHB-Lite Empty Crossbar 0x2400_4000 SPI FLASH (16KB) S-Bus 0x2400_0000 Boot DRAM ROM SPI AHB (64MB) DRAM FLASH APB 0x2000_0000 P-Bus Empty 0x0000_4000 Boot ROM AHB-Lite (16KB) GPIO CORE 0x0000_0000 APB LED RESET 12 Z-scale FPGA DEMO System Mapped to Xilinx Spartan6 LX9 § Avnet LX9 Microboard - $89 - Xilinx Spartan6 LX9 - 64MB LPDDR RAM - 16MB SPI FLASH Resource Used Percentage - 10/100 Ethernet Registers 2,329 20% - USB-to-UART LUTs 4,328 75% - USB-to-JTAG RAM16 8 25% - 2x Pmod headers RAM8 0 0% - 4x LEDs Test program is stored in bootrom. - 4x DIP switches It is a memory test program, which - RESET/PROG buttons writes 32-bit words generated from an LFSR to 64MB of DRAM, and § 4 boards for raffle! checks it by reading 64MB of data, and toggles LED if it succeeds. 13 Z-scale Use Cases § Microcontrollers - Implement your simple control loops - If code density matters § Embedded Systems - Build your system around Z-scale § Validation of Tiny 32-bit RISC-V Systems - You don’t need to use our code, just consider Z- scale as an existence proof and implement your own RV32I core § Both Chisel and Verilog versions of Z-scale is open-sourced under the BSD license - https://github.com/ucb-bar/zscale - https://github.com/ucb-bar/fpga-spartan6 14 What is the Rocket Chip Generator? Tile Tile HTIF § Rocket Rocket HTIFIO Parameterized SoC Core Core ROCC ROCCIO Accel. generator written in FPU FPU HostIO ROCC Chisel Accel. L1 Inst L1 Inst L1 Data L1 Data § Generates n Tiles sets, sets, sets, ways ways ways - (Rocket) Core client client client client client - RoCC Accelerator - TileLink L1 I$ L1 Network arb - L1 D$ TileLinkIO TileLinkIO TileLinkIO Coherence Manager § Generates Uncore L2Cache L2Cache L2Cache L2Cache L2Cache mngr mngr mngr mngr mngr - L1 Crossbar sets, sets, sets, sets, sets, - Coherence Manager ways ways ways ways ways - Shared L2$ with client client client client client TileLink directory bits mngr TileLinkIO TileLink / MemIO Converter - Exports a simple MemIO memory interface 15 Rocket Chip Generator Updates Since the 1st RISC-V Workshop Tile Tile HTIF Rocket Rocket HTIFIO § Implemented L2$ with Core Core ROCC ROCCIO Accel. directory bits FPU FPU HostIO ROCC § RoCC coprocessor has Accel. L1 Inst L1 Inst L1 Data L1 Data sets, sets, sets, a memory port ways ways ways client client client client client directly into the L2$ TileLink § Main development will L1 Network arb TileLinkIO TileLinkIO happen on the rocket- TileLinkIO Coherence Manager L2Cache L2Cache L2Cache L2Cache L2Cache chip repository mngr mngr mngr mngr mngr sets, sets, sets, sets, sets, § Moving towards ways ways ways ways ways standardized memory client client client client client TileLink interfaces mngr TileLinkIO TileLink / MemIO Converter MemIO 16 Important Memory Interfaces § TileLink - Our cache-coherent interconnect - For more details, watch my talk from last workshop § NASTI (pronounced nasty) - Not A STandard Interface - Our implementation of the AXI4 standard § HASTI (pronounced hasty) - Highly Advanced System Transport Interface - Our implementation of the AHB-Lite standard § POCI (pronounced pokey) - Peripheral Oriented Connection Interface - Our implementation of the APB standard 17 Rocket Chip Generator Grand Plan with Z-scale RocketTile RocketTile RoCC Rocket L1I$ Rocket L1I$ Accel. JTAG RoCC Debug CSR CSR Accel. File L1D$ File L1D$ L1 Network Z-scale L2$ Bank Cache- L2$ Bank Cache- L2$ Bank Coherent L2$ Bank DeviceCoherent AHB-Lite Bus Device Low- Scratch SCR Speed L2 Network Pad File IO Device TileLink/AXI4 TileLink/AXI4 AXI4/AHB AHB/APB Bridge Bridge Bridge Bridge AXI4 Crossbar APB Bus High- High- DRAM DRAM Speed Speed IO Peripheral Peripheral Peripheral Controller Controller IO Device Device 18 Conclusion, Future Work, and Raffle § Z-scale is a RISC-V tiny core generator suited for microcontrollers and embedded systems § Z-scale - Microarchitecture document will be released first - Improve performance - Implement “C” extension as an option - Add MMU option to boot Linux - More devices on the LX9 board to come § Rocket Chip Generator - JTAG debug interface (get rid of HTIF) - Move to standardized interfaces (NASTI/HASTI/POCI) - Add Z-scale option § Raffle time! 19 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us