Advanced RISC-V Architectures

Advanced RISC-V Architectures

PULP PLATFORM Open Source Hardware, the way it should be! Working with RISC-V Part 2 of 4 : Advanced RISC-V Architectures Luca Benini <[email protected]> Frank K. Gürkaynak <[email protected]> http://pulp-platform.org @pulp_platform https://www.youtube.com/pulp_platform Working with RISC-V Summary ▪ Part 1 – Introduction to RISC-V ISA ▪ Part 2 – Advanced RISC-V Architectures ▪ Going 64 bit ▪ Bottlenecks ▪ Safety/Security ▪ Vector units ▪ Part 3 – PULP concepts ▪ Part 4 – PULP based chips | ACACES 2020 - July 2020 Working with RISC-V PULP RISC-V Cores from Tiny to App-level 32 bit 64 bit Low Cost DSP Streaming Linux Core Enhanced Compute capable Core Core Core ▪ Zero-riscy ▪ RI5CY ▪ Snitch ▪ Ariane ▪ RV32-ICM ▪ RV32-ICMFX ▪ RV32- ▪ RV64-IC(MA) ▪ Micro-riscy ▪ SIMD ICMDFX ▪ Full ▪ HW loops ▪ RV32-CE privileged ▪ Bit specification manipulation ▪ Fixed point ARM Cortex-M0+ ARM Cortex-M4 ARM Cortex-A5 | ACACES 2020 - July 2020 Working with RISC-V From IoT to HPC ▪ For the first 4 years of PULP, we used only 32bit cores ▪ Most IoT near-sensor applications work well with 32bit cores. ▪ 64bit memory space is not affordable in an MCU-class device ▪ But times change: ▪ Large datasets, high-precision numerical calculations (e.g. double precision FP) at the IoT edge (gateways) and cloud ▪ Software infrastructure (OS – typically linux) with virtual memory assumes 64bit ▪ High-performance computing, being hot again, requires 64bit ▪ Research question – pJ/OP on 64bit data+address space is possible? How? | ACACES 2020 - July 2020 Working with RISC-V An application class processor ▪ Virtual Memory ▪ Larger address space (64-bit) ▪ Multi-program environment ▪ Requires more hardware support ▪ Efficient sharing and protection ▪ MMU (TLBs, PTW) ▪ Operating System ▪ Privilege Levels ▪ Highly sequential code ▪ More Exceptions (page fault, illegal access) ▪ Increase frequency to gain performance →Ariane an application class processor ▪ Large software infrastructure ▪ Drivers for hardware (PCIe, ethernet) ▪ Application SW (e.g.: Tensorflow, …) NOT an ARM Cortex-A killer! “Controller” core with must-have features for 64bit OSes | ACACES 2020 - July 2020 Working with RISC-V ARIANE: Linux Capable 64-bit core ▪ Application class processor ▪ 6-stage pipeline ▪ Linux Capable ▪ In-order issue ▪ Out-of-order write-back ▪ M, S and U privilege modes ▪ In-order commit ▪ TLB ▪ Tightly integrated D$ and I$ ▪ Branch-prediction ▪ Hardware PTW ▪ RAS ▪ Optimized for 1+GHz clock speed ▪ Branch Target Buffer ▪ Branch History Table ▪ Frequency: 1+ GHz (22 FDX) ▪ Area: 100s kGE (200-400) ▪ Scoreboarding ▪ Critical path: ~ 25-30 logic levels ▪ Designed for extendibility | ACACES 2020 - July 2020 Working with RISC-V Absolute minimum necessary to boot Linux? ▪ Hardware ▪ Software ▪ 64 or 32 bit Integer Extension ▪ Zero Stage Bootloader ▪ Atomic Extension ▪ Device Tree Specification (DTS) ▪ Privilege levels U, S and M ▪ RAM preparation (zeroing) ▪ MMU ▪ Second stage bootloader ▪ FD Extension or out-of-tree Kernel patch ▪ BBL ▪ 16 MB RAM ▪ Uboot ▪ Interrupts ▪ … ▪ Core local interrupts (CLINT) like timer and ▪ Linux Kernel inter processor interrupts ▪ User-space applications (e.g.: Busybox) ▪ Serial or distro | ACACES 2020 - July 2020 Ariane μARC 2 3 1 4 5 6 1. PC Gen • Select PC 2. Instr. Fetch • TLB • Query I$ 3. Instr. Decode • Re-align • De-compress • Decode 4. Issue • Select FU • Issue 5. Execute 6. Commit 1. Write state | Working with RISC-V Frequency-IPC trade-off ▪ Frequency: ▪ Increased bubbles due to: ▪ Increase frequency through pipelining ▪ Data Hazards ➔ Forwarding ▪ Modern Intel CPUs have around 10 - ▪ Structural Hazards ➔ Scoreboard 20 pipeline stages ▪ Control Hazards ➔ Branch Prediction ▪ Adds significant complexity on the cache interfaces | ACACES 2020 - July 2020 Working with RISC-V Data Hazards - Forwarding | ACACES 2020 - July 2020 Working with RISC-V Scoreboarding ▪ Hide latency of multi-cycle instructions ▪ Clean and modular interface to functional units ➔ scalability (FPU) ▪ Add issue port: Dual-Issue implementation ▪ Split execution into four steps: ▪ Issue: Relatively complex issue logic (extra pipeline-stage) ▪ Read Operands: From register file or forwarded ▪ Execute ▪ Write Back: Mitigate structural hazards on write-back path ▪ Mitigate structural hazards on write-back port ▪ Implemented as a circular buffer | ACACES 2020 - July 2020 Working with RISC-V Scoreboard Issue V FU OP rd src 1 src 2 result exception Commit Decode: 0 ld x1 ← x2(0) 0 mul x2 ← x2, x2 0 addi x3 ← x0, 16 0 addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard – 3 cycle instruction (LD) Issue V FU OP rd src 1 src 2 result exception Commit Decode: 0 LSU LD x1 x2 x0 0 No ld x1 ← x2(0) 0 mul x2 ← x2, x2 0 addi x3 ← x0, 16 0 addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard – 2 cycle instruction (MUL) V FU OP rd src 1 src 2 result exception Commit Decode: Issue 0 LSU LD x1 x2 x0 0 No ld x1 ← x2(0) 0 MUL MUL x2 x2 x2 0 No mul x2 ← x2, x2 0 addi x3 ← x0, 16 0 addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard - single cycle instruction (ALU) V FU OP rd src 1 src 2 result exception Commit Decode: 0 LSU LD x1 x2 x0 0 No ld x1 ← x2(0) Issue 0 MUL MUL x2 x2 x2 0 No mul x2 ← x2, x2 0 ALU ADD x3 x0 x0 16 No addi x3 ← x0, 16 0 addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard – Multiple Write Back V FU OP rd src 1 src 2 result exception Commit Decode: 1 LSU LD x1 x2 x0 0x5555 No ld x1 ← x2(0) 1 MUL MUL x2 x2 x2 0xAAAA No mul x2 ← x2, x2 Issue 1 ALU ADD x3 x0 x0 16 No addi x3 ← x0, 16 0 ALU ADD x2 x3 x1 0 No addi x2 ← x3, x1 forward ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard - Commit Issue V FU OP rd src 1 src 2 result exception Decode: 0 LSU LD x1 x2 x0 128 No Commit ld x1 ← x2(0) 1 MUL MUL x2 x2 x2 0xAAAA No mul x2 ← x2, x2 1 ALU ADD x3 x0 x0 16 No addi x3 ← x0, 16 1 ALU ADD x2 x3 x1 0xAABA No addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard – Exception Issue V FU OP rd src 1 src 2 result exception Decode: 1 LSU LD x1 x2 x0 128 Yes ld x1 ← x2(0) 0 Commit mul x2 ← x2, x2 1 ALU ADD x3 x0 x0 16 No addi x3 ← x0, 16 1 ALU ADD x2 x3 x1 0xAABA No addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard - Commit Issue V FU OP rd src 1 src 2 result exception Decode: 1 LSU LD x1 x2 x0 128 Yes ld x1 ← x2(0) 0 mul x2 ← x2, x2 0 Commit addi x3 ← x0, 16 1 ALU ADD x2 x3 x1 0xAABA No addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard - Commit Issue V FU OP rd src 1 src 2 result exception Commit Decode: 1 LSU LD x1 x2 x0 128 Yes ld x1 ← x2(0) 0 mul x2 ← x2, x2 0 addi x3 ← x0, 16 0 addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Working with RISC-V Scoreboard - Commit Issue V FU OP rd src 1 src 2 result exception Commit Decode: 0 ld x1 ← x2(0) 0 mul x2 ← x2, x2 0 addi x3 ← x0, 16 0 addi x2 ← x3, x1 ld x1 ← x2(128) Issue Commit ALU LSU MUL Regfile | ACACES 2020 - July 2020 Branch Prediction ▪ Branch Target Buffer ▪ Branch History Table ▪ 2-bit saturation counter ▪ Anti-aliasing bits | Working with RISC-V Caches ▪ Caches are a necessity for larger systems ▪ SRAMs (cache memories) are slow compared to regular ▪ Private L1 caches logic ▪ I$ (16 kByte, 32 entries, 16 byte cache line, 4 way) - 1,41% MR (Linux Boot) ▪ Virtually indexed, physically tagged data cache ▪ D$ (32 kByte, 32 entries, 16 byte cache line, 8 way) - 3,17% MR (Linux Boot) ▪ L2 cache (outside core domain) | ACACES 2020 - July 2020 Working with RISC-V Memory Interfaces ▪ Load and stores are very common in RISC architectures ▪ Caches add (costly) tag-comparison ▪ Address translation adds to this already critical path ▪ A fast CPU design needs to account for these effects as much as possible ▪ Virtually indexed, physically tagged caches ▪ De-skewing | ACACES 2020 - July 2020 Working with RISC-V Memory Management Unit (MMU) ▪ Essential for supporting Linux Virtual Address ▪ Ariane implements 39-bit page based address translation (SV39) ▪ SV39 supports three levels of page tables MMU ▪ 1st level: 1 gigabit-pages ▪ 2nd level: 2 megabit-pages TLB MISS ▪ 3rd level: (regular) 4 kilobit-pages TLB HIT ▪ Configurable number of TLB entries Physical Address ▪ Hardware page table walker allows for PTW efficient TLB miss management | ACACES 2020 - July 2020 Hardware Page Table Walker (HPTW) Virtual Page Number 2 Virtual Page Number 1 Virtual Page Number 0 Physical Page Number Base (CSR) V R W X A D G PPN 1 0 0 0 0 0 0 0xDEADBEEF Pointer to next level PTE ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ 1 1 0 0 0 0 0 0xCOFFEEBABE Leaf Node (1 Gigapage) | Second Level Page Table Virtual Page Number 2 Virtual Page Number 1 Virtual Page Number 0 V R W X A D G PPN 1 0 0 0 0 0 0 0xDEADBEEF Analogous with third ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ 1 1 0 0 0 0 0 0xCOFFEEBABE and fourth level V R W X A D G PPN 1 0 0 0 0 0 0 0xAAAAAAAAA Pointer to next level PTE ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ ⠇ 1 1 0 0 0 0 0 0xBBBBBBBBBB Leaf Node (2 Megapage) | Working with RISC-V Full Debug support ▪ JTAG interface ▪ OpenOCD support ▪ Debug Bridge to communicate with hardware ▪ Allows for: ▪ run-control ▪ single-step ▪ inspection ▪ 16 performance counters

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    53 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us