Lecture 04 RISC-V ISA

ITSC 3181 Intro to Computer Architecture

Department of Computer Science Yonghong Yan [email protected] https://passlab.github.io/yanyh/

1 Acknowledgement

• Slides adapted from – Computer Science 152: Computer Architecture and Engineering, Spring 2016 by Dr. George Michelogiannakis from UCB

2 Review: ISA Principles -- Iron-code Summary

• Section A.2—Use general-purpose registers with a load-store architecture. • Section A.3—Support these addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register indirect. • Section A.4—Support these data sizes and types: 8-, 16-, 32-, and 64-bit integers and 64-bit IEEE 754 floating-point numbers. – Now we see 16-bit FP for deep learning in GPU • http://www.nextplatform.com/2016/09/13/nvidia-pushes-deep-learning-inference- new-pascal-gpus/ • Section A.5—Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register- register, and shift. • Section A.6—Compare equal, compare not equal, compare less, branch (with a PC- relative address at least 8 bits long), jump, call, and return. • Section A.7—Use fixed instruction encoding if interested in performance, and use variable instruction encoding if interested in code size. • Section A.8—Provide at least 16 general-purpose registers, be sure all addressing modes apply to all data transfer instructions, and aim for a minimalist IS – Often use separate floating-point registers. – The justification is to increase the total number of registers without raising problems in the instruction format or in the speed of the general-purpose register file. This compromise, however, is not orthogonal.

3 What is RISC-V

• RISC-V (pronounced "risk-five”) is a ISA standard – An open source implementation of a reduced instruction set computing (RISC) based instruction set architecture (ISA) – There was RISC-I, II, III, IV before • Most ISAs: X86, ARM, Power, MIPS, SPARC – Commercially protected by patents – Preventing practical efforts to reproduce the computer systems. • RISC-V is open – Permitting any person or group to construct compatible computers – Use associated software • Originated in 2010 by researchers at UC Berkeley – Krste Asanović, David Patterson and students • 2017 version 2 of the userspace ISA is fixed – User-Level ISA Specification v2.2 – Draft Compressed ISA Specification v1.79 https://riscv.org/ – Draft Privileged ISA Specification v1.10 https://en.wikipedia.org/wiki/RISC-V

4 Goals in Defining RISC-V

• A completely open ISA that is freely available to academia and industry • A real ISA suitable for direct native hardware implementation, not just simulation or binary translation • An ISA that avoids "over-architecting" for – a particular microarchitecture style (e.g., microcoded, in-order, decoupled, out-of- order) or – implementation technology (e.g., full-custom, ASIC, FPGA), but which allows efficient implementation in any of these • RISC-V ISA includes – A small base integer ISA, usable by itself as a base for customized accelerators or for educational purposes, and – Optional standard extensions, to support general-purpose software development – Optional customer extensions • Support for the revised 2008 IEEE-754 floating-point standard

5 RISC-V ISA Principles

• Generally kept very simple and extendable • Separated into multiple specifications – User-Level ISA spec (compute instructions) – Compressed ISA spec (16-bit instructions) – Privileged ISA spec (supervisor-mode instructions) – More …

• ISA support is given by RV + word-width + extensions supported – E.g. RV32I means 32-bit RISC-V with support for the I(nteger) instruction set

6 User Level ISA

• Defines the normal instructions needed for computation – A mandatory Base integer ISA • I: Integer instructions: – ALU – Branches/jumps – Loads/stores – Standard Extensions • M: Integer Multiplication and Division • A: Atomic Instructions • F: Single-Precision Floating-Point • D: Double-Precision Floating-Point • C: Compressed Instructions (16 bit)

• G = IMAFD: Integer base + four standard extensions – Optional extensions

7 RISC-V ISA

• Both 32-bit and 64-bit address space variants – RV32 and RV64 • Easy to subset/extend for education/research – RV32IM, RV32IMA, RV32IMAFD, RV32G

• SPEC on the website – www.riscv.org

8 RV32/64 Processor State

• Program counter (pc) • 32 32/64-bit integer registers (x0-x31) – x0 always contains a 0 – x1 to hold the return address on a call. • 32 floating-point (FP) registers (f0-f31) – Each can contain a single- or double-precision FP value (32-bit or 64-bit IEEE FP) • FP status register (fsr), used for FP rounding mode & exception reporting

9 RV64G In One Table

10 Load/Store Instructions

11 ALU Instructions

12 Control Flow Instructions

13 RISC-V Dynamic Instruction Mix for SPECint2006

14 RISC-V Hybrid Instruction Encoding

• 16, 32, 48, 64 … bits length encoding • Base instruction set (RV32) always has fixed 32-bit

instructions lowest two bits = 112 • All branches and jumps have targets at 16-bit granularity (even in base ISA where all instructions are fixed 32 bits

15 Four Core RISC-V Instruction Formats

https://github.com/riscv/riscv-opcodes/blob/master/opcodes Additional opcode Additional opcode bits 7-bit opcode field bits/immediate (but low 2 bits =112)

Reg. Source 2 Reg. Source 1 Destination Reg.

Aligned on a four-byte boundary in memory. There are variants! Sign bit of immediates always on bit 31 of instruction. Register

fields never move. 16 With Variants

Additional opcode Additional opcode bits 7-bit opcode field bits/immediate (but low 2 bits =112)

Reg. Source 2 Reg. Source 1 Destination Reg.

Based on the handling of the immediates

17 RISC-V Encoding Summary Immediate Encoding Variants

• 32-bit Immediate produced by each base instruction format – Instruction bit: inst[y]

19 RISC-V Addressing Summary

, i.e., displacement addressing R-Format Encoding Example

funct7 rs2 rs1 funct3 rd opcode 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits add x6, x10, x6 0 6 10 0 6 51

0000000 00110 01010 000 00110 0110011

0000 0000 0110 0101 0000 0011 0011 0011two =

0065033316 RISC-V I-Format Instructions

immediate rs1 funct3 rd opcode 12 bits 5 bits 3 bits 5 bits 7 bits • Immediate arithmetic and load instructions – rs1: source or base address register number – immediate: constant operand, or offset added to base address • 2s-complement, sign extended • Design Principle: Good design demands good compromises – Different formats complicate decoding, but allow 32-bit instructions uniformly – Keep formats as similar as possible RISC-V S-Format Instructions

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits • Different immediate format for store instructions – rs1: base address register number – rs2: source operand register number – immediate: offset added to base address • Split so that rs1 and rs2 fields always in the same place Integer Computational Instructions (ALU)

• I-type (Immediate), all immediates in all instructions are sign extended – ADDI: adds sign extended 12-bit immediate to rs1 – SLTI(U): set less than immediate – ANDI/ORI/XORI: Logical operations I-type instructions end with I – SLLI/SRLI/SRAI: Shifts by constants

24 Integer Computational Instructions (ALU)

• I-type (Immediate), all immediates in all instructions are sign extended – LUI/AUIPC: load upper immediate/add upper immediate to pc I-type instructions end with I

• Writes 20-bit immediate to top of destination register. • Used to build large immediates. • 12-bit immediates are signed, so have to account for sign when building 32-bit immediates in 2-instruction sequence (LUI high- 20b, ADDI low-12b)

25 Integer Computational Instructions

• R-type (Register) – rs1 and rs2 are the source registers. rd the destination – ADD/SUB: – SLT, SLTU: set less than – SRL, SLL, SRA: shift logical or arithmetic left or right

ADDI x0, x0, 0

26 Control Transfer Instructions

NO architecturally visible delay slots • Unconditional Jumps: PC+offset target – JAL: Jump and link, also writes PC+4 to x1, UJ-type • Offset scaled by 1-bit left shift – can jump to 16-bit instruction boundary (Same for branches) – JALR: Jump and link register where Imm (12 bits) + rd1 = target

27 Control Transfer Instructions

NO architecturally visible delay slots • Conditional Branches: SB-type and PC+offset target

12-bit signed immediate split across two fields

Branches, compare two registers, PC+(immediate<<1) target (Signed offset in multiples of two).Branches do not have delay slot

28 Loads and Stores

• Store instructions (S-type) – MEM(rs1+imm) = rs2 • Loads (I-type) – Rd = MEM(rs1 + imm)

29 Specifications and Software From riscv.org and github.com/riscv • Specification from RISC-V website – https://riscv.org/specifications/

• RISC-V software includes – GNU Compiler Collection (GCC) toolchain (with GDB, the debugger) • https://github.com/riscv/riscv-tools – LLVM toolchain – A simulator ("Spike") • https://github.com/riscv/riscv-isa-sim – Standard simulator QEMU • https://github.com/riscv/riscv-qemu • Operating systems support exists for Linux – https://github.com/riscv/riscv-linux • A JavaScript ISA simulator to run a RISC-V Linux system on a web browser – https://github.com/riscv/riscv-angel

30 RISC-V Implementations

• For RISC-V implementation, the UCB created Chisel, an open-source hardware construction language that is a specialized dialect of Scala. – Chisel: Constructing Hardware In a Scala Embedded Language – https://chisel.eecs.berkeley.edu/ • In-order Rocket core and chip generator – https://github.com/freechipsproject/rocket-chip • Out-of-order BOOM core – https://github.com/ucb-bar/riscv-boom • UCB Sodor cores for education (single cycle, and 1-5 stages pipeline) – https://github.com/ucb-bar/riscv-sodor

31 RISC-V Implementations

• A list from – https://riscv.org/risc-v-cores/ • The Indian IIT-Madras is developing six RISC-V open-source CPU designs (SHAKTI) for six distinct usages – https://shaktiproject.bitbucket.io/index.html

• SiFive HiFive Unleashed – First Linux RISC-V Board • First shipment: June 2018 – https://www.sifive.com/ – https://github.com/sifive/freedom

32 Additional Information

33 Calling Convention

• C Datatypes and Alignment – RV32 employs an ILP32 integer model, while RV64 is LP64 – Floating-point types are IEEE 754-2008 compatible – All of the data types are keeped naturally aligned when stored in memory – char is implicitly unsigned – In RV64, 32-bit types, such as int, are stored in integer registers as proper sign extensions of their 32-bit values; that is, bits 63..31 are all equal • This restriction holds even for unsigned 32-bit types

34 Calling Convention

• RVG Calling Convention – If the arguments to a function are conceptualized as fields of a C struct, each with pointer alignment, the argument registers are a shadow of the first eight pointer- words of that struct • Floating-point arguments that are part of unions or array fields of structures are passed in integer registers • Floating-point arguments to variadic functions (except those that are explicitly named in the parameter list) are passed in integer registers – The portion of the conceptual struct that is not passed in argument registers is passed on the stack • The stack pointer sp points to the first argument not passed in a register – Arguments smaller than a pointer-word are passed in the least-significant bits of argument registers – When primitive arguments twice the size of a pointer-word are passed on the stack, they are naturally aligned • When they are passed in the integer registers, they reside in an aligned even-odd register pair, with the even register holding the least-significant bits – Arguments more than twice the size of a pointer-word are passed by reference

35 Calling Convention

• The stack grows downward and the stack pointer is always kept 16-byte aligned • Values are returned from functions in integer registers v0 and v1 and floating-point registers fv0 and fv1 – Floating-point values are returned in floating-point registers only if they are primitives or members of a struct consisting of only one or two floating-point values – Other return values that fit into two pointer-words are returned in v0 and v1 – Larger return values are passed entirely in memory; the caller allocates this memory region and passes a pointer to it as an implicit first parameter to the callee

36 Memory Model

• RISC-V: Relaxed memory model

37 Control and Status Register (CSR) Instructions

• CSR Instructions

• Timer and counters

38 Data Formats and Memory Addresses

Data formats: 8-b Bytes, 16-b Half words, 32-b words and 64-b double words Some issues Most Significant Least Significant • Byte addressing Byte Byte Little Endian (RISC-V) 3 2 1 0

Big Endian 0 1 2 3

• Word alignment Byte Addresses Suppose the memory is organized in 32-bit words. Can a word address begin only at 0, 4, 8, .... ?

0 1 2 3 4 5 6 7

39 ISA Design

• RISC-V has 32 integer registers and can have 32 floating-point registers – Register number 0 is a constant 0 – Register number 1 is the return address (link register) • The memory is addressed by 8-bit bytes • The instructions must be aligned to 32-bit addresses • Like many RISC designs, it is a "load-store" machine – The only instructions that access main memory are loads and stores – All arithmetic and logic operations occur between registers • RISC-V can load and store 8 and 16-bit items, but it lacks 8 and 16-bit arithmetic, including comparison-and-branch instructions • The 64-bit instruction set includes 32-bit arithmetic

40 ISA Design for Performance

• Features to increase a computer's speed, while reducing its cost and power usage – placing most-significant bits at a fixed location to speed sign-extension, and a bit- arrangement designed to reduce the number of multiplexers in a CPU

41 ISA Design

• Intentionally lacks condition codes, and even lacks a carry bit – To simplify CPU designs by minimizing interactions between instructions • Builds comparison operations into its conditional-jumps

42 ISA Design

• The lack of a carry bit complicates multiple-precision arithmetic – GMP, MPFR • Does not detect or flag most arithmetic errors, including overflow, underflow and divide by zero – No special instruction set support for overflow checks on integer arithmetic operations. • Most popular programming languages do not support checks for integer overflow, partly because most architectures impose a significant runtime penalty to check for overflow on integer arithmetic and partly because modulo arithmetic is sometimes the desired behavior – Floating-Point Control and Status Register

43 ISA Design

• Lacks the "count leading zero" and bit-field operations normally used to speed software floating-point in a pure-integer processor • No branch delay slot, a position after a branch instruction that can be filled with an instruction which is executed regardless of whether the branch is taken or not – This feature can improve performance of pipelined processors, – Omitted in RISC-V because it complicates both multicycle CPUs and superscalar CPUs • Lacks address-modes that "write back" to the registers – For example, it does not do auto-incrementing

44 ISA Design

• A load or store can add a twelve-bit signed offset to a register that contains an address. A further 20 bits (yielding a 32-bit address) can be generated at an absolute address – RISC-V was designed to permit position-independent code. It has a special instruction to generate 20 upper address bits that are relative to the program counter. The lower twelve bits are provided by normal loads, stores and jumps

– LUI (load upper immediate) places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros – AUIPC (add upper immediate to pc) is used to build pc-relative addresses, forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd