Pipelining and Hazards Instructor: Steven Ho Great Idea #4: Parallelism Software Hardware • Parallel Requests Warehouse Smart Assigned to Computer Scale Phone E.G

Pipelining and Hazards Instructor: Steven Ho Great Idea #4: Parallelism Software Hardware • Parallel Requests Warehouse Smart Assigned to computer Scale Phone e.g. search “Garcia” Computer Leverage • Parallel Threads Parallelism & Assigned to core Achieve High e.g. lookup, ads Performance Computer • Parallel Instructions Core … Core > 1 instruction @ one time Memory e.g. 5 pipelined instructions Input/Output • Parallel Data Core > 1 data item @ one time Instruction Unit(s) Functional e.g. add of 4 pairs of words Unit(s) A +B A +B A +B A +B • Hardware descriptions 0 0 1 1 2 2 3 3 All gates functioning in Cache Memory Logic Gates parallel at same time 7/12/2017 CS61C Su18 - Lecture 13 2 Review of Last Lecture • Implementing controller for your datapath – Take decoded signals from instruction and generate control signals • Pipelining improves performance by exploiting Instruction Level Parallelism – 5-stage pipeline for RISC-V: IF, ID, EX, MEM, WB – Executes multiple instructions in parallel – Each instruction has the same latency – What can go wrong??? 7/12/2017 CS61C Su18 - Lecture 13 3 Agenda • RISC-V Pipeline • Hazards – Structural – Data • R-type instructions • Load – Control • Superscalar processors 7/12/2017 CS61C Su18 - Lecture 13 4 Recap: Pipelining with RISC-V tinstruction instruction sequence add t0, t1, t2 or t3, t4, t5 sll t6, t0, t3 tcycle Single Cycle Pipelining Timing tstep = 100 … 200 ps tcycle = 200 ps Register access only 100 ps All cycles same length Instruction time, tinstruction = tcycle = 800 ps 1000 ps Clock rate, fs 1/800 ps = 1.25 GHz 1/200 ps = 5 GHz Relative speed 1 x 4 x 7/11/2018 5 RISC-V Pipeline t = 1000 ps instruction Resource use in a particular time slot add t0, t1, t2 Resource use of or t3, t4, t5 instruction sequence instruction over time slt t6, t0, t3 sw t0, 4(t3) lw t0, 8(t3) tcycle = 200 ps addi t2, t2, 1 7/11/2018 CS61C Su18 - Lecture 13 6 Single-Cycle RISC-V RV32I Datapath pc+4 pc +4 Reg[] alu wb 1 DataD 1 Reg[rs1] ALU 2 alu 0 DMEM inst[11:7] 1 pc AddrD 0 IMEM Reg[rs2] Addr wb pc+4 Branch DataR 0 inst[19:15] AddrA DataA 0 Comp. DataW mem inst[24:20] AddrB DataB 1 inst[31:7] Imm. imm[31:0] Gen PCSel inst[31:0] ImmSel RegWEn BrUnBrEq BrLT BSel ASel ALUSel MemRW WBSel 7/11/2018 7 Pipelining RISC-V RV32I Datapath pc+4 pc +4 Reg[] alu wb 1 DataD 1 Reg[rs1] ALU 2 alu 0 DMEM inst[11:7] 1 pc AddrD 0 IMEM Reg[rs2] Addr wb pc+4 Branch DataR 0 inst[19:15] AddrA DataA 0 Comp. DataW mem inst[24:20] AddrB DataB 1 inst[31:7] Imm. imm[31:0] Gen Instruction Instruction Fetch ALU Execute Memory Access Write Back (F) Decode/Register Read (X) (M) (W) (D) 7/11/2018 8 Pipelined RISC-V RV32I Datapath Recalculate PC+4 in M stage to avoid sending both PC and PC+4 down pipeline pcM pcX pcF pcD +4 +4 Reg[] rs1X DataD 1 ALU aluX DMEM 0 AddrD Addr pc +4 Branch aluM DataR F IMEM DataA AddrA Comp. DataW DataB AddrB inst D rs2 rs2 X imm M Imm. X instX instM instW Must pipeline instruction along with data, so 7/11/2018 control operates correctly in each stage 9 Each stage operates on different instruction lw t0, 8(t3) sw t0, 4(t3) slt t6, t0, t3 or t3, t4, t5 add t0, t1, t2 pcM pcX pcF pcD +4 +4 Reg[] rs1X DataD 1 ALU aluX DMEM 0 AddrD Addr pc +4 Branch aluM DataR F IMEM DataA AddrA Comp. DataW DataB AddrB inst D rs2 rs2 X imm M Imm. X instX instM instW 7/11/2018 Pipeline registers separate stages, hold data for each instruction in flight 10 Pipelined Control • Control signals derived from instruction − As in single-cycle implementation − Information is stored in pipeline registers for use by later stages 7/11/2018 11 Graphical Pipeline Representation • RegFile: right half is read, left half is write Time (clock cycles) I ALU n I$ Reg D$ Reg s Load ALU t I$ Reg D$ Reg Add r ALU I$ Reg D$ Reg Store O ALU r I$ Reg D$ Reg Sub d ALU I$ Reg D$ Reg e Or r 7/12/2017 CS61C Su18 - Lecture 13 12 Question: Which of the following signals (buses or control signals) for RISC-V does NOT need to be passed into the EX pipeline stage for a beq instruction? beq t0 t1 Label (A) BrUn IF ID EX Mem WB (B) MemWr ALU I$ Reg D$ Reg (C) RegWr (D) WBSel 13 Hazards Agenda Ahead! • RISC-V Pipeline • Hazards – Structural – Data • R-type instructions • Load – Control • Superscalar processors 7/12/2017 CS61C Su18 - Lecture 13 14 Pipelining Hazards A hazard is a situation that prevents starting the next instruction in the next clock cycle 1) Structural hazard – A required resource is busy (e.g. needed in multiple stages) 2) Data hazard – Data dependency between instructions – Need to wait for previous instruction to complete its data write 3) Control hazard – Flow of execution depends on previous instruction 7/12/2017 CS61C Su18 - Lecture 13 15 Agenda • RISC-V Pipeline • Hazards – Structural – Data • R-type instructions • Load – Control • Superscalar processors 7/12/2017 CS61C Su18 - Lecture 13 16 Structural Hazard • Problem: Two or more instructions in the pipeline compete for access to a single physical resource • Solution 1: Instructions take it in turns to use resource, some instructions have to stall • Solution 2: Add more hardware to machine • Can always solve a structural hazard by adding more hardware 7/11/2018 CS61C Su18 - Lecture 13 17 1. Structural Hazards • Conflict for use of a resource Time (clock cycles) I ALU n I$ Reg D$ Reg s Load Can we read and t ALU I$ Reg D$ Reg write to registers r Instr 1 simultaneously? ALU I$ Reg D$ Reg O Instr 2 r ALU I$ Reg D$ Reg d Instr 3 e ALU I$ Reg D$ Reg r Instr 4 7/12/2017 CS61C Su18 - Lecture 13 18 Regfile Structural Hazards • Each instruction: − can read up to two operands in decode stage − can write one value in writeback stage • Avoid structural hazard by having separate “ports” − two independent read ports and one independent write port • Three accesses per cycle can happen simultaneously 7/11/2018 CS61C Su18 - Lecture 13 19 Regfile Structural Hazards • Two alternate solutions: 1) Build RegFile with independent read and write ports (what you will do in the project; good for single-stage) 2) Double Pumping: split RegFile access in two! Prepare to write during 1st half, write on falling edge, read during 2nd half of each clock cycle • Will save us a cycle later... • Possible because RegFile access is VERY fast (takes less than half the time of ALU stage) • Conclusion: Read and Write to registers during same clock cycle is okay 7/12/2017 CS61C Su18 - Lecture 13 20 Memory Structural Hazards • Conflict for use of a resource Time (clock cycles) I Trying to read ALU n (and maybe I$ Reg D$ Reg s Load write) same t ALU I$ Reg D$ Reg memory twice r Instr 1 in same clock ALU I$ Reg D$ Reg cycle O Instr 2 r ALU I$ Reg D$ Reg d Instr 3 e ALU I$ Reg D$ Reg r Instr 4 7/12/2017 CS61C Su18 - Lecture 13 21 Instruction and Data Caches Processor Memory (DRAM) Control Instruction Cache Program Datapath PC Bytes Data Registers Cache Arithmetic & Logic Unit Data (ALU) Caches: small and fast “buffer” memories 7/11/2018 CS61C Su18 - Lecture 13 22 Structural Hazards – Summary • Conflict for use of a resource • In RISC-V pipeline with a single memory − Load/store requires data access − Without separate memories, instruction fetch would have to stall for that cycle ▪ All other operations in pipeline would have to wait • Pipelined datapaths require separate instruction/data memories − Or separate instruction/data caches • RISC ISAs (including RISC-V) designed to avoid structural hazards − e.g. at most one memory access/instruction CS61C Su18 - Lecture 13 23 Administrivia • Proj2-2 due 7/13, HW3/4 due 7/16 – 2-2 autograder being run • Guerilla Session Tonight! 4-6pm • HW0-2 grades should now be accurate on glookup • Project 3 released tomorrow night! • Supplementary review sessions starting – First one this Sat. (7/14) 12-2p, Cory 540AB 7/12/2017 CS61C Su18 - Lecture 13 24 Agenda • RISC-V Pipeline • Hazards – Structural – Data • R-type instructions • Load – Control • Superscalar processors 7/12/2017 CS61C Su18 - Lecture 13 25 2. Data Hazards (1/2) • Consider the following sequence of instructions: add t0, t1, t2 sub t4, t0, t3 and t5, t0, t6 or t7, t0, t8 xor t9, t0, t10 Stored Read during WB during ID 7/12/2017 CS61C Su18 - Lecture 13 26 2. Data Hazards (2/2) • Data-flow backward in time are hazards Time (clock cycles) IF ID/RF EX MEM WB I ALU add t0, t1, t2 I$ Reg D$ Reg Hazard if no double pumping n s ALU sub t4, t0, t3 I$ Reg D$ Reg t r ALU I$ D$ Reg and t5, t0, t6 Reg O ALU I$ Reg D$ Reg r or t7, t0, t8 d ALU e I$ Reg D$ Reg xor t9, t0, t10 r 7/12/2017 CS61C Su18 - Lecture 13 27 Solution 1: Stalling • Problem: Instruction depends on result from previous instruction − add t0, t1, t2 sub t4, t0, t3 • Bubble: − effectively NOP: affected pipeline stages do “nothing” Stalls and Performance • Stalls reduce performance − But stalls are required to get correct results • Compiler can arrange code to avoid hazards and stalls − Requires knowledge of the pipeline structure 7/11/2018 29 Data Hazard Solution: Forwarding • Forward result as soon as it is available – OK that it’s not stored in RegFile yet Arithmetic result IF ID/RF EX MEM WB available in EX ALU add t0, t1, t2 I$ Reg D$ Reg ALU sub t4, t0, t3 I$ Reg D$ Reg ALU I$ D$ Reg and t5, t0, t6 Reg ALU I$ Reg D$ Reg or t7, t0, t8 ALU I$ Reg D$ Reg xor t9, t0, t10 Forwarding: grab operand from pipeline stage, 7/12/2017 rather than register file 30 Forwarding (aka Bypassing) • Use result when it is computed − Don’t wait for it to be stored in a register − Requires extra connections in the datapath 7/11/2018 CS61C Su18 - Lecture 13 31 Detect Need for Forwarding (example) Compare destination of older instructions in D X M W pipeline with sources of new instruction in decode stage.

Load more