ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor

ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor 361 hazards.1 Review: A Pipelined Datapath Clk Ifetch Reg/Dec Exec Mem Wr RegWr ExtOp ALUOp Branch 1 0 P PC+4 P C PC+4 C + Imm16 4 M Imm16 E I I x e D F Rs / Zero Data busA m M / / A I E Ra / D e Mem W I x busB m U R R Exec r n Rb RA Do R e e R M 1 i g t Rt g Unit e WA e i i u RFile g s g s x t i t i s e Di e s t r t Rw r Di e Rt e r I 0 r 0 Rd 1 RegDst ALUSrc MemWr MemtoReg 361 hazards.2 1 Review: Pipeline Control “Data Stationary Control” ° The Main Control generates the control signals during Reg/Dec • Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later • Control signals for Mem (MemWr Branch) are used 2 cycles later • Control signals for Wr (MemtoReg MemWr) are used 3 cycles later Reg/Dec Exec Mem Wr ExtOp ExtOp ALUSrc ALUSrc E M x I I / e D F ALUOp ALUOp M m / / I E e / D Main W m RegDst x RegDst R R Control r R e R e e g MemWr g MemWr MemWr g e i i s g i s s t t i t e s Branch e Branch Branch e r t r r e MemtoReg MemtoReg MemtoReg r MemtoReg RegWr RegWr RegWr RegWr 361 hazards.3 Review: Pipeline Summary ° Pipeline Processor: • Natural enhancement of the multiple clock cycle processor • Each functional unit can only be used once per instruction • If a instruction is going to use a functional unit: - it must use it at the same stage as all other instructions • Pipeline Control: - Each stage’s control signal depends ONLY on the instruction that is currently in that stage 361 hazards.4 2 Outline of Today’s Lecture ° Recap and Introduction ° Introduction to Hazards ° Forwarding ° 1 cycle Load Delay ° 1 cycle Branch Delay ° What makes pipelining hard ° Summary 361 hazards.5 Its not that easy for computers ° Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle • structural hazards: HW cannot support this combination of instructions • data hazards: instruction depends on result of prior instruction still in the pipeline • control hazards: pipelining of branches & other instructions that change the PC ° Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline 361 hazards.6 3 Single Memory is a Structural Hazard Time (clock cycles) A L I Mem Reg U Mem Reg n Load s A L t Instr 1 Mem Reg U Mem Reg r. A L Mem Reg U Mem Reg O Instr 2 r A L d Mem Reg U Mem Reg e Instr 3 A r L Instr 4 Mem Reg U Mem Reg 361 hazards.7 Option 1: Stall to resolve Memory Structural Hazard Time (clock cycles) A L I Mem Reg U Mem Reg n Load s A L t Instr 1 Mem Reg U Mem Reg r. A L Mem Reg U Mem Reg O Instr 2 r A d L bubble Mem Reg U Mem Reg e Instr 3(stall) r A L Instr 4 Mem Reg U Mem Reg 361 hazards.8 4 Option 2: Duplicate to Resolve Structural Hazard • Separate Instruction Cache (Im) & Data Cache (Dm) Time (clock cycles) A L I Im Reg U Dm Reg n Load s A L t Instr 1 Im Reg U Dm Reg r. A L Im Reg U Dm Reg O Instr 2 r A L d Im Reg U Dm Reg e Instr 3 A r L Instr 4 Im Reg U Dm Reg 361 hazards.9 Data Hazard on r1 add r1 ,r2,r3 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11 361 hazards.10 5 Data Hazard on r1: (Figure 6.30, page 397, P&H) • Dependencies backwards in time are hazards Time (clock cycles) IF ID/RF EX MEM WB A L Im Reg U Dm Reg I add r1,r2,r3 n A L s sub r4,r1,r3 Im Reg U Dm Reg t A r. L and r6,r1,r7 Im Reg U Dm Reg O A L r Im Reg U Dm Reg d or r8,r1,r9 e A L r xor r10,r1,r11 Im Reg U Dm Reg 361 hazards.11 Option1: HW Stalls to Resolve Data Hazard • Dependencies backwards in time are hazards Time (clock cycles) IF ID/RF EX MEM WB A L Im Reg U Dm Reg I add r1,r2,r3 n A L s sub r4, r1,r3 Im bubble bubble bubble Reg U Dm Reg t r. A L and r6,r1,r7 Im Reg U Dm O r A L d or r8,r1,r9 Im Reg U e r xor r10,r1,r11 Im Reg 361 hazards.12 6 But recall use of “Data Stationary Control” ° The Main Control generates the control signals during Reg/Dec • Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later • Control signals for Mem (MemWr Branch) are used 2 cycles later • Control signals for Wr (MemtoReg MemWr) are used 3 cycles later Reg/Dec Exec Mem Wr ExtOp ExtOp ALUSrc ALUSrc E M x I I / e D F ALUOp ALUOp M m / / I E e / D Main W m RegDst x RegDst R R Control r R e R e e g MemWr g MemWr MemWr g e i i s g i s s t t i t e s Branch e Branch Branch e r t r r e MemtoReg MemtoReg MemtoReg r MemtoReg RegWr RegWr RegWr RegWr 361 hazards.13 Option 1: How HW really stalls pipeline • HW doesn’t change PC => keeps fetching same instruction & sets control signals to benign values (0) Time (clock cycles) IF ID/RF MEM WB EA X L Im Reg U Dm Reg I add r1,r2,r3 n s Im bubble bubble bubble bubble t stall r. stall Im bubble bubble bubble bubble O r Im bubble bubble bubble bubble d stall e A L r sub r4,r1,r3 Im Reg U Dm Reg A L and r6,r1,r7 Im Reg U Dm 361 hazards.14 7 Option 2: SW inserts indepdendent instructions • Worst case inserts NOP instructions Time (clock cycles) IF ID/RF E MEM WB A L X Im Reg U Dm Reg I add r1,r2,r3 n A L s Im Reg U Dm Reg t nop A r. L nop Im Reg U Dm Reg O A L r Im Reg U Dm Reg d nop e A L r sub r4,r1,r3 Im Reg U Dm Reg A L and r6,r1,r7 Im Reg U Dm 361 hazards.15 Questions and Administrative Matters 361 hazards.16 8 Option 3 Insight: Data is available! ) • Pipeline registers already contain needed data Time (clock cycles) IF ID/RF E MEM WB A L X Im Reg U Dm Reg I add r1,r2,r3 n A L s sub r4,r1,r3 Im Reg U Dm Reg t A r. L and r6,r1,r7 Im Reg U Dm Reg O A L r Im Reg U Dm Reg d or r8,r1,r9 e A L r xor r10,r1,r11 Im Reg U Dm Reg 361 hazards.17 HW Change for “Forwarding” (Bypassing):) • Increase multiplexors to add paths from pipeline registers • Assumes register read during write gets new value (otherwise more results to be forwarded) 361 hazards.18 9 From Last Lecture: The Delay Load Phenomenon Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Clock I0: Load Ifetch Reg/Dec Exec Mem Wr Plus 1 Ifetch Reg/Dec Exec Mem Wr Plus 2 Ifetch Reg/Dec Exec Mem Wr Plus 3 Ifetch Reg/Dec Exec Mem Wr Plus 4 Ifetch Reg/Dec Exec Mem Wr ° Although Load is fetched during Cycle 1: • The data is NOT written into the Reg File until the end of Cycle 5 • We cannot read this value from the Reg File until Cycle 6 • 3-instruction delay before the load take effect 361 hazards.19 Forwarding reduces Data Hazard to 1 cycle: Time (clock cycles) IF ID/RF EX MEM WB A L Im Reg U Dm Reg I lw r1, 0(r2) n A L s sub r4,r1,r6 Im Reg U Dm Reg t A r. L and r6,r1,r7 Im Reg U Dm Reg O A L r Im Reg U Dm Reg d or r8,r1,r9 e r 361 hazards.20 10 Option1: HW Stalls to Resolve Data Hazard • “Interlock”: checks for hazard & stalls Time (clock cycles) IF ID/RF EX MEM WB A L Im Reg U Dm Reg I lw r1, 0(r2) n s Im bubble bubble bubble bubble t stall r. A L sub r4,r1,r3 Im Reg U Dm Reg O A r L Im Reg U Dm Reg d and r6,r1,r7 e A L r or r8,r1,r9 Im Reg U Dm Reg 361 hazards.21 Option 2: SW inserts independent instructions • Worst case inserts NOP instructions • MIPS I solution: No HW checking Time (clock cycles) IF ID/RF EX MEM WB A L Im Reg U Dm Reg I lw r1, 0(r2) n A L s Im Reg U Dm Reg t nop r.

ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor

Pipelining and Vector Processing

Diploma Thesis

How Data Hazards Can Be Removed Effectively

Pipelining: Basic Concepts and Approaches

Powerpc 601 RISC Microprocessor Users Manual

Improving UNIX Kernel Performance Using Profile Based Optimization

Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗

Lecture Topics RAW Hazards Stall Penalty

UM0434 E200z3 Powerpc Core Reference Manual

Instruction Pipelining Review

Pipelined Instruction Executionhazards, Stages

Elastic Pipeline: Addressing GPU On-Chip Shared Memory Bank Conﬂicts