3/12/2015
Homework #2 and Lab #4
Homework #2 has been posted Due Friday, March 20 Pipeline hazards covered today and next week. Lab #4 has been posted Lecture 19, 20 & 21: Processor Pipelining – Part I design a single-cycle processor First due date: Friday, March 20 Reading: Chapter 4, March 9, 11, 13, 2015 Demo the first set of instructions executing through your processor design Patterson & Hennesey texbook Prof. R. Iris Bahar Second due date: Friday, April 3 Demo the full set of instructions executing correctly on your processor Final report due at this time Before starting on lab, go through the TimingQuest and PLL tutorials.
© 2015 R.I. Bahar This lab is to be completed INDIVIDUALLY Portions of these slides taken from Professors S. Reda and D. Patterson 2
Single-Cycle MIPS Processor Complete Single Cycle Processor
Fetch instruction @ PC Datapath
Control Decode instruction
Fetch Operands
Execute instruction
Store result Datapath for all instructions Update PC except jump
3 4
1 3/12/2015
Single Cycle processor with Control Datapath and control with jumps
[without jumps]
5 6
Performance Issues Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining
7 8
2 3/12/2015
MIPS stages MIPS datapath pipeline stages Need registers between stages Holds information produced in previous cycle Note that the register file is written in written in the first half of the cycle and read in the second half.
Five stages, one step per stage 1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 10 9
Pipeline datapath abstraction Multi-cycle datapath pipeline diagram Showing optimal resource usage Traditional form
11 12
3 3/12/2015
Tracing lW in its journey: 1st cycle Tracing lw in its journey: 2nd cycle
13 14
Tracing lw in its journey: 3rd cycle Tracing lw in its journey: 4th cycle
15 16
4 3/12/2015
Tracing lw in its journey: 5th cycle Corrected pipeline datapath for lW
Wrong register number
17 18
Pipeline state in 5th cycle Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath
Instr Instr fetch Register ALU op Memory Register Total time read access write lw 200ps 100 ps 200ps 200ps 100 ps 800ps lw $10, 20($1) sub $11, $2, $3 sw 200ps 100 ps 200ps 200ps 700ps add $12, $3, $4 R-format 200ps 100 ps 200ps 100 ps 600ps lw $13, 24($1) add $14, $5, $6 beq 200ps 100 ps 200ps 500ps
20 19
5 3/12/2015
Single-cycle vs. pipeline performance Pipeline Speedup
Single-cycle (Tc= 800ps) If all stages are balanced i.e., all take the same time
Time between instructionspipelined = Time between instructionsnonpipelined Number of stages Pipelined (Tc= 200ps) If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease
22 21
Pipeline datapath summary Reminder of single-cycle control
How many cycles does it take to execute this code?
23 24
6 3/12/2015
ALU Control Main decoder Instruction Op RegWrite RegDst AluSrc Branch Mem-read MemWrite MemtoReg ALUOp Assume 2-bit ALUOp derived from opcode 5:0 1:0 Combinational logic derives ALU control R-type 000000 110000010 Define additional ALU control encodings to expand its functionality lw 100011 101010100 opcode ALUOp Operation funct ALU function ALU control sw 101011 0 X 1 0 0 1 X 00 lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 000100 0 X 0 1 0 0 X 01 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 addi subtract 100010 subtract 0110 001000 10100 0000 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111
25 26
Control signals Modifications to pipeline control Control signals are derived from instructions Same as in single-cycle implementation Control is carried over to the proper pipeline stage
27 28
7 3/12/2015
Pipelined datapath + control Example: Cycle 1
29 30
Cycle 2 Cycle 3
31 32
8 3/12/2015
Cycle 4 Cycle 5
33 34
Cycle 6 Cycle 7
35 36
9 3/12/2015
Cycle 8 Cycle 9
37 38
Pipelining Hazards 1. Structure Hazards
Hazards are situations that prevent starting the next Conflict for use of a resource instruction in the next cycle What if in MIPS pipeline we had a single memory for 1. Structural hazards instruction and data? A required resource is busy Load/store requires data access 2. Data hazards Instruction fetch would have to stall for that cycle Need to wait for previous instruction to complete its data Would cause a pipeline “bubble” read/write 3. Control hazards Hence, pipelined datapaths require separate instruction/data memories Deciding on control action depends on previous instruction Or separate instruction/data caches What about having only one adder in the MIPS pipeline?
39 40
10 3/12/2015
2. Data Hazards: compute-use 2. Data Hazard: load-use
1234567 8
Time (cycles) $s2 add DM $s0 add $s0, $s2, $s3 IM RF $s3 + RF
$s0 and DM $t0 and $t0, $s0, $s1 IM RF $s1 & RF
$s4 or DM $t1 or $t1, $s4, $s0 IM RF $s0 | RF
$s0 sub DM $t2 sub $t2, $s0, $s5 IM RF $s5 - RF
41 42
Handling data hazards A. Compile Time Technique: Code Scheduling
A. Compile-time techniques Reorder code to avoid use of load result in the next instruction B. Stall the processor at run time C code for A = B + E; C = B + F; C. Forward data at run time Compiler must be aware of pipeline structure
lw $t1, 0($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0) sw $t3, 12($t0) add $t3, $t1, $t2 lw $t4, 8($t0) sw $t3, 12($t0) stall add $t5, $t1, $t4 add $t5, $t1, $t4 sw $t5, 16($t0) sw $t5, 16($t0) 13 cycles 11 cycles
43 44
11 3/12/2015
A. Compile Time Technique: Insert NOPs B. Run time technique: Stall pipeline Insert enough NOPs until result is ready (wastes cycles) Detect dependency at run time and insert “bubbles” Doesn’t require HW to detect hazards Prevent new instruction from advancing in pipeline 12345678910 add $s0, $t0, $t1 Time (cycles) $s2 add DM $s0 sub $t2, $s0, $t3 add $s0, $s2, $s3 IM RF $s3 + RF
nop IM nop RF DM RF
nop IM nop RF DM RF
$s0 and DM $t0 and $t0, $s0, $s1 IM RF $s1 & RF
$s4 or DM $t1 or $t1, $s4, $s0 IM RF $s0 | RF
$s0 sub DM $t2 sub $t2, $s0, $s5 IM RF $s5 - RF
45 46
How do we stall the pipeline? C. Data forwarding during runtime inserting a bubble Don’t wait for result to be stored in a register forward the results from wherever they happen to be Do not update PC or IF/ID Requires extra connections in the datapath instruction in ID stage is decoded again, instruction in IF stage is fetched again Force control values in ID/EX register to 0 Essentially passes on a NOP instruction to the EX stage Inserting a 2-cycle stall allows results to be written to register file before reading them in ID stage.
48 47
12 3/12/2015
Dependencies and forwarding Circuitry for forwarding
49 50
One more MUX for immediates When should data be forwarded? EX/MEM.RegWrite and/or MEM/WB.RegWrite are true Destination register(s) are equal to the source registers of the next 1 - 2 instructions. That is,
EX/MEM.RegisterRd == ID/EX.RegisterRs
EX/MEM.RegisterRd == ID/EX.RegisterRt
MEM/WB.RegisterRd == ID/EX.RegisterRs
MEM/WB.RegisterRd == ID/EX.RegisterRt Dest. reg. in EX/MEM and/or MEM/WB is not $0.
EX/MEM.RegisterRd ≠ 0
MEM/WB.RegisterRd ≠ 0 51 52
13