361 Computer Architecture Lecture 12: Designing a Pipeline Processor

Home , Instruction unit, Pipeline (computing)

pipeline.1

Overview of a Multiple Cycle Implementation

° The root of the single cycle processor’s problems: • The cycle time has to be long enough for the slowest instruction

° Solution: • Break the instruction into smaller steps • Execute each step (instead of the entire instruction) in one cycle - Cycle time: time it takes to execute the longest step - Keep all the steps to have similar length • This is the essence of the multiple cycle processor

° The advantages of the multiple cycle processor: • Cycle time is much shorter • Different instructions take different number of cycles to complete - Load takes five cycles - Jump only takes three cycles • Allows a functional unit to be used more than once per instruction pipeline.2

1 Multiple Cycle Processor

° MCP: A functional unit to be used more than once per instruction

PCWr PCWrCond PCSrc BrWr Zero

IorD MemWr IRWr RegDst RegWr ALUSelA M 1 Target 32 u

32 x PC 0

M 0

32 I Zero

n Rs u s M 0 Ra x t r 32 RAdr 5 32 A u u Rt L x c 32 Rb busA 1 t 32 U Ideal i o 5 Reg File 32 M 1 Memory n Rt 0 4 0

R u Rw 32

WrAdr 32 x e 1 32 Din Dout g Rd busW busB 32 32 1 2 32 ALU 1 Mux 0 3 << 2 Control

Imm Extend 16 32 ALUOp ExtOp MemtoReg ALUSelB

pipeline.3

Outline of Today’s Lecture

° Recap and Introduction

° Introduction to the Concept of Pipelined Processor

° Pipelined Datapath and Pipelined Control

° How to Avoid Race Condition in a Pipeline Design?

° Pipeline Example: Instructions Interaction

° Summary

pipeline.4

2 Pipelining is Natural!

° Laundry Example

° Sammy, Marc, Griffy, Albert A B C D each have one load of clothes to wash, dry, and fold

° Washer takes 30 minutes

° Dryer takes 30 minutes

° “Folder” takes 30 minutes

° “Stasher” takes 30 minutes to put clothes into drawers

pipeline.5

Sequential Laundry

6 PM 7 8 9 10 11 12 1 2 AM

T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time a A s k B C O D r d e r ° Sequential laundry takes 8 hours for 4 loads ° If they learned pipelining, how long would laundry take?

pipeline.6

3 Pipelined Laundry: Start work ASAP

6 PM 7 8 9 10 11 12 1 2 AM

Time T 30 30 30 30 30 30 30 a A s k B C O D r d e r ° Pipelined laundry takes 3.5 hours for 4 loads!

pipeline.7

Pipelining Lessons

° Pipelining doesn’t help latency 6 PM of single task, it helps 7 8 9 throughput of entire workload Time T ° Multiple tasks operating simultaneously using different a 30 30 30 30 30 30 30 resources s A ° Potential speedup = Number k pipe stages B ° Pipeline rate limited by slowest pipeline stage O C r ° Unbalanced lengths of pipe D stages reduces speedup d ° Time to “fill” pipeline and time to e “drain” it reduces speedup r ° Stall for Dependences

pipeline.8

4 Why Pipeline?

° Suppose we execute 100 instructions

° Single Cycle Machine • 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

° Multicycle Machine • 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns

° Ideal pipelined machine • 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

pipeline.9

Timing Diagram of a Load Instruction Instruction Fetch Instr Decode / Address Data Memory Reg Wr Reg. Fetch Clk Clk-to-Q PC Old Value New Value Instruction Memory Access Time Rs, Rt, Rd, Old Value New Value Op, Func Delay through Control Logic ALUctr Old Value New Value

ExtOp Old Value New Value

ALUSrc Old Value New Value R e

RegWr Old Value New Value g i s t

busA Old Value New Value F i l e

Delay through Extender & Mux W

busB Old Value New Value r i t ALU Delay e

Address Old Value New Value i m

Data Memory Access Time e busW Old Value New pipeline.10

5 The Five Stages of Load

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Load Ifetch Reg/Dec Exec Mem Wr

° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: Calculate the memory address

° Mem: Read the data from the Data Memory

° Wr: Write the data back to the register file

pipeline.11

Pipelining the Load Instruction

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Clock

1st lw Ifetch Reg/Dec Exec Mem Wr

2nd lw Ifetch Reg/Dec Exec Mem Wr

3rd lw Ifetch Reg/Dec Exec Mem Wr

° The five independent functional units in the pipeline datapath are: • Instruction Memory for the Ifetch stage • Register File’s Read ports (bus A and busB) for the Reg/Dec stage • ALU for the Exec stage • Data Memory for the Mem stage • Register File’s Write port (bus W) for the Wr stage

° One instruction enters the pipeline every cycle • One instruction comes out of the pipeline (complete) every cycle • The “Effective” Cycles per Instruction (CPI) is 1 pipeline.12

6 Conventional Pipelined Execution Representation

Time

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB Program Flow IFetch Dcd Exec Mem WB

pipeline.13

Single Cycle, Multiple Cycle, vs. Pipeline Cycle 1 Cycle 2 Clk Single Cycle Implementation: Load Store Waste

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk

Multiple Cycle Implementation: Load Store R-type Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch

Pipeline Implementation: Load Ifetch Reg Exec Mem Wr

Store Ifetch Reg Exec Mem Wr

R-type Ifetch Reg Exec Mem Wr

pipeline.14

7 Why Pipeline? Because the resources are there!

Time (clock cycles) A

I L

Im Reg U Dm Reg n Inst 0

s A L t Inst 1 Im Reg U Dm Reg r. A L

O Inst 2 Im Reg U Dm Reg r A

d Inst 3 L

Im Reg U Dm Reg e r A

Inst 4 L Im Reg U Dm Reg

pipeline.15

Can pipelining get us into trouble?

° Yes: Pipeline Hazards • structural hazards: attempt to use the same resource two different ways at the same time - E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) • data hazards: attempt to use item before it is ready - E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer - instruction depends on result of prior instruction still in the pipeline • control hazards: attempt to make a decision before condition is evaulated - E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in - branch instructions

° Can always resolve hazards by waiting • pipeline control must detect the hazard • take action (or delay action) to resolve hazards pipeline.16

8 Single Memory is a Structural Hazard

Time (clock cycles) A

I L Mem Reg U Mem Reg n Load s A L t Instr 1 Mem Reg U Mem Reg r. A L

Mem Reg U Mem Reg O Instr 2

r A L d Mem Reg U Mem Reg e Instr 3 A

r L Instr 4 Mem Reg U Mem Reg

Detection is easy in this case! (right half highlight means read, left half write) pipeline.17

Structural Hazards limit performance

° Example: if 1.3 memory accesses per instruction and only one memory access per cycle then • average CPI Ǝ 1.3 • otherwise resource is more than 100% utilized • More on Hazards later

pipeline.18

9 Pipelining the R-type and Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock

R-type Ifetch Reg/Dec Exec Wr Ops! We have a problem!

R-type Ifetch Reg/Dec Exec Wr

Load Ifetch Reg/Dec Exec Mem Wr

R-type Ifetch Reg/Dec Exec Wr

° We have a problem: • Two instructions try to write to the register file at the same time!

pipeline.19

The Four Stages of R-type

Cycle 1 Cycle 2 Cycle 3 Cycle 4

R-type Ifetch Reg/Dec Exec Wr

° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: ALU operates on the two register operands

° Wr: Write the ALU output back to the register file

pipeline.20

10 Important Observation

° Each functional unit can only be used once per instruction

° Each functional unit must be used at the same stage for all instructions: • Load uses Register File’s Write Port during its 5th stage 1 2 3 4 5 Load Ifetch Reg/Dec Exec Mem Wr

• R-type uses Register File’s Write Port during its 4th stage 1 2 3 4 R-type Ifetch Reg/Dec Exec Wr

pipeline.21

Solution 1: Insert “Bubble” into the Pipeline

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock Ifetch Reg/Dec Exec Wr Load Ifetch Reg/Dec Exec Mem Wr

R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Pipeline Exec Wr R-type Ifetch Bubble Reg/Dec Exec Wr Ifetch Reg/Dec Exec

° Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle • The control logic can be complex

° No instruction is completed during Cycle 5: • The “Effective” CPI for load is >1

pipeline.22

11 Solution 2: Delay R-type’s Write by One Cycle

° Delay R-type’s register write by one cycle: • Now R-type instructions also use Reg File’s write port at Stage 5 • Mem stage is a NOOP stage: nothing is being done 1 2 3 4 5 R-type Ifetch Reg/Dec Exec Mem Wr

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock

R-type Ifetch Reg/Dec Mem Exec Wr

Load Ifetch Reg/Dec Exec Mem Wr

R-type Ifetch Reg/Dec Mem Exec Wr

pipeline.23

The Four Stages of Store

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Store Ifetch Reg/Dec Exec Mem Wr

° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: Calculate the memory address

° Mem: Write the data into the Data Memory

pipeline.24

12 The Four Stages of Beq

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Beq Ifetch Reg/Dec Exec Mem Wr

° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: ALU compares the two register operands • Adder calculates the branch target address

° Mem: If the registers we compared in the Exec stage are the same, • Write the branch target address into the PC

pipeline.25

A Pipelined Datapath

Clk

Ifetch Reg/Dec Exec Mem Wr

RegWr ExtOp ALUOp Branch

1 0

P PC+4 P C PC+4 C + Imm16 4 M Imm16 E I I x e D F Rs / Zero Data busA m M / /

A I E

Ra / D e Mem W I

x busB m U

R R Exec r n Rb RA Do

R e e R M 1 i g t

Rt g

Unit e WA e i i u

RFile g s g s x t i t i s e Di e s t r t Rw r

Di e

Rt e r I 0 r 0 Rd 1

RegDst ALUSrc MemWr MemtoReg

pipeline.26

13 The Instruction Fetch Stage ° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

You are here!

Clk Ifetch Reg/Dec Exec Mem

RegWr ExtOp ALUOp Branch

1 0

P PC+4 P I

C PC+4 F C + /

Imm16 4 I = M E

D Imm16

I 1 x e : D 4

Rs /

busA Zero Data m M l / A w E

Ra / e W

Mem I

x busB $ m U

1 r R Exec n

, Rb RA Do

R M e 1 i 1

t Rt g e 0 Unit WA e u i RFile g g 0 s x i i t

s ( Di s e t $ t

Rw r

Di e

Rt e 2 r

0 r 0 I ) Rd 1

RegDst ALUSrc MemWr MemtoReg

pipeline.27

A Detail View of the Instruction Unit

° Location 10: lw $1, 0x100($2)

You are here! Clk

Ifetch Reg/Dec

1 0

“4” A d P I d F C e /

I r = D

1 10 : 4

l w

$ 1 ,

Address 1 0 0

Instruction ( $

Memory 2 Instruction )

pipeline.28

14 The Decode / Register Fetch Stage ° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100] You are here!

Clk Ifetch Reg/Dec Exec Mem

RegWr ExtOp ALUOp Branch

1 0 I P

PC+4 I F P D C PC+4 / C I + / E D Imm16 4

M Imm16 E x : x : e

Rs / Zero Data m R busA A M e

Ra / e W

g Mem I busB m U .

r 2 Exec n Rb RA Do

R M 1 i &

t Rt

Unit e WA e u

RFile g g 0 x i i x s Di s 1 t

Rw t

Di e

Rt e 0 r

0 r 0 I 0 Rd 1

RegDst ALUSrc MemWr MemtoReg

pipeline.29

Load’s Address Calculation Stage ° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

You are here!

Clk Ifetch Reg/Dec Exec Mem ALUOp=Add RegWr ExtOp=1 Branch

1 0 I E P

P PC+4 F x C PC+4 C / / I + M

D Imm16 4 Imm16 M e : I m e Rs D busA Zero Data m : A /

Ra / L

Mem W I

x busB o U

a R Exec r n Rb RA Do

d R e M 1 i ’ t

Rt g e Unit s WA i RFile u

g s A x t i

Di s e d t Rw Di r e Rt d

0 r 0 I r e

Rd s

1 s

RegDst=0 ALUSrc=1 MemWr MemtoReg

pipeline.30

15 A Detail View of the Execution Unit You are here!

Clk

Exec Mem E x

<< 2 / A Target M d e d m e r :

I 32

D PC+4 L o / E 32 a x busA d

’ R Zero s

e A 32 M g L i e s U m t ALUout

e busB o r 0 M 32 r 32 y E u

A x x d t e imm16 ALUctr d n

3 r d 1 e s e s 16 r 32 ALU Control

ExtOp=1 ALUSrc=1 3 ALUOp=Add

pipeline.31

Load’s Memory Access Stage ° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100] You are here!

Clk Ifetch Reg/Dec Exec Mem

RegWr ExtOp ALUOp Branch=0

1 0 I P

P PC+4 F M

C PC+4 C / I + e

D Imm16 4 m Imm16 E : I x / D W Rs busA / Zero Data M

A / E Ra r e Mem : I

x busB

m U L

R Exec n

Rb RA Do o R e M 1 a i t

Rt g e Unit WA d i u

RFile g ’ s s x i t s

Di e D t

Rw r Rt Di e a 0 r 0

I t a Rd 1

RegDst ALUSrc MemWr=0 MemtoReg

pipeline.32

16 Load’s Write Back Stage ° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

You are somewhere out there!

Clk Ifetch Reg/Dec Exec Mem Wr

RegWr=1 ExtOp ALUOp Branch

1 0 I P

P PC+4 F

C PC+4 C / I +

D Imm16 4 M Imm16 E : I x e D

Rs / busA Zero Data m M

A / E

Ra / e Mem W I

x busB m U

R Exec r n Rb RA Do

R R e M 1 i t

Rt g e

Unit WA e i u

RFile g g s x i t i s

Di s e t t

Rw r

Di e

Rt e r I 0 r 0 Rd 1

RegDst ALUSrc MemWr MemtoReg=1

pipeline.33

How About Control Signals? ° Key Observation: Control Signals at Stage N = Func (Instr. at Stage N) • N = Exec, Mem, or Wr

° Example: Controls Signals at Exec Stage = Func(Load’s Exec) Ifetch Reg/Dec Exec Mem Wr ALUOp=Add RegWr ExtOp=1 Branch

1 0 I E

P PC+4 F x C PC+4 / / I + P M

D Imm16 4 M C Imm16 e : I m e Rs D busA Zero Data m : A /

Ra / L

Mem W I

x busB o U

a R Exec r n Rb RA Do

d R e M 1 i ’ t

Rt g e Unit s WA i RFile u

g s A x t i

Di s e d t Rw Di r e Rt d

0 r 0 I r e

Rd s

1 s

RegDst=0 ALUSrc=1 MemWr MemtoReg

pipeline.34

17 Pipeline Control

° The Main Control generates the control signals during Reg/Dec • Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later • Control signals for Mem (MemWr Branch) are used 2 cycles later • Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

Reg/Dec Exec Mem Wr

ExtOp ExtOp

ALUSrc ALUSrc E M x I I e / D F ALUOp ALUOp M m / / I E e /

D Main W m

RegDst x RegDst

r Control R R

e R e e g MemWr g MemWr MemWr g e i i g s i s s t i t t e s

Branch e Branch Branch e t r r r e MemtoReg MemtoReg MemtoReg r MemtoReg RegWr RegWr RegWr RegWr

pipeline.35

Beginning of the Wr’s Stage: A Real World Problem

Clk Clk

RegAdr WrAdr

RegWr MemWr RegWr’s Clk-to-Q MemWr’s Clk-to-Q

RegAdr’s Clk-to-Q WrAdr’s Clk-to-Q M

RegWr E MemWr x e / m M / W e m r RegAdr Reg WrAdr Data Data File Data Memory

° At the beginning of the Wr stage, we have a problem if: • RegAdr’s (Rd or Rt) Clk-to-Q > RegWr’s Clk-to-Q

° Similarly, at the beginning of the Mem stage, we have a problem if: • WrAdr’s Clk-to-Q > MemWr’s Clk-to-Q

° We have a race condition between Address and Write Enable! pipeline.36

18 The Pipeline Problem

° Multiple Cycle design prevents race condition between Addr and WrEn: • Make sure Address is stable by the end of Cycle N • Asserts WrEn during Cycle N + 1

° This approach can NOT be used in the pipeline design because: • Must be able to write the register file every cycle • Must be able write the data memory every cycle

Clock

Store Ifetch Reg/Dec Exec Mem Wr

R-type Ifetch Reg/Dec Exec Mem Wr

pipeline.37

Synchronize Register File & Synchronize Memory

° Solution: And the Write Enable signal with the Clock • This is the ONLY place where gating the clock is used • MUST consult circuit expert to ensure no timing violation: - Example: Clock High Time > Write Access Delay

Clk Synchronize Memory and Register File Address, Data, and WrEn must be stable I_Addr at least 1 set-up time before the Clk edge I_WrEn Write occurs at the cycle following C_WrEn the clock edge that captures the signals

WrEn WrEn C_WrEn

I_WrEn Address Data Reg File Address I_Addr Reg File or Data I_Data or Clk Memory Memory

pipeline.38

19 A More Extensive Pipelining Example Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Clock

0: Load Ifetch Reg/Dec Exec Mem Wr

4: R-type Ifetch Reg/Dec Exec Mem Wr

8: Store Ifetch Reg/Dec Exec Mem Wr

12: Beq (target is 1000) Ifetch Reg/Dec Exec Mem Wr

End of End of End of End of Cycle 4 Cycle 5 Cycle 6 Cycle 7 ° End of Cycle 4: Load’s Mem, R-type’s Exec, Store’s Reg, Beq’s Ifetch

° End of Cycle 5: Load’s Wr, R-type’s Mem, Store’s Exec, Beq’s Reg

° End of Cycle 6: R-type’s Wr, Store’s Mem, Beq’s Exec

° End of Cycle 7: Store’s Wr, Beq’s Mem

pipeline.39

Pipelining Example: End of Cycle 4 ° 0: Load’s Mem 4: R-type’s Exec 8: Store’s Reg 12: Beq’s Ifetch

8: Store’s Reg 4: R-type’s Exec 0: Load’s Mem

12: Beq’s Ifet ALUOp=R-type RegWr=0 ExtOp=x Branch=0

Clk 1 0 E I

P PC+4 I D M F x C P PC+4 / / + / C M e E I Imm16 4 m D

x Imm16 = e : : / m

W 1 Rs S B busA Zero Data 6 : t

A e r

Ra o R q Mem : I r busB

- U L e I t ’ n y

Exec o n Rb RA Do s s M p a 1 i

t Rt b d e

r Unit WA u u RFile ’ ’ u s s x s c

Di A R D t

i Rw Di

Rt e o o &

0 s 0 n I u u

t B

Rd l 1 t

RegDst=1 ALUSrc=0 Clk MemtoReg=x MemWr=0 pipeline.40

20 Pipelining Example: End of Cycle 5 ° 0: Lw’s Wr 4: R’s Mem 8: Store’s Exec 12: Beq’s Reg 16: R’s Ifetch

12: Beq’s Reg 8: Store’s Exec 4: R-type’s Mem 0: Load’s Wr 16: R’s Ifet ALUOp=Add RegWr=1 ExtOp=1 Branch=0

Clk 1 0 E M

P PC+4 I I x C P

F PC+4 D e / + C m / M /

I Imm16 4 E

Imm16 / = e W x : m

2 Rs

I Zero Data

busA r B 0 : n :

Ra S e s

Mem R I t q busB t r U o - ’ u t s Exec r n

Rb RA Do y

c e M

b 1 i p t ’

t Rt u i s

Unit WA e o RFile u

s ’ A n x A Di s

d @ R Rw

Di & Rt d e

0 r 0 1 s I e B u 6 Rd s l 1 s t

RegDst=x ALUSrc=1 Clk MemtoReg=1 MemWr=0 pipeline.41

Pipelining Example: End of Cycle 6 ° 4: R’s Wr 8: Store’s Mem 12: Beq’s Exec 16: R’s Reg 20: R’s Ifet

16: R-type’s Reg 12: Beq’s Exec 8: Store’s Mem 20: 4: R-type’s Wr R-type’s Ifet ALUOp=Sub RegWr=1 ExtOp=1 Branch=0

Clk 1 0 I M

P PC+4 D I E C P PC+4 F / e x + C E / m / I Imm16 4 M x

D Imm16 = / : W : e R

2 Rs m I busA Zero Data - 4 r n

A t : :

Ra y s

Mem I N t p busB B U r e o e u Exec n ’ RA Do q

Rb t c s M h

i 1 ’ t

t Rt s i i b Unit WA n o u RFile u R n g x s

e A f @

Rw s Rt Di o u

r &

0 0 l I 2

t S 0

Rd B 1 t

RegDst=x ALUSrc=0 Clk MemtoReg=0 MemWr=1 pipeline.42

21 Pipelining Example: End of Cycle 7 ° 8: Store’s Wr 12: Beq’s Mem 16: R’s Exec 20: R’s Reg 24: R’s Ifet

20: R-type’s Reg 16: R-type’s Exec 12: Beq’s Mem 24: 8: Store’s Wr R-type’s Ifet ALUOp=R-type RegWr=0 ExtOp=x Branch=1

Clk 1 0 M I E P

P PC+4 D I x C PC+4 F e C / / m + E / M

I Imm16 4 = x D Imm16 / e W

: 1 : m R

0 Rs

I Zero Data busA r - 0 : n : A t

0 N

Ra y s R Mem I t p busB o U r t e y t u Exec h n ’ RA Do Rb p c s M i i 1 e t

n t Rt i b Unit ’ WA o g RFile u s u n

x f R s Di

o A @

Rw e Rt Di r

0 0 B u

I 2 l e 4

Rd B q

1 s

RegDst=1 ALUSrc=0 Clk MemtoReg=x MemWr=0 pipeline.43

The Delay Branch Phenomenon Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11 Clk

12: Beq Ifetch Reg/Dec Exec Mem Wr (target is 1000) 16: R-type Ifetch Reg/Dec Exec Mem Wr 20: R-type Ifetch Reg/Dec Exec Mem Wr

24: R-type Ifetch Reg/Dec Exec Mem Wr

1000: Target of Br Ifetch Reg/Dec Exec Mem Wr

° Although Beq is fetched during Cycle 4: • Target address is NOT written into the PC until the end of Cycle 7 • Branch’s target is NOT fetched until Cycle 8 • 3-instruction delay before the branch take effect

° This is referred to as Branch Hazard: • Clever design techniques can reduce the delay to ONE instruction

pipeline.44

22 The Delay Load Phenomenon

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Clock

I0: Load Ifetch Reg/Dec Exec Mem Wr

Plus 1 Ifetch Reg/Dec Exec Mem Wr

Plus 2 Ifetch Reg/Dec Exec Mem Wr

Plus 3 Ifetch Reg/Dec Exec Mem Wr

Plus 4 Ifetch Reg/Dec Exec Mem Wr

° Although Load is fetched during Cycle 1: • The data is NOT written into the Reg File until the end of Cycle 5 • We cannot read this value from the Reg File until Cycle 6 • 3-instruction delay before the load take effect

° This is referred to as Data Hazard: • Clever design techniques can reduce the delay to ONE instruction

pipeline.45

Summary

° Disadvantages of the Single Cycle Processor • Long cycle time • Cycle time is too long for all instructions except the Load

° Multiple Clock Cycle Processor: • Divide the instructions into smaller steps • Execute each step (instead of the entire instruction) in one cycle

° Pipeline Processor: • Natural enhancement of the multiple clock cycle processor • Each functional unit can only be used once per instruction • If a instruction is going to use a functional unit: - it must use it at the same stage as all other instructions • Pipeline Control: - Each stage’s control signal depends ONLY on the instruction that is currently in that stage

pipeline.46