3/2/15

Single-Cycle CPU Control

1

Review: 5 steps Step 1: Analyze instrucon set to determine datapath requirements – Meaning of each instrucon is given by register transfers – Datapath must include storage element for ISA registers – Datapath must support each register transfer Step 2: Select set of datapath components & establish clock methodology Step 3: Assemble datapath components that meet the requirements Step 4: Analyze implementaon of each instrucon to determine seng of control points that realizes the register transfer Step 5: Assemble the control logic 2

1 3/2/15

Register-Register Timing: One Complete Cycle (Add/Sub) Clk

PC Old Value New Value Rs, Rt, Rd, Instrucon Memory Access Time Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value

RegWr Old Value New Value Access Time busA, B Old Value New Value ALU Delay busW Old Value New Value RegWr Rd Rs Rt ALUctr 5 5 5 busA 32 Register Write busW Rw Ra Rb Occurs Here ALU 32 RegFile busB 32 clk 3

Register-Register Timing: One Complete Cycle Clk

PC Old Value New Value Rs, Rt, Rd, Instrucon Memory Access Time Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value

RegWr Old Value New Value Register File Access Time busA, B Old Value New Value ALU Delay busW Old Value New Value RegWr Rd Rs Rt ALUctr 5 5 5 busA 32 Register Write busW Rw Ra Rb Occurs Here ALU 32 RegFile busB 32 clk 4

2 3/2/15

Pung it All Together:A Single Cycle Datapath <21:25> <16:20> <11:15> Inst <0:15> Instruction<31:0> Memory Adr Rs Rt Rd Imm16 nPC_sel RegDst Rd Rt Equal ALUctr MemtoReg 1 0 MemWr Rs Rt

Adder RegWr 4 5 5 5 32 Rw Ra Rb busA = 00

busW ALU

Mux 32 PC 32 RegFile busB 0

Adder 0 32 PC Ext clk Extender 32 WrEn Adr clk imm16 1 Data In Data 1 16 32 Memory clk imm16 ExtOp ALUSrc

Datapath Control Signals • ExtOp: “zero”, “sign” • MemWr: 1 ⇒ write memory • ALUsrc: 0 ⇒ regB; • MemtoReg: 0 ⇒ ALU; 1 ⇒ Mem 1 ⇒ immed • RegDst: 0 ⇒ “rt”; 1 ⇒ “rd” • ALUctr: “ADD”, “SUB”, “OR” • RegWr: 1 ⇒ write register

RegDst Rd Rt ALUctr MemtoReg Inst Address 1 0 MemWr

nPC_sel & Equal RegWr Rs Rt 4 5 5 5 Adder busA 32 busW Rw Ra Rb ALU 32 0 00 RegFile busB Mux 32 0 PC 0 32 Adder Extender clk 32 1 WrEn Adr PC Ext clk imm16 1 Data In Data 1 16 32 Memory clk ExtOp ALUSrc 6 imm16

3 3/2/15

Control Signals

Instrucon<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst <0:15> Memory Adr Op Fun Rt Rs Rd Imm16

Control nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg

DATA PATH

7

P&H Figure 4.17

8

4 3/2/15

Summary of the Control Signals (1/2) inst Register Transfer add R[rd] ← R[rs] + R[rt]; PC ← PC + 4 ALUsrc=RegB, ALUctr=“ADD”, RegDst=rd, RegWr, nPC_sel=“+4” sub R[rd] ← R[rs] – R[rt]; PC ← PC + 4 ALUsrc=RegB, ALUctr=“SUB”, RegDst=rd, RegWr, nPC_sel=“+4” ori R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 ALUsrc=Im, Extop=“Z”, ALUctr=“OR”, RegDst=rt,RegWr, nPC_sel=“+4” lw R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 ALUsrc=Im, Extop=“sn”, ALUctr=“ADD”, MemtoReg, RegDst=rt, RegWr, nPC_sel = “+4” sw MEM[ R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4 ALUsrc=Im, Extop=“sn”, ALUctr = “ADD”, MemWr, nPC_sel = “+4” beq if (R[rs] == R[rt]) then PC ← PC + sign_ext(Imm16)] || 00 else PC ← PC + 4 nPC_sel = “br”, ALUctr = “SUB”

9

Summary of the Control Signals

See func 10 0000 10 0010 We Don’t Care :-) op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 add sub ori lw sw beq jump RegDst 1 1 0 0 x x x ALUSrc 0 0 1 1 1 0 x MemtoReg 0 0 0 1 x x x RegWrite 1 1 1 1 0 0 0 MemWrite 0 0 0 0 1 0 0 nPCsel 0 0 0 0 0 1 ? Jump 0 0 0 0 0 0 1 ExtOp x x 0 1 1 x x ALUctr<2:0> Add Subtract Or Add Add Subtract x

31 26 21 16 11 6 0 R-type op rs rt rd shamt funct add, sub

I-type op rs rt immediate ori, lw, sw, beq

J-type op target address jump 10

5 3/2/15

Boolean Expressions for Controller

RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw nPCsel = beq Jump = jump ExtOp = lw + sw ALUctr[0] = sub + beq (assume ALUctr is 00 ADD, 01 SUB, 10 OR) ALUctr[1] = or Where: rtype = ~op5 • ~op4 • ~op3 • ~op2 • ~op1 • ~op0, ori = ~op5 • ~op4 • op3 • op2 • ~op1 • op0 How do we lw = op5 • ~op4 • ~op3 • ~op2 • op1 • op0 implement this in sw = op ~op op ~op op op 5 • 4 • 3 • 2 • 1 • 0 gates? beq = ~op5 • ~op4 • ~op3 • op2 • ~op1 • ~op0 jump = ~op5 • ~op4 • ~op3 • ~op2 • op1 • ~op0 add = rtype • func5 • ~func4 • ~func3 • ~func2 • ~func1 • ~func0 sub = rtype • func5 • ~func4 • ~func3 • ~func2 • func1 • ~func0 11

Controller Implementaon

opcode func

RegDst add ALUSrc sub MemtoReg ori RegWrite “AND” logic lw “OR” logic MemWrite nPCsel sw Jump beq ExtOp jump ALUctr[0] ALUctr[1]

12

6 3/2/15

Where Do Control Signals Come From? Instrucon<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst <0:15> Memory Adr Op Fun Rt Rs Rd Imm16

Control

nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg

DATA PATH

13

Boolean Exprs for Controller

Instrucon<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst Op 0-5 are<0:15> really Instruction bits 26-31 Memory Adr Func 0-5 are really Instruction bits 0-5 Op Fun Rt Rs Rd Imm16 rtype = ~op5 • ~op4 • ~op3 • ~op2 • ~op1 • ~op0, ori = ~op5 • ~op4 • op3 • op2 • ~op1 • op0 lw = op5 • ~op4 • ~op3 • ~op2 • op1 • op0 sw = op5 • ~op4 • op3 • ~op2 • op1 • op0 beq = ~op5 • ~op4 • ~op3 • op2 • ~op1 • ~op0 jump = ~op5 • ~op4 • ~op3 • ~op2 • op1 • ~op0 add = rtype • func5 • ~func4 • ~func3 • ~func2 • ~func1 • ~func0 sub = rtype • func5 • ~func4 • ~func3 • ~func2 • func1 • ~func0

How doFall we 2011 implement -- Lecture #30 this in gates? 14

7 3/2/15

Boolean Exprs for Controller

RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw nPCsel = beq Jump = jump ExtOp = lw + sw ALUctr[0] = sub + beq ALUctr[1] = ori (assume ALUctr is 00 ADD, 01 SUB, 10 OR) How do we implement this in gates?

15

Controller Implementaon

opcode func

RegDst add ALUSrc sub MemtoReg ori RegWrite “AND” logic lw “OR” logic MemWrite nPCsel sw Jump beq ExtOp jump ALUctr[0] ALUctr[1]

16

8 3/2/15

Review: Single-cycle Processor

• Five steps to design a processor: 1. Analyze instrucon set à Processor Input datapath requirements Control 2. Select set of datapath Memory components & establish clock methodology Datapath Output 3. Assemble datapath meeng the requirements 4. Analyze implementaon of each instrucon to determine seng of control points that effects the register transfer. 5. Assemble the control logic • Formulate Logic Equaons • Design Circuits 17

Single Cycle Performance • Assume me for acons are – 100ps for register read or write; 200ps for other events • is? Instr Instr fetch Register ALU op Memory Register Total time read access write lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps

• What can we do to improve clock rate? • Will this improve performance as well? Want increased clock rate to mean faster programs

18

9 3/2/15

Single Cycle Performance • Assume me for acons are – 100ps for register read or write; 200ps for other events • Clock rate is? Instr Instr fetch Register ALU op Memory Register Total time read access write lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps

• What can we do to improve clock rate? • Will this improve performance as well? Want increased clock rate to mean faster programs

19

Goa Do Laundry • Ann, Brian, Cathy, Dave each have one load of clothes to A B C D wash, dry, fold, and put away – Washer takes 30 minutes

– Dryer takes 30 minutes

– “Folder” takes 30 minutes

– “Stasher” takes 30 minutes to put clothes into drawers

10 3/2/15

Sequenal Laundry 6 PM 7 8 9 10 11 12 1 2 AM

T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time a A s k B C O r D d e r • Sequenal laundry takes 8 hours for 4 loads

Pipelined Laundry 6 PM 7 8 9 10 11 12 1 2 AM

Time T 30 30 30 30 30 30 30 a A s k B C O D r d e r • Pipelined laundry takes 3.5 hours for 4 loads!

11 3/2/15

Pipelining Lessons (1/2) 6 PM 7 8 9 • Pipelining doesn’t help latency Time of single task, it helps T throughput of enre workload a 30 30 30 30 30 30 30 • Mulple tasks operang s A simultaneously using different k resources B • Potenal speedup = Number O C r pipe stages d D • Time to “fill” and me e to “drain” it reduces speedup: r 2.3X v. 4X in this example

Pipelining Lessons (2/2) 6 PM 7 8 9 • Suppose new Washer T Time takes 20 minutes, new a 30 30 30 30 30 30 30 Stasher takes 20 s A minutes. How much k faster is pipeline? B • Pipeline rate limited by O C r slowest pipeline stage d D • Unbalanced lengths of e pipe stages reduces r speedup

12 3/2/15

Steps in Execung MIPS 1) IFtch: Instrucon Fetch, Increment PC 2) Dcd: Instrucon Decode, Read Registers 3) Exec: Mem-ref: Calculate Address Arith-log: Perform Operaon 4) Mem: Load: Read Data from Memory Store: Write Data to Memory 5) WB: Write Data Back to Register

Single Cycle Datapath

rd

PC rs ALU rt registers Data memory instruction memory +4 imm

1. Instruction 5. Write 2. Decode/ 3. Execute 4. Memory Fetch Register Read Back

13 3/2/15

Pipeline registers

rd

PC rs ALU rt registers Data memory instruction memory +4 imm

1. Instruction 5. Write 2. Decode/ 3. Execute 4. Memory Fetch Register Read Back • Need registers between stages – To hold informaon produced in previous cycle

More Detailed Pipeline

14 3/2/15

IF for Load, Store, …

ID for Load, Store, …

15 3/2/15

EX for Load

MEM for Load

16 3/2/15

WB for Load – Oops!

Wrong register number

Corrected Datapath for Load

17 3/2/15

So, in conclusion • You now know how to implement the control logic for the single-cycle CPU. – (actually, you already knew it!) • Pipelining improves performance by increasing instrucon throughput: exploits ILP – Executes mulple instrucons in parallel – Each instrucon has the same latency • Next: hazards in pipelining: – Structure, data, control

35

18