3/2/15
Single-Cycle CPU Datapath Control
1
Review: Processor Design 5 steps Step 1: Analyze instruc on set to determine datapath requirements – Meaning of each instruc on is given by register transfers – Datapath must include storage element for ISA registers – Datapath must support each register transfer Step 2: Select set of datapath components & establish clock methodology Step 3: Assemble datapath components that meet the requirements Step 4: Analyze implementa on of each instruc on to determine se ng of control points that realizes the register transfer Step 5: Assemble the control logic 2
1 3/2/15
Register-Register Timing: One Complete Cycle (Add/Sub) Clk
PC Old Value New Value Rs, Rt, Rd, Instruc on Memory Access Time Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value
RegWr Old Value New Value Register File Access Time busA, B Old Value New Value ALU Delay busW Old Value New Value RegWr Rd Rs Rt ALUctr 5 5 5 busA 32 Register Write busW Rw Ra Rb Occurs Here ALU 32 RegFile busB 32 clk 3
Register-Register Timing: One Complete Cycle Clk
PC Old Value New Value Rs, Rt, Rd, Instruc on Memory Access Time Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value
RegWr Old Value New Value Register File Access Time busA, B Old Value New Value ALU Delay busW Old Value New Value RegWr Rd Rs Rt ALUctr 5 5 5 busA 32 Register Write busW Rw Ra Rb Occurs Here ALU 32 RegFile busB 32 clk 4
2 3/2/15
Pu ng it All Together:A Single Cycle Datapath <21:25> <16:20> <11:15> Inst <0:15> Instruction<31:0> Memory Adr Rs Rt Rd Imm16 nPC_sel RegDst Rd Rt Equal ALUctr MemtoReg 1 0 MemWr Rs Rt
Adder RegWr 4 5 5 5 32 Rw Ra Rb busA = 00
busW ALU
Mux 32 PC 32 RegFile busB 0
Adder 0 32 PC Ext clk Extender 32 WrEn Adr clk imm16 1 Data In Data 1 16 32 Memory clk imm16 ExtOp ALUSrc
Datapath Control Signals • ExtOp: “zero”, “sign” • MemWr: 1 ⇒ write memory • ALUsrc: 0 ⇒ regB; • MemtoReg: 0 ⇒ ALU; 1 ⇒ Mem 1 ⇒ immed • RegDst: 0 ⇒ “rt”; 1 ⇒ “rd” • ALUctr: “ADD”, “SUB”, “OR” • RegWr: 1 ⇒ write register
RegDst Rd Rt ALUctr MemtoReg Inst Address 1 0 MemWr
nPC_sel & Equal RegWr Rs Rt 4 5 5 5 Adder busA 32 busW Rw Ra Rb ALU 32 0 00 RegFile busB Mux 32 0 PC 0 32 Adder Extender clk 32 1 WrEn Adr PC Ext clk imm16 1 Data In Data 1 16 32 Memory clk ExtOp ALUSrc 6 imm16
3 3/2/15
Control Signals
Instruc on<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst <0:15> Memory Adr Op Fun Rt Rs Rd Imm16
Control nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg
DATA PATH
7
P&H Figure 4.17
8
4 3/2/15
Summary of the Control Signals (1/2) inst Register Transfer add R[rd] ← R[rs] + R[rt]; PC ← PC + 4 ALUsrc=RegB, ALUctr=“ADD”, RegDst=rd, RegWr, nPC_sel=“+4” sub R[rd] ← R[rs] – R[rt]; PC ← PC + 4 ALUsrc=RegB, ALUctr=“SUB”, RegDst=rd, RegWr, nPC_sel=“+4” ori R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 ALUsrc=Im, Extop=“Z”, ALUctr=“OR”, RegDst=rt,RegWr, nPC_sel=“+4” lw R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 ALUsrc=Im, Extop=“sn”, ALUctr=“ADD”, MemtoReg, RegDst=rt, RegWr, nPC_sel = “+4” sw MEM[ R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4 ALUsrc=Im, Extop=“sn”, ALUctr = “ADD”, MemWr, nPC_sel = “+4” beq if (R[rs] == R[rt]) then PC ← PC + sign_ext(Imm16)] || 00 else PC ← PC + 4 nPC_sel = “br”, ALUctr = “SUB”
9
Summary of the Control Signals
See func 10 0000 10 0010 We Don’t Care :-) op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 add sub ori lw sw beq jump RegDst 1 1 0 0 x x x ALUSrc 0 0 1 1 1 0 x MemtoReg 0 0 0 1 x x x RegWrite 1 1 1 1 0 0 0 MemWrite 0 0 0 0 1 0 0 nPCsel 0 0 0 0 0 1 ? Jump 0 0 0 0 0 0 1 ExtOp x x 0 1 1 x x ALUctr<2:0> Add Subtract Or Add Add Subtract x
31 26 21 16 11 6 0 R-type op rs rt rd shamt funct add, sub
I-type op rs rt immediate ori, lw, sw, beq
J-type op target address jump 10
5 3/2/15
Boolean Expressions for Controller
RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw nPCsel = beq Jump = jump ExtOp = lw + sw ALUctr[0] = sub + beq (assume ALUctr is 00 ADD, 01 SUB, 10 OR) ALUctr[1] = or Where: rtype = ~op5 • ~op4 • ~op3 • ~op2 • ~op1 • ~op0, ori = ~op5 • ~op4 • op3 • op2 • ~op1 • op0 How do we lw = op5 • ~op4 • ~op3 • ~op2 • op1 • op0 implement this in sw = op ~op op ~op op op 5 • 4 • 3 • 2 • 1 • 0 gates? beq = ~op5 • ~op4 • ~op3 • op2 • ~op1 • ~op0 jump = ~op5 • ~op4 • ~op3 • ~op2 • op1 • ~op0 add = rtype • func5 • ~func4 • ~func3 • ~func2 • ~func1 • ~func0 sub = rtype • func5 • ~func4 • ~func3 • ~func2 • func1 • ~func0 11
Controller Implementa on
opcode func
RegDst add ALUSrc sub MemtoReg ori RegWrite “AND” logic lw “OR” logic MemWrite nPCsel sw Jump beq ExtOp jump ALUctr[0] ALUctr[1]
12
6 3/2/15
Where Do Control Signals Come From? Instruc on<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst <0:15> Memory Adr Op Fun Rt Rs Rd Imm16
Control
nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg
DATA PATH
13
Boolean Exprs for Controller
Instruc on<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst Op 0-5 are<0:15> really Instruction bits 26-31 Memory Adr Func 0-5 are really Instruction bits 0-5 Op Fun Rt Rs Rd Imm16 rtype = ~op5 • ~op4 • ~op3 • ~op2 • ~op1 • ~op0, ori = ~op5 • ~op4 • op3 • op2 • ~op1 • op0 lw = op5 • ~op4 • ~op3 • ~op2 • op1 • op0 sw = op5 • ~op4 • op3 • ~op2 • op1 • op0 beq = ~op5 • ~op4 • ~op3 • op2 • ~op1 • ~op0 jump = ~op5 • ~op4 • ~op3 • ~op2 • op1 • ~op0 add = rtype • func5 • ~func4 • ~func3 • ~func2 • ~func1 • ~func0 sub = rtype • func5 • ~func4 • ~func3 • ~func2 • func1 • ~func0
How doFall we 2011 implement -- Lecture #30 this in gates? 14
7 3/2/15
Boolean Exprs for Controller
RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw nPCsel = beq Jump = jump ExtOp = lw + sw ALUctr[0] = sub + beq ALUctr[1] = ori (assume ALUctr is 00 ADD, 01 SUB, 10 OR) How do we implement this in gates?
15
Controller Implementa on
opcode func
RegDst add ALUSrc sub MemtoReg ori RegWrite “AND” logic lw “OR” logic MemWrite nPCsel sw Jump beq ExtOp jump ALUctr[0] ALUctr[1]
16
8 3/2/15
Review: Single-cycle Processor
• Five steps to design a processor: 1. Analyze instruc on set à Processor Input datapath requirements Control 2. Select set of datapath Memory components & establish clock methodology Datapath Output 3. Assemble datapath mee ng the requirements 4. Analyze implementa on of each instruc on to determine se ng of control points that effects the register transfer. 5. Assemble the control logic • Formulate Logic Equa ons • Design Circuits 17
Single Cycle Performance • Assume me for ac ons are – 100ps for register read or write; 200ps for other events • Clock rate is? Instr Instr fetch Register ALU op Memory Register Total time read access write lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps
• What can we do to improve clock rate? • Will this improve performance as well? Want increased clock rate to mean faster programs
18
9 3/2/15
Single Cycle Performance • Assume me for ac ons are – 100ps for register read or write; 200ps for other events • Clock rate is? Instr Instr fetch Register ALU op Memory Register Total time read access write lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps
• What can we do to improve clock rate? • Will this improve performance as well? Want increased clock rate to mean faster programs
19
Go a Do Laundry • Ann, Brian, Cathy, Dave each have one load of clothes to A B C D wash, dry, fold, and put away – Washer takes 30 minutes
– Dryer takes 30 minutes
– “Folder” takes 30 minutes
– “Stasher” takes 30 minutes to put clothes into drawers
10 3/2/15
Sequen al Laundry 6 PM 7 8 9 10 11 12 1 2 AM
T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time a A s k B C O r D d e r • Sequen al laundry takes 8 hours for 4 loads
Pipelined Laundry 6 PM 7 8 9 10 11 12 1 2 AM
Time T 30 30 30 30 30 30 30 a A s k B C O D r d e r • Pipelined laundry takes 3.5 hours for 4 loads!
11 3/2/15
Pipelining Lessons (1/2) 6 PM 7 8 9 • Pipelining doesn’t help latency Time of single task, it helps T throughput of en re workload a 30 30 30 30 30 30 30 • Mul ple tasks opera ng s A simultaneously using different k resources B • Poten al speedup = Number O C r pipe stages d D • Time to “fill” pipeline and me e to “drain” it reduces speedup: r 2.3X v. 4X in this example
Pipelining Lessons (2/2) 6 PM 7 8 9 • Suppose new Washer T Time takes 20 minutes, new a 30 30 30 30 30 30 30 Stasher takes 20 s A minutes. How much k faster is pipeline? B • Pipeline rate limited by O C r slowest pipeline stage d D • Unbalanced lengths of e pipe stages reduces r speedup
12 3/2/15
Steps in Execu ng MIPS 1) IFtch: Instruc on Fetch, Increment PC 2) Dcd: Instruc on Decode, Read Registers 3) Exec: Mem-ref: Calculate Address Arith-log: Perform Opera on 4) Mem: Load: Read Data from Memory Store: Write Data to Memory 5) WB: Write Data Back to Register
Single Cycle Datapath
rd
PC rs ALU rt registers Data memory instruction memory +4 imm
1. Instruction 5. Write 2. Decode/ 3. Execute 4. Memory Fetch Register Read Back
13 3/2/15
Pipeline registers
rd
PC rs ALU rt registers Data memory instruction memory +4 imm
1. Instruction 5. Write 2. Decode/ 3. Execute 4. Memory Fetch Register Read Back • Need registers between stages – To hold informa on produced in previous cycle
More Detailed Pipeline
14 3/2/15
IF for Load, Store, …
ID for Load, Store, …
15 3/2/15
EX for Load
MEM for Load
16 3/2/15
WB for Load – Oops!
Wrong register number
Corrected Datapath for Load
17 3/2/15
So, in conclusion • You now know how to implement the control logic for the single-cycle CPU. – (actually, you already knew it!) • Pipelining improves performance by increasing instruc on throughput: exploits ILP – Executes mul ple instruc ons in parallel – Each instruc on has the same latency • Next: hazards in pipelining: – Structure, data, control
35
18