Processor Design 5 Steps

3/2/15 Single-Cycle CPU Datapath Control 1 Review: Processor Design 5 steps Step 1: Analyze instruc>on set to determine datapath requirements – Meaning of each instruc>on is given by register transfers – Datapath must include storage element for ISA registers – Datapath must support each register transfer Step 2: Select set of datapath components & establish clock methodology Step 3: Assemble datapath components that meet the requirements Step 4: Analyze implementaon of each instruc>on to determine seng of control points that realizes the register transfer Step 5: Assemble the control logic 2 1 3/2/15 Register-Register Timing: One Complete Cycle (Add/Sub) Clk PC Old Value New Value Rs, Rt, Rd, Instruc>on Memory Access Time Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value RegWr Old Value New Value Register File Access Time busA, B Old Value New Value ALU Delay busW Old Value New Value RegWr Rd Rs Rt ALUctr 5 5 5 busA 32 Register Write busW Rw Ra Rb Occurs Here ALU 32 RegFile busB 32 clk 3 Register-Register Timing: One Complete Cycle Clk PC Old Value New Value Rs, Rt, Rd, Instruc>on Memory Access Time Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value RegWr Old Value New Value Register File Access Time busA, B Old Value New Value ALU Delay busW Old Value New Value RegWr Rd Rs Rt ALUctr 5 5 5 busA 32 Register Write busW Rw Ra Rb Occurs Here ALU 32 RegFile busB 32 clk 4 2 3/2/15 Pung it All Together:A Single Cycle Datapath <21:25> <16:20> <11:15> Inst <0:15> Instruction<31:0> Memory Adr Rs Rt Rd Imm16 nPC_sel RegDst Rd Rt Equal ALUctr MemtoReg 1 0 MemWr Rs Rt Adder RegWr 4 5 5 5 32 Rw Ra Rb busA = 00 busW ALU Mux 32 PC 32 RegFile busB 0 Adder 0 32 PC Ext clk Extender 32 WrEn Adr clk imm16 1 Data In Data 1 16 32 Memory clk imm16 ExtOp ALUSrc Datapath Control Signals • ExtOp: “zero”, “sign” • MemWr: 1 ⇒ write memory • ALUsrc: 0 ⇒ regB; • MemtoReg: 0 ⇒ ALU; 1 ⇒ Mem 1 ⇒ immed • RegDst: 0 ⇒ “rt”; 1 ⇒ “rd” • ALUctr: “ADD”, “SUB”, “OR” • RegWr: 1 ⇒ write register RegDst Rd Rt ALUctr MemtoReg Inst Address 1 0 MemWr nPC_sel & Equal RegWr Rs Rt 4 5 5 5 Adder busA 32 busW Rw Ra Rb ALU 32 0 00 RegFile busB Mux 32 0 PC 0 32 Adder Extender clk 32 1 WrEn Adr PC Ext clk imm16 1 Data In Data 1 16 32 Memory clk ExtOp ALUSrc 6 imm16 3 3/2/15 Control Signals Instruc>on<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst <0:15> Memory Adr Op Fun Rt Rs Rd Imm16 Control nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg DATA PATH 7 P&H Figure 4.17 8 4 3/2/15 Summary of the Control Signals (1/2) inst Register Transfer add R[rd] ← R[rs] + R[rt]; PC ← PC + 4 ALUsrc=RegB, ALUctr=“ADD”, RegDst=rd, RegWr, nPC_sel=“+4” sub R[rd] ← R[rs] – R[rt]; PC ← PC + 4 ALUsrc=RegB, ALUctr=“SUB”, RegDst=rd, RegWr, nPC_sel=“+4” ori R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 ALUsrc=Im, Extop=“Z”, ALUctr=“OR”, RegDst=rt,RegWr, nPC_sel=“+4” lw R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 ALUsrc=Im, Extop=“sn”, ALUctr=“ADD”, MemtoReg, RegDst=rt, RegWr, nPC_sel = “+4” sw MEM[ R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4 ALUsrc=Im, Extop=“sn”, ALUctr = “ADD”, MemWr, nPC_sel = “+4” beq if (R[rs] == R[rt]) then PC ← PC + sign_ext(Imm16)] || 00 else PC ← PC + 4 nPC_sel = “br”, ALUctr = “SUB” 9 Summary of the Control Signals See func 10 0000 10 0010 We Don’t Care :-) op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 add sub ori lw sw beq jump RegDst 1 1 0 0 x x x ALUSrc 0 0 1 1 1 0 x MemtoReg 0 0 0 1 x x x RegWrite 1 1 1 1 0 0 0 MemWrite 0 0 0 0 1 0 0 nPCsel 0 0 0 0 0 1 ? Jump 0 0 0 0 0 0 1 ExtOp x x 0 1 1 x x ALUctr<2:0> Add Subtract Or Add Add Subtract x 31 26 21 16 11 6 0 R-type op rs rt rd shamt funct add, sub I-type op rs rt immediate ori, lw, sw, beq J-type op target address jump 10 5 3/2/15 Boolean Expressions for Controller RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw nPCsel = beq Jump = jump ExtOp = lw + sw ALUctr[0] = sub + beq (assume ALUctr is 00 ADD, 01 SUB, 10 OR) ALUctr[1] = or Where: rtype = ~op5 • ~op4 • ~op3 • ~op2 • ~op1 • ~op0, ori = ~op5 • ~op4 • op3 • op2 • ~op1 • op0 How do we lw = op5 • ~op4 • ~op3 • ~op2 • op1 • op0 implement this in sw = op ~op op ~op op op 5 • 4 • 3 • 2 • 1 • 0 gates? beq = ~op5 • ~op4 • ~op3 • op2 • ~op1 • ~op0 jump = ~op5 • ~op4 • ~op3 • ~op2 • op1 • ~op0 add = rtype • func5 • ~func4 • ~func3 • ~func2 • ~func1 • ~func0 sub = rtype • func5 • ~func4 • ~func3 • ~func2 • func1 • ~func0 11 Controller Implementaon opcode func RegDst add ALUSrc sub MemtoReg ori RegWrite “AND” logic lw “OR” logic MemWrite nPCsel sw Jump beq ExtOp jump ALUctr[0] ALUctr[1] 12 6 3/2/15 Where Do Control Signals Come From? Instruc>on<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst <0:15> Memory Adr Op Fun Rt Rs Rd Imm16 Control nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg DATA PATH 13 Boolean Exprs for Controller Instruc>on<31:0> <0:5> <21:25> <16:20> <11:15> <26:31> Inst Op 0-5 are<0:15> really Instruction bits 26-31 Memory Adr Func 0-5 are really Instruction bits 0-5 Op Fun Rt Rs Rd Imm16 rtype = ~op5 • ~op4 • ~op3 • ~op2 • ~op1 • ~op0, ori = ~op5 • ~op4 • op3 • op2 • ~op1 • op0 lw = op5 • ~op4 • ~op3 • ~op2 • op1 • op0 sw = op5 • ~op4 • op3 • ~op2 • op1 • op0 beq = ~op5 • ~op4 • ~op3 • op2 • ~op1 • ~op0 jump = ~op5 • ~op4 • ~op3 • ~op2 • op1 • ~op0 add = rtype • func5 • ~func4 • ~func3 • ~func2 • ~func1 • ~func0 sub = rtype • func5 • ~func4 • ~func3 • ~func2 • func1 • ~func0 How doFall we 2011 implement -- Lecture #30 this in gates? 14 7 3/2/15 Boolean Exprs for Controller RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw nPCsel = beq Jump = jump ExtOp = lw + sw ALUctr[0] = sub + beq ALUctr[1] = ori (assume ALUctr is 00 ADD, 01 SUB, 10 OR) How do we implement this in gates? 15 Controller Implementaon opcode func RegDst add ALUSrc sub MemtoReg ori RegWrite “AND” logic lw “OR” logic MemWrite nPCsel sw Jump beq ExtOp jump ALUctr[0] ALUctr[1] 16 8 3/2/15 Review: Single-cycle Processor • Five steps to design a processor: 1. Analyze instruc>on set à Processor Input datapath requirements Control 2. Select set of datapath Memory components & establish clock methodology Datapath Output 3. Assemble datapath meeng the requirements 4. Analyze implementaon of each instruc>on to determine seng of control points that effects the register transfer. 5. Assemble the control logic • Formulate Logic Equaons • Design Circuits 17 Single Cycle Performance • Assume >me for ac>ons are – 100ps for register read or write; 200ps for other events • Clock rate is? Instr Instr fetch Register ALU op Memory Register Total time read access write lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps • What can we do to improve clock rate? • Will this improve performance as well? Want increased clock rate to mean faster programs 18 9 3/2/15 Single Cycle Performance • Assume >me for ac>ons are – 100ps for register read or write; 200ps for other events • Clock rate is? Instr Instr fetch Register ALU op Memory Register Total time read access write lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps • What can we do to improve clock rate? • Will this improve performance as well? Want increased clock rate to mean faster programs 19 Goma Do Laundry • Ann, Brian, Cathy, Dave each have one load of clothes to A B C D wash, dry, fold, and put away – Washer takes 30 minutes – Dryer takes 30 minutes – “Folder” takes 30 minutes – “Stasher” takes 30 minutes to put clothes into drawers 10 3/2/15 Sequen>al Laundry 6 PM 7 8 9 10 11 12 1 2 AM T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time a A s k B C O r D d e r • Sequen>al laundry takes 8 hours for 4 loads Pipelined Laundry 6 PM 7 8 9 10 11 12 1 2 AM Time T 30 30 30 30 30 30 30 a A s k B C O D r d e r • Pipelined laundry takes 3.5 hours for 4 loads! 11 3/2/15 Pipelining Lessons (1/2) 6 PM 7 8 9 • Pipelining doesn’t help latency Time of single task, it helps T throughput of en>re workload a 30 30 30 30 30 30 30 • Mul>ple tasks operang s A simultaneously using different k resources B • Poten>al speedup = Number O C r pipe stages d D • Time to “fill” pipeline and >me e to “drain” it reduces speedup: r 2.3X v.

Processor Design 5 Steps

Efficient Checker Processor Design

Three-Dimensional Integrated Circuit Design: EDA, Design And

Understanding Performance Numbers in Integrated Circuit Design Oprecomp Summer School 2019, Perugia Italy 5 September 2019

CSE 141L: Design Your Own Processor What You'll

Processor Architecture Design Using 3D Integration Technology

Processor Design：System-On-Chip Computing For

Design Software & Development Kit Selector Guide

Physical Design of a 3D-Stacked Heterogeneous Multi-Core Processor

Modern Processor Design: Fundamentals of Superscalar

Designing Digital Circuits a Modern Approach

Academic Definitions

Ultrafast Design Methodology Guide for the Vivado Design Suite