18-447 Lecture 6: Microprogrammed Multi-Cycle Implementation

18‐447 Lecture 6: Microprogrammed Multi‐Cycle Implementation James C. Hoe Department of ECE Carnegie Mellon University 18‐447‐S21‐L06‐S1, James C. Hoe, CMU/ECE/CALCM, ©2021 Housekeeping •Your goal today – understand why VAX was possible and reasonable •Notices –Lab 1, Part A, due this week –Lab 1, Part B, due next week –HW1,due Monday 2/22 • Readings –P&H Appendix C –Start reading the rest of P&H Ch 4 18‐447‐S21‐L06‐S2, James C. Hoe, CMU/ECE/CALCM, ©2021 “Single‐Cycle” Datapath: Is it any good? PCSrc1=Jump Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 10 Add RegDst Shift PCSrc2=Br Taken Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read data 1 Instruction [20– 16] Read register 2 bcondZero Instruction 0 Registers Read ALU [31– 0] 0 ALU M Write data 2 result Address Read 1 Instruction register M data u M memory x u Instruction [15– 11] Write x u 1 Data x data 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU ALU operation control Neither fast nor cheap, Instruction [5– 0] and not even simplest 18‐447‐S21‐L06‐S3, James C. Hoe, CMU/ECE/CALCM, ©2021 [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Go Fast(er)!! 18‐447‐S21‐L06‐S4, James C. Hoe, CMU/ECE/CALCM, ©2021 Iron Law of Processor Performance • time/program = (inst/program) (cyc/inst) (time/cyc) 1/IPC 1/MIPS 1/GHz note workload dependence • Contributing factors –time/cyc: architecture and implementation – cyc/inst: architecture, implementation, instruction mix – inst/program: architecture, nature and quality of prgm • **Note**: cyc/inst is a workload average potentially large instantaneous variations due to instruction type and sequence 18‐447‐S21‐L06‐S5, James C. Hoe, CMU/ECE/CALCM, ©2021 Worst‐Case Critical Path PCSrc1=Jump Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 10 Add RegDst Shift PCSrc2=Br Taken Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read data 1 Instruction [20– 16] Read register 2 bcondZero Instruction 0 Registers Read ALU [31– 0] 0 ALU M Write data 2 result Address Read 1 Instruction register M data u M memory x u Instruction [15– 11] Write x u 1 Data x data 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU ALU operation control Instruction [5– 0] 18‐447‐S21‐L06‐S6, James C. Hoe, CMU/ECE/CALCM, ©2021 [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Single‐Cycle Datapath Analysis • Assume (numbers from P&H) –memory units (read or write): 200 ps –ALU and adders: 100 ps –register file (read or write): 50 ps –other combinational logic: 0 ps steps IF ID EX MEM WB Delay resources mem RF ALU mem RF R‐type 200 50 100 50 400 I‐type 200 50 100 50 400 LW 200 50 100 200 50 600 SW 200 50 100 200 550 Bxx 200 50 100 350 JALR 200 50 100 50 350 JAL 200 100 50 300 18‐447‐S21‐L06‐S7, James C. Hoe, CMU/ECE/CALCM, ©2021 Single‐Cycle Implementations • Good match for the sequential and atomic semantics of ISAs – instantiate programmer‐visible state one‐for‐one –map instructions to combinational next‐state logic •But, contrived and inefficient 1. all instructions run as slow as slowest instruction 2. must provide worst‐case combinational resource in parallel as required by any one instruction 3. what about CISC ISAs? polyf? Not the fastest, cheapest or even the simplest way 18‐447‐S21‐L06‐S8, James C. Hoe, CMU/ECE/CALCM, ©2021 Multi‐cycle Implementation: Ver 1.0 •Each instruction type take only as much time as needed –run a 50 psec clock –each instruction type take as many 50‐psec clock cycles as needed •Add “MasterEnable” signal so architectural state ignores clock edges until after enough time –an instruction’s effect is still purely combinational from state to state – all other control signal unaffected 18‐447‐S21‐L06‐S9, James C. Hoe, CMU/ECE/CALCM, ©2021 Multi‐Cycle Datapath: Ver 1.0 PCSrc1=Jump Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 10 Add RegDst Shift PCSrc =Br Taken Jump left 2 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read data 1 Instruction [20– 16] Read register 2 bcondZero Instruction 0 Registers Read ALU [31– 0] 0 ALU M Write data 2 result Address Read 1 Instruction register M data u M memory x u Instruction [15– 11] Write x u 1 Data x data 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU control Instruction [5– 0] MasterEn 18‐447‐S21‐L06‐S10, James C. Hoe, CMU/ECE/CALCM, ©2021 [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Sequential Control: Ver 1.0 IF1 IF2 IF3 IF4 ID JAL EX1 Bxx or JAL or JALR / EX2 MasterEn=1 WB LW or SW I‐type or R‐type LW MEM MEM MEM MEM SW / 4 3 2 1 MasterEn=1 18‐447‐S21‐L06‐S11, James C. Hoe, CMU/ECE/CALCM, ©2021 Performance Analysis •Iron Law: time/program = (inst/program) (cyc/inst) (time/cyc) •For same ISA, inst/program is the same; okay to compare MIPS = IPC ×fclk in MHz frequency in MHz million instructions instructions per cycle per second 18‐447‐S21‐L06‐S12, James C. Hoe, CMU/ECE/CALCM, ©2021 Performance Analysis •Single‐Cycle Implementation 1 × 1,667MHz = 1667 MIPS •Multi‐Cycle Implementation IPCavg × 20,000 MHz = 2178 MIPS what is IPCaverage? • Assume: 25% LW, 15% SW, 40% ALU, 13.3% Branch, 6.7% Jumps [Agerwala and Cocke, 1987] –weighted arithmetic mean of CPI 9.18 –weighted harmonic mean of IPC 0.109 –weighted arithmetic mean of IPC 0.115 18‐447‐S21‐L06‐S13, James C. Hoe, CMU/ECE/CALCM, ©2021 MIPS = IPC ×fclk Microsequencer: Ver 1.0 •ROM as a combinational logic lookup table ** ROM size grows as O(2n) as the literally Combinational control logic MasterEnDatapath controlnumber outputs of inputs holds the 2n rows truth table OutputsOutputsm ** ROM size grows as O(m) as the number of outputs addressn / inputInputs Next state Inputs from instruction State register register opcode field 18‐447‐S21‐L06‐S14, James C. Hoe, CMU/ECE/CALCM, ©2021 [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Microcoding: Ver 0 (note: this is only about counting clock ticks) state cntrl conditional targets label flow R/I‐type LW SW Bxx JALR JAL IF1 next ‐‐‐‐‐‐ IF2 next ‐‐‐‐‐‐ IF3 next ‐‐‐‐‐‐ IF4 goto ID ID ID ID ID EX1 ID next ‐‐‐‐‐‐ EX1 next ‐‐‐‐‐‐ EX2 goto WB MEM1 MEM1 IF1 IF1 IF1 MEM1 next ‐‐‐‐‐‐ MEM2 next ‐‐‐‐‐‐ MEM3 next ‐‐‐‐‐‐ MEM4 goto ‐ WB IF1 ‐‐‐ WB goto IF1 IF1 ‐‐‐‐ CPI 8 12 11 7 7 6 A systematic approach to FSM sequencing/control 18‐447‐S21‐L06‐S15, James C. Hoe, CMU/ECE/CALCM, ©2021 Microcontroller/Microsequencer •A stripped‐down “processor” for sequencing and control – control states are like PC – PC indexed into a program ROM to select an instruction Microcode – program state and storage Datapath Outputs control well‐formed control‐flow outputs support (branch, jump) Input 1 – fields in the instruction Sequencing Microprogram counter control maps to control signals Adder Address select logic •Very elaborate controllers Inputs from instruction have been built register opcode field [Based on original figure from P&H CO&D, COPYRIGHT 18‐447‐S21‐L06‐S16, James C. Hoe, CMU/ECE/CALCM, ©2021 2004 Elsevier. ALL RIGHTS RESERVED.] Go Cheap!! (And More Capable) 18‐447‐S21‐L06‐S17, James C. Hoe, CMU/ECE/CALCM, ©2021 Reducing Datapath by Resource Reuse How to reuse same adder for two additions in one instruction Add 4 ALU operatio Read 3 Read register 1 PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write result register Instruction Read memory Write data 2 data RegWrite ALUSrc 16 32 Sign isItype extend “Single‐cycle” reused same adder for different instructions 18‐447‐S21‐L06‐S18, James C. Hoe, CMU/ECE/CALCM, ©2021 [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Reducing Datapath by Sequential Reuse Add 4 ALU control Read 3 Read register 1 PC address Read Read data 1 register 2 Zero Instruction Registers ALU ALU IR Write result Instruction register Read memory Write data 2 data 4 RegWrite to IR or not to IR? 18‐447‐S21‐L06‐S19, James C. Hoe, CMU/ECE/CALCM, ©2021 [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Removing Redundancies rd rs1 rs2 immed •Latch Enables: PC, IR, MDR, A, B, ALUOut, RegWr, MemWr • Steering: ALUSrc1{RF,PC}, ALUSrc2{RF, immed}, MAddrSrc{PC, ALUOut}, RFDataSrc{ALUOut, MDR} Could also reduce down to a single register read‐write port! 18‐447‐S21‐L06‐S20, James C. Hoe, CMU/ECE/CALCM, ©2021 [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier.

Load more