18-741 Advanced Computer Architecture Lecture 1: Intro And

18-447 Computer Architecture Lecture 6: Multi-cycle Microarchitectures Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/28/2013 Reminder: Homeworks Homework 1 Due today, midnight Turn in via AFS (hand-in directories) Homework 2 Will be assigned later today. Stay tuned… ISA concepts, ISA vs. microarchitecture, microcoded machines 2 Reminder: Lab Assignment 1 Due this Friday (Feb 1), at the end of Friday lab A functional C-level simulator for a subset of the MIPS ISA Study the MIPS ISA Tutorial TAs will continue to cover this in Lab Sessions this week 3 Lookahead: Lab Assignment 2 Lab Assignment 1.5 Verilog practice Not to be turned in Lab Assignment 2 Due Feb 15 Single-cycle MIPS implementation in Verilog All labs are individual assignments No collaboration; please respect the honor code 4 Lookahead: Extra Credit for Lab Assignment 2 Complete your normal (single-cycle) implementation first, and get it checked off in lab. Then, implement the MIPS core using a microcoded approach similar to what we will discuss in class. We are not specifying any particular details of the microcode format or the microarchitecture; you can be creative. For the extra credit, the microcoded implementation should execute the same programs that your ordinary implementation does, and you should demo it by the normal lab deadline. 5 Readings for Today P&P, Revised Appendix C Microarchitecture of the LC-3b Appendix A (LC-3b ISA) will be useful in following this P&H, Appendix D Mapping Control to Hardware Optional Maurice Wilkes, “The Best Way to Design an Automatic Calculating Machine,” Manchester Univ. Computer Inaugural Conf., 1951. 6 Lookahead: Readings for A Next Lecture Pipelining P&H Chapter 4.5-4.8 7 Review of Last Lecture: Single-Cycle Uarch What phases of the instruction processing cycle does the MIPS JAL instruction exercise? How many cycles does it take to process an instruction in the single-cycle microarchitecture? What determines the clock cycle time? What is the difference between datapath and control logic? What about combinational vs. sequential control? What is the semantics of a delayed branch? Why this is so will become clear when we cover pipelining 8 Review: Instruction Processing “Cycle” Instructions are processed under the direction of a “control unit” step by step. Instruction cycle: Sequence of steps to process an instruction Fundamentally, there are six phases: Fetch Decode Evaluate Address Fetch Operands Execute Store Result Not all instructions require all six stages (see P&P Ch. 4) 9 Review: Datapath vs. Control Logic Instructions transform Data (AS) to Data’ (AS’) This transformation is done by functional units Units that “operate” on data These units need to be told what to do to the data An instruction processing engine consists of two components Datapath: Consists of hardware elements that deal with and transform data signals functional units that operate on data hardware structures (e.g. wires and muxes) that enable the flow of data into the functional units and registers storage units that store data (e.g., registers) Control logic: Consists of hardware elements that determine control signals, i.e., signals that specify what the datapath elements should do to the data 10 A Note: How to Make the Best Out of 447? Do the readings P&P Appendixes A and C Wilkes 1951 paper Today’s lecture will be easy to understand if you read these And, you can ask more in-depth questions and learn more Do the assignments early You can do things for extra credit if you finish early We will describe what to do for extra credit Study the material and buzzwords daily Lecture notes, videos Buzzwords take notes during class 11 Today’s Agenda Finish single-cycle microarchitectures Critical path Microarchitecture design principles Performance evaluation primer Multi-cycle microarchitectures Microprogrammed control 12 Review: The Full Single-Cycle Datapath PCSrc1=Jump Ins tru ctio n [2 5 – 0 ] S h ift Ju m p a d dres s [31 – 0] left 2 26 2 8 0 1 M M P C +4 [31 – 2 8 ] u u x x A LU A d d 1 0 re su lt A d d R e g D st S h ift PCSrc2=Br Taken Ju m p le ft 2 4 B ra n ch M e m R e a d Ins truc tio n [3 1 – 2 6] C o ntro l M e m to R eg A L U O p M e m W rite A L U S rc R e g W rite Ins truc tio n [2 5 – 2 1] R ea d R e a d P C re giste r 1 ad d re ss R e a d da ta 1 Ins truc tio n [2 0 – 1 6] R ea d re giste r 2 bcondZ e ro In struc tio n 0 R eg iste rs R e a d A L U A L U [3 1– 0] 0 R e a d M W rite da ta 2 re su lt A d d re ss 1 In s truc tio n re giste r M d ata u M u m e m ory x u Ins truc tio n [1 5 – 1 1] W rite x 1 D ata x da ta 1 m e m o ry 0 W rite d ata 1 6 3 2 Ins truc tio n [1 5 – 0 ] S ig n e xte n d A L U ALU operation co n tro l In stru ction [5– 0 ] **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. 13 ALL RIGHTS RESERVED.] JAL, JR, JALR omitted Single-Cycle Control Logic 14 Single-Cycle Hardwired Control As combinational function of Inst=MEM[PC] 31 26 21 16 11 6 0 0 rs rt rd shamt funct R-type 6-bit 5-bit 5-bit 5-bit 5-bit 6-bit 31 26 21 16 0 opcode rs rt immediate I-type 6-bit 5-bit 5-bit 16-bit 31 26 0 opcode immediate J-type 6-bit 26-bit Consider All R-type and I-type ALU instructions LW and SW BEQ, BNE, BLEZ, BGTZ J, JR, JAL, JALR 15 Single-Bit Control Signals When De-asserted When asserted Equation GPR write select GPR write select opcode==0 RegDest according to rt, i.e., according to rd, i.e., inst[20:16] inst[15:11] 2nd ALU input from 2nd 2nd ALU input from sign- (opcode!=0) && ALUSrc GPR read port extended 16-bit (opcode!=BEQ) && immediate (opcode!=BNE) Steer ALU result to GPR steer memory load to opcode==LW MemtoReg write port GPR wr. port GPR write disabled GPR write enabled (opcode!=SW) && (opcode!=Bxx) && RegWrite (opcode!=J) && (opcode!=JR)) JAL and JALR require additional RegDest and MemtoReg options16 Single-Bit Control Signals When De-asserted When asserted Equation Memory read disabled Memory read port opcode==LW MemRead return load value Memory write disabled Memory write enabled opcode==SW MemWrite According to PCSrc2 next PC is based on 26- (opcode==J) || PCSrc1 bit immediate jump (opcode==JAL) target next PC = PC + 4 next PC is based on 16- (opcode==Bxx) && PCSrc2 bit immediate branch “bcond is satisfied” target JR and JALR require additional PCSrc options17 ALU Control case opcode ‘0’ select operation according to funct ‘ALUi’ selection operation according to opcode ‘LW’ select addition ‘SW’ select addition ‘Bxx’ select bcond generation function __ don’t care Example ALU operations ADD, SUB, AND, OR, XOR, NOR, etc. bcond on equal, not equal, LE zero, GT zero, etc. 18 R-Type ALU Ins tru ctio n [2 5 – 0 ] S h ift Ju m p a d dres s [31 – 0] left 2 PCSrc =Jump 26 2 8 0 1 1 M M P C +4 [31 – 2 8 ] u u x x A LU A d d 1 0 re su lt A d d R e g D st S h ift Ju m p le ft 2 4 B ra n ch PCSrc2=Br Taken M e m R e a d Ins truc tio n [3 1 – 2 6] C o ntro l M e m to R eg A L U O p M e m W rite A L U S rc R e g W rite Ins truc tio n [2 5 – 2 1] R ea d R e a d P C re giste r 1 ad d re ss 1 R e a d da ta 1 Ins truc tio n [2 0 – 1 6] R ea d re giste r 2 Z e ro In struc tio n 0 0 R eg iste rs R e a d A L U A L U [3 1– 0] 0 R e a d M W rite da ta 2 bcondre su lt A d d re ss 1 In s truc tio n re giste r M d ata u M u m e m ory x u Ins truc tio n [1 5 – 1 1] W rite x 1 D ata x da ta 1 m e m o ry 0 W rite d ata 1 6 3 2 Ins truc tio n [1 5 – 0 ] S ig n e xte n d A L U co n tro l ALU operation In stru ction [5– 0 ] funct 0 **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 19 I-Type ALU Ins tru ctio n [2 5 – 0 ] S h ift Ju m p a d dres s [31 – 0] left 2 PCSrc =Jump 26 2 8 0 1 1 M M P C +4 [31 – 2 8 ] u u x x A LU A d d 1 0 re su lt A d d R e g D st S h ift Ju m p le ft 2 4 B ra n ch PCSrc2=Br Taken M e m R e a d Ins truc tio n [3 1 – 2 6] C o ntro l M e m to R eg A L U O p M e m W rite A L U S rc R e g W rite Ins truc tio n [2 5 – 2 1] R ea d R e a d P C re giste r 1 ad d re ss 1 R e a d da ta 1 Ins truc tio n [2 0 – 1 6] R ea d re giste r 2 Z e ro In struc tio n 0 0 R eg iste rs R e a d A L U A L U [3 1– 0] 0 R e a d M W rite da ta 2 bcondre su lt A d d re ss 1 In s truc tio n re giste r M d ata u M u m e m ory x u Ins truc tio n [1 5 – 1 1] W rite x 1 D ata x da ta 1 m e m o ry 0 W rite d ata 1 6 3 2 Ins truc tio n [1 5 – 0 ] S ig n e xte n d A L U co n tro l ALU operation In stru ction [5– 0 ] opcode 0 **Based on original figure from [P&H CO&D, COPYRIGHT 2004 20 Elsevier.

18-741 Advanced Computer Architecture Lecture 1: Intro And

Fill Your Boots: Enhanced Embedded Bootloader Exploits Via Fault Injection and Binary Analysis

Computer Organization and Architecture Designing for Performance Ninth Edition

ARM Instruction Set

3.2 the CORDIC Algorithm

Consider an Instruction Cycle Consisting of Fetch, Operators Fetch (Immediate/Direct/Indirect), Execute and Interrupt Cycles

Akukwe Michael Lotachukwu 18/Sci01/014 Computer Science

Parallel Computing

Instruction Cycle

VLIW Architectures Lisa Wu, Krste Asanovic

Introduction to Cpu

Fpga-Based Digital Phase-Locked Loop Analysis and Implementation

The Datasheetarchive