
Outline Combinational Element • Combinational & sequential logic combinational • Single-cycle CPU input n n output logic • Multi-cycle CPU •O utput determined entirely by input •Contains no storage element 1 2 Examples of Combinational Elements State (Sequential) Element w rite MM •M ultiplexor selects one out of 2n •S tate element has storage (i.e., 2n UU 1 inputs memory) XX •S tate defined by storage content S tate Element input output •O utput depends on input and the storage •A LU performs arithmetic & logic n state operations •W rite lead controls storage update –A N D : 000 64 –O R: 001 •Clock lead determines time of update zero clock –add: 010 A LU 64 result •Examples: main memory, registers, PC –subtract: 110 64 –set on less than: 111 –other 3 combinations unused 3 3 4 Clocking Methodology Input/Output of Elements S tate •N eeded to prevent simultaneous S tate Combinational element Combinational element 1 logic read/w rite to state elements 1 logic •Edge-triggered methodology: state elements updated at rising clock edge •Combinational elements take input from one state element at clock edge and output to another state element at the S tate next clock edge, S tate Combinational SStattaete element Combinational element eelelemmeennt t 1 logic 1 logic 22 •W ithin a clock cycle, state elements are not updated and their stable state is available as input to combinational elements, clock input •O utput can be derived from a state element at the edge of one cycle and input into the same state at the next. 5 6 Register File MIPS64 Instruction Formats •Register file is the structure that 64 contains the processor’s 32 registers 5 Read reg 1 Read data 1 Register 5 •A ny register can be accessed for read Read reg 2 numbers 5 R egisters or w ritten by specifying the register W rite reg 6 5 5 16 Read W rite data 64 number 64 data 2 opcode rs rd immediate D ata I-T ype •Register File’s I/O structure N ote the regularity of 6 5 5 5 5 6 instruction encoding. –3 inputs derived from current RegW rite instruction to specify register R-T ype opcode rs rt rd shamt func T his is important for operands (2 for read and 1 for implementing an efficient w rite) pipelined CPU . •Register file’s outputs are alw ays 6 26 –1 input to w rite data into a J-T ype register available on the output lines opcode O ffset added to PC •Register w rite is controlled by –2 outputs carrying contents of the specified registers RegW rite lead 7 8 Common Steps in Instruction Execution Differences in Instruction Execution • Data transfer (strictly load/store ISA) – load: access memory for read data {ld R1, 0(R2)} • Execution of all instructions require the following steps – store: access memory for write data {ld 0(R2), R1} – send PC to memory and fetch instruction stored at location specified by PC • ALU instruction – read 0-2 registers, using fields specifying the registers in the – no memory access for operands instruction – access a register for write of result {add R1,R2, R3} • All instructions use ALU functionality • Branch instruction – data transfer instructions: compute address – change PC content based on comparison {bnez R1, Loop} – ALU instructions: execute ALU operations – branch instructions: comparison & address compuation 9 10 Summary Data Path & Control path • Datapath is the signal path through which data in the CPU flows including the functional elements Fetch D ecode Read Compute A ccess W rite • Elements of Datapath Registers M emory Registers – combinational elements add/sub X X X X X – state (sequential) elements load X X X X X X • Control path store X X X X X – the signal path from the controller to the Datapath conditional X X X X elements branch – exercises timing & control over Datapath elements unconditional X X X branch 11 12 What Should be in the Datapath Datapath Schematic • At a minimum we need combinational and sequential logic elements in the datapath to support the following functions – fetch instructions and data from memory – Read registers – decode instructions and dispatch them to the execution unit D ata R egisters – execute arithmetic & logic operations Instruction Mem ory Register # – update state elements (registers and memory) PC A ddress A LU Register # A LU A ddress Instruction D ata Register # Mem ory D ata W hat is this for? 13 14 Datapath Building Blocks: Instruction Access Datapath Building Blocks: R-Type Instruction 6 5 5 5 5 6 A LU op • Program Counter (PC) H ow w ide is this in MIPS 64? opcode rs rt rd shamt func – a register that points to the next Read R-T ype Format 5 Read reg 1 instruction to be fetched Instruction data 1 – it is incremented each clock cycle 5 zero Read reg 2 A LU • Content of PC is input to Instruction 5 Register A LU W rite reg Memory A LU • Used for arithmetic & logic File A dder Read • The instruction is fetched and operations W rite data data 2 supplied to upstream datapath • Read two register, rs and rt elements 4 • ALU operates on registers’ • Adder is used to increment PC by 4 in preparation for the next instruction content RegW rite (why 4?) PC Read • Write result to register rd • Adder: an ALU with control input address hardwired to perform add instruction Instruction • Example: add R1, R2, R3 only 32 – rs=R2, rt=R3, rd=R1 • For reasons that will become clear Instruction later, we assume separate memory • Controls Mem ory units for instructions & data – RegWrite is asserted to enable write at clock edge – ALUop to control operation 15 16 I-Type Instruction: load/store Required Datapath Elements for load/store • rs contains the base field for • Register file the displacement address – load: registers to read for base address & to write for data mode – store: registers to read for base address & for data • rt specifies register 6 5 5 16 – to load from memory for I-T ype opcode rs rt immediate • Sign extender load – to sign-extend and condition immediate field for 2’s complement addition – to write to memory for store of address offset using 64-bit ALU • Immediate contains address LLWW R R22, ,2 23322((RR11)) S W R5, -88(R4) • ALU offset S W R5, -88(R4) 16 sign 64 – to add base address and sign-extended immediate field • To compute memory extend address, we must • Data memory to load/store data: – sign-extend the 16-bit – memory address; data input for store; data output for load immediate to 64 bits – add it to the base in rs – control inputs: MemRead, MemWrite, clock 17 18 Datapath Building Blocks: load/store I-Type Instruction: bne 6 5 5 16 • Branch datapath must compute branch condition & branch I-Type opcode rs rt immediate address ALUop • rs and rt refer to registers to be MemWrite compared for branch condition 6 5 5 16 Read 5 Read reg 1 • if Reg[rs] != Reg[rd], data 1 zero I-T ype opcode rs rt immediate Instruction 5 – PC = PC + Imm<< 2 (note that at Read reg 2 ALU this point PC is already 5 Registers ALU Read Write reg Address data incremented. In effect Read PCcurrent=(PCprevious+4) + Imm<< 2 bne R1, R2, Imm Write data Data bne R1, R2, Imm data 2 Memory • else if Reg[rs] == Reg[rt] Write – PC remains unchanged: PC =(PC +4) data current previous RegWrite – the next sequential instruction is taken 64 shift 64 • Required functional elements left 2 16 sign 64 MemRead – RegFile, sign extender, adder, extend shifter 19 20 Sign Extend & Shift Operations Datapath Building Blocks: bne • Sign extension is required 6 5 5 16 because I-T ype opcode rs rt immediate – 16-bit offset must be A LU op = subtract expanded to 64 bits in order -20189 -20189 -80756 Read to be used in the 64-bit 5 Read reg 1 0xb123 0xffffb123 0xfffec48c data 1 zero T o branch adder Instruction 5 Read reg 2 A LU control logic – we are using 2’s Registers A LU 1 W rite reg complement arithmetic sign 64 shift 64 6 W rite data Read • Shift by 2 is required extend left 2 data 2 because A LU A dder RegW rite – instructions are 32-bits wide and are aligned on a word (4 bytes) boundary PC+4 16 sign 64 shift – in effect we are using an 18- extend left 2 B ranch target A LU bit offset instead of 16 A dder PC+4 from Instruction D atapath 21 22 Computing Address & Branch Condition Putting it All Together • The register operands of bne are compared in the same ALU • Combine datapath building blocks to build the full datapath we use for load/store/arithmetic/logic instructions – now we must decide some specifics of implementation – the ALU provides a ZERO output signal to indicate condition • Single-cycle CPU – the ZERO signal controls what instruction will be fetched next – each instruction executes in one clock cycle depending on whether the branch is taken or not – CPI=1 for all instructions • We also need to compute the address • Multi-cycle CPU – we may not be able to use the ALU if it is being used to compute the branch condition (more on this later) – instructions execute in multiples of a shorter clock cycle – need an additional ADDER (an ALU hardwired to add only) to – different instructions have different CPI compute branch address 23 24 Single-Cycle CPU The Processor: Datapath & Control • We're ready to look at an implementation of the MIPS • Simplified to contain only: • One clock cycle for all instructions – memory-reference instructions: lw, sw • No datapath resource can be used more than once per – arithmetic-logical instructions: add, sub, and, or, slt clock cycle – control flow instructions: beq, j – results in resource duplication for elements that must • Generic Implementation: be used more than once – use the program counter (PC) to supply instruction address – examples: separate memory units for instruction
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages16 Page
-
File Size-