Datapath elements
. Arithmetic logic unit (ALU) • Combinational logic (=function) Input: a, b, ALU operation (carryin is hidden) CS/COE1541: Introduction to Output: result, zero, overflow, carryout Computer Architecture . Adders • For PC incrementing, branch target calculation, … Datapath and Control Review . Mux • We need a lot of these Sangyeun Cho . Registers Computer Science Department • Register file, PC, … (architecturally visible registers) University of Pittsburgh • Temporary registers to keep intermediate values
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Register file Register file
. Interface: read port, write port, clock, control signal
0
1 1 0x11223344
0
0
0x11223344
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Processor building blocks Abstract implementation
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Analyzing instruction execution Analyzing instruction execution
. lw (load word) . add (add) • Fetch instruction • Fetch instruction • Read a base register • Read from two source registers • Sign-extend the immediate offset • Add the two numbers made available in the above step • Add the two numbers made available in the above two steps • Store the result to the target register specified in the instruction • Access data memory with the address computed in the above step • Store the value from the memory to the target register specified in the instruction
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Analyzing instruction execution Common steps in inst. execution
. j (jump) . Fetching the instruction word from the instruction memory • Fetch instruction . Decoding the instruction and reading from the register file • Extend the 26-bit immediate field • Or prepare a value from the immediate value (and PC) Shift left by 2 bits (28 bits now) . Performing an ALU operation Extract the most significant 4 bits from the current PC and concatenate to . Accessing the data memory (if needed) form a 32-bit value • Assign this value to PC . Making a jump (assigning a computed value to PC) (if needed) . Writing to the register file
. Designing a control logic is based on our (more formal) analysis of instruction execution • Consider all instructions
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Fetching an instruction Fetching operands
For branches!
Instruction width is 4 bytes!
PC keeps the current memory address Instruction memory Two reads from which instruction at a time! is fetched here is read-only!
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Handling memory access Datapath so far
Data to store!
j instruction not considered so far!
Imm. offset for address Load data from memory Data to be in a register!
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Revisiting MIPS inst. format More elaborate datapath
Write register # selection
ALU control bits from Inst[5:0]
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh First look at control Control signals overview
. RegDst: which instr. field to use for dst. register specifier? • Inst[20:16] vs. Inst[15:11] . ALUSrc: which one to use for ALU src 2? • Immediate vs. register read port 2 . MemtoReg: is it memory load? . RegWrite: update register? . MemRead: read memory? . MemWrite: write to memory? . Branch: is it a branch? . ALUop: what type of ALU operation?
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Example: lw r8, 32(r18) Example: lw r8, 32(r18)
(PC+4) (PC+4)
Branch=0
35 RegWrite . Let’s assume r18 has 1,000 (PC+4) 18 1000 . Let’s assume M[1032] has 0x11223344 8 0x11223344 RegDest=0 8 ALUSrc=1 1032 0 32 MemtoReg=1 0x11223344 32 32 MemRead
0x11223344 CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Control signals in a table ALU control
. Depending on instruction, we perform different ALU operation . Example • lw or sw: ADD • and: AND • beq: SUB . ALU control input (3 bits) • 000: AND • 001: OR • 010: ADD • 110: SUB • 111: SET-IF-LESS-THAN (similar to SUB)
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
ALU control Supporting “j” instruction
. ALUop • 00: lw/sw, 01: beq, 10: arithmetic, 11: jump
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Resource usage Single-cycle execution timing
(in pico-seconds)
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Single-cycle execution problem Multi-cycle implementation
. The cycle time depends on the most time-consuming . Reusing functional units instruction • Break up instruction execution into smaller steps • What happens if we implement a more complex instruction, e.g., a • Each functional unit is used for a specific purpose in any cycle floating-point multiplication • ALU is used for additional functions: calculation and PC increment • All resources are simultaneously active – there is no sharing of • Memory used for instructions and data resources . At the end of a cycle, keep results in registers • Additional registers . We’ll adopt a multi-cycle solution which allows us to • Use a faster clock; . Now, control signals are NOT solely determined by the • Adopt a different number of clock cycles per instruction; and instruction bits • Reduce physical resources
. Controls will be generated by a FSM!
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Five instruction execution steps Step 1: instruction fetch
. Instruction fetch . Access memory w/ PC to fetch instruction and store it in . Instruction decode and register read Instruction Register (IR) . Execution, memory address calculation, or branch completion . Increment PC by 4 using ALU and put the result back in the . Memory access or R-type instruction completion PC • We can do this because ALU is not busy in this cycle . Write-back • Actual PC Update is done at the next clock rising edge
. Instruction execution takes 3~5 cycles!
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Step 2: decode & operand fetch Step 3: actions, actions, actions
. Read registers rs and rt . ALU performs one of three functions based on instruction type • We read both of them regardless of necessity • Store two values in temporary register A and B . Memory reference • ALUOut <= A + sign-extend(IR[15:0]); . Compute the branch address using ALU in case the . R-type instruction is a branch • ALUOut <= A op B; • We can do this because ALU is not busy . Branch: • ALUOut will keep the target address • if (A==B) PC <= ALUOut; . Jump: • PC <= {PC[31:28],IR[25:0],2’b00}; // verilog notation . We have not set any control signals based on the instruction type yet • Instruction is being decoded now in the control logic!
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Step 4: memory access Step 5: register write-back
. If the instruction is memory reference . Only memory load instruction reaches this step • MDR <= Memory[ALUOut]; // if it is a load • Reg[IR[20:16]] <= MDR; • Memory[ALUOut] <= B; // if it is a store • Store is complete!
. If the instruction is R-type • Reg[IR[15:11]] <= ALUOut; • Now the instruction is complete!
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Multi-cycle datapath & control Multi-cycle control design
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Example: lw, 1st cycle Example: lw, 2nd cycle
00
1 1 0 10 0 0 18 8
01 11
00 00
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Example: lw, 3rd cycle Example: lw, 4th cycle
1 1 10
10
00
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Example: lw, 5th cycle Example: j, 1st cycle
00
1 1 1 0 10 0
0
01 1
00
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh
Example: j, 2nd cycle Example: j, 3rd cycle
10
1 0
11
00
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh To wrap up To wrap up
. From a number of building blocks, we constructed a . We looked at how an instruction is executed on the datapath for a subset of the MIPS instruction set datapath in a pictorial way
. Control signals were connected to functional blocks in . First, we analyzed instructions for functional the datapath requirements . Second, we connected buildings blocks in a way that . How execution sequence of an instruction change the accommodates instructions control signals was analyzed . Third, we kept refining the datapath . We looked at the multi-cycle control scheme in some detail • Multi-cycle control can be implemented using FSM
CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh