elements

. (ALU) • (=function) Input: a, b, ALU operation (carryin is hidden) CS/COE1541: Introduction to Output: result, zero, overflow, carryout . Adders • For PC incrementing, branch target calculation, … Datapath and Control Review . Mux • We need a lot of these Sangyeun Cho . Registers Computer Science Department • Register file, PC, … (architecturally visible registers) University of Pittsburgh • Temporary registers to keep intermediate values

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Register file Register file

. Interface: read port, write port, clock, control signal

0

1 1 0x11223344

0

0

0x11223344

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh building blocks Abstract implementation

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Analyzing instruction execution Analyzing instruction execution

. lw (load word) . add (add) • Fetch instruction • Fetch instruction • Read a base register • Read from two source registers • Sign-extend the immediate offset • Add the two numbers made available in the above step • Add the two numbers made available in the above two steps • Store the result to the target register specified in the instruction • Access data memory with the address computed in the above step • Store the value from the memory to the target register specified in the instruction

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Analyzing instruction execution Common steps in inst. execution

. j (jump) . Fetching the instruction word from the instruction memory • Fetch instruction . Decoding the instruction and reading from the register file • Extend the 26-bit immediate field • Or prepare a value from the immediate value (and PC) Shift left by 2 bits (28 bits now) . Performing an ALU operation Extract the most significant 4 bits from the current PC and concatenate to . Accessing the data memory (if needed) form a 32-bit value • Assign this value to PC . Making a jump (assigning a computed value to PC) (if needed) . Writing to the register file

. Designing a control logic is based on our (more formal) analysis of instruction execution • Consider all instructions

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Fetching an instruction Fetching operands

For branches!

Instruction width is 4 bytes!

PC keeps the current memory address Instruction memory Two reads from which instruction at a time! is fetched here is read-only!

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Handling memory access Datapath so far

Data to store!

j instruction not considered so far!

Imm. offset for address Load data from memory Data to be in a register!

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Revisiting MIPS inst. format More elaborate datapath

Write register # selection

ALU control bits from Inst[5:0]

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh First look at control Control signals overview

. RegDst: which instr. field to use for dst. register specifier? • Inst[20:16] vs. Inst[15:11] . ALUSrc: which one to use for ALU src 2? • Immediate vs. register read port 2 . MemtoReg: is it memory load? . RegWrite: update register? . MemRead: read memory? . MemWrite: write to memory? . Branch: is it a branch? . ALUop: what type of ALU operation?

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Example: lw r8, 32(r18) Example: lw r8, 32(r18)

(PC+4) (PC+4)

Branch=0

35 RegWrite . Let’s assume r18 has 1,000 (PC+4) 18 1000 . Let’s assume M[1032] has 0x11223344 8 0x11223344 RegDest=0 8 ALUSrc=1 1032 0 32 MemtoReg=1 0x11223344 32 32 MemRead

0x11223344 CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Control signals in a table ALU control

. Depending on instruction, we perform different ALU operation . Example • lw or sw: ADD • and: AND • beq: SUB . ALU control input (3 bits) • 000: AND • 001: OR • 010: ADD • 110: SUB • 111: SET-IF-LESS-THAN (similar to SUB)

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

ALU control Supporting “j” instruction

. ALUop • 00: lw/sw, 01: beq, 10: arithmetic, 11: jump

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Resource usage Single-cycle execution timing

(in pico-seconds)

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Single-cycle execution problem Multi-cycle implementation

. The cycle time depends on the most time-consuming . Reusing functional units instruction • Break up instruction execution into smaller steps • What happens if we implement a more complex instruction, e.g., a • Each functional unit is used for a specific purpose in any cycle floating-point multiplication • ALU is used for additional functions: calculation and PC increment • All resources are simultaneously active – there is no sharing of • Memory used for instructions and data resources . At the end of a cycle, keep results in registers • Additional registers . We’ll adopt a multi-cycle solution which allows us to • Use a faster clock; . Now, control signals are NOT solely determined by the • Adopt a different number of clock ; and instruction bits • Reduce physical resources

. Controls will be generated by a FSM!

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Five instruction execution steps Step 1: instruction fetch

. Instruction fetch . Access memory w/ PC to fetch instruction and store it in . Instruction decode and register read (IR) . Execution, memory address calculation, or branch completion . Increment PC by 4 using ALU and put the result back in the . Memory access or R-type instruction completion PC • We can do this because ALU is not busy in this cycle . Write-back • Actual PC Update is done at the next clock rising edge

. Instruction execution takes 3~5 cycles!

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Step 2: decode & operand fetch Step 3: actions, actions, actions

. Read registers rs and rt . ALU performs one of three functions based on instruction type • We read both of them regardless of necessity • Store two values in temporary register A and B . Memory reference • ALUOut <= A + sign-extend(IR[15:0]); . Compute the branch address using ALU in case the . R-type instruction is a branch • ALUOut <= A op B; • We can do this because ALU is not busy . Branch: • ALUOut will keep the target address • if (A==B) PC <= ALUOut; . Jump: • PC <= {PC[31:28],IR[25:0],2’b00}; // verilog notation . We have not set any control signals based on the instruction type yet • Instruction is being decoded now in the control logic!

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Step 4: memory access Step 5: register write-back

. If the instruction is memory reference . Only memory load instruction reaches this step • MDR <= Memory[ALUOut]; // if it is a load • Reg[IR[20:16]] <= MDR; • Memory[ALUOut] <= B; // if it is a store • Store is complete!

. If the instruction is R-type • Reg[IR[15:11]] <= ALUOut; • Now the instruction is complete!

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Multi-cycle datapath & control Multi-cycle control design

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Example: lw, 1st cycle Example: lw, 2nd cycle

00

1 1 0 10 0 0 18 8

01 11

00 00

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Example: lw, 3rd cycle Example: lw, 4th cycle

1 1 10

10

00

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh Example: lw, 5th cycle Example: j, 1st cycle

00

1 1 1 0 10 0

0

01 1

00

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh

Example: j, 2nd cycle Example: j, 3rd cycle

10

1 0

11

00

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh To wrap up To wrap up

. From a number of building blocks, we constructed a . We looked at how an instruction is executed on the datapath for a subset of the MIPS instruction set datapath in a pictorial way

. Control signals were connected to functional blocks in . First, we analyzed instructions for functional the datapath requirements . Second, we connected buildings blocks in a way that . How execution sequence of an instruction change the accommodates instructions control signals was analyzed . Third, we kept refining the datapath . We looked at the multi-cycle control scheme in some detail • Multi-cycle control can be implemented using FSM

CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh