Computer organization

¢ Computer design – an application of digital logic design procedures ¢ Computer = processing unit + memory system ¢ Processing unit = control + ¢ Control = finite state machine ß inputs = machine instruction, datapath conditions ß outputs = register transfer control signals, ALU operation codes ß instruction interpretation = instruction fetch, decode, execute ¢ Datapath = functional units + registers ß functional units = ALU, multipliers, dividers, etc. ß registers = , shifters, storage registers

CSE370 - XI - Computer Organization 1

Structure of a computer

¢ Block diagram view address

Processor read/write Memory System central processing data unit (CPU)

control signals Control Data Path data conditions

instruction unit œ instruction fetch and œ functional units interpretation FSM and registers CSE370 - XI - Computer Organization 2 Registers

¢ Selectively loaded – EN or LD input ¢ Output enable – OE input ¢ Multiple registers – group 4 or 8 in parallel

LD OE D7 Q7 OE asserted causes FF state to be D6 Q6 connected to output pins; otherwise they D5 Q5 are left unconnected (high impedance) D4 Q4 D3 Q3 D2 Q2 LD asserted during a lo-to-hi clock D1 Q1 transition loads new data into FFs D0 CLK Q0

CSE370 - XI - Computer Organization 3

Register transfer

¢ Point-to-point connection MUX MUX MUX MUX ß dedicated wires ß muxes on inputs of each register rs rt rd R4

¢ Common input from ß load enables rs rt rd R4 for each register ß control signals MUX for multiplexer

¢ Common with output enables ß output enables and load rs rt rd R4 enables for each register BUS

CSE370 - XI - Computer Organization 4 Register files

¢ Collections of registers in one package ß two-dimensional array of FFs ß address used as index to a particular word ß can have separate read and write addresses so can do both at same time ¢ 4 by 4 ß 16 D-FFs ß organized as four words of four bits each ß write-enable (load) RE RB ß read-enable (output enable) RA WE Q3 WB Q2 WA Q1 Q0 D3 D2 D1 D0

CSE370 - XI - Computer Organization 5

Memories

¢ Larger collections of storage elements ß implemented not as FFs but as much more efficient latches ß high-density memories use 1 to 5 switches (transitors) per memory bit ¢ Static RAM – 1024 words each 4 bits wide ß once written, memory holds forever (not true for denser dynamic RAM) ß address lines to select word (10 lines for 1024 words) ß read enable RD ¢ same as output enable WR ¢ often called chip select ¢ permits connection of many A9 A8 chips into larger array IO3 A7 IO2 A6 ß write enable (same as load enable) IO1 A5 IO0 A4 ß bi-directional data lines A3 ¢ output when reading, input when writing A2 A2 A1 A0 CSE370 - XI - Computer Organization 6 Instruction sequencing

¢ Example – an instruction to add the contents of two registers (Rx and Ry) and place result in a third register (Rz) ¢ Step 1: get the ADD instruction from memory into an ¢ Step 2: decode instruction ß instruction in IR has the code of an ADD instruction ß register indices used to generate output enables for registers Rx and Ry ß register index used to generate load signal for register Rz ¢ Step 3: execute instruction ß enable Rx and Ry output and direct to ALU ß setup ALU to perform ADD operation ß direct result to Rz so that it can be loaded into register

CSE370 - XI - Computer Organization 7

Instruction types

¢ Data manipulation ß add, subtract ß increment, decrement ß multiply ß shift, rotate ß immediate operands ¢ Data staging ß load/store data to/from memory ß register-to-register move ¢ Control ß conditional/unconditional branches in program flow ß subroutine call and return

CSE370 - XI - Computer Organization 8 Elements of the (akainstruction unit)

¢ Standard FSM elements ß state register ß next-state logic ß output logic (datapath/control signalling) ß Moore or synchronous Mealy machine to avoid loops unbroken by FF ¢ Plus additional "control" registers ß instruction register (IR) ß program counter (PC) ¢ Inputs/outputs ß outputs control elements of data path ß inputs from data path used to alter flow of program (test if zero)

CSE370 - XI - Computer Organization 9

Instruction execution

¢ Control state diagram (for each diagram) Reset ß reset ß fetch instruction Init Initialize ß decode Machine ß execute

¢ Instructions partitioned into three classes Fetch Instr. ß branch ß load/store ß register-to-register Load/ XEQ ¢ Different sequence through Branch Store Instr. Register- diagram for each to-Register instruction type Branch Branch Taken Not Taken Incr. PC

CSE370 - XI - Computer Organization 10 Data path (hierarchy)

Cin ¢ Arithmetic circuits constructed in hierarchical and modular fashion Ain FA Sum ß each bit in datapath Bin is functionally identical ß 4-bit, 8-bit, 16-bit, 32-bit Cout

Ain Sum HA Bin Cout HA Cin

CSE370 - XI - Computer Organization 11

Data path (ALU)

¢ ALU block diagram ß input: data and operation to perform ß output: result of operation and status information

A B 16 16

Operation

16

N S Z

CSE370 - XI - Computer Organization 12 Data path (ALU + registers)

¢ Accumulator ß special register ß one of the inputs to ALU ß output of ALU stored back in accumulator ¢ One-address instructions

ß operation and address of one operand 16 ß other operand and destination is accumulator register REG AC ß AC <– AC op Mem[addr] 16 16 ß "single address instructions” (AC implicit operand) OP ¢ Multiple registers ß part of instruction used N 16 to choose register operands Z

CSE370 - XI - Computer Organization 13

Data path (bit-slice)

¢ Bit-slice concept – replicate to build n-bit wide datapaths

CO ALU CI CO ALU ALU CI

AC AC AC

R0 R0 R0

rs rs rs

rt rt rt

rd rd rd

from from from memory memory memory 1 bit wide 2 bits wide CSE370 - XI - Computer Organization 14 Instruction path

¢ Program counter (PC) ß keeps track of program execution ß address of next instruction to read from memory ß may have auto-increment feature or use ALU ¢ Instruction register (IR) ß current instruction ß includes ALU operation and address of operand ß also holds target of jump instruction ß immediate operands ¢ Relationship to data path ß PC may be incremented through ALU ß contents of IR may also be required as input to ALU – immediate operands

CSE370 - XI - Computer Organization 15

Data path (memory interface)

¢ Memory ß separate data and instruction memory () ¢ two address busses, two data busses ß single combined memory (Princeton architecture) ¢ single address bus, single data bus ¢ Separate memory ß ALU output goes to data memory input ß register input from data memory output ß data memory address from instruction register ß instruction register from instruction memory output ß instruction memory address from program counter ¢ Single memory ß address from PC or IR ß memory output to instruction and data registers ß memory input from ALU output

CSE370 - XI - Computer Organization 16 Block diagram of (Harvard)

¢ Register transfer view of Harvard architecture ß which register outputs are connected to which register inputs ß arrows represent data-flow, other are control signals from control FSM

ß two MARs (PC and IR) load ß two MBRs (REG and IR) 16 path ß load control for each register REG AC rd wr 16 16 store path data Data Memory OP (16-bit words) addr N 16 Z

Control 16 FSM

IR PC data 16 16 Inst Memory (8-bit words) OP addr

16

CSE370 - XI - Computer Organization 17

Block diagram of processor (Princeton)

¢ Register transfer view of Princeton architecture ß which register outputs are connected to which register inputs ß arrows represent data-flow, other are control signals from control FSM ß MAR may be a simple multiplexer rather than separate register ß MBR is split in two (REG and IR) load 16 path ß load control for each register REG AC rd wr 16 16 store path data Data Memory OP (16-bit words) addr N 8 Z Control MAR FSM 16

IR PC 16 16

OP

16 CSE370 - XI - Computer Organization 18 A simplified processor data-path and memory

¢ Princeton architecture ¢ Register file ¢ Instruction register ¢ PC incremented through ALU ¢ Modeled after MIPS rt000 (used in 378 textbook by Patterson & Hennessy) ß really a 32-bit machine ß we’ll do a 16-bit version

CSE370 - XI - Computer Organization 19

Processor control

¢ Synchronous Mealy or Moore machine ¢ Multiple

CSE370 - XI - Computer Organization 20 Processor instructions

¢ Three principal types (16 bits in each instruction) type op rs rt rd funct R(egister) 3 3 3 3 4 I(mmediate) 3 3 3 7 J(ump) 3 13 ¢ Some of the instructions

add 0 rs rt rd 0 rd = rs + rt R sub 0 rs rt rd 1 rd = rs - rt and 0 rs rt rd 2 rd = rs & rt or 0 rs rt rd 3 rd = rs | rt slt 0 rs rt rd 4 rd = (rs < rt) lw 1 rs rt offset rt = mem[rs + offset] I sw 2 rs rt offset mem[rs + offset] = rt beq 3 rs rt offset pc = pc + offset, if (rs == rt) addi 4 rs rt offset rt = rs + offset J j 5 target address pc = target address halt 7 - stop execution until reset

CSE370 - XI - Computer Organization 21

Tracing an instruction's execution

¢ Instruction: r3 = r1 + r2 R 0 rs=r1 rt=r2 rd=r3 funct=0 ¢ 1. instruction fetch ß move instruction address from PC to memory address bus ß assert memory read ß move data from memory data bus into IR ß configure ALU to add 1 to PC ß configure PC to store new value from ALUout ¢ 2. instruction decode ß op-code bits of IR are input to control FSM ß rest of IR bits encode the operand addresses (rs and rt) ¢ these go to register file

CSE370 - XI - Computer Organization 22 Tracing an instruction's execution (cont’d)

¢ Instruction: r3 = r1 + r2 R 0 rs=r1 rt=r2 rd=r3 funct=0 ¢ 3. instruction execute ß set up ALU inputs ß configure ALU to perform ADD operation ß configure register file to store ALU result (rd)

CSE370 - XI - Computer Organization 23

Tracing an instruction's execution (cont’d)

¢ Step 1

CSE370 - XI - Computer Organization 24 Tracing an instruction's execution (cont’d)

¢ Step 2

CSE370 - XI - Computer Organization 25 to controller

Tracing an instruction's execution (cont’d)

¢ Step 3

CSE370 - XI - Computer Organization 26 Register-transfer-level description

¢ Control ß transfer data between registers by asserting appropriate control signals ¢ Register transfer notation - work from register to register ß instruction fetch: mabus ← PC; – move PC to memory address bus (PCmaEN, ALUmaEN) memory read; – assert memory read signal (mr, RegBmdEN) IR ← memory; – load IR from memory data bus (IRld) op ← add – send PC into A input, 1 into B input, add (srcA, srcB0, scrB1, op) PC ← ALUout – load result of incrementing in ALU into PC (PCld, PCsel) ß instruction decode: IR to controller values of A and B read from register file (rs, rt) ß instruction execution: op ← add – send regA into A input, regB into B input, add (srcA, srcB0, scrB1, op) rd ← ALUout – store result of add into destination register (regWrite, wrDataSel, wrRegSel)

CSE370 - XI - Computer Organization 27

Register-transfer-level description (cont’d)

¢ How many states are needed to accomplish these transfers? ß data dependencies (where do values that are needed come from?) ß resource conflicts (ALU, busses, etc.) ¢ In our case, it takes three cycles ß one for each step ß all operation within a cycle occur between rising edges of the clock ¢ How do we set all of the control signals to be output by the state machine? ß depends on the type of machine (Mealy, Moore, synchronous Mealy)

CSE370 - XI - Computer Organization 28 Review of FSM timing

fetch decode execute

step 1 step 2 step 3 IR ← mem[PC]; A ← rs rd ← A + B PC ← PC + 1; B ← rt

to configure the data-path to do this here, when do we set the control signals?

CSE370 - XI - Computer Organization 29

FSM controller for CPU (skeletal Moore FSM)

¢ First pass at deriving the state diagram (Moore machine) ß these will be further refined into sub-states

reset instruction fetch

instruction decode

SW ADD J instruction LW execution

CSE370 - XI - Computer Organization 30 FSM controller for CPU (reset and inst. fetch)

¢ Assume Moore machine ß outputs associated with states rather than arcs ¢ Reset state and instruction fetch sequence ¢ On reset (go to Fetch state) ß start fetching instructions ß PC will set itself to zero reset

mabus ← PC; Fetch instruction memory read; fetch IR ← memory data bus; PC ← PC + 1;

CSE370 - XI - Computer Organization 31

FSM controller for CPU (decode)

¢ Operation decode state ß next state branch based on operation code in instruction ß read two operands out of register file ¢ what if the instruction doesn’t have two operands?

Decode instruction branch based on value of decode Inst[15:13] and Inst[3:0]

add

CSE370 - XI - Computer Organization 32 FSM controller for CPU (instruction execution)

¢ For add instruction ß configure ALU and store result in register

rd ← A + B

ß other instructions may require multiple cycles

instruction add execution

CSE370 - XI - Computer Organization 33

FSM controller for CPU (add instruction)

¢ Putting it all together and closing the loop ß the famous reset instruction fetch instruction Fetch fetch decode execute cycle instruction Decode decode

instruction add execution

CSE370 - XI - Computer Organization 34 FSM controller for CPU

¢ Now we need to repeat this for all the instructions of our processor ß fetch and decode states stay the same ß different execution states for each instruction ¢ some may require multiple states if available register transfer paths require sequencing of steps

CSE370 - XI - Computer Organization 35