Computer Organization Structure of a Computer

❚ Computer design as an application of digital logic design procedures ❚ Block diagram view address

❚ Computer = processing unit + memory system read/write Memory System central processing data ❚ Processing unit = control + unit (CPU) ❚ Control = finite state machine ❙ Inputs = machine instruction, datapath conditions ❙ Outputs = register transfer control signals, ALU operation control signals codes Control Data Path ❙ Instruction interpretation = instruction fetch, decode, data conditions execute ❚ Datapath = functional units + registers ❙ Functional units = ALU, multipliers, dividers, etc. – instruction fetch and – functional units interpretation FSM ❙ Registers = program , shifters, storage registers and registers

CS 150 - Spring 2004 – Lec 11: Computer Org I - 1 CS 150 - Spring 2004 – Lec 11: Computer Org I - 2

Registers Register Transfer

❚ Selectively loaded – EN or LD input ❚ Point-to-point connection MUX MUX MUX MUX ❚ Output enable – OE input ❙ Dedicated wires ❙ Muxes on inputs of rs rt rd R4 ❚ Multiple registers – group 4 or 8 in parallel each register ❚ Common input from ❙ Load enables LD OE for each register rs rt rd R4 D7 Q7 OE asserted causes FF state to be D6 Q6 connected to output pins; otherwise they ❙ Control signals D5 Q5 are left unconnected (high impedance) MUX D4 Q4 for multiplexer D3 Q3 D2 Q2 LD asserted during a lo-to-hi clock D1 Q1 transition loads new data into FFs ❚ Common with output enables D0 CLK Q0 ❙ Output enables and load enables for each register rs rt rd R4

BUS

CS 150 - Spring 2004 – Lec 11: Computer Org I - 3 CS 150 - Spring 2004 – Lec 11: Computer Org I - 4

Register Files Memories

❚ Larger Collections of Storage Elements ❚ Collections of registers in one package ❙ Implemented not as FFs but as much more efficient latches ❙ Two-dimensional array of FFs ❙ High-density memories use 1-5 (transitors) per bit ❙ Address used as index to a particular word ❙ Separate read and write addresses so can do both at same time ❚ Static RAM – 1024 words each 4 bits wide ❙ Once written, memory holds forever (not true for denser ❚ 4 by 4 dynamic RAM) ❙ 16 D-FFs ❙ Address lines to select word (10 lines for 1024 words) ❙ Organized as four words of four bits each ❙ Read enable RD RE ❘ Same as output enable WR ❙ Write-enable (load) RB RA ❘ Often called chip select A9 ❙ Read-enable (output enable) A8 WE Q3 ❘ Permits connection of many IO3 A7 IO2 WB Q2 chips into larger array A6 WA Q1 IO1 A5 IO0 Q0 ❙ Write enable (same as load enable) A4 D3 ❙ Bi-directional data lines A3 D2 A2 D1 ❘ output when reading, input when writing A2 D0 A1 A0 CS 150 - Spring 2004 – Lec 11: Computer Org I - 5 CS 150 - Spring 2004 – Lec 11: Computer Org I - 6 Instruction Sequencing Instruction Types

❚ Example – an instruction to add the contents of two ❚ Data Manipulation registers (Rx and Ry) and place result in a third ❙ Add, subtract register (Rz) ❙ Increment, decrement ❚ Step 1: Get the ADD instruction from memory into an ❙ Multiply ❙ Shift, rotate ❙ Immediate operands ❚ Step 2: Decode instruction ❙ Instruction in IR has the code of an ADD instruction ❚ Data Staging ❙ Register indices used to generate output enables for ❙ Load/store data to/from memory registers Rx and Ry ❙ Register-to-register move ❙ Register index used to generate load signal for register Rz ❚ Control ❚ Step 3: execute instruction ❙ Conditional/unconditional branches in program flow ❙ Enable Rx and Ry output and direct to ALU ❙ Subroutine call and return ❙ Setup ALU to perform ADD operation ❙ Direct result to Rz so that it can be loaded into register CS 150 - Spring 2004 – Lec 11: Computer Org I - 7 CS 150 - Spring 2004 – Lec 11: Computer Org I - 8

Elements of the (aka Instruction Unit) Instruction Execution

❚ Control State Diagram (for each diagram) ❚ Standard FSM Elements Reset ❙ Reset ❙ State register ❙ Fetch instruction Init ❙ Next-state logic Initialize ❙ Decode Machine ❙ Output logic (datapath/control signaling) ❙ Execute ❙ Moore or synchronous Mealy machine to avoid loops unbroken Fetch by FF ❚ Instructions partitioned Instr. into three classes ❚ Plus Additional ”Control" Registers ❙ Branch ❙ Instruction register (IR) Load/ XEQ ❙ Load/store Branch Store Instr. ❙ (PC) Register- ❙ Register-to-register to-Register ❚ Inputs/Outputs Branch Branch ❚ Different sequence Taken Not Taken ❙ Outputs control elements of data path Incr. through diagram for PC ❙ Inputs from data path used to alter flow of program (test if each instruction type zero)

CS 150 - Spring 2004 – Lec 11: Computer Org I - 9 CS 150 - Spring 2004 – Lec 11: Computer Org I - 10

Data Path (Hierarchy) Data Path (ALU)

❚ Arithmetic circuits constructed in hierarchical and ❚ ALU Block Diagram Cin iterative fashion ❙ Input: data and operation to perform ❙ each bit in datapath is ❙ Output: result of operation and status information Ain FA Sum functionally identical Bin ❙ 4-bit, 8-bit, 16-bit, 32-bit Cout AB 16 16 Ain Sum HA Bin Cout HA Operation Cin

16

N SZ

CS 150 - Spring 2004 – Lec 11: Computer Org I - 11 CS 150 - Spring 2004 – Lec 11: Computer Org I - 12 Data Path (ALU + Registers) Data Path (Bit-slice)

❚ Accumulator ❚ Bit-slice concept: iterate to build n-bit wide datapaths ❙ Special register ❙ One of the inputs to ALU ❙ Output of ALU stored back in accumulator CO ALU CI CO ALU ALU CI ❚ One-address instructions

❙ Operation and address of one operand 16 AC AC AC ❙ Other operand and destination is accumulator register REG AC R0 R0 R0

❙ AC <– AC op Mem[addr] 16 16 rs rs rs ❙ ”Single address instructions” OP rt rt rt (AC implicit operand) rd rd rd ❚ Multiple registers N 16 from from from ❙ Part of instruction used Z memory memory memory to choose register operands 1 bit wide 2 bits wide

CS 150 - Spring 2004 – Lec 11: Computer Org I - 13 CS 150 - Spring 2004 – Lec 11: Computer Org I - 14

Instruction Path Data Path (Memory Interface)

❚ Memory ❚ Program Counter ❙ Separate data and instruction memory () ❙ Keeps track of program execution ❘ Two address busses, two data busses ❙ Address of next instruction to read from memory ❙ Single combined memory (Princeton architecture) ❙ May have auto-increment feature or use ALU ❘ Single address bus, single data bus ❚ Instruction Register ❚ Separate memory ❙ Current instruction ❙ ALU output goes to data memory input ❙ Includes ALU operation and address of operand ❙ Register input from data memory output ❙ Data memory address from instruction register ❙ Also holds target of jump instruction ❙ Instruction register from instruction memory output ❙ Immediate operands ❙ Instruction memory address from program counter ❚ Relationship to Data Path ❚ Single memory ❙ PC may be incremented through ALU ❙ Address from PC or IR ❙ Contents of IR may also be required as input to ALU ❙ Memory output to instruction and data registers ❙ Memory input from ALU output CS 150 - Spring 2004 – Lec 11: Computer Org I - 15 CS 150 - Spring 2004 – Lec 11: Computer Org I - 16

Block Diagram of Processor Block Diagram of Processor

❚ Register Transfer View of Princeton Architecture ❚ Register transfer view of Harvard architecture ❙ Which register outputs are connected to which register inputs ❙ Which register outputs are connected to which register inputs ❙ Arrows represent data-flow, other are control signals from ❙ Arrows represent data-flow, other are control signals from control FSM load control FSM load 16 path 16 path ❙ MAR may be a simple multiplexer ❙ Two MARs (PC and IR) rather than separate register REG AC rd wr REG AC rd wr 16 16 store ❙ Two MBRs (REG and IR) 16 16 store ❙ MBR is split in two path data path data OP Data Memory ❙ Load control for each register OP Data Memory (REG and IR) (16-bit words) (16-bit words) addr addr ❙ Load control N 8 N 16 Z Z for each register Control MAR FSM Control 16 FSM 16 IR PC IR PC data 16 16 16 16 Inst Memory (8-bit words) OP OP addr

16 16

CS 150 - Spring 2004 – Lec 11: Computer Org I - 17 CS 150 - Spring 2004 – Lec 11: Computer Org I - 18 A simplified Processor Data-path and Memory Processor Control

❚ Princeton architecture memory has only 255 words ❚ Synchronous Mealy machine with a display on the last one ❚ Register file ❚ Multiple ❚ Instruction register ❚ PC incremented through ALU ❚ Modeled after MIPS rt000 (used in 61C textbook by Patterson & Hennessy) ❙ Really a 32 bit machine ❙ We’ll do a 16 bit

version CS 150 - Spring 2004 – Lec 11: Computer Org I - 19 CS 150 - Spring 2004 – Lec 11: Computer Org I - 20

Processor Instructions Tracing an Instruction's Execution

❚ Three principal types (16 bits in each instruction) ❚ Instruction: r3 = r1 + r2 type op rs rt rd funct R 0 rs=r1 rt=r2 rd=r3 funct=0 R(egister) 33334 I(mmediate) 3337 ❚ 1. Instruction fetch J(ump) 313 ❙ Move instruction address from PC to memory address bus ❚ Some of the instructions ❙ Assert memory read add 0 rs rt rd 0 rd = rs + rt ❙ Move data from memory data bus into IR R sub 0 rs rt rd 1 rd = rs - rt ❙ Configure ALU to add 1 to PC and 0 rs rt rd 2 rd = rs & rt or 0 rs rt rd 3 rd = rs | rt ❙ Configure PC to store new value from ALUout slt 0 rs rt rd 4 rd = (rs < rt) lw 1 rs rt offset rt = mem[rs + offset] ❚ 2. Instruction decode I sw 2 rs rt offset mem[rs + offset] = rt ❙ Op-code bits of IR are input to control FSM beq 3 rs rt offset pc = pc + offset, if (rs == rt) addi 4 rs rt offset rt = rs + offset ❙ Rest of IR bits encode the operand addresses (rs and rt) J j 5 target address pc = target address ❘ These go to register file halt 7 - stop execution until reset

CS 150 - Spring 2004 – Lec 11: Computer Org I - 21 CS 150 - Spring 2004 – Lec 11: Computer Org I - 22

Tracing an Instruction's Execution Tracing an Instruction's Execution (cont’d) (cont’d)

❚ Instruction: r3 = r1 + r2 ❚ Step 1 R 0 rs=r1 rt=r2 rd=r3 funct=0 ❚ 3. Instruction execute ❙ Set up ALU inputs ❙ Configure ALU to perform ADD operation ❙ Configure register file to store ALU result (rd)

CS 150 - Spring 2004 – Lec 11: Computer Org I - 23 CS 150 - Spring 2004 – Lec 11: Computer Org I - 24 Tracing an Instruction's Execution Tracing an Instruction's Execution (cont’d) (cont’d)

❚ Step 2 ❚ Step 3

CS 150 - Spring 2004 – Lec 11: Computer Org I - 25 to controller CS 150 - Spring 2004 – Lec 11: Computer Org I - 26

Register-Transfer-Level Description Register-Transfer-Level Description (cont’d)

❚ Control ❚ How many states are needed to accomplish these ❙ Transfer data btwn registers by asserting appropriate control signals transfers? ❚ Register transfer notation: work from register to register ❙ Data dependencies (where do values that are needed come from?) ❙ Instruction fetch: mabus ← PC; – move PC to memory address bus (PCmaEN, ALUmaEN) ❙ Resource conflicts (ALU, busses, etc.) memory read; – assert memory read signal (mr, RegBmdEN) IR ← memory; – load IR from memory data bus (IRld) ❚ In our case, it takes three cycles op ← add – send PC into A input, 1 into B input, add ❙ One for each step (srcA, srcB0, scrB1, op) PC ← ALUout – load result of incrementing in ALU into PC (PCld, PCsel) ❙ All operation within a cycle occur between rising edges of the clock ❙ Instruction decode: ❚ How do we set all of the control signals to be output by the IR to controller values of A and B read from register file (rs, rt) state machine? ❙ Instruction execution: ❙ Depends on the type of machine (Mealy, Moore, synchronous Mealy) op ← add – send regA into A input, regB into B input, add (srcA, srcB0, scrB1, op) rd ← ALUout – store result of add into destination register (regWrite, wrDataSel, wrRegSel)

CS 150 - Spring 2004 – Lec 11: Computer Org I - 27 CS 150 - Spring 2004 – Lec 11: Computer Org I - 28

FSM Controller for CPU (skeletal Moore Review of FSM Timing FSM)

❚ First pass at deriving the state diagram (Moore machine) ❙ These will be further refined into sub-states reset decode execute instruction fetch fetch

step 1 step 2 step 3 IR ← mem[PC]; A ← rs rd ← A + B instruction PC ← PC + 1; B ← rt decode

SW LW ADD J instruction to configure the data-path to do this here, execution when do we set the control signals?

CS 150 - Spring 2004 – Lec 11: Computer Org I - 29 CS 150 - Spring 2004 – Lec 11: Computer Org I - 30 FSM Controller for CPU (reset and instruction fetch) FSM Controller for CPU (decode)

❚ Assume Moore machine ❚ Operation Decode State ❙ Outputs associated with states rather than arcs ❙ Next state branch based on operation code in instruction ❚ Reset state and instruction fetch sequence ❙ Read two operands out of register file ❘ What if the instruction doesn’t have two operands? ❚ On reset (go to Fetch state) ❙ Start fetching instructions ❙ PC will set itself to zero reset instruction Decode decode Fetch instruction branch based on value of mabus ← PC; fetch Inst[15:13] and Inst[3:0] memory read; IR ← memory data bus; PC ← PC + 1; add

CS 150 - Spring 2004 – Lec 11: Computer Org I - 31 CS 150 - Spring 2004 – Lec 11: Computer Org I - 32

FSM Controller for CPU (Instruction FSM Controller for CPU (Add Execution) Instruction)

❚ For add instruction ❚ Putting it all together ❙ Configure ALU and store result in register and closing the loop ❙ the famous reset rd ← A + B instruction instruction fetch Fetch fetch ❙ Other instructions may require multiple cycles decode execute cycle Decode instruction instruction decode add execution

instruction add execution

CS 150 - Spring 2004 – Lec 11: Computer Org I - 33 CS 150 - Spring 2004 – Lec 11: Computer Org I - 34

FSM Controller for CPU

❚ Now we need to repeat this for all the instructions of our processor ❙ Fetch and decode states stay the same ❙ Different execution states for each instruction ❘ Some may require multiple states if available register transfer paths require sequencing of steps

CS 150 - Spring 2004 – Lec 11: Computer Org I - 35