Second Instruction: addi $rt, $rs, imm Third Instruction: lw $rt, imm($rs)

sign extension (sx) unit + + 4 4

a ? P Insn Register rs P Insn Register ? C Mem File C Mem File ? dMem ? s1 s2 d s1 s2 d ? S S X Extended(imm) X ?

II--typetype Op(6)Op(6) rs(5)rs(5) rt(5)rt(5) Immed(16) II--typetype Op(6)Op(6) rs(5)rs(5) rt(5)rt(5) Immed(16)

• Destination register can now be either rd or rt • Add data memory, address is ALU output (rs+imm) • Add sign extension unit and mux into second ALU input • Add register write data mux to select memory output or ALU output

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 22 from Roth ECE152 23

Fourth Instruction: sw $rt, imm($rs) Fifth Instruction: beq $1,$2,target

<< 2 + + 4 4

a a P Insn Register Data P Insn Register z Data C Mem File ? dMem C Mem File dMem s1 s2 d s1 s2 d S S X X

II--typetype Op(6)Op(6) rs(5)rs(5) rt(5)rt(5) Immed(16) II--typetype Op(6)Op(6) rs(5)rs(5) rt(5)rt(5) Immed(16)

• Add path from second input register to data memory data input • Add left shift unit (why?) and to compute PC-relative branch target • Disable RegFile’s WE signal • Add mux to do what?

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 24 from Roth ECE152 25 Sixth Instruction: j Seventh, Eight, Ninth Instructions

• Are these the paths we would need for all instructions? << 2 sll $1,$2,4 // shift left logical << + 2 • Like an arithmetic operation, but need a shifter too 4 slt $1,$2,$3 // set less than (slt) • Like subtract, but need to write the condition bits, not the result a P Insn Register Data • Need zero extension unit for condition bits C Mem File dMem • Need additional input to register write data mux s1 s2 d jal absolute_target // jump and link S X • Like a jump, but also need to write PC+4 into $ra ($31) • Need path from PC+4 adder to register write data mux

JJ--typetype Op(6)Op(6) Immed(26) • Need to be able to specify $31 as an implicit destination jr $31 // jump register • Add shifter to compute left shift of 26-bit immediate • Like a jump, but need path from register read to PC write mux • Add additional PC input mux for jump target

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 26 from Roth ECE152 27

Clock Timing This Unit: Design

• Must deliver clock(s) to avoid races Application • components and timing OS • Can’t write and read same value at same clock edge • Registers and register files Firmware • Particularly a problem for RegFile and Memory • Memories (RAMs)

• May create multiple clock edges (from single input clock) CPU I/O • Clocking strategies by using buffers (to delay clock) and inverters Memory • Mapping an ISA to a datapath • Control Digital Circuits • Exceptions Gates & Transistors

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 28 from Roth ECE152 29 What Is Control? Example: Control for add

BR BR=0 << << 2 2 << JP << JP=0 + 2 + 2 4 4

a a P Insn Register Data P Insn Register Data C Mem File dMem Rwd C Mem File dMem Rwd=0 s1 s2 d s1 s2 d Rwe S ALUop S ALUop=0 X DMwe Rwe=1 X DMwe=0 Rdst ALUinB Rdst=1 ALUinB=0 • 9 signals control flow of data through this datapath • MUX selectors, or register/memory write enable signals • Datapath of current has 100s of control signals

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 30 from Roth ECE152 31

Example: Control for sw Example: Control for beq $1,$2,target

BR=0 BR=1 << << 2 2 << JP=0 << JP=0 + 2 + 2 4 4

a a P Insn Register Data P Insn Register Data C Mem File dMem Rwd=X C Mem File dMem Rwd=X s1 s2 d s1 s2 d S ALUop=0 S ALUop=1 Rwe=0 X DMwe=1 Rwe=0 X DMwe=0 Rdst=X ALUinB=1 Rdst=X ALUinB=0 • Difference between a sw and an add is 5 signals • Difference between a store and a branch is only 4 signals • 3 if you don’t count the X (“don’t care”) signals

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 32 from Roth ECE152 33 How Is Control Implemented? Implementing Control

BR • Each instruction has a unique set of control signals << 2 • Most signals are function of opcode << JP + 2 • Some may be encoded in the instruction itself 4 • E.g., the ALUop signal is some portion of the MIPS Func field + Simplifies controller implementation a P Insn Register Data – Requires careful ISA design C Mem File dMem Rwd s1 s2 d • Options for implementing control Rwe S ALUop X DMwe 1. Use instruction type to look up control signals in a table Rdst ALUinB 2. Design FSM whose outputs are control signals • Either way, goal is same: turn instruction into control signals Control?

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 34 from Roth ECE152 35

Control Implementation: ROM ROM vs.

• ROM (read only memory) : like a RAM but unwritable • A control ROM is fine for 6 insns and 9 control signals • Bits in data words are control signals • A real machine has 100+ insns and 300+ control signals • Lines indexed by • Even “RISC”s have lots of instructions • 30,000+ control bits (~4KB) • Example: ROM control for our simple datapath – Not huge, but hard to make fast • Control must be faster than datapath

BR JP ALUinB ALUop DMwe Rwe Rdst Rwd add 0 0 0 0 0 1 1 0 • Alternative: combinational logic addi 0 0 1 0 0 1 1 0 • ECE 52 strikes back! opcode lw 0 0 1 0 0 1 0 1 • Exploits observation: many signals have few 1s or few 0s sw 0 0 1 0 1 0 0 0 beq 1 0 0 1 0 0 0 0 j 0 1 0 0 0 0 0 0

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 36 from Roth ECE152 37 Control Implementation: Combinational Logic Datapath and Control Timing

• Example: combinational logic control for our simple + datapath 4

a P Insn Register Data opcode add C Mem File dMem addi s1 s2 d lw S sw X beq j Control (ROM or combinational logic)

BR JP DMwe Rwe Rwd Rdst ALUop ALUinB Read IMem Read Registers Read DMEM Write DMEM (Read Control ROM) Write Registers Write PC

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 38 from Roth ECE152 39

This Unit: Exceptions

Application • Datapath components and timing • Exceptions and interrupts OS • Registers and register files • Infrequent (exceptional!) events Compiler Firmware • Memories (RAMs) • I/O, divide-by-0, illegal instruction, page fault, protection fault, ctrl-C, ctrl-Z, timer CPU I/O • Clocking strategies • Mapping an ISA to a datapath Memory • Handling requires intervention from operating system • Control Digital Circuits • End program: divide-by-0, protection fault, illegal insn, ^C • Exceptions • Fix and restart program: I/O, page fault, ^Z, timer Gates & Transistors

• Handling should be transparent to application code • Don’t want to (can’t) constantly check for these using insns • Want “Fix and restart” equivalent to “never happened”

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 40 from Roth ECE152 41 Exception Handling MIPS Exception Handling

• What does exception handling look like to software? • MIPS uses registers to hold during exception handling • When exception happens… • These registers live on “ 0” • $14 : EPC (holds PC of user program during exception handling) • Control transfers to OS at pre-specified exception handler address • $13 : exception type (SYSCALL, overflow, etc.) • OS has privileged access to registers user processes do not see • $8 : virtual address (that produced page/protection fault) • These registers hold information about exception • $12 : exception mask (which exceptions trigger OS) • Cause of exception (e.g., page fault, arithmetic overflow) • Exception registers accessed using two privileged • Other exception info (e.g., address that caused page fault) instructions mfc0 , mtc0 • PC of application insn to return to after exception is fixed • Privileged = user can’t execute them • OS uses privileged (and non-privileged) registers to do its “thing” • mfc0: move (register) from coprocessor 0 (to user reg) • OS returns control to user application • mtc0: move (register) to coprocessor 0 (from user reg) • Privileged instruction rfe restores user mode • Same mechanism available programmatically via SYSCALL • Kernel executes this instruction to restore user program

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 42 from Roth ECE152 43

Implementing Exceptions Datapath with Support for Exceptions

• Why do architects care about exceptions? PSRs • Because we use datapath and control to implement them P << Co-procesor PCwC • More precisely… to implement aspects of exception handling S 2 << R • Recognition of exceptions + 2 4 • Transfer of control to OS PSRr CRwd CRwe • Privileged OS mode A a D P Insn I Register Data ALUinAC O C Mem R File dMem s1 s2 d B S X

• Co-processor register (CR) file needn’t be implemented as RF • Independent registers connected directly to pertinent muxes • PSR (processor ) : in privileged mode? © 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 44 from Roth ECE152 45 Summary “Single-Cycle” Performance

• We now know how to build a fully functional processor • Useful metric: (CPI) • But … + Easy to calculate for single-cycle processor: CPI = 1 • We’re still treating memory as a black box (actually two green • Seconds/program = (insns/program) * 1 CPI * (N seconds/cycle) boxes, to be precise) • ICQ: How many cycles/second in 3.8 GHz processor? • Our fully functional processor is slow. Really, really slow. – Slow! • Clock period must be elongated to accommodate longest operation • In our datapath: lw • Goes through five structures in series: insn mem, register file (read), ALU, data mem, register file again (write) • No one will buy a machine with a slow clock • Not even your grandparents!

© 2012 Daniel J. Sorin © 2012 Daniel J. Sorin from Roth ECE152 46 from Roth ECE152 47

This Unit: Processor Design

Application • Datapath components and timing OS • Registers and register files Compiler Firmware • Memories (RAMs)

CPU I/O • Clocking strategies Memory • Mapping an ISA to a datapath • Control Digital Circuits Gates & Transistors

Next up: Pipelining

© 2012 Daniel J. Sorin from Roth ECE152 48