EE577b Register File

By Joong-Seok Moon Register File

• A set of registers that store data • Consists of a small array of static memory cells • Smallest size and fastest access time in (Register File – On-chip – Off-chip Cache – Main Memory – DISK) • Frequently used by and DSPs •Permits multiple read and write ports – 2-read/1-write: Scalar (e.g. DLX) – 8-read/4-write: Super-scalar microprocessor (often more than that), VLIW – 1-read/1-write: DSP data/coefficient memory Register File Single-ended Read/Write • Single-ended 2-read/1-write ports (Slow-write) • Fully-static, No precharge required • NMOS of I1 should be sized bigger because node A will be Vdd-Vth during write operation • I2 should be weak (N1-N2 change the data) • I3: buffer for the storage node Register File Cell Single-ended Read/Dual-ended Write • Dual-ended write: Either A or B pulled low – But actually single-ended bitWR bitWR bitRD2 bitRD1 operation (It’s ok usually wordRD1 write is much faster than read) N2 wordRD2 • Precharge required for N3 I1 read AB – B=1: discharge bitRD (slow N1 read for large bitline cap)

– B=0: hold precharge value I2 N4 N5 • No buffer inside cell wrEN

– Sense-amplifier or skewed N6 N6 inverter to amplify slow discharge • Two write bitline drivers – bitWR/bitWR Register File Cell Single-ended Read/Dual-ended Write • Further optimization – Only one write bitline driver •bitWR=1 – N4,N6 on: Node A pulled down – N5 on: Node B pulled up – True dual-ended write •bitWR=0 – N5 on: Node B pulled down – One transistor on pull-down path – Single-ended write with enhanced speed Write Operation Static

A0 A 1 wordline A 0 • Static N to 2N decoder 2

A0 A 1 wordline A 1 – wordline0=A0bA1bA2b…A(N-1)b 2 – More than 32 registers: multi-level decoder is desired – Works well with edge- triggered flip- for address inputs – Can we connect decoder output directly to drive wordline? • Extremely dangerous, why? A0 A1 wordlineN-1 • Glitches A2 • Read might be ok, but write can be problematic • Put latches at the decoder output

AN-1 AN-2 AN-1 AN-2 AN-1 AN-2 AN-1 AN-2 Address Decoder Dynamic •Dynamic N to 2N decoder – Domino N-input AND gate – Charge sharing problem for large N – Gate Keeper may be required – Long NMOS chain for large N – No glitch at the output – Need qualified address input • Two-phase latch •Dynamic Flops Address Decoder Dynamic (Revised) • Revised dynamic N to 2N decoder –Make NMOS half size – Reverse input sequence Word[N-1] A0 A3 – Same active strenght W/2 W/2 – Charge-sharing reduced A1 A2

A2 A1

A3 A0

wordEN Write Driver

• Tri-state Buffer – Write operation requires full-swing bitline Read-Out Circuitry

• Small bitline capacitance • Single-ended sensing • May not need sense amplifier – Skewed buffer is fine for precharged scheme – Sensing value only when bitline goes to 0 – Latching old value (Latch and sensing) Read-Out Circuitry

• Complete Static Circuit – Data is sensed by I1 –During read

•Nl is off

•Pf is on only if Vdd-Vth (read 1)

•Pf charges back to Vdd • I1 must be sized with higher beta – After read

•RE=0, Nl is on • Latch is formed through I1 and I2 Architectural Consideration

• Pipelined Add R1,R2,R3FDEMW FDEMW FDEMW Sub R4,R1,R2 F D EMW • In the same cycle, read value just written – DLX assumes write in high-phase of clock and read in low-phase of clock: implicit bypassing – But only half of the clock cycle is allowed for read – Explicit bypassing: compare read and write addresses • If same: bypass write data to read output directly without read or discard read value • If different: normal read Architectural Consideration

• Read caching Add R1,R2,R3 Sub R4,R1,R2 – Compare read addresses – If same, do not read and direct cached value – As write-read bypass, comparators are required – Make sense only if comparators consume less power than register file • Precharge for 0 or 1 value? – In DSP, quantitative study shows that values contain more 0 than 1 – For precharged register file design, • Value in memory = 0: preserve precharge • Value in memory = 1: discharge precharged value in bitlines Some comments

• Many designer choose precharged design over pure static design – Skewed inverter for read-out circuit burns lots of power (slow slew rate, reduced voltage-level) – Precharge time and reading time should not overlap to avoid short-circuit currents – Precharge on->request read->precharge off->ack read->request precharge->read off->… – Asynchronous concepts is widely used in register file design