Ee577b Register File

EE577b Register File By Joong-Seok Moon Register File • A set of registers that store data • Consists of a small array of static memory cells • Smallest size and fastest access time in memory hierarchy (Register File – On-chip Cache – Off-chip Cache – Main Memory – DISK) • Frequently used by microprocessors and DSPs •Permits multiple read and write ports – 2-read/1-write: Scalar microprocessor (e.g. DLX) – 8-read/4-write: Super-scalar microprocessor (often more than that), VLIW – 1-read/1-write: DSP data/coefficient memory Register File Cell Single-ended Read/Write • Single-ended 2-read/1-write ports (Slow-write) • Fully-static, No precharge required • NMOS of I1 should be sized bigger because node A will be Vdd-Vth during write operation • I2 should be weak (N1-N2 change the data) • I3: buffer for the storage node Register File Cell Single-ended Read/Dual-ended Write • Dual-ended write: Either A or B pulled low – But actually single-ended bitWR bitWR bitRD2 bitRD1 operation (It’s ok usually wordRD1 write is much faster than read) N2 wordRD2 • Precharge required for N3 I1 read AB – B=1: discharge bitRD (slow N1 read for large bitline cap) – B=0: hold precharge value I2 N4 N5 • No buffer inside cell wrEN – Sense-amplifier or skewed N6 N6 inverter to amplify slow discharge • Two write bitline drivers – bitWR/bitWR Register File Cell Single-ended Read/Dual-ended Write • Further optimization – Only one write bitline driver •bitWR=1 – N4,N6 on: Node A pulled down – N5 on: Node B pulled up – True dual-ended write •bitWR=0 – N5 on: Node B pulled down – One transistor on pull-down path – Single-ended write with enhanced speed Write Operation Address Decoder Static A0 A 1 wordline A 0 • Static N to 2N decoder 2 A0 A 1 wordline A 1 – wordline0=A0bA1bA2b…A(N-1)b 2 – More than 32 registers: multi-level decoder is desired – Works well with edge- triggered flip-flops for address inputs – Can we connect decoder output directly to drive wordline? • Extremely dangerous, why? A0 A1 wordlineN-1 • Glitches A2 • Read might be ok, but write can be problematic • Put latches at the decoder output AN-1 AN-2 AN-1 AN-2 AN-1 AN-2 AN-1 AN-2 Address Decoder Dynamic •Dynamic N to 2N decoder – Domino N-input AND gate – Charge sharing problem for large N – Gate Keeper may be required – Long NMOS chain for large N – No glitch at the output – Need qualified address input • Two-phase latch •Dynamic Flops Address Decoder Dynamic (Revised) • Revised dynamic N to 2N decoder –Make NMOS half size – Reverse input sequence Word[N-1] A0 A3 – Same active strenght W/2 W/2 – Charge-sharing reduced A1 A2 A2 A1 A3 A0 wordEN Write Driver • Tri-state Buffer – Write operation requires full-swing bitline Read-Out Circuitry • Small bitline capacitance • Single-ended sensing • May not need sense amplifier – Skewed buffer is fine for precharged scheme – Sensing value only when bitline goes to 0 – Latching old value (Latch and sensing) Read-Out Circuitry • Complete Static Circuit – Data is sensed by I1 –During read •Nl is off •Pf is on only if Vdd-Vth (read 1) •Pf charges back to Vdd • I1 must be sized with higher beta – After read •RE=0, Nl is on • Latch is formed through I1 and I2 Architectural Consideration • Pipelined processor Add R1,R2,R3FDEMW FDEMW FDEMW Sub R4,R1,R2 F D EMW • In the same cycle, read value just written – DLX assumes write in high-phase of clock and read in low-phase of clock: implicit bypassing – But only half of the clock cycle is allowed for read – Explicit bypassing: compare read and write addresses • If same: bypass write data to read output directly without read or discard read value • If different: normal read Architectural Consideration • Read caching Add R1,R2,R3 Sub R4,R1,R2 – Compare read addresses – If same, do not read and direct cached value – As write-read bypass, comparators are required – Make sense only if comparators consume less power than register file • Precharge for 0 or 1 value? – In DSP, quantitative study shows that values contain more 0 than 1 – For precharged register file design, • Value in memory = 0: preserve precharge • Value in memory = 1: discharge precharged value in bitlines Some comments • Many designer choose precharged design over pure static design – Skewed inverter for read-out circuit burns lots of power (slow slew rate, reduced voltage-level) – Precharge time and reading time should not overlap to avoid short-circuit currents – Precharge on->request read->precharge off->ack read->request precharge->read off->… – Asynchronous concepts is widely used in register file design.

Ee577b Register File

The Central Processing Unit(CPU). the Brain of Any Computer System Is the CPU

The Microarchitecture of a Low Power Register File

1.1.2. Register File

Reverse Engineering X86 Processor Microcode

Memory Hierarchy

Introduction to Cpu

Introduction to Microcoded Implementation of a CPU Architecture

Microcode Processor Monday, Feb

Computer Architectures an Overview

The Implementation of Prolog Via VAX 8600 Microcode ABSTRACT

Register File Design and Memory Design State Elements An

Reducing the Complexity of the Register File in Dynamic Superscalar Processors