<<

Lecture 14: Memory Elements

Mark McDermott Electrical and Engineering The University of Texas at Austin

10/23/18 VLSI-1 Class Notes Memory Elements

§ Memory arrays § SRAMs § Serial Memories § Dynamic memories

Pentium-4 (Willamette) Yellow boxes are memory arrays

10/23/18 VLSI-1 Class Notes Page 2 Memory Arrays

Memory Arrays

Random Access Memory Serial Access Memory Content Addressable Memory (CAM)

Read/Write Memory Read Only Memory (RAM) (ROM) Shift Registers Queues (Volatile) (Nonvolatile)

Serial In Parallel In First In Last In Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out (SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)

Mask ROM Programmable Erasable Electrically Flash ROM ROM Programmable Erasable (PROM) ROM Programmable (EPROM) ROM (EEPROM)

10/23/18 VLSI-1 Class Notes Page 3 1D Memory Architecture

m bits m bits S S 0 Word 0 0 Word 0 S S 1 Word 1 1 Word 1 S S 2 Word 2 Storage A0 2 Word 2 Storage S S 3 Cell A1 3 Cell

Ak-1

Decoder

n words

S S n-2 Word n-2 n-2 Word n-2 S S n-1 Word n-1 n-1 Word n-1

Input/Output Input/Output n words ® n select Decoder reduces # of inputs

k = log2 n

10/23/18 VLSI-1 Class Notes Page 4 2D Memory Architecture

2k-j bit line word line Aj Aj+1 storage

Ak-1 (RAM) cell

Row Address Row

Row DecoderRow

m2j A 0 selects appropriate word A1 Column Decoder from memory row

Column Address Aj-1 Sense Amplifiers amplifies bit line swing

Read/Write Circuits

Input/Output (m bits)

10/23/18 VLSI-1 Class Notes Page 5 Array Architecture

§ 2n words of 2m bits each § If n >> m, fold by 2k into fewer rows of more columns

bitline conditioning wordlines bitlines row decoder

memory cells: 2n-k rows x 2m+k columns

n-k k column circuitry n column decoder 2m bits § Good regularity – easy to design § Very high density if good cells are used

10/23/18 VLSI-1 Class Notes Page 6 SRAM Block Diagram

Precharge Precharge

2N Addr Address Row Wordline WL Memory Memory Buffer Decoder Driver Cell Cell

Bitlines

2M M Column YSEL# <0> Column <2 -1> Column Chip Select Decoder Select Select Differential Column Mux

Sense Amplifier saout Input Output Write Enable Buffer Buffer

Data_in Data_out

10/23/18 VLSI-1 Class Notes Page 7 SRAM Simulation Cross Section

BLPC#

WL<0> WL Drivers Keepers WL<2 N-1>

Row Decoder Bitcell

Column Decoder Col<2> Col<1> Col<0> SAPC#

Data_out Wrt Driver Dout

<0 SenseAmp

wsel > WE*CS SAE

Data_in

10/23/18 VLSI-1 Class Notes Page 8 6T SRAM Cell

§ Cell size accounts for most of array size – Reduce cell size at expense of complexity § 6T SRAM Cell PG: pass gate – Used in most commercial chips PU: pull-up – Data stored in cross-coupled inverters PD: pull-down § Read: – Precharge bit, bit_b Wordline – Raise wordline PU PU § Write: – Drive data onto bit, bit_b PG PG – Raise wordline PD PD

Bit Bit#

10/23/18 VLSI-1 Class Notes Page 9 Moore’s Law Scaling for Memory

Source: Kelin Kuhn, Intel

10/23/18 VLSI-1 Class Notes Page 10 SRAM Improvements

10/23/18 VLSI-1 Class Notes Page 11 12T SRAM Cell

§ Basic building block: SRAM Cell – Holds one bit of information, like a latch – Must be read and written § 12- (12T) SRAM cell – Use a simple latch connected to bitline – 46 x 75 l unit cell bit write

write_b

read

read_b

10/23/18 VLSI-1 Class Notes Page 12 SRAM Read

§ Improve performance when bit-line capacitance is high § Precharge both bitlines high bit bit_b § Then turn on wordline word P1 P2 § One of the two bitlines N2 N4 will be pulled down by the cell A A_b N1 N3

§ Ex: A = 0, A_b = 1 – bit discharges, bit_b stays high A_b bit_b

– But A bumps up slightly 1.5

1.0 bit § Read stability word – A must not flip 0.5 A

– N1 >> N2 0.0 0 100 200 300 400 500 600 time (ps)

10/23/18 VLSI-1 Class Notes Page 13 Read Timing

phi2 CLK phi1

WL BL/BL# DV differential BL equalized

SAE Dt tsu YSEL#

BLPC# BL precharge

SAPC# SA precharge SAOUT DATA_OUT evaluate/latch time

10/23/18 VLSI-1 Class Notes Read Requirements

§ Pre-charge & equalize bit-lines from previous cycle § Minimum “Design Margin” before next READ begins § requirement to allow sufficient bit-line development BL BL# Degradation Due to & Restore Spec Quiet WL(s) + coupling VDD

SA Offset BL# Design Margin Requirement

BL VSS

Write Restore Read Dt delay

§ NOTE: The Dt delay can be generated by a chain of inverter delays or by replica “dummy” row and column composed of bitcells.

10/23/18 VLSI-1 Class Notes SRAM Write

§ Drive one bitline high, the other low § Then turn on wordline bit bit_b § Bitlines overpower cell with new word value P1 P2 N2 N4 § Ex: A = 0, A_b = 1, A A_b N1 N3 bit = 1, bit_b = 0 – Force A_b low,

then A rises high A_b

1.5 A

§ Writability bit_b – Must overpower 1.0 inverter – N4 >> P2 0.5 word

– N2 >> P1 (symmetry) 0.0 0 100 200 300 400 500 600 700 time (ps)

10/23/18 VLSI-1 Class Notes Page 16 Write Timing

write mode from latch read mode WREN

phi2 CLK phi1

WL

WSEL write pulsewidth

BL/BL# BL equalized

BLPC# BL precharge

DATA data arrives early from previous cycle

DATA_OUT holds state from previous READ time

10/23/18 VLSI-1 Class Notes Write Requirements

§ Pre-charge & equalize bit-lines from previous cycle § WRITE can begin as soon as word-line is available § Must guarantee minimum write pulse-width, data valid time and write recovery; internal “high node” reaches say 90% of VDD

BL Equalization WRITE Recovery Design Margin & Restore Spec VDD 90% of VDD

BL#

data#

data and data# are the 6T bitcell state nodes. WL 50% of VDD

WRITE Pulse Width BL

internal cell nodes cross

VSS

TTW data § NOTE: Write pulse-width margin increases with lower

10/23/18 VLSI-1 Class Notes SRAM Sizing

§ High bitlines must not overpower inverters during reads § But low bitlines must write new value into cell

bit bit_b word weak med med A A_b strong

Wdn > Wpass for read stability Wpass > Wup to enable writes

10/23/18 VLSI-1 Class Notes Page 19 6-Transistor SRAM Cell Layout

transistor LEFT RIGHT PG 1.00 1.00 PD 1.38 1.38 PU 0.44 0.44 PFET NFET BL #BL

0.07/.055 0.07/.055 55 0.16/.055 0.16/.055 160 70 120 (m3) poly 70 360 220 0.22/.055 0.22/.055 70 55 160 (m3) 110 WL 1040 PASS Units are in nanometers {nm} GATE In 45nm CMOS, a typical µ 6T bit-cell area = 0.38 µm2 1.04 m (45nm)

VLSI-1 Class Notes 10/23/18 Decoders

§ n:2n decoder consists of 2n n-input AND gates – One needed for each row of memory – Build AND from NAND or NOR gates

Static CMOS Pseudo-nMOS

A1 A0 A1 A0

word0 1 1 8 word0 1/2 4 16 word word A1 1 4 A0 A1 2 8 word1 word1 1 1 A0 1 word2 word2

word3 word3

10/23/18 VLSI-1 Class Notes Page 21 Decoder Layout

§ Decoders must be pitch-matched to SRAM cell – Requires very skinny gates

A3 A3 A2 A2 A1 A1 A0 A0

VDD

word

GND

NAND gate buffer inverter

10/23/18 VLSI-1 Class Notes Page 22 Large Decoders

§ For n > 4, NAND gates become slow – Break large gates into multiple smaller gates

A3 A2 A1 A0

word0

word1

word2

word3

word15

10/23/18 VLSI-1 Class Notes Page 23 Predecoding

§ Many of these gates are redundant

– Factor out common A3 gates into pre-decoder – Saves area A2 – Same path effort A1

A0

predecoders

1 of 4 hot predecoded lines

word0

word1

word2

word3

word15

10/23/18 VLSI-1 Class Notes Page 24 Column Circuitry

§ Some circuitry is required for each column – Bitline conditioning – Sense amplifiers – Column multiplexing

bitline conditioning wordlines bitlines row decoder

memory cells: 2n-k rows x 2m+k columns

n-k k column circuitry n column decoder 2m bits

10/23/18 VLSI-1 Class Notes Page 25 Bitline Conditioning

§ Precharge bitlines high before reads f bit bit_b

§ Equalize bitlines to minimize voltage f difference when using sense amplifiers bit bit_b

10/23/18 VLSI-1 Class Notes Page 26 Sense Amplifiers

§ Bitlines have many cells attached – Ex: 32-kbit SRAM has 256 rows x 128 cols – 128 cells on each bitline

§ tpd µ (C/I) DV – Even with shared diffusion contacts, 64C of diffusion capacitance (big C) – Discharged slowly through small (small I) § Sense amplifiers are triggered on small voltage swing (reduce DV) BL Equalization BL# Degradation Due to & Restore Spec Quiet WL(s) + coupling VDD

Sense Amp BL# Design Margin Offset requirement

BL VSS

Write Restore Read

Dt delay

10/23/18 VLSI-1 Class Notes Page 27 Differential Pair Amp

§ Differential pair requires no clock § But always dissipates static

P1 P2 sense_b sense bit N1 N2 bit_b

N3

10/23/18 VLSI-1 Class Notes Page 28 Clocked Sense Amp

§ Clocked sense amp saves power § Requires timing the sense_clk to arrive after enough bitline swing § Isolation transistors cut off large bitline capacitance bit bit_b

sense_clk isolation transistors

regenerative feedback

sense sense_b

10/23/18 VLSI-1 Class Notes Page 29 De-coupled Sense Amplifier

§ With SAE low; Data and Data# are pre-charge high § When SAE goes high; source-coupled pair acts as § Cross coupled inverters amplify and latch any voltage difference

Bit Bit#

Data# Data Out Out#

SAE

10/23/18 VLSI-1 Class Notes Page 30 Twisted Bitlines

§ Sense amplifiers also amplify – Coupling noise is severe in modern processes – Try to couple equally onto bit and bit_b (common mode) – Done by twisting bitlines

b0 b0_b b1 b1_b b2 b2_b b3 b3_b

10/23/18 VLSI-1 Class Notes Page 31 Column Multiplexing

§ Recall that array may be folded for good aspect ratio § Ex: 2 kword x 16 folded into 256 rows x 128 columns – Must select 16 output bits from the 128 columns

– Requires 16 8:1 column multiplexers Precharge Precharge

2N Addr Address Row Wordline WL Memory Memory Buffer Decoder Driver Cell Cell

Bitlines

M 2 <0> <2M-1> Column YSEL# Column Column Chip Select Decoder Select Select

Differential Column Mux

Sense Amplifier saout Input Output Write Enable Buffer Buffer

Data_in Data_out

10/23/18 VLSI-1 Class Notes Page 32 Tree Decoder Mux

§ Column mux can use pass transistors – Use nMOS only, precharge outputs § One design is to use k series transistors for 2k:1 mux – No external decoder logic needed

B0 B1 B2 B3 B4 B5 B6 B7 B0 B1 B2 B3 B4 B5 B6 B7 A0 A0

A1 A1

A2 A2

Y Y to sense amps and write circuits

10/23/18 VLSI-1 Class Notes Page 33 Single Pass-Gate Mux

§ Or eliminate series transistors with separate decoder A1 A0

B0 B1 B2 B3

Y

10/23/18 VLSI-1 Class Notes Page 34 Ex: 2-way Muxed SRAM

f2

More More Cells Cells word_q1

A0 A0

write0_q1 f2 write1_q1

data_v1

10/23/18 VLSI-1 Class Notes Page 35 Multiple Ports

§ We have considered single-ported SRAM – One read or one write on each cycle § Multiported SRAM are needed for register files § Examples: – Multicycle MIPS must read two sources or write a result on some cycles – Pipelined MIPS must read two sources and write a third result each cycle – Superscalar MIPS must read and write many sources and results each cycle

10/23/18 VLSI-1 Class Notes Page 36 Dual-Ported SRAM

§ Simple dual-ported SRAM – Two independent single-ended reads – Or one differential write

WordB WordA

BitB BitA BitA# BitB#

§ Do two reads and one write by time multiplexing – Read during ph1, write during ph2

10/23/18 VLSI-1 Class Notes Page 37 Multi-Ported SRAM

§ Adding more access transistors hurts read stability § Multi-ported SRAM isolates reads from state node § Single-ended design minimizes number of bitlines

bA bB bC bD bE bF bG wordA wordB wordC wordD wordE wordF wordG

write circuits

read circuits

10/23/18 VLSI-1 Class Notes Page 38 BASIC ARRAY LAYOUT

Columns

Precharge

WordDecoder Line Cell WordLine

Cell Cell BitLine Rows Decode

Pre Pre Column Decoder

- Bit-line Sense Amplifiers Write Buffers

Address Write Data Read Data

VLSI-1 Class Notes 10/23/18 SRAM Layout Using a Memory Compiler

10/23/18 VLSI-1 Class Notes CAMs

§ Extension of ordinary memory (e.g. SRAM) – Read and write memory as usual – Also match to see which words contain a key

adr data/key

read CAM match write

10/23/18 VLSI-1 Class Notes Page 41 10T CAM Cell

§ Add four match transistors to 6T SRAM – 56 x 43 l unit cell

bit bit_b word cell cell_b

match

10/23/18 VLSI-1 Class Notes Page 42 CAM Cell Operation

§ Read and write like ordinary SRAM § For matching: CAM cell – Leave wordline low clk weak – Precharge matchlines row decoder miss – Place key on bitlines address match0 – Matchlines evaluate § Miss line match1 – Pseudo-nMOS NOR of match lines match2 – Goes high if no words match match3 read/write column circuitry data

10/23/18 VLSI-1 Class Notes Page 43 4x4 DRAM Memory

read 2 bit words precharge bit line precharge enable WL[0] BL A 1 WL[1]

A2 WL[2]

Row Decoder WL[3]

sense amplifiers clocking, control, and BL[0] BL[1] BL[2] BL[3] write circuitry refresh A0 Column Decoder

10/23/18 VLSI-1 Class Notes Page 44 3-Transistor DRAM Cell

WWL WWL write RWL Vdd M3 BL1

X M1 M2 X Vdd-Vt C s RWL read

BL2 Vdd-Vt DV BL1 BL2

No constraints on device sizes (ratioless) Reads are non-destructive

Value stored at node X when writing a “1” is VWWL - Vtn

10/23/18 VLSI-1 Class Notes Page 45 1-Transistor DRAM Cell

WL write read WL “1” “1”

M1 X X Vdd-Vt Cs CBL

BL Vdd BL Vdd/2 sensing

Write: Cs is charged (or discharged) by asserting WL and BL

Read: Charge redistribution occurs between CBL and Cs

Read is destructive, so must refresh after read

10/23/18 VLSI-1 Class Notes Page 46 1-T DRAM Cell

Capacitor

Metal word line M1 word line SiO2 poly Field Oxide n+ n+

Inversion layer poly Diffused induced by bit line plate bias Polysilicon Polysilicon plate (a) Cross-section gate (b) Layout

Used Polysilicon-Diffusion Capacitance Expensive in Area

10/23/18 VLSI-1 Class Notes Page 47 Dense 1T DRAM Cell

Cell Plate Si

Refilling Poly Insulator

Storage Node Poly Si Substrate 2nd Field Oxide

Trench Cell

10/23/18 VLSI-1 Class Notes Page 48 DRAM Cell Observations

§ DRAM memory cells are single ended (complicates the design of the sense amp) § 1T cell requires a sense amp for each bit line due to charge redistribution read § 1T cell read is destructive; refresh must follow to restore data § 1T cell requires an extra capacitor that must be explicitly included in the design § A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than Vdd)

10/23/18 VLSI-1 Class Notes Page 49 Serial Access Memories

§ Serial access memories do not use an address – Shift Registers – Tapped Delay Lines – Serial In Parallel Out (SIPO) – Parallel In Serial Out (PISO)

10/23/18 VLSI-1 Class Notes Page 50 Shift Register

§ Shift registers store and delay data § Simple design: cascade of registers – Watch your hold times!

clk

Din Dout 8

10/23/18 VLSI-1 Class Notes Page 51 Denser Shift Registers

§ Flip-flops aren’t very area-efficient § For large shift registers, keep data in SRAM instead § Move R/W pointers to RAM rather than data – Initialize read address to first entry, write to last – Increment address on each cycle Din clk

counter readaddr 00...00 dual-ported counter SRAM writeaddr 11...11

reset Dout

10/23/18 VLSI-1 Class Notes Page 52 Tapped Delay Line

§ A tapped delay line is a shift register with a programmable number of stages § Set number of stages with delay controls to mux – Ex: 0 – 63 stages of delay

clk SR32 SR16 SR8 SR4 SR2 SR1 Din Dout

delay5 delay4 delay3 delay2 delay1 delay0

10/23/18 VLSI-1 Class Notes Page 53 Serial In Parallel Out

§ 1-bit shift register reads in serial data – After N steps, presents N-bit parallel output

clk

Sin

P0 P1 P2 P3

10/23/18 VLSI-1 Class Notes Page 54 Parallel In Serial Out

§ Load all N bits in parallel when shift = 0 – Then shift one bit out per cycle

P0 P1 P2 P3 shift/load clk

Sout

10/23/18 VLSI-1 Class Notes Page 55 SOFT ERROR RATE (SER)

10/23/18 VLSI-1 Class Notes Page 56 BACKGROUND § There are 2 categories of system failure: – hard failure (permanent failures that require replacement) – soft failure (non-permanent random system failures) § Cause of failures could be noise, power glitches, design margins, etc § In large memory systems, soft errors are mostly due to radiation § In 1978, May & Woods[5] (Intel) found radioactive materials in memory packages emitting alpha particles which can generate sufficient charge to the state of stored charge in DRAMs § Minute traces of radioactive elements can be found in alumina- based ceramics, zirconia & silica fillers used in packaging § Another potential source of alpha particles is from cosmic radiation – High energy particles from cosmic rays can have energies greater than 1GeV – Alpha particle energies typically range from 0.1 to 10 MeV

10/23/18 VLSI-1 Class Notes Page 57 ALPHA PARTICLES

§ An alpha-particle is a doubly charged helium nucleus (2 protons, 2 neutrons) that is generated during radioactive decay of high-Z atoms § More than 300 known alpha-emitting nuclides: – Uranium(238), Thorium(232) can be found in package materials for semiconductors – Radioactive decay of U238 à Th234 + He4 until it decays to a stable Pb206 (8 alphas are generated) – Thorium generates 6 alphas as it decays from Th232 to stable Pb208 § Alpha particles interact with silicon to generate an ionization trail of electron-hole pairs § The amount of electron-hole pairs generated depends on the particles initial energy (~3.6eV per e-h pair)

10/23/18 VLSI-1 Class Notes Page 58 α Particle Energy for Silicon

Electron-Hole Pair Generation versus alpha energy for Silicon

1.00E+06

1.00E+05 Alpha particles generated from Th232 and U238 will 1.00E+04 have energies ranging from 3.95MeV to 8.78MeV 1.00E+03

1.00E+02 Total E-H Pairs Generated Pairs E-H Total 1.00E+01

1.00E+00 0.01 0.1 1 10 alpha Particle Energy (MeV)

10/23/18 VLSI-1 Class Notes Page 59 FUNNEL EFFECT

He++ Alpha track Depletion region results in enhanced charge J = Angle of Incidence collection by the hit node

SiO2 _ N + _ + _ Drift current causes the + _ storage node to discharge Funnel of charge due to E-H + pair generation P _ + If the total charge exceeds Qcrit, then state of bitcell or latch will flip

Qcrit = number of electrons which differentiates between a 1 and 0 (May & Woods) [1]

10/23/18 VLSI-1 Class Notes Page 60 SOFT ERROR RATE

6-transistor SRAM cell WL

1

BL #BL

a particle hit

a-particle hit can be modeled as a sub- nanosecond current pulse as described by Chenming Hu

10/23/18 VLSI-1 Class Notes Page 61 Memory Array Redundancy

10/23/18 VLSI-1 Class Notes Page 62 BASIC ARRAY LAYOUT

Columns

Precharge Cell WordLine

Decoder Cell Cell BitLine

Rows Pre Dec Bitline Receivers

- Write Buffers

Address Write Data Read Data

10/23/18 VLSI-1 Class Notes Page 63 ARRAY REDUNDANT ELEMENTS

Columns Account for area Redundant Precharge overhead if Wordline & redundancy is used Driver Cell for repair WordLine

Decoder Cell Cell BitLine

Rows

Redundant Pre Address & Dec Bitline Receivers Redundant Column

enable - Write Buffers & Bitslice

Address Write Data Read Data

10/23/18 VLSI-1 Class Notes Page 64 SRAM Cell Stability Analysis

10/23/18 VLSI-1 Class Notes Page 65 SRAM Cell Stability Analysis

§ Bitlines are precharged to VCC à this is the critical situation because the nmos access device shunts the pmos load device; thereby reducing the of the inverters § Static noise voltage sources are inserted into cross-couple path between inverters

vn

vn § SNM is defined by the maximum value of Vn that can be tolerated before changing state; sweep noise voltage from 0V to the point where differential collapses to zero § Include temperature, voltage and process variation using a Monte Carlo simulator

10/23/18 VLSI-1 Class Notes Page 66 SRAM Cell Stability Analysis

Large diagonal due to access   Retention Curve devices being OFF   3.5 Standby or Data Retention state when wordline is not 3 selected à Cell is stable 2.5 2 1.5 1 The maximum square is approximately VCC/2 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5

10/23/18 VLSI-1 Class Notes Page 67 SRAM Cell Stability Analysis

Diagonal is reduced since access device is ON Stability Curve 3.5 The maximum square 3 v represents SNM of the bitcell

2.5

SNM 2

1.5

1 Transform coordinates 450 1 1 0.5 x = --- u + ---- v 0 2 2 0 0.5 1 1.5 2 2.5 3 3.5 1 1 u y = - --- u + ---- v 2 2

10/23/18 VLSI-1 Class Notes Page 68 Backup

10/23/18 VLSI-1 Class Notes Page 69 4-Transistor Dynamic RAM Cell

Remove the two p-channel transistors from static RAM cell, to get a four-transistor dynamic RAM cell

Data stored as charge on gate (complementary nodes) Data must be refreshed regularly Dynamic cells must be designed very carefully

10/23/18 VLSI-1 Class Notes Page 70 3-Transistor Dynamic RAM Cell

Data stored on the gate of a transistor Need two additional transistors, one for write and the other for read control

10/23/18 VLSI-1 Class Notes Page 71 1-Transistor Dynamic RAM Cell

Value of CB must be chosen very carefully; otherwise, voltage on bit-line will be affected by charge sharing

Cannot get any smaller than this: data stored on a (trench) capacitor C, need a transistor to control data Bit line normally precharged to ½ VDD (need a sense amplifier)

10/23/18 VLSI-1 Class Notes Page 72 SRAM Layout

§ Cell size is critical: 26 x 45 l (even smaller in industry)

§ Tile cells sharing VDD, GND, bitline contacts

GND BIT BIT_B GND

VDD

WORD

Cell boundary

10/23/18 VLSI-1 Class Notes Page 73