Lecture 14: Memory Elements
Mark McDermott Electrical and Computer Engineering The University of Texas at Austin
10/23/18 VLSI-1 Class Notes Memory Elements
§ Memory arrays § SRAMs § Serial Memories § Dynamic memories
Pentium-4 (Willamette) Yellow boxes are memory arrays
10/23/18 VLSI-1 Class Notes Page 2 Memory Arrays
Memory Arrays
Random Access Memory Serial Access Memory Content Addressable Memory (CAM)
Read/Write Memory Read Only Memory (RAM) (ROM) Shift Registers Queues (Volatile) (Nonvolatile)
Serial In Parallel In First In Last In Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out (SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)
Mask ROM Programmable Erasable Electrically Flash ROM ROM Programmable Erasable (PROM) ROM Programmable (EPROM) ROM (EEPROM)
10/23/18 VLSI-1 Class Notes Page 3 1D Memory Architecture
m bits m bits S S 0 Word 0 0 Word 0 S S 1 Word 1 1 Word 1 S S 2 Word 2 Storage A0 2 Word 2 Storage S S 3 Cell A1 3 Cell
Ak-1
Decoder
n words
S S n-2 Word n-2 n-2 Word n-2 S S n-1 Word n-1 n-1 Word n-1
Input/Output Input/Output n words ® n select signals Decoder reduces # of inputs
k = log2 n
10/23/18 VLSI-1 Class Notes Page 4 2D Memory Architecture
2k-j bit line word line Aj Aj+1 storage
Ak-1 (RAM) cell
Row Address Row
Row DecoderRow
m2j A 0 selects appropriate word A1 Column Decoder from memory row
Column Address Aj-1 Sense Amplifiers amplifies bit line swing
Read/Write Circuits
Input/Output (m bits)
10/23/18 VLSI-1 Class Notes Page 5 Array Architecture
§ 2n words of 2m bits each § If n >> m, fold by 2k into fewer rows of more columns
bitline conditioning wordlines bitlines row decoder
memory cells: 2n-k rows x 2m+k columns
n-k k column circuitry n column decoder 2m bits § Good regularity – easy to design § Very high density if good cells are used
10/23/18 VLSI-1 Class Notes Page 6 SRAM Block Diagram
Precharge Precharge
2N Addr Address Row Wordline WL Memory Memory
Bitlines
2M M Column YSEL# <0> Column <2 -1> Column Chip Select Decoder Select Select Differential Column Mux
Sense Amplifier saout Input Output Write Enable Buffer Buffer
Data_in Data_out
10/23/18 VLSI-1 Class Notes Page 7 SRAM Simulation Cross Section
BLPC#
WL<0> WL Drivers Keepers WL<2 N-1>
Row Decoder Bitcell
Column Decoder Col<2> Col<1> Col<0> SAPC#
Data_out Wrt Driver Dout
<0 SenseAmp
wsel > WE*CS SAE
Data_in
10/23/18 VLSI-1 Class Notes Page 8 6T SRAM Cell
§ Cell size accounts for most of array size – Reduce cell size at expense of complexity § 6T SRAM Cell PG: pass gate – Used in most commercial chips PU: pull-up – Data stored in cross-coupled inverters PD: pull-down § Read: – Precharge bit, bit_b Wordline – Raise wordline PU PU § Write: – Drive data onto bit, bit_b PG PG – Raise wordline PD PD
Bit Bit#
10/23/18 VLSI-1 Class Notes Page 9 Moore’s Law Scaling for Memory
Source: Kelin Kuhn, Intel
10/23/18 VLSI-1 Class Notes Page 10 SRAM Memory Cell Improvements
10/23/18 VLSI-1 Class Notes Page 11 12T SRAM Cell
§ Basic building block: SRAM Cell – Holds one bit of information, like a latch – Must be read and written § 12-transistor (12T) SRAM cell – Use a simple latch connected to bitline – 46 x 75 l unit cell bit write
write_b
read
read_b
10/23/18 VLSI-1 Class Notes Page 12 SRAM Read
§ Improve performance when bit-line capacitance is high § Precharge both bitlines high bit bit_b § Then turn on wordline word P1 P2 § One of the two bitlines N2 N4 will be pulled down by the cell A A_b N1 N3
§ Ex: A = 0, A_b = 1 – bit discharges, bit_b stays high A_b bit_b
– But A bumps up slightly 1.5
1.0 bit § Read stability word – A must not flip 0.5 A
– N1 >> N2 0.0 0 100 200 300 400 500 600 time (ps)
10/23/18 VLSI-1 Class Notes Page 13 Read Timing
phi2 CLK phi1
WL BL/BL# DV differential BL equalized
SAE Dt tsu YSEL#
BLPC# BL precharge
SAPC# SA precharge SAOUT DATA_OUT evaluate/latch time
10/23/18 VLSI-1 Class Notes Read Requirements
§ Pre-charge & equalize bit-lines from previous cycle § Minimum “Design Margin” before next READ begins § Delay requirement to allow sufficient bit-line voltage development BL Equalization BL# Degradation Due to & Restore Spec Quiet WL(s) + coupling VDD
SA Offset BL# Design Margin Requirement
BL VSS
Write Restore Read Dt delay
§ NOTE: The Dt delay can be generated by a chain of inverter delays or by replica “dummy” row and column composed of bitcells.
10/23/18 VLSI-1 Class Notes SRAM Write
§ Drive one bitline high, the other low § Then turn on wordline bit bit_b § Bitlines overpower cell with new word value P1 P2 N2 N4 § Ex: A = 0, A_b = 1, A A_b N1 N3 bit = 1, bit_b = 0 – Force A_b low,
then A rises high A_b
1.5 A
§ Writability bit_b – Must overpower 1.0 feedback inverter – N4 >> P2 0.5 word
– N2 >> P1 (symmetry) 0.0 0 100 200 300 400 500 600 700 time (ps)
10/23/18 VLSI-1 Class Notes Page 16 Write Timing
write mode from latch read mode WREN
phi2 CLK phi1
WL
WSEL write pulsewidth
BL/BL# BL equalized
BLPC# BL precharge
DATA data arrives early from previous cycle
DATA_OUT holds state from previous READ time
10/23/18 VLSI-1 Class Notes Write Requirements
§ Pre-charge & equalize bit-lines from previous cycle § WRITE can begin as soon as word-line is available § Must guarantee minimum write pulse-width, data valid time and write recovery; internal “high node” reaches say 90% of VDD
BL Equalization WRITE Recovery Design Margin & Restore Spec VDD 90% of VDD
BL#
data#
data and data# are the 6T bitcell state nodes. WL 50% of VDD
WRITE Pulse Width BL
internal cell nodes cross
VSS
TTW data § NOTE: Write pulse-width margin increases with lower frequency
10/23/18 VLSI-1 Class Notes SRAM Sizing
§ High bitlines must not overpower inverters during reads § But low bitlines must write new value into cell
bit bit_b word weak med med A A_b strong
Wdn > Wpass for read stability Wpass > Wup to enable writes
10/23/18 VLSI-1 Class Notes Page 19 6-Transistor SRAM Cell Layout
transistor LEFT RIGHT PG 1.00 1.00 PD 1.38 1.38 PU 0.44 0.44 PFET NFET BL #BL
0.07/.055 0.07/.055 55 0.16/.055 0.16/.055 160 70 120 (m3) poly 70 360 220 0.22/.055 0.22/.055 70 55 160 (m3) 110 WL 1040 PASS Units are in nanometers {nm} GATE In 45nm CMOS, a typical µ 6T bit-cell area = 0.38 µm2 1.04 m (45nm)
VLSI-1 Class Notes 10/23/18 Decoders
§ n:2n decoder consists of 2n n-input AND gates – One needed for each row of memory – Build AND from NAND or NOR gates
Static CMOS Pseudo-nMOS
A1 A0 A1 A0
word0 1 1 8 word0 1/2 4 16 word word A1 1 4 A0 A1 2 8 word1 word1 1 1 A0 1 word2 word2
word3 word3
10/23/18 VLSI-1 Class Notes Page 21 Decoder Layout
§ Decoders must be pitch-matched to SRAM cell – Requires very skinny gates
A3 A3 A2 A2 A1 A1 A0 A0
VDD
word
GND
NAND gate buffer inverter
10/23/18 VLSI-1 Class Notes Page 22 Large Decoders
§ For n > 4, NAND gates become slow – Break large gates into multiple smaller gates
A3 A2 A1 A0
word0
word1
word2
word3
word15
10/23/18 VLSI-1 Class Notes Page 23 Predecoding
§ Many of these gates are redundant
– Factor out common A3 gates into pre-decoder – Saves area A2 – Same path effort A1
A0
predecoders
1 of 4 hot predecoded lines
word0
word1
word2
word3
word15
10/23/18 VLSI-1 Class Notes Page 24 Column Circuitry
§ Some circuitry is required for each column – Bitline conditioning – Sense amplifiers – Column multiplexing
bitline conditioning wordlines bitlines row decoder
memory cells: 2n-k rows x 2m+k columns
n-k k column circuitry n column decoder 2m bits
10/23/18 VLSI-1 Class Notes Page 25 Bitline Conditioning
§ Precharge bitlines high before reads f bit bit_b
§ Equalize bitlines to minimize voltage f difference when using sense amplifiers bit bit_b
10/23/18 VLSI-1 Class Notes Page 26 Sense Amplifiers
§ Bitlines have many cells attached – Ex: 32-kbit SRAM has 256 rows x 128 cols – 128 cells on each bitline
§ tpd µ (C/I) DV – Even with shared diffusion contacts, 64C of diffusion capacitance (big C) – Discharged slowly through small transistors (small I) § Sense amplifiers are triggered on small voltage swing (reduce DV) BL Equalization BL# Degradation Due to & Restore Spec Quiet WL(s) + coupling VDD
Sense Amp BL# Design Margin Offset requirement
BL VSS
Write Restore Read
Dt delay
10/23/18 VLSI-1 Class Notes Page 27 Differential Pair Amp
§ Differential pair requires no clock § But always dissipates static power
P1 P2 sense_b sense bit N1 N2 bit_b
N3
10/23/18 VLSI-1 Class Notes Page 28 Clocked Sense Amp
§ Clocked sense amp saves power § Requires timing the sense_clk signal to arrive after enough bitline swing § Isolation transistors cut off large bitline capacitance bit bit_b
sense_clk isolation transistors
regenerative feedback
sense sense_b
10/23/18 VLSI-1 Class Notes Page 29 De-coupled Sense Amplifier
§ With SAE low; Data and Data# are pre-charge high § When SAE goes high; source-coupled pair acts as differential amplifier § Cross coupled inverters amplify and latch any voltage difference
Bit Bit#
Data# Data Out Out#
SAE
10/23/18 VLSI-1 Class Notes Page 30 Twisted Bitlines
§ Sense amplifiers also amplify noise – Coupling noise is severe in modern processes – Try to couple equally onto bit and bit_b (common mode) – Done by twisting bitlines
b0 b0_b b1 b1_b b2 b2_b b3 b3_b
10/23/18 VLSI-1 Class Notes Page 31 Column Multiplexing
§ Recall that array may be folded for good aspect ratio § Ex: 2 kword x 16 folded into 256 rows x 128 columns – Must select 16 output bits from the 128 columns
– Requires 16 8:1 column multiplexers Precharge Precharge
2N Addr Address Row Wordline WL Memory Memory
Bitlines
M 2 <0> <2M-1> Column YSEL# Column Column Chip Select Decoder Select Select
Differential Column Mux
Sense Amplifier saout Input Output Write Enable Buffer Buffer
Data_in Data_out
10/23/18 VLSI-1 Class Notes Page 32 Tree Decoder Mux
§ Column mux can use pass transistors – Use nMOS only, precharge outputs § One design is to use k series transistors for 2k:1 mux – No external decoder logic needed
B0 B1 B2 B3 B4 B5 B6 B7 B0 B1 B2 B3 B4 B5 B6 B7 A0 A0
A1 A1
A2 A2
Y Y to sense amps and write circuits
10/23/18 VLSI-1 Class Notes Page 33 Single Pass-Gate Mux
§ Or eliminate series transistors with separate decoder A1 A0
B0 B1 B2 B3
Y
10/23/18 VLSI-1 Class Notes Page 34 Ex: 2-way Muxed SRAM
f2
More More Cells Cells word_q1
A0 A0
write0_q1 f2 write1_q1
data_v1
10/23/18 VLSI-1 Class Notes Page 35 Multiple Ports
§ We have considered single-ported SRAM – One read or one write on each cycle § Multiported SRAM are needed for register files § Examples: – Multicycle MIPS must read two sources or write a result on some cycles – Pipelined MIPS must read two sources and write a third result each cycle – Superscalar MIPS must read and write many sources and results each cycle
10/23/18 VLSI-1 Class Notes Page 36 Dual-Ported SRAM
§ Simple dual-ported SRAM – Two independent single-ended reads – Or one differential write
WordB WordA
BitB BitA BitA# BitB#
§ Do two reads and one write by time multiplexing – Read during ph1, write during ph2
10/23/18 VLSI-1 Class Notes Page 37 Multi-Ported SRAM
§ Adding more access transistors hurts read stability § Multi-ported SRAM isolates reads from state node § Single-ended design minimizes number of bitlines
bA bB bC bD bE bF bG wordA wordB wordC wordD wordE wordF wordG
write circuits
read circuits
10/23/18 VLSI-1 Class Notes Page 38 BASIC ARRAY LAYOUT
Columns
Precharge
WordDecoder Line Cell WordLine
Cell Cell BitLine Rows Decode
Pre Pre Column Decoder
- Bit-line Sense Amplifiers Write Buffers
Address Write Data Read Data
VLSI-1 Class Notes 10/23/18 SRAM Layout Using a Memory Compiler
10/23/18 VLSI-1 Class Notes CAMs
§ Extension of ordinary memory (e.g. SRAM) – Read and write memory as usual – Also match to see which words contain a key
adr data/key
read CAM match write
10/23/18 VLSI-1 Class Notes Page 41 10T CAM Cell
§ Add four match transistors to 6T SRAM – 56 x 43 l unit cell
bit bit_b word cell cell_b
match
10/23/18 VLSI-1 Class Notes Page 42 CAM Cell Operation
§ Read and write like ordinary SRAM § For matching: CAM cell – Leave wordline low clk weak – Precharge matchlines row decoder miss – Place key on bitlines address match0 – Matchlines evaluate § Miss line match1 – Pseudo-nMOS NOR of match lines match2 – Goes high if no words match match3 read/write column circuitry data
10/23/18 VLSI-1 Class Notes Page 43 4x4 DRAM Memory
read 2 bit words precharge bit line precharge enable WL[0] BL A 1 WL[1]
A2 WL[2]
Row Decoder WL[3]
sense amplifiers clocking, control, and BL[0] BL[1] BL[2] BL[3] write circuitry refresh A0 Column Decoder
10/23/18 VLSI-1 Class Notes Page 44 3-Transistor DRAM Cell
WWL WWL write RWL Vdd M3 BL1
X M1 M2 X Vdd-Vt C s RWL read
BL2 Vdd-Vt DV BL1 BL2
No constraints on device sizes (ratioless) Reads are non-destructive
Value stored at node X when writing a “1” is VWWL - Vtn
10/23/18 VLSI-1 Class Notes Page 45 1-Transistor DRAM Cell
WL write read WL “1” “1”
M1 X X Vdd-Vt Cs CBL
BL Vdd BL Vdd/2 sensing
Write: Cs is charged (or discharged) by asserting WL and BL
Read: Charge redistribution occurs between CBL and Cs
Read is destructive, so must refresh after read
10/23/18 VLSI-1 Class Notes Page 46 1-T DRAM Cell
Capacitor
Metal word line M1 word line SiO2 poly Field Oxide n+ n+
Inversion layer poly Diffused induced by bit line plate bias Polysilicon Polysilicon plate (a) Cross-section gate (b) Layout
Used Polysilicon-Diffusion Capacitance Expensive in Area
10/23/18 VLSI-1 Class Notes Page 47 Dense 1T DRAM Cell
Cell Plate Si
Refilling Poly Capacitor Insulator
Storage Node Poly Si Substrate 2nd Field Oxide
Trench Cell
10/23/18 VLSI-1 Class Notes Page 48 DRAM Cell Observations
§ DRAM memory cells are single ended (complicates the design of the sense amp) § 1T cell requires a sense amp for each bit line due to charge redistribution read § 1T cell read is destructive; refresh must follow to restore data § 1T cell requires an extra capacitor that must be explicitly included in the design § A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than Vdd)
10/23/18 VLSI-1 Class Notes Page 49 Serial Access Memories
§ Serial access memories do not use an address – Shift Registers – Tapped Delay Lines – Serial In Parallel Out (SIPO) – Parallel In Serial Out (PISO)
10/23/18 VLSI-1 Class Notes Page 50 Shift Register
§ Shift registers store and delay data § Simple design: cascade of registers – Watch your hold times!
clk
Din Dout 8
10/23/18 VLSI-1 Class Notes Page 51 Denser Shift Registers
§ Flip-flops aren’t very area-efficient § For large shift registers, keep data in SRAM instead § Move R/W pointers to RAM rather than data – Initialize read address to first entry, write to last – Increment address on each cycle Din clk
counter readaddr 00...00 dual-ported counter SRAM writeaddr 11...11
reset Dout
10/23/18 VLSI-1 Class Notes Page 52 Tapped Delay Line
§ A tapped delay line is a shift register with a programmable number of stages § Set number of stages with delay controls to mux – Ex: 0 – 63 stages of delay
clk SR32 SR16 SR8 SR4 SR2 SR1 Din Dout
delay5 delay4 delay3 delay2 delay1 delay0
10/23/18 VLSI-1 Class Notes Page 53 Serial In Parallel Out
§ 1-bit shift register reads in serial data – After N steps, presents N-bit parallel output
clk
Sin
P0 P1 P2 P3
10/23/18 VLSI-1 Class Notes Page 54 Parallel In Serial Out
§ Load all N bits in parallel when shift = 0 – Then shift one bit out per cycle
P0 P1 P2 P3 shift/load clk
Sout
10/23/18 VLSI-1 Class Notes Page 55 SOFT ERROR RATE (SER)
10/23/18 VLSI-1 Class Notes Page 56 BACKGROUND § There are 2 categories of system failure: – hard failure (permanent failures that require replacement) – soft failure (non-permanent random system failures) § Cause of failures could be noise, power glitches, design margins, etc § In large memory systems, soft errors are mostly due to radiation § In 1978, May & Woods[5] (Intel) found radioactive materials in memory packages emitting alpha particles which can generate sufficient charge to switch the state of stored charge in DRAMs § Minute traces of radioactive elements can be found in alumina- based ceramics, zirconia & silica fillers used in packaging § Another potential source of alpha particles is from cosmic radiation – High energy particles from cosmic rays can have energies greater than 1GeV – Alpha particle energies typically range from 0.1 to 10 MeV
10/23/18 VLSI-1 Class Notes Page 57 ALPHA PARTICLES
§ An alpha-particle is a doubly charged helium nucleus (2 protons, 2 neutrons) that is generated during radioactive decay of high-Z atoms § More than 300 known alpha-emitting nuclides: – Uranium(238), Thorium(232) can be found in package materials for semiconductors – Radioactive decay of U238 à Th234 + He4 until it decays to a stable Pb206 (8 alphas are generated) – Thorium generates 6 alphas as it decays from Th232 to stable Pb208 § Alpha particles interact with silicon to generate an ionization trail of electron-hole pairs § The amount of electron-hole pairs generated depends on the particles initial energy (~3.6eV per e-h pair)
10/23/18 VLSI-1 Class Notes Page 58 α Particle Energy for Silicon
Electron-Hole Pair Generation versus alpha energy for Silicon
1.00E+06
1.00E+05 Alpha particles generated from Th232 and U238 will 1.00E+04 have energies ranging from 3.95MeV to 8.78MeV 1.00E+03
1.00E+02 Total E-H Pairs Generated Pairs E-H Total 1.00E+01
1.00E+00 0.01 0.1 1 10 alpha Particle Energy (MeV)
10/23/18 VLSI-1 Class Notes Page 59 FUNNEL EFFECT
He++ Alpha track Depletion region distortion results in enhanced charge J = Angle of Incidence collection by the hit node
SiO2 _ N + _ + _ Drift current causes the + _ storage node to discharge Funnel of charge due to E-H + pair generation P _ + If the total charge exceeds Qcrit, then state of bitcell or latch will flip
Qcrit = number of electrons which differentiates between a 1 and 0 (May & Woods) [1]
10/23/18 VLSI-1 Class Notes Page 60 SOFT ERROR RATE
6-transistor SRAM cell WL
1
BL #BL
a particle hit
a-particle hit can be modeled as a sub- nanosecond current pulse as described by Chenming Hu
10/23/18 VLSI-1 Class Notes Page 61 Memory Array Redundancy
10/23/18 VLSI-1 Class Notes Page 62 BASIC ARRAY LAYOUT
Columns
Precharge Cell WordLine
Decoder Cell Cell BitLine
Rows Pre Dec Bitline Receivers
- Write Buffers
Address Write Data Read Data
10/23/18 VLSI-1 Class Notes Page 63 ARRAY REDUNDANT ELEMENTS
Columns Account for area Redundant Precharge overhead if Wordline & redundancy is used Driver Cell for repair WordLine
Decoder Cell Cell BitLine
Rows
Redundant Pre Address & Dec Bitline Receivers Redundant Column
enable - Write Buffers & Bitslice
Address Write Data Read Data
10/23/18 VLSI-1 Class Notes Page 64 SRAM Cell Stability Analysis
10/23/18 VLSI-1 Class Notes Page 65 SRAM Cell Stability Analysis
§ Bitlines are precharged to VCC à this is the critical situation because the nmos access device shunts the pmos load device; thereby reducing the gain of the inverters § Static noise voltage sources are inserted into cross-couple path between inverters
vn
vn § SNM is defined by the maximum value of Vn that can be tolerated before changing state; sweep noise voltage from 0V to the point where differential collapses to zero § Include temperature, voltage and process variation using a Monte Carlo simulator
10/23/18 VLSI-1 Class Notes Page 66 SRAM Cell Stability Analysis
Large diagonal due to access Retention Curve devices being OFF 3.5 Standby or Data Retention state when wordline is not 3 selected à Cell is stable 2.5 2 1.5 1 The maximum square is approximately VCC/2 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5
10/23/18 VLSI-1 Class Notes Page 67 SRAM Cell Stability Analysis
Diagonal is reduced since access device is ON Stability Curve 3.5 The maximum square 3 v represents SNM of the bitcell
2.5
SNM 2
1.5
1 Transform coordinates 450 1 1 0.5 x = --- u + ---- v 0 2 2 0 0.5 1 1.5 2 2.5 3 3.5 1 1 u y = - --- u + ---- v 2 2
10/23/18 VLSI-1 Class Notes Page 68 Backup
10/23/18 VLSI-1 Class Notes Page 69 4-Transistor Dynamic RAM Cell
Remove the two p-channel transistors from static RAM cell, to get a four-transistor dynamic RAM cell
Data stored as charge on gate capacitors (complementary nodes) Data must be refreshed regularly Dynamic cells must be designed very carefully
10/23/18 VLSI-1 Class Notes Page 70 3-Transistor Dynamic RAM Cell
Data stored on the gate of a transistor Need two additional transistors, one for write and the other for read control
10/23/18 VLSI-1 Class Notes Page 71 1-Transistor Dynamic RAM Cell
Value of CB must be chosen very carefully; otherwise, voltage on bit-line will be affected by charge sharing
Cannot get any smaller than this: data stored on a (trench) capacitor C, need a transistor to control data Bit line normally precharged to ½ VDD (need a sense amplifier)
10/23/18 VLSI-1 Class Notes Page 72 SRAM Layout
§ Cell size is critical: 26 x 45 l (even smaller in industry)
§ Tile cells sharing VDD, GND, bitline contacts
GND BIT BIT_B GND
VDD
WORD
Cell boundary
10/23/18 VLSI-1 Class Notes Page 73