Semiconductor Memories: an ItIntrod ucti on Talk Overview
MTdMemory Trend Memory Classification Memory Architectures The Memory Core PihPeriphery Reliability Semiconductor Memory Trends (up to the 90’s )
Memory Size as a function of time: x 4 every three years Semiconductor Memory Trends (Upd at ed Furth er Beyond) Trends in Memory Cell Area Growth in DRAM Chip Capacity 1000000
256,000 000
100000 64,000 yy 16,000 10000
pacit 4,000 aa
1000 1,000 Kbit c Kbit 256
100 64
10 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 Year of introduction Semiconductor Memory Classification
Non-Volatile Read-Write Memory Read-Write Read-Only Memory Memory
Random Non-Random EPROM Mask-Programmed Access Access 2 E PROM Programmable (PROM)
DRAM FIFO FLASH SRAM LIFO MRAM Shift Register PRAM RRAM CAM Memory Timing: Definitions
Read cycle
READ
Write cycle Read access Read access WRITE
Write access Data valid
DATA
Data written Memory Architecture: Decoders Intuitive architecture for n x m memory Too many select signals: N words == N select signals m bits m bits S S0 Word 0 0 Word 0 S S1 Word 1 1 Word 1 S2 S2 Word 2 Storage A0 Word 2 Storage Cell S3 Cell S3 A1
Ak-1
S Sn-2 n-2 Word n-2 Word n-2 S Sn-1 n-1 Word n-1 Word n-1
Input/Output Input/Output n words → n select signals Decoder reduces # of inputs
k = log2 n ArrayArray--StructuredStructured Memory Architecture
2k-j bit line word line
Aj A j+1 storage (RAM) cell Ak-1
m2j A 0 selects appropriate word A1 Column Decoder from memory row Aj-1 Sense Amplifiers amplifies bit line swing Read/Write Circuits
Input/Output (m bits) Hierarchical Memory Architecture
Block 0 Block i Block P 21- Row address
ClColumn address Block address
Global data bus Control Block selector Global circuitry amplifier/driver
I/O Advantages: 1. Shorter wires within blocks for reduced local transit times 2. Block address activates only 1 block for power savings Block Diagram of 4 Mbit SRAM
Clock Z-address X-address generator buffer buffer
Predecoder and block selector Bit line load cccc cccc row dec row dec al row al row dede al row de al row de llll b bbb glo glob -glo -glob GlobaGloba SubSub- SubSub-
Transfer gate Column decoder Sense amplifier and write driver Local row dec
CS, WE I/O x1/x4 Y-address X-address buffer buffer controller buffer buffer Memory Timing: Approaches
Address bus Row Address Column Address RAS Address Bus Address Address transition CAS initiates memory operation
RAS-CAS timing
DRAM Timing: SRAM Timing: MltilMultiplexed dAd Adressi ng SlfSelf-timed Read-Only Memory Cells
BL BL BL
VDD WL WL WL 1
BL BL BL
WL WL WL 0
GND
Diode ROM MOS ROM 1 MOS ROM 2 MOS OR ROM
BL[0] BL[1] BL[2] BL[3]
WL[0]
VDD WL[1]
WL[2]
VDD
WL[3]
Vbias
Pull-down loads MOS NOR ROM
VDD Pull-up devices
WL[0]
GND WL [1]
WL [2]
GND WL [3]
BL [[]0] BL [[]1] BL [[]2] BL [[]3] MOS NAND ROM
VDD Pull-up devices
BL[0] BL[1] BL[2] BL[3]
WL[0]
WL[1]
WL[2]
WL[3]
All word li nes hi gh b y d ef ault with excepti on of sel ect ed row Equiva le n t Tr an si en t M odel f or M OS N OR R OM
VDD
BL r WL word Cbit
cword
Word line parasitics Wire capacitance and gate capacitance Wire resistance (polysilicon) Bit line parasiiitics Resistance not dominant (metal) Drain jjgunction and gate-drain overlappp capacitance Equivalent Transient Model for MOS NAND ROM
VDD
BL
CL rbit
c r bit WL word
cword
Word line parasitics Similar to NOR ROM Bit line parasitics Resistance of cascaded transistors dominates. Drain/source and complete gate capacitances Decreasing Word Line Delay
Drive the word line from both sides driver driver polysilicon word line WL metal word line
Use a metal bypass polysilicon word line WL
metal bypass
Use silicides Precharged MOS NOR ROM
V f pre DD
Precharge devices
WL[0]
GND WL[1]
WL[2] GND WL[3]
BL[0] BL[[]1] BL[[]2] BL[3]
PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design. NonNon--VolatileVolatile Memories The Float ing-gate transistor (FAMOS)
Floating gate Gate D Source Drain
tox G
tox S n+ p n+_ Substrate
Device cross-section Schematic symbol Floating-Gate Transistor Programming
20 V 0 V 5 V
10 V 5 V 20 V 5 V 0 V 2.5 V 5 V
S D S D S D
Avalanche injection Removing programming Programming results in voltage l eaves ch arge tdtrapped. high er VT. A “Programmable-Threshold” Transistor
I D “0”-state “1”-state
“ ON”
ΔDVT
“ OFF”
VWL VGS Floating-Gate Tunneling Oxide (FLOTOX) EEPROM
Floating gate Gate I
Source Drain
20–30 nm -10 V VGD 10 V + + nn1 p n1 Substrate
10 nm Fowler-Nordheim FLOTOX transistor I-V characteristic EEPROM Cell BL
WL Absolute threshold control is hard, and non- ppgrogrammed transistor might be in depletion. VDD Ö 2-transistor cell (one serving as the access transistor) Flash EEPROM
Control gate Floating gate
erasure Thin tunneling oxide
+1 n+1 nn source programming n drain p-substrate
Erasure using Fowler-Nordheim tunneling is performed in bulk for the complete chip or in a sub-section of the memory. Cross-sections of NVM cells
Flhlash EPROM Courtesy Intel Basic Operations in a NOR Flash Memory― Erase
cell array BL 0 BL 1 G 12 V 0 V WL 0 S D 12 V
0 V WL 1
open open Basic Operations in a NOR Flash Memory― Write
12 V BL 0 BL 1 G 6 V 12 V WL 0 S D 0 V
0V0 V WL 1
6 V 0 V Basic Operations in a NOR Flash Memory― Read
BL BL 5 V 0 1 G 1V1 V 5 V WL 0 S D 0 V
0V0 V WL 1
1V1 V 0V0 V NAND Flash Memory
Word line(poly)
Unit Cell Gate ONO
Gate FG Oxide
Source line (Diff. Layer)
Courtesy Toshiba NAND Flash Memory
Select transistor Word lines
Active area
STI
Bit line contact Source line contact Read-Write Memories (RAM)
STATIC (SRAM) Data stored as long as power supply is applied Large (6 transistors/cell) Fast Differential
DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended 6-transistor CMOS SRAM Cell
WL
VDD M 2 M 4 Q Q M M 5 6
M 1 M 3
BL BL CMOS SRAM Analysis (Read)
WL
VDD M BL 4 BL
Q = 0 M Q 1 6 M 5 =
V VDD M1 DD VDD
Cbit Cbit CMOS SRAM Analysis (Read)
121.2 1 V) ((
V 0.8 Δ 0.6 se, ii 0.4
ltage R 0.2 oo V 0 0 0.5 1 1.2 1.5 2 2.5 3 Cell Ratio (CR) CMOS SRAM Analysis (Write)
WL
VDD
M4
Q = 0 M6 M 5 Q = 1
M1 VDD BL = 1 BL = 0 CMOS SRAM Analysis (Write)
W / L PR 4 4 W6 / L6 Resistance-load SRAM Cell
WL
VDD RL RL
QQ M3 M 4
BL BL M 1 M 2
Static power dissipation -- Want R L large Bit lines precharged to VDD to address tp problem 3-Transistor DRAM Cell
BL1 BL2
WWL
RWL WWL
M 3 RWL
M 1 X X V DD 2-V T M 2 V DD CS BL 1
ΔD V BL 2 V DD 2- V T
No constraints on device ratios Reads are non-destructive.
Value stored at node X when writing a “1” = VWWL-VTn 1-Transistor DRAM Cell BL WL Write 1 Read 1 WL
M 1 X X GND VDD 2- VT CS
VDD BL V /2 V /2 DD sensing DD
CBL
Write: CS is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance
Voltage swing is small, typically around 250 mV. C ------S ΔV ==VBL – VPRE (V X – V PRE ) CS + CBL DRAM Cell Observations
1T DRAM requires a sense amplifier for each bit line due thto charge redi ditibtistribution read-out. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. DRAM memory cells are single-ended in contrast to SRAM cells. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writingg,g a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD. Sense Amplifier Operation
VBL V(1)
VPRE DΔV(1)
V(0) Sense amp activated t Word line activated 1-T DRAM Cell
Capacitor
M1 word line Metal word line
SiO2 Poly n+ n+ Field Oxide Diffused bit line Inversion layer Poly Polysilicon induced by Polysilicon plate bias gate plate Cross-section Layout
Uses Polysilicon-Diffusion Capacitance Expensive in Area Advanced 1T DRAM Cells
Word line Capacitor dielectric layer Insulating Layer Cell plate
Cell Plate Si
Transfer gate Isolation Refillinggy Poly Capacitor Insulator Storage electrode
Storage Node Poly Si Substrate 2nd Field Oxide
ThCllTrench Cell Stacked Cell Row Decoders
Collection of 2M complex logggic gates Organized in regular and dense fashion
(N)AND Decoder
NOR Decoder Hierarchical Decoders
Multi-stage implementation improves performance
•••
WL 1
WL 0
A 0A 1 A 0A 1 A 0A 1 A 0A 1 A 2A 3 A 2A 3 A 2A 3 A 2A 3
••• NAND decoder using 22--inputinput prepre--decodersdecoders A 1 A 0 A 0 A 1 A 3 A 2 A 2 A 3 Dynamic Decoders
PhPrecharge d dievices GND GND VDD
WL3 VDD WL3
WL WL 2 2 VDD
WL1 WL 1 VDD
WL0
WL 0
VDD φ A0 A0 A1 A1 A0 A0 A1 A1 φ
2-input NOR decoder 2-input NAND decoder 4-to-1 tree based column decoder
BL 0 BL 1 BL 2 BL 3
A 0
A 0
A1
A 1
D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combinati on of t ree and pass t ransi st or approach es Sense Amplifiers
make ΔVassmallV as small C×ΔV as possible (make the SA tp = ------Iav as sensitive as possible)
large small Idea: Use Sense Amplifer
small transition s.a.
input output Differential Sense Amplifier
V DD
M 3 M 4 y Out
bibit M 1 M 2 bibit
SE M 5
Directly applicable to SRAMs Differential Sensing ― SRAM
V V DD PC DD
BL BL VDD VDD EQ y M 3 M 4 2y
WL i x M 1 M 2 2x x 2x
SEM 5 SE
SE SRAM cell i
V Diff. DD x Sense 2x Output Amp y
SE Output (a) SRAM sens ing sc heme (b) two st age diff erenti al amplifi er LatchLatch--BasedBased Sense Amplifier (DRAM) EQ BL BL
VDD
SE
SE
Initialized in its meta-stable point with EQ Once adequate voltage gap is created, sense amp is enabled with SE Positi ve ffdbkeedback qu iklickly forces ou tpu t to a stbltable opera ting po itint. Charge-Redistribution Amplifier
V ref
V L V S M 1
Csmall M 2 M 3 Clarge
Transient Response 2.5
Concept 2.0 VS Vin 1.5 V VL 1.0
0.5 Vref 5= 3V 0.0 0.0 1.00 2.00 3.00 time (nsec) ChargeCharge--RedistributionRedistribution Amplifier―
EPROM VDD
SE M 4 Load Out
Cascode Cout Vcasc M 3 device
Ccol Column WLC M 2 decoder
BL M CBL EPROM WL 1 array Singl e-to-Differen tia l Conversi on
WL BL x 2x Diff. +1 SAS.A. V Cell 2- ref
OtOutput
Vref: reference voltage Open bitline architecture with dummy cells
EQ
L L L V 1 0 DD R0 R 1 L SE
BLL BLR
… … CS CS CS SE CS CS CS
Dummy cell DllDummy cell DRAM Read Process with Dummy Cell
3 3
2 2
BL BL
1 1 BL BL
0 0 0 1 2 3 0 1 2 3 t (ns) t (ns) reading 0 reading 1 3 EQ WL
2
SE
1
0 0 1 2 3 t (ns) control signals Voltage Regulator
VDD
Mdrive VREF VDL Equivalent Model
Vbias
VREF - Mdrive +
VDL Charge Pump
VDD 2VDD -2 VT VB M1 V -2 V CLK AB DD T 0 V Cpump M2 Vload
Cload Vload 0 V DRAM Timing Reliability and Yield
Semiconductor memories trade-off noise margin for density and performance. Thus, they are highly sensitive to noises (cross talk, supply noise, etc).
High density and large die size cause yield problems #of# of “good” chips/wafer Yield = 100 # of chips/wafer
Y = [(1 – e–chip_area*defect_density)/(chip_area*defect_density)]2
Increase yield using error correction and redundancy. Noise Sources in DRAM
BL subtbstra te Adjacent BL
CWBL α-particles
WL
leakage CS electrode
Ccross OpeOpenn BitBit-linlinee ArchitectureArchitecture —CCrossross CoupCouplingling
EQ
WL 1 WL 0 WL D WL D WL 0 WL 1 CWBL CWBL BL BL
CBL Sense CBL C C C Amplifier C C C FlddFolded-Bitline Archit ect ure
WL 1 WL 1 WL 0 WL 0 WL D WL D CWBL
BL CBL x y
… Sense C C C C C C EQ Amplifier
x y BL CBL CWBL Transposed-Bitline Architecture Ccross BL 9 BL SA BL BL 99 (a) Straightforward bit-line routing
Ccross BL 9 BL SA BL BL 99
(b) Transposed bit- line architecture Alpha-particles (or Neutrons)
α-partilicle
WL V DD BL SiO 2 n+ 1 2 2 1 2 1 2 1 2 1 2 1
1 Particle ~ 1 Million Carriers Yield
Yie ld curves a t differen t s tages of process mat urit y Redundancy
Row Redundant Address rows Fuse : Bank
Redundant erer columns dddd Memory Array Deco Deco w www RoRo
Column Decoder Column Address Error-Correcting Codes
Example: Hamming Codes
e.g. B3 Wrong with 1
1 = 3 0 Redundancy and Error Correction Data Retention in SRAM
1.30u
1.10u 0.13 μm CMOS 900n
700n
akage (A) 500n Factor 7 le I
300n 0.18 μm CMOS
100n
0.00 .600 1.20 1.80
VDD
SRAM leakagggye increases with technology scaling. Suppressing Leakage in SRAM
VDD low-threshold transistor VDD VDDL sleep VDD,int sleep VDD,int
SRAM SRAM SRAM cell cell cell SRAM SRAM SRAM cell cell cell
V sleep SS,int
Inserting Extra Resistance Reducing the supply voltage Conclusions
The memory architecture has a major impact on the ease of use of the memory, its performance, power consumption, reliabilityyy and yield. The memory cell should be designed so that a maximum signal is obtained with a minimum area. While the cell design is mostly dominated by technological considerations, a clever circuit design can help maximize the signal value and transient response. The peripheral circuitry is essential to operate the memory in a reliable way and with a reasonable performance given the weak signals from the cells. Issues related to decoders,,p,, sense amplifiers, IO buffers, and voltage generations are all very critical. Conclusions
The memory must operate correctly over a variety of operating and manufacturing conditions. To increase the integgyygyration density, memory designers may often give up on signal-to-noise ratio. This makes the design vulnerable to a whole range of noise signals whic h are norma lly less o f an issue in log ic des ign. Identifying the potential sources of malfunction and providing an appropriate model is the first requirement when addressing memory reliability. Circuit precautions to deal with potential malfunctions include redundancy and error correction. Conclusions
The field of memory design is a dynamic and exciting specialty: Coordinated efforts from marketing and planning , process design, device design, circuit design, test & production engineering, and software engineering are all needed. Manyyppp innovative approaches are possible in ever y design stage. The market competition is fierce, but the winner is awarded with a big prize. The leading memory technologies are being pioneered by domestic companies/engineers. Conclusions
Think flexible and think “big.”
Succeed as an engineer.
sElf-Motivated eNergetic
Self-manaGed Insightful Engineer = iNnovative Eye on data Execute
Rewards