Semiconductor Memories: I T D Ti an Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Semiconductor Memories: an ItIntrod ucti on Talk Overview MTdMemory Trend Memory Classification Memory Architectures The Memory Core PihPeriphery Reliability Semiconductor Memory Trends (up to the 90’s ) Memory Size as a function of time: x 4 every three years Semiconductor Memory Trends (Upd at ed Furth er Beyond) Trends in Memory Cell Area Growth in DRAM Chip Capacity 1000000 256, 000 100000 64,000 yy 16,000 10000 pacit 4,000 aa 1000 1,000 Kbit c Kbit 256 100 64 10 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 Year of introduction Semiconductor Memory Classification Non-Volatile Read-Write Memory Read-Write Read-Only Memory Memory Random Non-Random EPROM Mask-Programmed Access Access 2 E PROM Programmable (PROM) DRAM FIFO FLASH SRAM LIFO MRAM Shift Register PRAM RRAM CAM Memory Timing: Definitions Read cycle READ Write cycle Read access Read access WRITE Write access Data valid DATA Data written Memory Architecture: Decoders Intuitive architecture for n x m memory Too many select signals: N words == N select signals m bits m bits S S0 Word 0 0 Word 0 S S1 Word 1 1 Word 1 S2 S2 Word 2 Storage A0 Word 2 Storage Cell S3 Cell S3 A1 Ak-1 S Sn-2 n-2 Word n-2 Word n-2 S Sn-1 n-1 Word n-1 Word n-1 Input/Output Input/Output n words → n select signals Decoder reduces # of inputs k = log2 n ArrayArray--StructuredStructured Memory Architecture 2k-j bit line word line Aj A j+1 storage (RAM) cell Ak-1 m2j A 0 selects appropriate word A1 Column Decoder from memory row Aj-1 Sense Amplifiers amplifies bit line swing Read/Write Circuits Input/Output (m bits) Hierarchical Memory Architecture Block 0 Block i Block P 21- Row address ClColumn address Block address Global data bus Control Block selector Global circuitry amplifier/driver I/O Advantages: 1. Shorter wires within blocks for reduced local transit times 2. Block address activates only 1 block for power savings Block Diagram of 4 Mbit SRAM Clock Z-address X-address generator buffer buffer Predecoder and block selector Bit line load cccc cccc row dec row dec al row al row dede al row de al row de llll b bbb glo glob -glo -glob GlobaGloba SubSub- SubSub- Transfer gate Column decoder Sense amplifier and write driver Local row dec CS, WE I/O x1/x4 Y-address X-address buffer buffer controller buffer buffer Memory Timing: Approaches Address bus Row Address Column Address RAS Address Bus Address Address transition CAS initiates memory operation RAS-CAS timing DRAM Timing: SRAM Timing: MltilMultiplexed dAd Adressi ng SlfSelf-time d Read-Only Memory Cells BL BL BL VDD WL WL WL 1 BL BL BL WL WL WL 0 GND Diode ROM MOS ROM 1 MOS ROM 2 MOS OR ROM BL[0] BL[1] BL[2] BL[3] WL[0] VDD WL[1] WL[2] VDD WL[3] Vbias Pull-down loads MOS NOR ROM VDD Pull-up devices WL[0] GND WL [1] WL [2] GND WL [3] BL [[]0] BL [[]1] BL [[]2] BL [[]3] MOS NAND ROM VDD Pull-up devices BL[0] BL[1] BL[2] BL[3] WL[0] WL[1] WL[2] WL[3] All word li nes hi gh b y d ef ault with excepti on of sel ect ed row Equiva le nt Tra ns ie n t M odel f or M OS N OR R OM VDD BL r WL word Cbit cword Word line parasitics Wire capacitance and gate capacitance Wire resistance (polysilicon) Bit line parasiiitics Resistance not dominant (metal) Drain jjgunction and gate-drain overlappp capacitance Equivalent Transient Model for MOS NAND ROM VDD BL CL rbit c r bit WL word cword Word line parasitics Similar to NOR ROM Bit line parasitics Resistance of cascaded transistors dominates. Drain/source and complete gate capacitances Decreasing Word Line Delay Drive the word line from both sides driver driver polysilicon word line WL metal word line Use a metal bypass polysilicon word line WL metal bypass Use silicides Precharged MOS NOR ROM V f pre DD Precharge devices WL[0] GND WL[1] WL[2] GND WL[3] BL[0] BL[[]1] BL[[]2] BL[3] PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design. NonNon--VolatileVolatile Memories The Float ing-gate transistor (FAMOS) Floating gate Gate D Source Drain tox G tox S n+ p n+_ Substrate Device cross-section Schematic symbol Floating-Gate Transistor Programming 20 V 0 V 5 V 10 V 5 V 20 V 5 V 0 V 2.5 V 5 V S D S D S D Avalanche injection Removing programming Programming results in voltage leaves ch arge tdtrapped. hig her VT. A “Programmable-Threshold” Transistor I D “0”-state “1”-state “ ON” ΔDVT “ OFF” VWL VGS Floating-Gate Tunneling Oxide (FLOTOX) EEPROM Floating gate Gate I Source Drain 20–30 nm -10 V VGD 10 V + + nn1 p n1 Substrate 10 nm Fowler-Nordheim FLOTOX transistor I-V characteristic EEPROM Cell BL WL Absolute threshold control is hard, and non- ppgrogrammed transistor might be in depletion. VDD Ö 2-transistor cell (one serving as the access transistor) Flash EEPROM Control gate Floating gate erasure Thin tunneling oxide +1 n+1 nn source programming n drain p-substrate Erasure using Fowler-Nordheim tunneling is performed in bulk for the complete chip or in a sub-section of the memory. Cross-sections of NVM cells Flhlash EPROM Courtesy Intel Basic Operations in a NOR Flash Memory― Erase cell array BL 0 BL 1 G 12 V 0 V WL 0 S D 12 V 0 V WL 1 open open Basic Operations in a NOR Flash Memory― Write 12 V BL 0 BL 1 G 6 V 12 V WL 0 S D 0 V 0V0 V WL 1 6 V 0 V Basic Operations in a NOR Flash Memory― Read BL BL 5 V 0 1 G 1V1 V 5 V WL 0 S D 0 V 0V0 V WL 1 1V1 V 0V0 V NAND Flash Memory Word line(poly) Unit Cell Gate ONO Gate FG Oxide Source line (Diff. Layer) Courtesy Toshiba NAND Flash Memory Select transistor Word lines Active area STI Bit line contact Source line contact Read-Write Memories (RAM) STATIC (SRAM) Data stored as long as power supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended 6-transistor CMOS SRAM Cell WL VDD M 2 M 4 Q Q M M 5 6 M 1 M 3 BL BL CMOS SRAM Analysis (Read) WL VDD M BL 4 BL Q = 0 M Q 1 6 M 5 = V VDD M1 DD VDD Cbit Cbit CMOS SRAM Analysis (Read) 121.2 1 V) (( V 0.8 Δ 0.6 se, ii 0.4 ltage R 0.2 oo V 0 0 0.5 1 1.2 1.5 2 2.5 3 Cell Ratio (CR) CMOS SRAM Analysis (Write) WL VDD M4 Q = 0 M6 M 5 Q = 1 M1 VDD BL = 1 BL = 0 CMOS SRAM Analysis (Write) W / L PR 4 4 W6 / L6 Resistance-load SRAM Cell WL VDD RL RL QQ M3 M 4 BL BL M 1 M 2 Static power dissipation -- Want R L large Bit lines precharged to VDD to address tp problem 3-Transistor DRAM Cell BL1 BL2 WWL RWL WWL M 3 RWL M 1 X X V DD 2-V T M 2 V DD CS BL 1 ΔD V BL 2 V DD 2- V T No constraints on device ratios Reads are non-destructive. Value stored at node X when writing a “1” = VWWL-VTn 1-Transistor DRAM Cell BL WL Write 1 Read 1 WL M 1 X X GND VDD 2- VT CS VDD BL V /2 V /2 DD sensing DD CBL Write: CS is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance Voltage swing is small, typically around 250 mV. C ------------S ΔV ==VBL – VPRE (V X – V PRE ) CS + CBL DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line due thto charge redi ditibtistribution read-out. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. DRAM memory cells are single-ended in contrast to SRAM cells. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writingg,g a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD. Sense Amplifier Operation VBL V(1) VPRE DΔV(1) V(0) Sense amp activated t Word line activated 1-T DRAM Cell Capacitor M1 word line Metal word line SiO2 Poly n+ n+ Field Oxide Diffuse d bit line Inversion layer Poly Polysilicon induced by Polysilicon plate bias gate plate Cross-section Layout Uses Polysilicon-Diffusion Capacitance Expensive in Area Advanced 1T DRAM Cells Word line Capacitor dielectric layer Insulating Layer Cell plate Cell Plate Si Transfer gate Isolation Refillinggy Poly Capacitor Insulator Storage electrode Storage Node Poly Si Substrate 2nd Field Oxide ThCllTrench Cell Stacked Cell Row Decoders Collection of 2M complex logggic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder Hierarchical Decoders Multi-stage implementation improves performance ••• WL 1 WL 0 A 0A 1 A 0A 1 A 0A 1 A 0A 1 A 2A 3 A 2A 3 A 2A 3 A 2A 3 ••• NAND decoder using 22--inputinput prepre--decodersdecoders A 1 A 0 A 0 A 1 A 3 A 2 A 2 A 3 Dynamic Decoders PhPrecharge didevices GND GND VDD WL3 VDD WL3 WL WL 2 2 VDD WL1 WL 1 VDD WL0 WL 0 VDD φ A0 A0 A1 A1 A0 A0 A1 A1 φ 2-input NOR decoder 2-input NAND decoder 4-to-1 tree based column decoder BL 0 BL 1 BL 2 BL 3 A 0 A 0 A1 A 1 D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combinati on o f t ree an d pass t ransi st or approach es Sense Amplifiers make ΔVassmallV as small C×ΔV as possible (make the SA tp = ---------------- Iav as sensitive as possible) large small Idea: Use Sense Amplifer small transition s.a.