Mobile Memory Technology Team
Total Page:16
File Type:pdf, Size:1020Kb
Emerging Memories and Pathfinding for the Era of sub-10nm System-on-Chip Seung Kang Qualcomm Technologies, Inc. IEEE Solid-State Circuits Society Seminar San Diego, CA August 8, 2019 1 Memory Is Big Business >> $100 Billions* https://www.dw.com/ http://www.icinsights.com/news/bulletins/Total-Memory-Market-Forecast-To-Increase-10-In-2017/ * Not including embedded memories for AP, SOC, and MCU 2 Memory Subsystem Hierarchical memory layers Computing RF On-chip Cache Off-chip Cache Bit Cost Main Memory Local Storage Remote Storage Data 3 Memory Subsystem There is no such thing like a universal memory RF Speed; SRAM Endurance “Embedded” DRAM DRAM Density; Flash (SSD), HDD Retention Flash (SSD), HDD, Tape 4 Problem Statement 1 "Memory Wall" Overall system performance & power governed more by memory subsystem than by CPU subsystem SOC, AP MCU Computing- ROM CPU RF ROM OTP/MTP CPU SRAM centric External L1 OTP/MTP eFlash Flash Custom L2 SRAM L3 Cache GPU Embedded Memory Cost Standalone DRAM Flash Storage/SSD Data-centric HDD 5 Problem Statement 2 Many-Core Processors Increasing SRAM area & leakage power overhead Shared L3 CacheShared 25 Mbytes of L3 cache L3 Cache (60 Mbytes for 24 cores) Intel Broadwell-E (14nm node) • Datacenter applications projecting 120 Mbytes (960 Mb) L3 cache at 10nm and beyond. • More expensive at advanced nodes (6T-SRAM: 550 F2 at 7 nm vs. 150 F2 at 40 nm) • High standby/leakage power (worse at high T) 6 Problem Statement 3 IOT & Embedded System Inherent drawbacks caused by memory limitations • Energy-hungry • Poor form factor • High cost • Security vulnerability “The IOT is an NVM problem.” Greg Yeric, ARM (2015 IEDM Plenary Talk) 7 A New Perspective on Energy Efficiency New Demand and Criteria for Wearable and Bioelectronic Devices Critical Challenge: Battery Life (Energy Efficiency) 8 A New Perspective on Security & Privacy Demand for secure memory and HW primitives (e.g. PUF) Endpoint Gateway Cloud ? ? ? 9 Problems, new requirements, and opportunities demand advanced memories… 10 Memory Classification Device Type Volatile Memory Nonvolatile Memory SRAM Charge Modulation Resistance Modulation DRAM Flash FRAM PCM MRAM RRAM 2D/3D STT- MRAM Ox-RAM NAND NOR SOT/SHE CB-RAM Field MRAM VMCO CNT Mature (mainstream or commoditized) Mott Emerging (currently in small markets) Transition 11 Phase Change Memory PCM PC-RAM PRAM 12 PCM: Early History Neale, Nelson, & Moore, Electronics, 1970 “Nonvolatile and reprogrammable, the read-mostly memory is here” • Density: 256 bits • Die Size: 122-by-131-mil (10.3 mm2) • Read: 2.5 mA, < 5 V • Set: 5 mA, 25 V, 10 ms • Reset: < 200 mA, 25 V, 5 µs 13 PCM: Basic Concept Phase-change Element Amorphous Crystalline High R Source: Samsung (2006) Low R • Chalcogenide alloy (e.g. Ge-Sb-Te/GST)) T > melting point • Programming: Joule heating followed by natural cooling • Relatively simple physics! T > crystallization T 14 PCM: Cell and Array Architecture Cell = Access Device + Phase-change Element 1BJT-1R 1FET-1R 1Diode-1R The required characteristics of access FET, diode, or BJT are largely governed by the upper limit of the reset current (to drive localized melting) at a target cell size. Cross-bar Array 15 PCM: Evolution of Cell Configuration Improve thermal isolation Source: H.-L. Lung (ITRS ERD, 2014) >90% of heat is wasted during reset Lower reset current/power Improved endurance & retention 16 PCM: Reliability Cycling Endurance Chen et al. (Macronix-IBM, IMW, 2009) 0 cycles 10 cycles 10K cycles 1M cycles Updoped GST 0 cycles 1K cycles 100M cycles 1B cycles Doped GST 17 PCM: Reliability Retention Shih et al. (Macronix-IBM, IEDM, 2008) 18 PCM: Prototype Samsung 8Gb PCM (ISSCC, 2012) 4.2F2 19 PCM: Evolution to 3D • PCMS • Phase-change memory (PCM) coupled with a selector (OTS) • OTS: Ovonic Threshold Switch • 64 Mb • Endurance: 106 cycles Kau et al. (Intel & Numonyx, IEDM, 2009) Intel Optane Memory Series (2017) 3D XPoint (Intel & Micron, 2016) Chip Density 16 GB (128 Gb) 32 GB • 20nm node Read Latency 7 s 9 s • 128 Gb Write Latency 18 s 30 s • SLC Random Read 190K IOPS 240K IOPS Random Write 35K IOPS 65K IOPS Selector Sequential Read 900 MB/s 1350 MB/s Sequential Write 145 MB/s 290 MB/s Memory Power (Active/Idle) 3.5 W / 1 W Endurance 182.5 TB (Lifetime Writes) Source: Intel.com 20 3D XPoint as Storage Class Memory It does not replace DRAM, or NAND storage, but it adds a new layer to improve the subsystem Source: Intel-Micron, 2015 21 Magnetoresistive RAM MRAM Spin-transfer-torque MRAM STT-MRAM ST-MRAM STT-RAM 22 A Building Block: Magnetic Tunnel Junction Multiple flavors, but perpendicular MTJ Free Layer Electrical resistance varied by Tunnel Barrier relative electron spin alignment Pinned Layer : Magnetoresistance (MR) Parallel Antiparallel Low Resistance (RP) High Resistance (RAP) Relatively small read window Electrical switching, not magnetic switching 23 MRAM Snapshot A new class of memory: Nonvolatile RAM • Fast NVM • High endurance • 3 additional masks over baseline logic • Low voltage (no charge pump) • Scalable Operation voltage on MTJ Read: 0.1 V Write: 0.3 − 0.5 V Lu et al. (Qualcomm & TDK) Park et al. (Qualcomm & Applied Mat.) IEDM, 2015 IEDM, 2015 24 MRAM Array Architecture Write Driver MTJ Array Reference Generator Read SA MUX Rref Rref Ref Ref BL0 SL0 BL31 SL31 BL0 SL0 BL1 SL1 SLDP (local data path) MTJ MTJ MTJ MTJ wl<0> MTJ MTJ MTJ MTJ wl<1> MTJ Array MTJ MTJ MTJ MTJ wl<510> MTJ MTJ MTJ MTJ wl<511> Ref MTJ array Data MTJ array 2IOs+Ref Use the same bitcell for both data and reference array • Small read window → Design for robust read (sensing) is critical • Balancing switching asymmetry and source generation 25 Challenges for MRAM Design and Reliability Narrow design window for deeply scaled nodes Prevent read error ▪ Low VRead (0.1V) ▪ High TMR ▪ Fast fall off of RDR slope Prevent write error ▪ Low VWrite ▪ Fast fall off of WER slope Improve barrier reliability ▪ High VBD ▪ Contain TDDB 26 MRAM Device Scalability: Ic Most important bitcell and design parameter ) µA Critical Switching Current ( MTJ Diameter (nm) Kang, VLSI Symp., 2014 Saida et al., VLSI Symp., 2016 At small dimensions, dynamic current consumption becoming comparable with that of SRAM cell current 27 MRAM Device Scalability: Endurance Practically unlimited endurance for cache applications Kan et al., IEDM, 2016 101.E+1313 101.E+2222 55.E+14×1014 30k30000 (-) AP-P (-) Polarity, 50 ns Pulse (+) P-AP 101.E+1212 25000 101.E+1818 (-) 1 ppm 55.E+10×1010 (+) 1 ppm 25 nm 20k20000 101.E+1111 101.E+1414 10 years, 50% 55.E+06×106 Duty Cycle 15000 101.E+1010 101.E+1010 55.E+02×102 10k10000 L2 SRAM (256 KB) 45 nm Resistance Resistance (Ohms) Endurance Requirement Endurance 1.E+099 L2 MRAM (1024 KB) 10 1.E+066 5.E-02-2 to Breakdown Time(sec) Cycles Breakdown Cycles Breakdown (cycles) 10 5×10 L3 SRAM (1.5 MB) 5000 L3 MRAM (6 MB) 8 101.E+08 101.E+022 55.E-06×10-6 00 0 25 50 75 100 0.5 0.75 1 1.25 1.5 1 1.5 2 Millions of accesses per core per MTJ Voltage (V) MTJ Voltage (V) second Intrinsically solid Better with MTJ scaling In real life, subjected to design robustness & defect control 28 MRAM: Prototypes Samsung (IEDM, 2016 / 7th MRAM Global Innovation Forum) SK Hynix-Toshiba (IEDM, 2016 / ISSCC, 2017) 4Gb 9F2 (30nm) 29 MRAM: Qualcomm Demo System MRAM integrated along with PSRAM and NOR Flash for performance and power benchmarking Integrated into a demo tablet 350X faster than Flash 3X faster than PSRAM Kang, IMW, 2016 MRAM can unify PSRAM (volatile RAM) and NOR (nonvolatile storage) with PPAC advantages 30 MRAM In Production 31 MRAM for Processing-in-Memory CNN Accelerator A single-chip solution for Mobile and IOT applications From Gyrfalcon Technologies (2018) • 22nm eMRAM (40 MB) • 9.9 TOPS/W 32 Resistive RAM RRAM ReRAM Conductive Bridge RAM CB-RAM 33 RRAM: Materials Two-terminal resistive switching elements (excluding PCM and MRAM). Found in numerous combinations of materials. Source: P. Wong (Stanford, 2011) 34 RRAM: Common Classification Different materials & switching characteristics Top Electrode Top Electrode Top Electrode Metal Ion Tunnel Barrier Metal Reservoir Conductive Oxide Solid Metal Oxide Electrolyte Bottom Electrode Bottom Electrode Bottom Electrode Oxide RRAM (Ox-RAM) Conductive Bridge RRAM (CB-RAM) Conductive Metal Oxide RRAM Transition Metal Oxide RRAM Programmable Metallization Cell (PMC) Vacancy Modulated Conductive Oxide RRAM (VMCO RRAM) Interfacial Switching (2D) Filamentary Switching (1D) Uniform Switching (No forming) 35 RRAM: Switching + - + Initial State Forming Reset Set (Very High R) (Low R) (High R) (Low R) Top Electrode Top Electrode Top Electrode Top Electrode Metal Oxide Bottom Electrode Bottom Electrode Bottom Electrode Bottom Electrode - + - Current Observation of a filament Current Kwon et al. Nature Nanotechnology (2010) Voltage Voltage Bipolar Switching Unipolar Switching 36 RRAM: Cell and Array Architecture P. Wong (Stanford) 1T-1R Sheu at al. 1D-1R (VLSI Symp., 2008) (Diode selector for unipolar RRAM) 1D-1R/1S-1R Lee et al. (IEDM, 2007) (Stacked Cross Point Array) Yoon et al. (VLSI Symp., 2009) 3D Vertical Cross Point RRAM 37 RRAM: Variability Temporal and spatial variability Sills et al. (VLSI Symp., 2014) Jurczak (ITRS ERD, 2014) Resistance Variation vs. Switching Current Write Speed vs. Read Margin 38 RRAM: Reliability Endurance & Retention Sills et al. (VLSI Symp., 2014) Wei et al. (IEDM, 2011) 256 Kbit array baked at 150oC for 1000 hours 39 RRAM: Prototypes SanDisk-Toshiba RRAM (ISSCC, 2013) T.-Y. Liu et al. (JSSCC, 2014) • By far, largest density RRAM test chip • Relatively slow performance (NAND Flash alternative) 40 RRAM: Prototype Micron-Sony CB-RAM (ISSCC, 2014) • Target application: storage class memory • Endurance target: >106 cycles • Raw BER o Endurance <3X10-5 at 106 cycles o Retention: <2X10-4 at 10 years, 70oC, 104 cycles Acceptable for SCM? o Read disturb: <2X10-5 at 106 reads 41 Memristor Nature v.453, p.80 (2008) L.O.