Storage Class Memory Towards a Disruptively Low-Cost Solid-State Non-Volatile Memory
Total Page:16
File Type:pdf, Size:1020Kb
Storage Class Memory Towards a disruptively low-cost solid-state non-volatile memory Science & Technology Almaden Research Center January 2013 Storage Class Memory Power & space in the server room The cache/memory/storage hierarchy is rapidly becoming the bottleneck for large systems. We know how to create MIPS & MFLOPS cheaply and in abundance, but feeding them with data has become the performance-limiting and most-expensive part of a system (in both $ and Watts). Extrapolation to 2020 (at 70% CGR need 2 GIOP/sec) • 5 million HDD . 16,500 sq. ft. !! . 22 Megawatts R. Freitas and W. Wilcke, Storage Class Memory: the next storage system technology –"Storage Technologies & Systems" special issue of the IBM Journal of R&D (2008) 2 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory …yet critical applications are also undergoing a paradigm shift Compute-centric Data-centric paradigm paradigm Main Focus: Solve differential equations Analyze petabytes of data Bottleneck: CPU / Memory Storage & I/O Typical Examples: Computational Fluid Dynamics Search and Mining Finite Element Analysis Analyses of social/terrorist networks Multi-body Simulations Sensor network processing Digital media creation/transmission Environmental & economic modeling Extrapolation (at 90% CGR need 1.7 PB/sec) (at 90% CGR need 8.4G SIO/sec) to 2020 • 5.6 million HDD • 21 million HDD . 19,000 sq. ft. !! . 70,000 sq. ft. !! [Freitas:2008] . 25 Megawatts . 93 Megawatts 3 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory Problem (& opportunity): The access-time gap between memory & storage Access time... ...(in human perspective) 1980 (in ns) (T x 109) 1 CPU operations (1ns) second Decreasing ON-chip CPU memory 10 Get data from L2 cache (<5ns) co$t Get data from DRAM/SCM (60ns) minute OFF-chip 100 memory RAM 103 hour ON-line 104 storage 105 day OFF-line 106 week storage month 107 Read or write to DISK (5ms) year DISK 108 decade 109 century 1010 Get data from TAPE (40s) millenium TAPE • Modern computer systems have long had to be designed around hiding the access gap between memory and storage caching, threads, predictive branching, etc. • “Human perspective” – if a CPU instruction is analogous to a 1-second decision by a human, retrieval of data from off-line tape represents an analogous delay of 1250 years 4 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory Problem (& opportunity): The access-time gap between memory & storage Access time... (in ns) 1980 Today 1 CPU operations (1ns) Decreasing ON-chip memory 10 Get data from L2 cache (<5ns) CPU CPU co$t Get data from DRAM/SCM (60ns) OFF-chip 100 memory 103 Memory/storage gap RAM RAM ON-line 104 storage Read a FLASH device (20 us) 105 FLASH SSD OFF-line 106 Write to FLASH, random (1ms) storage 107 Read or write to DISK (5ms) 108 DISK DISK 109 1010 Get data from TAPE (40s) TAPE TAPE •Today, Solid-State Disks based on NAND Flash can offer fast ON-line storage, and storage capacities are increasing as devices scale down to smaller dimensions… …but while prices are dropping, the performance gap between memory and storage remains significant, and the already-poor device endurance of Flash is getting worse. 5 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory Problem (& opportunity): The access-time gap between memory & storage Access time... (in ns) Near-future 1 CPU operations (1ns) Decreasing ON-chip memory 10 Get data from L2 cache (<5ns) CPU co$t Get data from DRAM/SCM (60ns) OFF-chip 100 memory 103 Memory/storage gap RAM ON-line 104 storage Read a FLASH device (20 us) 105 SCM OFF-line 106 Write to FLASH, random (1ms) storage Read or write to DISK (5ms) 107 DISK 108 109 1010 Get data from TAPE (40s) TAPE Research into new solid-state non-volatile memory candidates – originally motivated by finding a “successor” for NAND Flash – has opened up several interesting ways to change the memory/storage hierarchy… 1) Embedded Non-Volatile Memory – low-density, fast ON-chip NVM 2) Embedded Storage – low density, slower ON-chip storage 3) M-type Storage Class Memory – high-density, fast OFF- (or ON*)-chip NVM 4) S-type Storage Class Memory – high-density, very-near-ON-line storage * ON-chip using 3-D packaging 6 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory Storage-type vs. memory-type Storage Class Memory 100s Speed (Latency & Bandwidth) NAND Storage-type Memory-type 10s uses uses Power! (Write) Cost/bit 1s Endurance 4F2 Read Latency F F 100ns DRAM Cell size [F2] 10ns 2 4 6 8 10 low co$t The cost basis of semiconductor processing is well understood – the paths to higher density are 1) shrinking the minimum lithographic pitch F, and 2) storing more bits PER 4F2 7 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory M-type: Synchronous S-type vs. M-type SCM • Hardware managed •Low overhead • Processor waits Internal • New NVM not Flash CPU DRAM • Cached or pooled memory • Persistence (data survives despite Memory component failure or loss of power) requires Controller redundancy in system architecture I/O SCM Controller ~1us read latency SCM S-type: Asynchronous • Software managed • High overhead SCM • Processor doesn’t wait, Storage (process-, thread-switching) Controller • Flash or new NVM Disk • Paging or storage External • Persistence RAID 8 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory Competitive Outlook among emerging NVMs Future NAND applications (consumer devices, etc.) Future NOR applications • 3-D NAND (but crossover to succeed 20nm (program code, etc.) conventional NAND may require >50 layers!) • PCM (but market disappearing) • PCM?/RRAM? High Speed Embedded Storage (low density, slower ON-chip storage) S-type Storage Class Memory (high-density, very-near-ON-line storage) • NAND? (but complicated process) 1) PCM?/RRAM? • RRAM?/PCM? 2) Racetrack? (future?) M-type Storage Class Memory (high-density, fast OFF- (or ON*)-chip NVM) • CBRAM? STT-RAM? Embedded Non-Volatile Memory • PCM?/RRAM? (low-density, fast ON-chip NVM) • Racetrack? (future?) • STT-RAM? CBRAM? Low co$t * ON-chip using 3-D packaging 9 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory Device Paths towards SCM Availability Co$t Capital investment Applications 3-D NAND NAND Future NAND applications (consumer devices, etc.) unlikely, but possible path 1-10us Embedded Storage S-type SCM emerging NVM (low density, (high-density, RRAM? PCM? slower ON-chip storage) near-ON-line storage) CBRAM? ¿1us Embedded M-type SCM emerging NVM (high-density, Non-Volatile Memory fast OFF-(or ON*) STT-RAM? CBRAM? (low-density, fast ON-chip NVM) -chip NVM) PCM??/RRAM?? * ON-chip using 3-D packaging DRAM Future DRAM (working memory, etc.) 10 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory NVM candidates for SCM 1) NVM element • Improved FLASH • Magnetic Spin Torque Transfer STT-RAM Magnetic Racetrack • Phase Change RAM • Resistive RAM 2) High-density access device (A.D.) • 2-D – silicon transistor or diode • 3-D higher density per 4F2 NVM memory element • polysilicon diode (but <400oC processing?) plus access device • MIEC A.D. (Mixed Ionic-Electronic Conduction) • OTS A.D. (Ovonic Threshold Switch) Generic SCM Array • Conductive oxide tunnel barrier A.D. 11 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory Limitations of Flash 100000 52000 Asymmetric performance 17000 10000 Writes much slower than reads 10000 2000 3000 1000 Program/erase cycle IOPS Block-based, no write-in-place 100 49 Data retention and Non-volatility 10 USB disk LapTop Enterprise Retention gets worse as Flash scales down Maximum Random Read IOPs Maximum Random Write IOPs Endurance 5 • Single level cell (SLC) 10 writes/cell 1000 • Multi level cell (MLC) 104 writes/cell 200 100 100 60 • Triple level cell (TLC) ~300 writes/cell 40 17 MB/s Future outlook 10 7 • Scaling focussed solely on density • 3-D schemes exist but are complex 1 USB disk LapTop Enterprise Sustained Read Bandwidth Sustained Write Bandwidth 12 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory STT (Spin-Torque-Transfer) RAM • Controlled switching of free magnetic layer in a Plate Line magnetic tunnel junction using current, leading to two distinct resistance states Strengths Word Line • Inherently very fast almost as fast as DRAM • Much better endurance than Flash or PCM • Radiation-tolerant •Materials are Back-End-Of-the-Line compatible • Simple cell structure reduced processing costs Bit Line Weaknesses • Achieving low switching current/power is not easy • Resistance contrast is quite low (2-3x) achieving tight distributions is ultra-critical • High-temperature retention strongly affected by scaling below F~50nm • Tradeoff between fastest switching and switching reliability Outlook: Strong outlook for an Embedded Non-Volatile Memory to replace/augment DRAM. While near-term prospects for high-density SCM with STT-RAM may seem dim, Racetrack Memory offers hope for using STT concepts to create vertical “shift-register” of domain walls potential densities of 10-100 bits/F2 13 Science & Technology – IBM Almaden Research Center Jan 2013 Storage Class Memory word line Phase-change RAM Phase Change Material • Switching between low-resistance crystalline, and high-resistance amorphous phases, controlled through power & duration of electrical pulses insulator Strengths ‘heater’ wire • Very