Using MRAM in an Intelligent Memory Hierarchy (IMH)

Using MRAM in an Intelligent Memory Hierarchy (IMH) Shinobu Fujita Senior Fellow Toshiba Corporation MRAM Developer Day 2018 Santa Clara, CA 1 Our previous works(1) Embedded Memory Integration (31nm MTJ in 65nm CMOS) 1Mbit 4Mb(65nm) 1Gb(28nm) MTJ Last Process for embedded Logics 2T-2MTJ Ikegami(Toshiba), IEDM 2014.2015 Access time < 4ns 2 Noguchi(Toshiba), VLSI circuit symposium, 2013, 2014 Our previous works(2) Demonstration: Last Level Cache Active Energy Reduction Measured cache energy Cache energy is reduced further by over 90% of SRAM-based one. Cache-ARM CPU STT-MRAM Last Level Cache demonstration H. Noguchi et. al, Toshiba , ISSCC 2016 ISSCC 2016 Outline . Introduction : Key Points of Intelligent Memory Hierarchy (IMH) . IMH case study 1: Last level cache (LLC) memories with eMRAM . Rethink requirement of eMRAM for LLC . IMH case study 2: Persistent memories with eMRAM . Summary 4 STT-MRAM potential in memory hierarchy Fast write speed Near- Memory High endurance CPU core Far- Memory L1 L2/3 LLC SRAM Storage time Class STT-MRAM Near Memory Memory DRAM Far Memory Storage SSD SCM SSD/HDD 5 https://www.i-micronews.com/manufacturing-report/product/emerging-non-volatile-memory-2017.html Low speed Before after High speed & High density (>40ns) 2019 2020 (<20ns) TAM ~$ 1B TAM ~ $ 4B or more Simply NVM replacement is not Intelligent Memory Hierarchy… Traditional trend HW/SW co-design based of memory hierarchy new memory hierarchy with I/F Power supply CPU CPU core core Volatile ~nsec resistor Volatile Architecture cache Paradigm shift ~sec Intelligent Main memory On/Off Judgment Nonvolatile Volatile ~msec File storage Nonvolatile OS/API I/F sensor I/F network I/F Application oriented NVM adaptation (FeRAM, MRAM, Ultra-fast STTRAM) Intelligent Memory Hierarchy (IMH) should have - Nonvolatile/Volatile Hybrid IMH - Intelligent Power Management Point 1 7 (Breakeven Time aware) IMH Intelligent Memory Hierarchy (IMH) should be Point 2 changed from applications to applications. IMH There should be “only one storage (Long retention Point 3 NVM) at the bottom of IMH”. S.Fujita, SSDM 2016 9 Outline . Introduction . Intelligent Memory Hierarchy (IMH) for near-future applications from IoT edge to Cloud . IMH case study 1: Last level cache (LLC) with eSTT-MRAM . Rethink requirement of LLC-MRAM . IMH case study 2: Persistent memories with eSTT-MRAM . Summary 10 Trend of cache memory for processors Capacity of Cache Memory in CPU is increasing, which increases standby power of processors! L1/L2 SRAM L1/L2/L3 SRAM L1/L2/L3/L4 SRAM (+eDRAM) <Background> More cache, •Increase performance by increasing cache capacity. •Multi-core. Larger Area… More Leakage Power.. 11 ICICDT 2016, Tutorial 2 eSTT-MRAM / High-density and Scaling..then to LLC Mature process and technology has been presented. 1Mb- Full Function@40nm CMOS (Qualcomm) 8Mb- Full Function@28nm CMOS (Samsung) 4Gb- Full Function@29nm DRAM-Tr. (SKH&Toshiba) 256Mb@40nm DDR3/4 shipping, 1Gb sample (Everspin) 30nm MTJ demonstration for Last Level Cache- MRAM (TDK Headway) TSMC, GF, Samsung…. eMRAM manufacturing on FDSOI http://semimd.com/blog/2018/07/18/emerging-memory-types-headed-for-volumes/ Break even time for utilizing nonvolatile memory S. Fujita, IMW2015 13 Intelligent Memory Hierarchy (IMH) for HP-Processor Conventional All Nonvolatile IMH = Volatile/Nonvolatile Hybrid !! 14 Outline . Introduction . Intelligent Memory Hierarchy (IMH) for near-future applications from IoT edge to Cloud . IMH case study 1: Last level cache (LLC) with eSTT-MRAM . Rethink requirement of LLC-MRAM . IMH case study 2: Persistent memories with eSTT-MRAM . Summary 15 Ex. of SRAM LLC (last level cache) several MB ・Read Write random access time：~2ns ・Power（Write power）: ~10fJ/bit ・Retention: N.A. ・Area: 200 to 400 f2 MRAM should not compete with SRAM! Let’s RETHINK requirement of MRAM for Last Level Cache！・Read Write Speed ・Power（Write power）・Endurance ・Retention ・Area Shinobu Fujita Toshiba ・Error Rate 16 Access frequency of cache memory and main memory Cache Cache Cache miss rate Miss rate Miss rate CPU L2 ~10% Once L1 ~10% ~10% Core 20 ns Cache 30~40 ns 100~500 ns per Cache Last Level Cache 3 ns (LLC) （should be MRAM） Base clock Main Memory 1GHz （DRAM） Time Time Time Higher WER than that of conv. SRAM is acceptable for MRAM-LLC, since main memory DRAM can cover it. CPU performance simulation for comparison between SRAM-LLC and MRAM-LLC 1.2 1 0.8 0.6 0.4 性能(高いほど高性能) 0.2 0 gcc lbm mcf milc omnetpp soplex sphinx3 xalancbmk average GemsFDTD libquantum L3-1MB-2ns/2ns-WB10E L3-1MB-10ns/2ns-WB0E L3-1MB-10ns/10ns-WB0E L3-1MB-10ns/10ns-WB10E L3-1MB-10ns/35ns-WB0E L3-1MB-10ns/35ns-WB10E L3-1MB-10ns/35ns-WB100E CPU performance is affected by read latency of Last Level Cache. Relative CPU performance (Instruction per second, a.u.) second, per (Instruction CPU performanceRelative CPU performance is not largely affected by write latency (<~20ns), since write process is not on the CPU pipeline and write latency(<~20ns) is also covered by write buffer.18 Influence of MRAM read access time on CPU performance 4 MRAM 3 SRAM < 3% 2 @5ns 1 0 0 10 20 30 Average Average Instructionper (ns) CPU Time Read Access Time (ns) S. Fujita et al., ASP-DAC 2015 Comparison of Read latency between SRAM and MRAM having Mb capacity 10 9 8 7 6 5 SRAM MRAM 4 3 2 Read (ns)Read latency 1 0 1 10 100 1000 Memory capacity (Mb) S. Fujita et al, NVMSA2017 Let’s RETHINK requirement of MRAM for Last Level Cache！・Read Write Speed ・Power（Write power）・Endurance ・Retention ・Area ・Error Rate 21 How to reduce Write Energy with STT-MRAM?(1) (2) e-STT-MRAM Write Pulse Time Power Write Write Power (1) Conventional SRAM/eDRAM Power Time RAM active power Power RAM static power Time Time Energy = “Write pulse time x Write power” should decreased! How to reduce Write Energy with STT-MRAM?(2) Static/Active Static/Active duty ratio=1:50 duty ratio=1:100 MRAM Higher Power MRAM than SRAM SRAM=1 128MB Lower Power than SRAM Relative Average Power for L2 Cache Memory CacheL2 for Power Average Relative Access Time of MRAM (ns) S. Fujita, NVMSA2017 Conventional PG to Normally-off / Instant-On with STT-MRAM-LLC Power Conventional Power Gating Active Power On Active Frequent CPU stop Power gating Leakage Power gating Leakage Leakage (SRAM) Deep-Power-Down-State Time Power Gating with STT-MRAM Power Reduced Power Instant On Better Performance Frequent CPU stop Power gating Deep-Power-Down-State Power gating Reduced Leakage Power Time (Normally-off) 24 Real measurement of cache Active/Static state; “CPU cores very frequently STOP shortly even while the application is running! “ STOP Real time monitoring of 8 CPU cores. 3D graphics application is running. 25 S. Fujita, ICICDT2016 CPU cores very frequently STOP shortly even while the application is running! Mobile Benchmark Software Average Real time CPU/cache Measurement Data Real time Measurement Data (Movie software) 1: CPU Active Sum of CPU state time 2: CPU Standby 3: CPU Sleep/Deep-Sleep 26 Let’s RETHINK requirement of MRAM for Last Level Cache！・Read Write Speed ・Power（Write power）・Endurance ・Retention ・Area ・Error Rate 27 CPU simulations (Total Write times) Max: ~3 x 1010/ Year 4 core CPU, 3GHz, 8MB LLC, CPU simulator (Gem5) 28 Trade off : High Endurance > 1012 & High-speed write<20ns 1018 1015 1012 109 Endurance 106 H. Noguchi, eSTT-MRAM ISSCC 2015 103 0 10 20 30 40 Write Pulse (ns) Write pulse>40ns : practically unlimited endurance Write pulse<20ns : limited endurance affect LLC/CPU long term reliability Let’s RETHINK requirement of MRAM for Last Level Cache！・Read Write Speed ・Power（Write power）・Endurance ・Retention ・Area ・Error Rate 30 Retention is controllable for each application. 110 Long retention 100 90 80 [2] 70 60 Short retention/ Lower Iw 50 40 D 30 [1] 20 Toshiba [3] 10 0 0 50 100 MTJ diameter [nm] Retention requirement for LLC 1.E+00 80 Chip failure : 1000FIT 70 1.E-01 80% D 60 1.E-02 50% 50 1.E-03 Necessary 10MB, 80C Average lifetime[s] 40 256KB, 80C 1.E-04 256KB, 80C, SECDED 30 1.5L2 cache 2.5L3 cache 3.5 1.E-01 1.E+01 1.E+03 1.E+05 1.E+07 1.E+09 (LLC) Measured average cache data life for various workloads. Red line shows reference data distribution for 1MB L2 cache A. Jog et Required Data Retention [s] al., DAC2012, pp. 243-252. Relatively low is acceptable for LLC. D 32 Let’s RETHINK requirement of MRAM for Last Level Cache！・Read Write Speed ・Power（Write power）・Endurance ・Retention ・Area ・Error Rate 33 Expected cell area reduction with e-STT-MRAM in 22nm CMOS and beyond 6T SRAM 0.092um2 (Intel, ISSCC2012) eRAM Cell are comparison (22nm/16nm CMOS) 190F2@F=22nm SRAM @22nm 2T-2MTJ STTMRAM @22nm Slow Fast MTJ >5ns 0.037um2 = 76F2@F=22nm MTJφ=35nm Iw=50uA 6T SRAM 0.092um2 eDRAM 0.029um2 Ra=6Ωum2 (Intel, ISSCC2012) (Intel, ISSCC2013) DRAM@22nm Expected area reduction of SRAM with MRAM x 50% or less with 2T-2MTJ (Fast), x 25% or less with 1T-1MTJ (Slow) Low write current is needed for small memory cell 16nm LP-CMOS K. Ikegami, K. Abe, S. Fujita et al., IEDM2015 MTJ should be scaled with Metal pitch. MTJ scaling for eMRAM on CMOS tech. node 100 80 60 40 20 0 28 nm 20 nm 16 nm 7 nm 5 nm ~30nm MTJ for 5nm CMOS TSMC, VLSI symposium 2018 M1 pitch MTJ size (x nm MTJ is Not needed!!) Samsung, IEDM2016 GF, VLSI symposium 2018 Let’s RETHINK requirement of MRAM for Last Level Cache！・Read Write Speed ・Power（Write power）・Endurance ・Retention ・Area ・Error Rate 37 Improve Error tolerance of L2 cache TAG Index TAG Array Data Array K.

Load more