<<

Using MRAM in an Intelligent (IMH)

Shinobu Fujita Senior Fellow Toshiba Corporation

MRAM Developer Day 2018 Santa Clara, CA 1 Our previous works(1) Embedded Memory Integration (31nm MTJ in 65nm CMOS)

1Mbit 4Mb(65nm) 1Gb(28nm) MTJ Last Process for embedded Logics

2T-2MTJ Ikegami(Toshiba), IEDM 2014.2015

Access time < 4ns 2 Noguchi(Toshiba), VLSI circuit symposium, 2013, 2014 Our previous works(2) Demonstration: Last Level Active Energy Reduction Measured cache energy

Cache energy is reduced further by over 90% of SRAM-based one.

Cache-ARM CPU STT-MRAM Last Level Cache demonstration H. Noguchi et. al, Toshiba , ISSCC 2016 ISSCC 2016 Outline

. Introduction : Key Points of Intelligent Memory Hierarchy (IMH)

. IMH case study 1: Last level cache (LLC) memories with eMRAM

. Rethink requirement of eMRAM for LLC

. IMH case study 2: Persistent memories with eMRAM

. Summary

4 STT-MRAM potential in memory hierarchy

Fast write speed Near- Memory High endurance

CPU core Far- Memory L1

L2/3

LLC SRAM Storage time Class STT-MRAM Near Memory Memory DRAM Far Memory Storage

SSD SCM

SSD/HDD

5 https://www.i-micronews.com/manufacturing-report/product/emerging-non-volatile-memory-2017.html

Low speed Before after High speed & High density (>40ns) 2019 2020 (<20ns) TAM ~$ 1B TAM ~ $ 4B or more Simply NVM replacement is not Intelligent Memory Hierarchy…

Traditional trend HW/SW co-design based of memory hierarchy new memory hierarchy with I/F Power supply CPU CPU core core Volatile ~nsec resistor Volatile Architecture cache Paradigm shift ~sec Intelligent Main memory On/Off Judgment Nonvolatile Volatile ~msec File storage Nonvolatile OS/API

I/F sensor I/F network I/F

Application oriented NVM adaptation (FeRAM, MRAM, Ultra-fast STTRAM) Intelligent Memory Hierarchy (IMH) should have - Nonvolatile/Volatile Hybrid IMH - Intelligent Point 1 7 (Breakeven Time aware) IMH Intelligent Memory Hierarchy (IMH) should be Point 2 changed from applications to applications. IMH There should be “only one storage (Long retention Point 3 NVM) at the bottom of IMH”.

S.Fujita, SSDM 2016 9 Outline

. Introduction

. Intelligent Memory Hierarchy (IMH) for near-future applications from IoT edge to Cloud

. IMH case study 1: Last level cache (LLC) with eSTT-MRAM

. Rethink requirement of LLC-MRAM

. IMH case study 2: Persistent memories with eSTT-MRAM

. Summary

10 Trend of cache memory for processors

Capacity of Cache Memory in CPU is increasing, which increases standby power of processors! L1/L2 SRAM

L1/L2/L3 SRAM

L1/L2/L3/L4 SRAM (+eDRAM)

More cache, •Increase performance by increasing cache capacity. •Multi-core. Larger Area… More Leakage Power.. 11 ICICDT 2016, Tutorial 2 eSTT-MRAM / High-density and Scaling..then to LLC

Mature process and technology has been presented. 1Mb- Full Function@40nm CMOS (Qualcomm) 8Mb- Full Function@28nm CMOS (Samsung) 4Gb- Full Function@29nm DRAM-Tr. (SKH&Toshiba) 256Mb@40nm DDR3/4 shipping, 1Gb sample (Everspin) 30nm MTJ demonstration for Last Level Cache- MRAM (TDK Headway) TSMC, GF, Samsung…. eMRAM manufacturing on FDSOI

http://semimd.com/blog/2018/07/18/emerging-memory-types-headed-for-volumes/ Break even time for utilizing nonvolatile memory

S. Fujita, IMW2015 13 Intelligent Memory Hierarchy (IMH) for HP- Conventional

All Nonvolatile

IMH = Volatile/Nonvolatile Hybrid !!

14 Outline

. Introduction

. Intelligent Memory Hierarchy (IMH) for near-future applications from IoT edge to Cloud

. IMH case study 1: Last level cache (LLC) with eSTT-MRAM

. Rethink requirement of LLC-MRAM

. IMH case study 2: Persistent memories with eSTT-MRAM

. Summary

15 Ex. of SRAM LLC (last level cache) several MB ・Read Write random access time:~2ns ・Power(Write power): ~10fJ/bit ・Retention: N.A. ・Area: 200 to 400 f2 MRAM should not compete with SRAM! Let’s RETHINK requirement of MRAM for Last Level Cache! ・Read Write Speed ・Power(Write power) ・Endurance ・Retention ・Area

Shinobu Fujita Toshiba ・Error Rate 16 Access frequency of cache memory and main memory

Cache Cache Cache miss rate Miss rate Miss rate CPU L2 ~10% Once L1 ~10% ~10% Core 20 ns Cache 30~40 ns 100~500 ns per Cache Last Level Cache 3 ns (LLC) (should be MRAM) Base clock Main Memory 1GHz (DRAM)

Time Time Time

Higher WER than that of conv. SRAM is acceptable for MRAM-LLC, since main memory DRAM can cover it. CPU performance simulation for comparison between SRAM-LLC and MRAM-LLC 1.2

1

0.8

0.6

0.4 性能(高いほど高性能)

0.2

0 gcc lbm mcf milc omnetpp soplex sphinx3 xalancbmk average GemsFDTD libquantum

L3-1MB-2ns/2ns-WB10E L3-1MB-10ns/2ns-WB0E L3-1MB-10ns/10ns-WB0E L3-1MB-10ns/10ns-WB10E L3-1MB-10ns/35ns-WB0E L3-1MB-10ns/35ns-WB10E L3-1MB-10ns/35ns-WB100E CPU performance is affected by read latency of Last Level Cache. Relative CPU performance Relativeperformance CPU (Instruction per second, a.u.) CPU performance is not largely affected by write latency (<~20ns), since write process is not on the CPU pipeline and write latency(<~20ns) is also covered by write buffer.18 Influence of MRAM read access time on CPU performance

4 MRAM

3 SRAM < 3% 2 @5ns

1

0 0 10 20 30

Average CPU Time CPU per(ns) Instruction Average Read Access Time (ns) S. Fujita et al., ASP-DAC 2015 Comparison of Read latency between SRAM and MRAM having Mb capacity

10

9

8

7

6

5 SRAM MRAM 4

3

2

Read latencyRead(ns) 1

0 1 10 100 1000 Memory capacity (Mb)

S. Fujita et al, NVMSA2017 Let’s RETHINK requirement of MRAM for Last Level Cache! ・Read Write Speed ・Power(Write power) ・Endurance ・Retention ・Area ・Error Rate

21 How to reduce Write Energy with STT-MRAM?(1)

(2) e-STT-MRAM Write Pulse Time Power Write Power Write

(1) Conventional SRAM/eDRAM

Power Time RAM active power Power RAM static power Time

Time Energy = “Write pulse time x Write power” should decreased! How to reduce Write Energy with STT-MRAM?(2)

Static/Active Static/Active duty ratio=1:50 duty ratio=1:100

MRAM Higher Power MRAM than SRAM

SRAM=1 128MB

Lower Power than SRAM Relative Average Power for L2 Cache Memory

Access Time of MRAM (ns) S. Fujita, NVMSA2017 Conventional PG to Normally-off / Instant-On with STT-MRAM-LLC

Power Conventional Power Gating Active Power On Active Frequent CPU stop Power gating Leakage Power gating Leakage Leakage (SRAM) Deep-Power-Down-State Time Power Gating with STT-MRAM Power

Reduced Power Instant On Better Performance Frequent CPU stop Power gating Deep-Power-Down-State Power gating Reduced Leakage Power Time (Normally-off) 24 Real measurement of cache Active/Static state; “CPU cores very frequently STOP shortly even while the application is running! “

STOP

Real time monitoring of 8 CPU cores.

3D graphics application is running.

25 S. Fujita, ICICDT2016 CPU cores very frequently STOP shortly even while the application is running!

Mobile Benchmark Software Average Real time CPU/cache Measurement Real time Measurement Data

(Movie software) 1: CPU Active Sum of CPU state time 2: CPU Standby 3: CPU Sleep/Deep-Sleep

26 Let’s RETHINK requirement of MRAM for Last Level Cache! ・Read Write Speed ・Power(Write power) ・Endurance ・Retention ・Area ・Error Rate

27 CPU simulations (Total Write times)

Max: ~3 x 1010/ Year

4 core CPU, 3GHz, 8MB LLC, CPU simulator (Gem5) 28 Trade off : High Endurance > 1012 & High-speed write<20ns

1018

1015

1012

109 Endurance 106 H. Noguchi, eSTT-MRAM ISSCC 2015

103 0 10 20 30 40 Write Pulse (ns) Write pulse>40ns : practically unlimited endurance Write pulse<20ns : limited endurance affect LLC/CPU long term reliability Let’s RETHINK requirement of MRAM for Last Level Cache! ・Read Write Speed ・Power(Write power) ・Endurance ・Retention ・Area ・Error Rate

30 Retention is controllable for each application.

110 Long retention 100 90 80 [2] 70 60 Short retention/ Lower Iw 50 40 D 30 [1] 20 Toshiba [3] 10 0 0 50 100 MTJ diameter [nm] Retention requirement for LLC

1.E+00 80 Chip failure : 1000FIT 70 1.E-01 80% D 60 1.E-02 50% 50 1.E-03 Necessary Necessary 10MB, 80C Average lifetime[s]Average 40 256KB, 80C 1.E-04 256KB, 80C, SECDED 30 1.5L2 cache 2.5L3 cache 3.5 1.E-01 1.E+01 1.E+03 1.E+05 1.E+07 1.E+09 (LLC) Measured average cache data life for various workloads. Red line shows reference data distribution for 1MB L2 cache A. Jog et Required [s] al., DAC2012, pp. 243-252. Relatively low is acceptable for LLC. D 32 Let’s RETHINK requirement of MRAM for Last Level Cache! ・Read Write Speed ・Power(Write power) ・Endurance ・Retention ・Area ・Error Rate

33 Expected cell area reduction with e-STT-MRAM in 22nm CMOS and beyond

6T SRAM 0.092um2 (Intel, ISSCC2012) eRAM Cell are comparison (22nm/16nm CMOS) 190F2@F=22nm

SRAM @22nm

2T-2MTJ STTMRAM @22nm Slow Fast MTJ >5ns 0.037um2 = 76F2@F=22nm MTJφ=35nm Iw=50uA 6T SRAM 0.092um2 eDRAM 0.029um2 Ra=6Ωum2 (Intel, ISSCC2012) (Intel, ISSCC2013) DRAM@22nm

Expected area reduction of SRAM with MRAM x 50% or less with 2T-2MTJ (Fast), x 25% or less with 1T-1MTJ (Slow) Low write current is needed for small

16nm LP-CMOS K. Ikegami, K. Abe, S. Fujita et al., IEDM2015 MTJ should be scaled with Metal pitch. MTJ scaling for eMRAM on CMOS tech. node 100 80 60 40 20 0 28 nm 20 nm 16 nm 7 nm 5 nm ~30nm MTJ for 5nm CMOS TSMC, VLSI symposium 2018 M1 pitch MTJ size (x nm MTJ is Not needed!!)

Samsung, IEDM2016

GF, VLSI symposium 2018 Let’s RETHINK requirement of MRAM for Last Level Cache! ・Read Write Speed ・Power(Write power) ・Endurance ・Retention ・Area ・Error Rate

37 Improve Error tolerance of L2 cache

TAG Index

TAG Array Data Array K. Ikegami, H. Noguchi, S. Fujita et al., IEDM2015

SRAM eMRAM ECC (ex. SEC- correctable Cache To CPU DED) Hit

uncorrectable Cache To Main Miss Memory DRAM RER/WER~1e9 in eMRAM is acceptable for a case study. Requirement for eMRAM for LLC

Item

Read /Write Speed < 10ns / 20ns(Write Pulse) These 3 realized Endurance > 1x1012 at the same time: Active Energy(Write Energy) < 50fJ Most Challenging! Retention > order of minute @1ppm Area Less than half of SRAM area Error Rate <1x10-9

39 How to meet requirement for LLC? (1)

1.0E-02 [3] General [4] MTJ, p-MTJ [2] Higher energy 1.0E-03 [6] [1] compared with SRAM SRAM compatible

1.0E-04Toshibazone 2014 [7] Toshiba 2016

AdvancedToshiba 2012 Kitagawa(Toshiba),IEDM2012 28~35nm

Writecurrent(A) P-MTJ 1.0E-05 Saida(Toshiba),Intermag2014 22nm 1.E-10 1.E-09 1.E-08 1.E-07 Saida(Toshiba), VLSI 2016, 2017 16nm to 1xnm…

Programming(mA) current Programming time (nsec) QUALCOMM, TDK-Headway &TSMC, Samsung, GF, Write time (s) Mature MTJ tech. for small MTJs (30~40nm)

[1] Sony corp. IEDM (2005) • Improvement of MgO break down voltage [2] New York univ. APPLIED PHYSICS LETTERS 97, 242510 (2010) [3] Cornel Univ. APPLIED PHYSICS LETTERS 95, 012506 (2009) • Reduction in R of MTJ [4] Minnesota univ. J. Phys. D: Appl. Phys. 45, 025001 (2012). A [6] IBM corp. Appl Phys Lett 98, 022501 (2011). [7] TDK-Headway Applied Physics Express 5 093008 (2012) • Improvement of Iw/D • Small MTJ to reduce Iw with low WER 40 How to meet requirement for LLC? (2)

New architecture: NAND-flash-like-MRAM; Voltage-Control Spintronics MRAM (VoCSM) 1018

1015

1012

109 Endurance 106

103 0 10 20 30 40 H. Yoda et al., IEDM 2016 High Speed Write & Write Pulse (ns) High Endurance VoCSM can also reduce Write Energy ( = Iw x tw x Vdd )

1.0E-02 [3] General [4] MTJ, p-MTJ [2]

1.0E-03 [6] [1] SRAM compatible zone 1.0E-04 [7] Powerdown VoCSM

Programming current (mA) current Programming 1.0E-05 1.E-10 1.E-09Higher speed 1.E-08 1.E-07 Programming time (nsec) Measured Iw of VoCSM

Y. Ohsawa et al., EDTMC, 2018 42 Outline

. Introduction

. Intelligent Memory Hierarchy (IMH) for near-future applications from IoT edge to Cloud

. IMH case study 1: Last level cache (LLC) with eSTT-MRAM

. Rethink requirement of LLC-MRAM

. IMH case study 2: Persistent memories with eSTT-MRAM

. Summary

43 Conventional persistent memory with NVDIMM

DRAM=NVM DRAM=0 DRAM<NVM NVM only (3D-Xpoint) (3D-Xpoint) Conventional persistent memory NVDIMM with battery

High Cost Large A new persistent with eSTT-MRAM

64GB

Processor DRAMDRAMDRAMDRAM SSD eMRAM(1Gb) 1TB

Dirty Line Data CPU (Page) Last Level Main Memory Cache NAND (SSD)

OS / (Memory/Disk address) PCIe eMRAM <Control 2> Backup into SSD <Control 3> When unexpected power down <Control1> Write back (Background/PCIe). Data in MRAM can be used. 46 ALL dirty line replica into MRAM Write into main memory. All write back data simulated • Single Core – 3.2GHz/3-way-OoO • Cache memory ) – 32KB-L1D/ 32KB-L1I

x x 64B – 1MB-LLC ( 1.00E+06 • Main memory 16GB/DDR4 9.00E+05 8.00E+05 7.00E+05 1Gb e-MRAM can persist data in 6.00E+05 16GB DRAM. 5.00E+05 4.00E+05 3.00E+05 2.00E+05 1.00E+05 0.00E+00

47 Interval of write back simulated 20

18

16 GemsFDTD 14 gcc 12 lbm )

% libquantum 10 mcf Ratio( 8 milc

6 omnetpp soplex 4 sphinx3 2 xalancbmk

0 10 20 30 40 50 60 70 80 90 Interval of write back (ns)

Less than 30ns write latency is required. (MRAM is only NVM solution.) can compensate write latency (~100ns)

2500

2000 ←2KB

1500

1000

500 Write BufferCapacity(B)

0 eMRAM requirement for persistent memory

Items Requirement

Write latency < 100ns

Read Latency < 1us

Endurance > 1x1010

Retention ~1 week

Mem. Capacity > x Gb Challenging! 50 Comparison of persistent memory

Persistent Cost Virtual Latency Frequency Processor Memory memory for Performance Organization Capacity Persistency (Ex. In-Memory Data base) eMRAM Low 16GB DDR ~100ns /DRAM  compatible  /NAND 1TB (Proposed) DRAM16GB/ Very High 1TB ~1 us ~1 us Enhanced 3DXp 1TB   (Conventional1)

DRAM16GB/ High 16GB DDR ~ms NAND 1TB  compatible (conventional2) Summary  3 Key Points of Intelligent Memory Hierarchy (IMH)  Nonvolatile/Volatile Hybrid & Intelligent Power Management.  IMH should be changed from applications to applications.  There should be “only one storage (Long retention NVM) at the bottom of IMH.  IMH case study 1: Last level cache (LLC) memories with eMRAM  MRAM/SRAM hybrid cache memory hierarchy considering break even time ware designs  Rethinking requirement of eMRAM for LLC  Based on our deepest analysis, the requirements have been cralified.  High speed write & High endurance & low Iw ; the most difficult to realize.  Largely improved STT-MRAM or new architecture VoCSM is expected.  IMH case study 2: Persistent memories with eMRAM  Several Gb eMRAM can be proposed as a novel Persistent Memory.  STT-MRAM can cover the requirement but Gb density is a challenging.  Lower cost and higher performance compared with conventional NVDIMM- based ones. 52 Thank you for your kind attention!

Acknowledgements This work was partly supported by the ImPACT Program of the Council for Science, Technology and Innovation (Cabinet Office, Government of Japan).

Co-workers (Toshiba R&D center) Kazutaka Ikegami, Susumu Takeda, Satoshi Shirotori, Naoharu Shimomura, Hiroaki Yoda, Tomoaki Inokuchi, Katsuhiko Koi, Hideyuki Sugiyama, Yuichi Ohsawa, and Atsushi Kurobe

53