Emerging Memories and Pathfinding for the Era of sub-10nm System-on-Chip

Seung Kang Qualcomm Technologies, Inc.

IEEE Solid-State Circuits Society Seminar San Diego, CA August 8, 2019

1 Memory Is Big Business >> $100 Billions*

https://www.dw.com/ http://www.icinsights.com/news/bulletins/Total-Memory-Market-Forecast-To-Increase-10-In-2017/

* Not including embedded memories for AP, SOC, and MCU

2 Memory Subsystem Hierarchical memory layers

Computing

RF

On-chip

Off-chip Cache Cost Main Memory

Local Storage

Remote Storage

3 Memory Subsystem There is no such thing like a

RF Speed; SRAM Endurance

“Embedded” DRAM

DRAM Density; Flash (SSD), HDD Retention

Flash (SSD), HDD, Tape

4 Problem Statement 1 "Memory Wall" Overall system performance & power governed more by memory subsystem than by CPU subsystem

SOC, AP MCU Computing- ROM CPU RF ROM OTP/MTP CPU SRAM centric External L1 OTP/MTP eFlash Flash Custom L2 SRAM L3 Cache GPU Embedded Memory

Cost Standalone DRAM

Flash Storage/SSD Data-centric HDD

5 Problem Statement 2 Many-Core Processors Increasing SRAM area & leakage power overhead

Shared L3 CacheShared 25 Mbytes of L3 cache L3 Cache (60 Mbytes for 24 cores)

Intel Broadwell-E (14nm node)

• Datacenter applications projecting 120 Mbytes (960 Mb) L3 cache at 10nm and beyond. • More expensive at advanced nodes (6T-SRAM: 550 F2 at 7 nm vs. 150 F2 at 40 nm) • High standby/leakage power (worse at high T)

6 Problem Statement 3 IOT & Embedded System Inherent drawbacks caused by memory limitations

• Energy-hungry • Poor form factor • High cost • Security vulnerability

“The IOT is an NVM problem.” Greg Yeric, ARM (2015 IEDM Plenary Talk)

7 A New Perspective on Energy Efficiency New Demand and Criteria for Wearable and Bioelectronic Devices

Critical Challenge: Battery Life (Energy Efficiency)

8 A New Perspective on Security & Privacy Demand for secure memory and HW primitives (e.g. PUF)

Endpoint Gateway Cloud

? ?

?

9 Problems, new requirements, and opportunities demand advanced memories…

10 Memory Classification

Device Type

Volatile Memory Nonvolatile Memory

SRAM Charge Modulation Resistance Modulation

DRAM Flash FRAM PCM MRAM RRAM

2D/3D STT- MRAM Ox-RAM NAND

NOR SOT/SHE CB-RAM

Field MRAM VMCO

CNT Mature (mainstream or commoditized) Mott Emerging (currently in small markets) Transition

11 Phase Change Memory PCM PC-RAM PRAM

12 PCM: Early History

Neale, Nelson, & Moore, Electronics, 1970

“Nonvolatile and reprogrammable, the read-mostly memory is here” • Density: 256 • Die Size: 122-by-131-mil (10.3 mm2) • Read: 2.5 mA, < 5 V • Set: 5 mA, 25 V, 10 ms • Reset: < 200 mA, 25 V, 5 µs

13 PCM: Basic Concept Phase-change Element

Amorphous Crystalline

High R Source: (2006) Low R

• Chalcogenide alloy (e.g. Ge-Sb-Te/GST)) T > melting point • Programming: Joule heating followed by natural cooling • Relatively simple physics! T > crystallization T

14 PCM: Cell and Array Architecture Cell = Access Device + Phase-change Element

1BJT-1R

1FET-1R 1Diode-1R

The required characteristics of access FET, diode, or BJT are largely governed by the upper limit of the reset current (to drive localized melting) at a target cell size. Cross-bar Array

15 PCM: Evolution of Cell Configuration Improve thermal isolation

Source: H.-L. Lung (ITRS ERD, 2014)

>90% of heat is wasted during reset Lower reset current/power Improved endurance & retention

16 PCM: Reliability Cycling Endurance Chen et al. (Macronix-IBM, IMW, 2009)

0 cycles 10 cycles 10K cycles 1M cycles

Updoped GST

0 cycles 1K cycles 100M cycles 1B cycles

Doped GST

17 PCM: Reliability Retention

Shih et al. (Macronix-IBM, IEDM, 2008)

18 PCM: Prototype Samsung 8Gb PCM (ISSCC, 2012)

4.2F2

19 PCM: Evolution to 3D

• PCMS • Phase-change memory (PCM) coupled with a selector (OTS) • OTS: Ovonic Threshold Switch • 64 Mb • Endurance: 106 cycles

Kau et al. (Intel & Numonyx, IEDM, 2009) Intel Optane Memory Series (2017) 3D XPoint (Intel & Micron, 2016) Chip Density 16 GB (128 Gb) 32 GB • 20nm node Read Latency 7 s 9 s • 128 Gb Write Latency 18 s 30 s • SLC Random Read 190K IOPS 240K IOPS Random Write 35K IOPS 65K IOPS Selector Sequential Read 900 MB/s 1350 MB/s Sequential Write 145 MB/s 290 MB/s Memory Power (Active/Idle) 3.5 W / 1 W Endurance 182.5 TB (Lifetime Writes)

Source: Intel.com 20 3D XPoint as Storage Class Memory

It does not replace DRAM, or NAND storage, but it adds a new layer to improve the subsystem

Source: Intel-Micron, 2015

21 Magnetoresistive RAM MRAM Spin-transfer-torque MRAM STT-MRAM ST-MRAM STT-RAM

22 A Building : Magnetic Tunnel Junction Multiple flavors, but perpendicular MTJ

Free Layer Electrical resistance varied by Tunnel Barrier relative spin alignment Pinned Layer : (MR)

Parallel Antiparallel

Low Resistance (RP) High Resistance (RAP)

Relatively small read window

Electrical switching, not magnetic switching

23 MRAM Snapshot A new class of memory: Nonvolatile RAM

• Fast NVM • High endurance • 3 additional masks over baseline logic • Low voltage (no ) • Scalable

Operation voltage on MTJ Read: 0.1 V Write: 0.3 − 0.5 V

Lu et al. (Qualcomm & TDK) Park et al. (Qualcomm & Applied Mat.) IEDM, 2015 IEDM, 2015

24 MRAM Array Architecture

Write Driver

MTJ Array Reference Generator Read SA

MUX

Rref Rref Ref Ref BL0 SL0 BL31 SL31 BL0 SL0 BL1 SL1 SLDP (local data path)

MTJ MTJ MTJ MTJ

wl<0> MTJ MTJ MTJ MTJ wl<1>

MTJ Array MTJ MTJ MTJ MTJ wl<510> MTJ MTJ MTJ MTJ wl<511>

Ref MTJ array Data MTJ array

2IOs+Ref Use the same bitcell for both data and reference array • Small read window → Design for robust read (sensing) is critical • Balancing switching asymmetry and source generation

25 Challenges for MRAM Design and Reliability Narrow design window for deeply scaled nodes

Prevent read error

▪ Low VRead (0.1V) ▪ High TMR ▪ Fast fall off of RDR slope

Prevent write error

▪ Low VWrite ▪ Fast fall off of WER slope

Improve barrier reliability

▪ High VBD ▪ Contain TDDB

26 MRAM Device Scalability: Ic

Most important bitcell and design parameter

)

µA Critical Critical Switching Current( MTJ Diameter (nm) Kang, VLSI Symp., 2014 Saida et al., VLSI Symp., 2016 At small dimensions, dynamic current consumption becoming comparable with that of SRAM cell current

27 MRAM Device Scalability: Endurance Practically unlimited endurance for cache applications

Kan et al., IEDM, 2016

101.E+1313 101.E+2222 55.E+14×1014 30k30000 (-) AP-P (-) Polarity, 50 ns Pulse (+) P-AP 101.E+1212 25000 101.E+1818 (-) 1 ppm 55.E+10×1010 (+) 1 ppm 25 nm 20k20000 101.E+1111 101.E+1414 10 years, 50% 55.E+06×106 Duty Cycle 15000

101.E+1010 101.E+1010 55.E+02×102 10k10000

L2 SRAM (256 KB) 45 nm Resistance Resistance (Ohms)

Endurance Requirement Endurance 1.E+099 L2 MRAM (1024 KB) 10 1.E+066 5.E-02-2 Breakdown to (sec) Time

Cycles Breakdown Cycles (cycles) Breakdown 10 5×10 L3 SRAM (1.5 MB) 5000 L3 MRAM (6 MB) 8 101.E+08 101.E+022 55.E-06×10-6 00 0 25 50 75 100 0.5 0.75 1 1.25 1.5 1 1.5 2 Millions of accesses per core per MTJ Voltage (V) MTJ Voltage (V) second Intrinsically solid Better with MTJ scaling In real life, subjected to design robustness & defect control

28 MRAM: Prototypes Samsung (IEDM, 2016 / 7th MRAM Global Innovation Forum)

SK Hynix- (IEDM, 2016 / ISSCC, 2017)

4Gb 9F2 (30nm)

29 MRAM: Qualcomm Demo System

MRAM integrated along with PSRAM and NOR Flash for performance and power benchmarking

Integrated into a demo tablet 350X faster than Flash 3X faster than PSRAM

Kang, IMW, 2016

MRAM can unify PSRAM (volatile RAM) and NOR (nonvolatile storage) with PPAC advantages

30 MRAM In Production

31 MRAM for Processing-in-Memory CNN Accelerator A single-chip solution for Mobile and IOT applications

From Gyrfalcon Technologies (2018)

• 22nm eMRAM (40 MB) • 9.9 TOPS/W

32 Resistive RAM RRAM ReRAM Conductive Bridge RAM CB-RAM

33 RRAM: Materials Two-terminal resistive switching elements (excluding PCM and MRAM). Found in numerous combinations of materials.

Source: P. Wong (Stanford, 2011)

34 RRAM: Common Classification Different materials & switching characteristics

Top Electrode Top Electrode Top Electrode Metal Ion Tunnel Barrier Metal Reservoir Conductive Oxide Solid Metal Oxide Electrolyte Bottom Electrode Bottom Electrode Bottom Electrode

Oxide RRAM (Ox-RAM) Conductive Bridge RRAM (CB-RAM) Conductive Metal Oxide RRAM Transition Metal Oxide RRAM Programmable Metallization Cell (PMC) Vacancy Modulated Conductive Oxide RRAM (VMCO RRAM)

Interfacial Switching (2D) Filamentary Switching (1D) Uniform Switching (No forming) 35 RRAM: Switching

+ - + Initial State Forming Reset Set (Very High R) (Low R) (High R) (Low R)

Top Electrode Top Electrode Top Electrode Top Electrode Metal Oxide

Bottom Electrode Bottom Electrode Bottom Electrode Bottom Electrode

- + -

Current Observation of a filament Current

Kwon et al. Nature Nanotechnology (2010) Voltage Voltage Bipolar Switching Unipolar Switching

36 RRAM: Cell and Array Architecture

P. Wong (Stanford) 1T-1R Sheu at al. 1D-1R (VLSI Symp., 2008) (Diode selector for unipolar RRAM)

1D-1R/1S-1R Lee et al. (IEDM, 2007) (Stacked Cross Point Array) Yoon et al. (VLSI Symp., 2009) 3D Vertical Cross Point RRAM

37 RRAM: Variability Temporal and spatial variability

Sills et al. (VLSI Symp., 2014) Jurczak (ITRS ERD, 2014)

Resistance Variation vs. Switching Current Write Speed vs. Read Margin

38 RRAM: Reliability Endurance & Retention

Sills et al. (VLSI Symp., 2014)

Wei et al. (IEDM, 2011) 256 Kbit array baked at 150oC for 1000 hours

39 RRAM: Prototypes SanDisk-Toshiba RRAM (ISSCC, 2013)

T.-Y. Liu et al. (JSSCC, 2014) • By far, largest density RRAM test chip • Relatively slow performance (NAND Flash alternative)

40 RRAM: Prototype Micron- CB-RAM (ISSCC, 2014)

• Target application: storage class memory • Endurance target: >106 cycles • Raw BER o Endurance <3X10-5 at 106 cycles o Retention: <2X10-4 at 10 years, 70oC, 104 cycles Acceptable for SCM? o Read disturb: <2X10-5 at 106 reads

41

Nature v.453, p.80 (2008)

L.O. Chua, IEEE Trans. Circuit Theory 18, p.507 (1971)

RRAM (and also MRAM and PCM) may show memristic behaviors (analog memory characteristics)

42 Ferroelectric Memory FRAM FeRAM Ferroelectric FET FeFET

43 Conventional FRAM Perovskite crystals (PZT, SBT) Internal electric dipole reversibly switchable by electric field

Pb(Zr , Ti )O PZT: lead zirconate titanate x 1-x 3 SBT: strontium bismuth tantalate

1T-1C (C=FeCAP) Kim et al. (IEDM, 2005) Ramtron (2012) 44 Conventional FRAM In production, but not scaling beyond 130nm Fundamental scaling limit requires 3D FeCAP (very challenging)

U. Bottger and S.R. Summerfelt, Ferroelectric RAM, from and Information Technology (ed. by R. Waser) Koo et al. (IEDM, 2006)

45 FeFET Polarization of ferroelectric layer over the Si channel modulates the threshold voltage (Vth) → 1T FRAM

On (“1”) Off (“0”)

Challenges • Required perovskite configuration difficult to integrate • Data retention (→ FeDRAM?) o Depolarization field o Leakage

Ma (IMW, 2014)

46 FeFET: Renewed Hope for Scaling

Orthorhombic phase of HfO2

SBT: SrBi2Ta2O9 (perovskite)

Cheng & Chin (EDL, 2014) Muller et al. (VLSI Symp. 2012)

High endurance Good retention at limited retention (<103 sec) at limited endurance (<105) 47 Hype? Promise? Reality? Opportunities?

PCM… MRAM… RRAM… FRAM…

48 Emerging Memory Reality Check There is no universal memory Opportunities in tunability (system differentiation, user experiences)

Ideally In Reality 100F2 1ns 1ns 4F2 >105h Density Performance 1015 cycles (Cell Size) Endurance vs. 2 Universal Memory 4F Switching Energy 1000ns

1h Retention 105h (Energy Barrier) 49 Positioning Emerging Memory Need to understand the application space

Lee, Kan, and Kang, ISLPED, 2014

High performance & good endurance Lowest cost per bit & high density

Low cost, high density, and intermediate performance

Low cost & long battery life (fast cycle & low leakage)

Anti-tampering & atomic operation

Low cost & good reliability

50 Emerging Memory Pathfinding for Sub-10nm CMOS

MRAM as an example because of its NV-RAM attributes and recent advances at major IC manufacturers

5th CIES Forum 51 CMOS Logic Scaling Intrinsic FinFET scaling is limited Logic scaling is about standard cell architecture innovation

Source: A. Steegen, 2018 ITF Belgium

52 Parasitic R & C Impact

MOL and BEOL parasitic R & C causing more delays than intrinsic delay

Negatively impacting essentially all types of resistance-based memory designs (MRAM, RRAM, PCM)

53 SRAM Scaling Relatively more expensive at advanced nodes FinFET SRAM near the end of scaling

High-density 6T SRAM: 550F2 at 7nm Expect 1000F2 at 5nm F: node number

Kang & Park, IEDM 2017

54 MRAM Pathfinding as an SRAM Alternative Reduce last-level-cache area and energy consumption

A 22nm case study by Toshiba

55 Cell Design Challenge: Supply Current CMOS supply current much smaller at advanced nodes → Requiring low switching current and low MTJ resistance

28nm 7nm

2 Jc = 4.41 MA/cm

Park et al., VLSI Symp. 2018

56 Reliability Challenge: Endurance Intrinsic endurance practically unlimited However, endurance sensitive to switching voltage & MgO TDDB

Need TDDB test

Common memory applications < 1012

Kan et al., IEDM 2016 & TED 2017

Smaller MTJ → Higher Vbd 57 Cell Architecture Pathfinding for 7nm

2T-1MTJ

1T-1MTJ

fin

fin

2 2 P 3 3 P 2 PM1 2 CPP

SL MTJ pitch → 85-90 nm PO BL MTJ CD → 30-35 nm PO BL

SL Bitcell (X,Y): (2PM1, 2Pfin) Area (2-fin cell): 140 F2 Bitcell (X,Y): (2CPP, 3Pfin) MRAM:SRAM → 0.25X (for area) Area (6-fin cell): 210 F2 MRAM:SRAM → 0.35X (for performance)

58 Prospect

IOT Wearables Security Automotive eNVM (code & data) Mobile AP ML/AI Last-level Cache Datacenter High-density SRAM In Production High-BW RAM Research

28nm 22nm 7nm 5nm

Kang, 2014 VLSI Symp. & 2019 CIES Tech Forum

59 From Research to Commercialization

Semiconductor devices typically require >10 years of R&D (e.g. FinFET)

Can you stay in the game?

Groundbreaking Laboratory Circuit IP Design Product Design / Pilot Demonstration Prototyping System Production / Production Technology and Integration Early Adopter Discovery in Functional device Functional IP qualification Facing “the physics, materials array (fully functional Product Early market Chasm” science and reliable pathfinding & over P-V-T) qualification Any fundamental showstopper?

60 Thank You.

For questions and feedbacks, contact [email protected]

61