Quick viewing(Text Mode)

Can MRAM Be a Factor for HPC ?

Can MRAM Be a Factor for HPC ?

IC Power Consumption

ITRS roadmap Can MRAM be a factor for HPC ?

) 2 (W/cm

1. Introduction 2. Can MRAM help ? 3. Which MRAM ?

Logic is the major issue !

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

High Performance Computing Memory Wall

Current HPC ! Pétaflops (1015 flop/s) Memory vs. CPU speed mismatch : Logic keeps awaiting !

>MW overall operating power consumption + Same amount for cooling

> 100m² area

VOLATILE CPU Processor NON VOLATILE

Registers MEMORY L1 Cache (SRAM) 4 to 32 KB ~ns L2 Cache (SRAM) Up to 512 KB ~10 ns L3 Cache (SRAM / eDRAM) 4 to 8MB ~30 ns WORKING MEMORY

Random Access Memory (DRAM) >GB ~100 ns

SOLID STATE MEMORY

Non Volatile (Flash)

VIRTUAL MEMORY

Storage memory (HDD) Towards Exaflop (1018 Flops/s) requires drastic increase of compactness and energy efficiency ! Logic issue is becoming a memory issue …

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Logic Power Losses

Static Loss : Current leakage Dynamic Loss : Interconnects capacitance and Joule heating Gate-Channel tunneling

Source-Drain leakage (direct tunneling) Can MRAM be of any help ?

Technology 90nm 65nm 45nm 32nm 22nm # 107 108 109 3.109 1010 Wire length ~10km ~30km ~ 100km ~ 300km …

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

The (Memory) Holy Grail Why MRAM ?

Non-volatile Dense Fast High Endurance like Flash like DRAM Like SRAM like SRAM / DRAM 10+ years retention 10F2, small overheads ~10ns in normal mode 1012 cycles, up to 1016 " Non-Volatile to save data while logic OFF " Low active power " Fast enough to match logic speed " “Infinitely” endurant to act as cache " Easy to embedd within logic " With minimal wire length to minimize dynamic (RC) loss " Single technology to answer multiple needs (RAM, ROM, Store) MRAM is not the best but … Can replace SRAM at 1/6th of size, zero leakage

5 Can replace e-Flash at >10 x speed, lower power Can replace DRAM (if running out of steam)

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres An easy to embed memory

- « End-of-back-end » process MRAM (above-IC) cell

- Cell R compatible with CMOS (~ kΩ )

- Vdd driven switching

- No charge pumps required

- No trade-off with logic process

- Cheap (only 3 add-masks) CMOS Logic

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

MRAM Cache “Janus” architecture

Option 1 : DRAM & L2,L3 cache replacement @ same overall architecture Option 2 : Memory blocks distributed within (above) logic core(s) Logic-In Memory concept - First introduced in 1969

VOLATILE

NON VOLATILE CPU CPU Processor Processor Processor Chip Registers NV-Registers CACHE MEMORY CACHE MEMORY L1 Cache (SRAM) L1 (SRAM/MRAM) L2 Cache (SRAM) L2 Cache (MRAM) L3 Cache (SRAM / eDRAM) Level 3 Cache (MRAM) WORKING MEMORY WORKING MEMORY

Random Access Memory (DRAM) Random Access Memory (MRAM) • Reduced silicon footprint SOLID STATE MEMORY SOLID STATE MEMORY • Multiple / short interconnects Non Volatile (Flash) Non Volatile (Flash) • Distributed memory within logic VIRTUAL MEMORY ! Faster memory-logic communication Storage memory (HDD) Storage memory (HDD) Reduced dynamic power Advanced power management Reduced Static power, (NV cache, no DRAM refresh) High data resilience

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres eVaderis STT-based MCU for the IoT Hybrid CMOS-MRAM logic

Option 3 : Non-volatility inside logic blocks (NV-Flip-flop, NV-latch, …) Battery Wireless

10% VOLATILE

Controle Sensor Performances, Intelligence Autonomy CPU r CPU (computing, amount of data) (battery life,CO2) NON VOLATILE Processor Processor

NV-Registers NV-Registers CACHE MEMORY CACHE MEMORY L1 (SRAM/MRAM) L1 (SRAM/MRAM) Connected Object L2 Cache (MRAM) L2 Cache (MRAM) Level 3 Cache (MRAM) Level 3 Cache (MRAM) WORKING MEMORY NVRM WORKING MEMORY Off-Chip Processing-Storage Random Access Memory (MRAM) Energy “everywhere” Random Access Memory (MRAM) SOLID STATE MEMORY SOLID STATE MEMORY Non Volatile (Flash) Non Volatile (Flash)

VIRTUAL MEMORY Non-volatile data-centric control processor VIRTUAL MEMORY Energy Storage memory (HDD) On-Chip Processing-Storage Storage memory (HDD) Near-zero standby 10 to 100X less energy translates into Fast save / restore of logic states extended lifetime, more intelligence, less ! « Normally-OFF / Instant-ON” Computing CO2

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

eVaderis STT-based MCU NV Flip-Flop

User Case 19 Mb MRAM + 1Mb SRAM 5 Jan 2009 • 106 scalar measures • 32 uncompressed 320x240 grayscale pictures (security standard)

Divide by 3 to 100 (depends on data profile) Balance of RF gain and on-chip process cost

Divide by 105 >1000 X faster then Flash

Divide by 10 to 100 10 to 100 X faster (parallelized boot from distributed memory blocks)

Divide by 10 to 100 Relative Energy RelativeEnergy Instant on ! full shutoff (no more sleep/deep sleep states)

Near Zero

Send Store Wake-up Sleep Standby

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Rad-Hard NV Look-up Table There Are Many MRAMS !

Field-driven STT (SPRAM) DW motion " DRAM%based+Configura3on+memory+ Toggle Planar Perpendicular

" Periodic+refresh+of+DRAM+using+MRAM+ content+(scrubbing)+

" Advantages:++ " High+density+(DRAM)+ " No+redundancy+required+ OST SOT " Shadowed+reconfigura3on+ Thermally Assisted (TAS) STT-TAS (Precessional) (Spin-Orbit Torque) " Low+power+(non%vola3le)+

" Implemented+ on+ hybrid+ TowerJazz+ 130nm+CMOS+/+Crocus+MRAM+process+

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

The Magnetic Tunnel Junction (MTJ)

Hard mask Giant&(Tunneling)&Magnetoresistance& Ta Ac#ng&on¤t&through&magne#za#on& Etch stop layer Which MRAM ? Ru Parallel+“0”++%++Low+R+ Capping layer Ta

Storage layer CoFeB An3parallel+“1”+%+High+R+ Tunnel barrier (MgO) Reference layer CoFeB Spacer (Ta)

Pinning layer R (Pt/Co)n Ru (MgO(Pt/Co) R − R n 1Kb 1Mb 16Kb 4Mb 64Mb TMR = ↑↓ ↑↑ Ferrite core Bubble memory AMR Toggle STT R ↑↑ Seed layer Control Data Corp Honeywell Everspin Hynix Pt (1965) (1980) (1984) (2004) (2010) Smoothing layer Ta/CuN/Ta H

Base electrode

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres MRAM (Read) Why STT MRAM ?

Field-driven MRAM STT MRAM

1T-1R architecture Logic state = Magnetization (resistance) state

Smallest cell size Lowest current

0 10 Full scalability

-1 Rmin Rmax 10 Address -2 10 Count

-3 10 >25σ

Normalised -4 10 Data out (Rhigh) -5 10 2X

-6 10 0 100 200 300 400 500 600 700 800 900 1000 Data ref Resistance

Data out (Rlow) Figure of merit is ΔR/σ not ΔR

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

Field-Driven MRAM What is STT ?

Spin&Transfer&Torque&(STT)& Gilbert Ac#ng&on&magne#za#on&through¤t& Damping Use current pulses to generate overlapping Field term magnetic fields at word/ line crosspoint (precession) Spin torque Current%only+ (antidamping) switching++ (no+field)+ OFF

Large cell size (30F²) High power (2x16 mA / bit) dM dM = −γM × H + bI.M +γaI.M × M × M +αM × Low speed (35ns R/W) dt ( eff p ) ( p ) dt Not scalable Zeeman Field-torque Spin-torque Gilbert Damping

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Why perpendicular ? STT-MRAM is now becoming real !

Q3, 2014

Q3, 2014

t Δ Barrier to switching − τ ± kBT Thermal activation Switching probability P± =1− e τ = τ 0e -9 τ0 = 10 s

Keff V Ks1 + Ks2 Thermal stability factor plan 2 perp ( ) Δ = K eff = Kv − 2π M s K eff = + Kv kBT t

2 ! 4e$ αk T plan ! 4e$ αk T ! π M V $ perp B Critical current j B s jc = # & Δ c = # & #Δ + & " ! % g(0)pA " ! % g(0)pA " kBT % Q4, 2014

Forum ORaP 2 Avril 2015 α = damping JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres P = polarization A = Area g(0)~1

P-STT MRAM Demo How Fast Can STT-MRAM Be ?

" % 1 kBT τ Ic − Ic0 ∝ Ic − Ic0 = $1− ln ' τ # E τ 0 &

50ns 10ns

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Thermally Activated Switching Precessional STT demo

Marins de castro Sousa et al, Liu et al, APL97, Journal of Applied Physics 111 (2012) 07C912 242510 (2010)

!" !!" !" !!" • Stochastic reversal 1.0 Γ = a j M ×(P × M ) • Incubation time preceding a large 0.8 thermal fluctuation 0.6

0.4 Devolder et al., Phys. Rev. Let. vol 100 (2008) 500mV 562mV 0.2 631mV centered bias 631mV AP bias Switching Probability Switching 631mV P bias

708 0.0 794mV

0 2 4 6 8 10 Pulse width (ns) Transmitted Transmitted voltage (mV) voltage ! In-plane precession ! Ultrafast deterministic switching. Time after pulse (ns) ! ultra low power (switching with 90 fJ)

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

Precessional STT (Orthogonal Spin Torque – OST) Spin Orbit Torque (SOT) MRAM

2 STT contributions M+ → → HEFF~&M×HR& → → → → → → → → → AlOx&2&nm& d M α d M Co&0.5+nm& M H eff M a M (A M ) a M (P M ) = −γ 0 × + × + jA × × + jP × × Pt&3&nm& dt M S dt

HR+ Reference&& Layer+ A →&STT&from&reference&layer&A&:&& MgO &&&&&Bipolar&switching&of&free&layer&magne>za>on& "++3%Terminals+ Free&& Layer++ "Infinite+Endurance+/+Reliability++ MgO →&STT&from&Perpendicular&polarizer&P:&& &&&&&Precession&of&free&layer&magne>za>on& Perpendicular&& "+Independent+Read+and+Write+paths+ Polarizer++ P o+Adjustable+Impedance++ -If a >>a (perpendicular Polarizer dominates) jP jA o+Maximized+TMR+ Fast+Read+ → Steady Precessions (e.g. RF devices) o+No+read+disturb+

-If a >>a (in-plane Analyzer dominates) jA jP "+High+speed+?+ →+Bipolar non-oscillatory switching (e.g. memory)

30 Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres SOT Fast Switching

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

Moore’s Law

SRAM cell size

Core technology End-Users

22nm (2011) 14nm (2014) 32nm (2009) 2 0.1 µm2 0.06 µm 45nm (2007) 0.17 µm2 65nm (2005) 0.35 µm2 0.6 µm2 90nm (2003) 1 µm2

Models Tools IDM

Foundries

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Memory Hierarchy

(Almost) All Non-Volatile Data Path !

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres

Technology enabler : The STT-RAM MCU / SoC Implementation

10-9

10-10 MRAM+performances+may+be+tuned+by+shape+/+size+(same+core+technology)+

10-11 ! Replace&simultaneously&mul>ple&memory&instances&

10-12

10-13 Energy(J/bit) 10-14 1018

10-15 1015 10-16 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 1012 Write speed (sec)

109

Endurance(#cycles) 106

103 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 Write speed (sec)

Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres NV LUT silicon demonstrator

" Hybrid&TowerJazz&130nm&CMOS&/&CrocusVMRAM&process& & " Digital&Test&at&Spintec& " MRAM&programming&and&input&transferred&to&DRAM&& " All&inputs&combina>ons&tested&and&corresponding&output&checked&

Forum ORaP 2 Avril 2015 JP.Nozieres

Speed

Process

Power

Retention

Forum ORaP 2 Avril 2015 JP.Nozieres