Can MRAM Be a Factor for HPC ?
Total Page:16
File Type:pdf, Size:1020Kb
IC Power Consumption ITRS roadmap Can MRAM be a factor for HPC ? ) 2 (W/cm 1. Introduction 2. Can MRAM help ? 3. Which MRAM ? Logic is the major issue ! Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres High Performance Computing Memory Wall Current HPC ! Pétaflops (1015 flop/s) Memory vs. CPU speed mismatch : Logic keeps awaiting Data ! >MW overall operating power consumption + Same amount for cooling > 100m² area Memory Hierarchy VOLATILE CPU Processor NON VOLATILE Registers CACHE MEMORY L1 Cache (SRAM) 4 to 32 KB ~ns L2 Cache (SRAM) Up to 512 KB ~10 ns L3 Cache (SRAM / eDRAM) 4 to 8MB ~30 ns WORKING MEMORY Random Access Memory (DRAM) >GB ~100 ns SOLID STATE MEMORY Non Volatile (Flash) VIRTUAL MEMORY Storage memory (HDD) Towards Exaflop (1018 Flops/s) requires drastic increase of compactness and energy efficiency ! Logic issue is becoming a memory issue … Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Logic Power Losses Static Loss : Current leakage Dynamic Loss : Interconnects capacitance and Joule heating Gate-Channel tunneling Source-Drain leakage (direct tunneling) Can MRAM be of any help ? Technology 90nm 65nm 45nm 32nm 22nm # transistors 107 108 109 3.109 1010 Wire length ~10km ~30km ~ 100km ~ 300km … Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres The (Memory) Holy Grail Why MRAM ? Non-volatile Dense Fast High Endurance like Flash like DRAM Like SRAM like SRAM / DRAM 10+ years retention 10F2, small overheads ~10ns in normal mode 1012 cycles, up to 1016 " Non-Volatile to save data while logic OFF " Low active power " Fast enough to match logic speed " “Infinitely” endurant to act as cache " Easy to embedd within logic " With minimal wire length to minimize dynamic (RC) loss " Single technology to answer multiple needs (RAM, ROM, Store) MRAM is not the best but … Can replace SRAM at 1/6th of size, zero leakage 5 Can replace e-Flash at >10 x speed, lower power Can replace DRAM (if running out of steam) Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres An easy to embed memory - « End-of-back-end » process MRAM (above-IC) cell - Cell R compatible with CMOS (~ kΩ ) - Vdd driven switching - No charge pumps required - No trade-off with logic process - Cheap (only 3 add-masks) CMOS Logic Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres MRAM Cache “Janus” architecture Option 1 : DRAM & L2,L3 cache replacement @ same overall architecture Option 2 : Memory blocks distributed within (above) logic core(s) Logic-In Memory concept - First introduced in 1969 VOLATILE NON VOLATILE CPU CPU Processor Processor Processor Chip Registers NV-Registers CACHE MEMORY CACHE MEMORY L1 Cache (SRAM) L1 (SRAM/MRAM) L2 Cache (SRAM) L2 Cache (MRAM) L3 Cache (SRAM / eDRAM) Level 3 Cache (MRAM) WORKING MEMORY WORKING MEMORY Random Access Memory (DRAM) Random Access Memory (MRAM) • Reduced silicon footprint SOLID STATE MEMORY SOLID STATE MEMORY • Multiple / short interconnects Non Volatile (Flash) Non Volatile (Flash) • Distributed memory within logic VIRTUAL MEMORY VIRTUAL MEMORY ! Faster memory-logic communication Storage memory (HDD) Storage memory (HDD) Reduced dynamic power Advanced power management Reduced Static power, (NV cache, no DRAM refresh) High data resilience Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres eVaderis STT-based MCU for the IoT Hybrid CMOS-MRAM logic Option 3 : Non-volatility inside logic blocks (NV-Flip-flop, NV-latch, …) Battery Wireless 10% VOLATILE Controle Sensor Performances, Intelligence Autonomy CPU r CPU (computing, amount of data) (battery life,CO2) NON VOLATILE Processor Processor NV-Registers NV-Registers CACHE MEMORY CACHE MEMORY L1 (SRAM/MRAM) L1 (SRAM/MRAM) Connected Object L2 Cache (MRAM) L2 Cache (MRAM) Level 3 Cache (MRAM) Level 3 Cache (MRAM) WORKING MEMORY NVRM WORKING MEMORY Off-Chip Processing-Storage Random Access Memory (MRAM) Energy “everywhere” Random Access Memory (MRAM) SOLID STATE MEMORY SOLID STATE MEMORY Non Volatile (Flash) Non Volatile (Flash) VIRTUAL MEMORY Non-volatile data-centric control processor VIRTUAL MEMORY Energy Storage memory (HDD) On-Chip Processing-Storage Storage memory (HDD) Near-zero standby 10 to 100X less energy translates into Fast save / restore of logic states extended lifetime, more intelligence, less ! « Normally-OFF / Instant-ON” Computing CO2 Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres eVaderis STT-based MCU NV Flip-Flop User Case 19 Mb MRAM + 1Mb SRAM 5 Jan 2009 • 106 scalar measures • 32 uncompressed 320x240 grayscale pictures (security standard) Divide by 3 to 100 (depends on data profile) Balance of RF gain and on-chip process cost Divide by 105 >1000 X faster then Flash Divide by 10 to 100 10 to 100 X faster (parallelized boot from distributed memory blocks) Divide by 10 to 100 Relative Energy Relative Energy Instant on ! full shutoff (no more sleep/deep sleep states) Near Zero Send Store Wake-up Sleep Standby Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Rad-Hard NV Look-up Table There Are Many MRAMS ! Field-driven STT (SPRAM) DW motion " DRAM%based+Configura3on+memory+ Toggle Planar Perpendicular " Periodic+refresh+of+DRAM+using+MRAM+ content+(scrubbing)+ " Advantages:++ " High+density+(DRAM)+ " No+redundancy+required+ OST SOT " Shadowed+reconfigura3on+ Thermally Assisted (TAS) STT-TAS (Precessional) (Spin-Orbit Torque) " Low+power+(non%vola3le)+ " Implemented+ on+ hybrid+ TowerJazz+ 130nm+CMOS+/+Crocus+MRAM+process+ Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres The Magnetic Tunnel Junction (MTJ) Hard mask Giant&(Tunneling)&Magnetoresistance& Ta Ac#ng&on¤t&through&magne#za#on& Etch stop layer Which MRAM ? Ru Parallel+“0”++%++Low+R+ Capping layer Ta Storage layer CoFeB An3parallel+“1”+%+High+R+ Tunnel barrier (MgO) Reference layer CoFeB Spacer (Ta) Pinning layer R (Pt/Co)n Ru (MgO(Pt/Co) R − R n 1Kb 1Mb 16Kb 4Mb 64Mb TMR = ↑↓ ↑↑ Ferrite core Bubble memory AMR Toggle STT R ↑↑ Seed layer Control Data Corp Intel Honeywell Everspin Hynix Pt (1965) (1980) (1984) (2004) (2010) Smoothing layer Ta/CuN/Ta H Base electrode Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres MRAM (Read) Why STT MRAM ? Field-driven MRAM STT MRAM 1T-1R architecture Logic state = Magnetization (resistance) state Smallest cell size Lowest current 0 10 Full scalability -1 Rmin Rmax 10 Address -2 10 Count -3 10 >25σ Normalised -4 10 Data out (Rhigh) -5 10 2X -6 10 0 100 200 300 400 500 600 700 800 900 1000 Data ref Resistance Data out (Rlow) Figure of merit is ΔR/σ not ΔR Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Field-Driven MRAM What is STT ? Spin&Transfer&Torque&(STT)& Gilbert Ac#ng&on&magne#za#on&through¤t& Damping Use current pulses to generate overlapping Field term magnetic fields at word/bit line crosspoint (precession) Spin torque Current%only+ (antidamping) switching++ (no+field)+ Transistor OFF Large cell size (30F²) High power (2x16 mA / bit) dM dM = −γM × H + bI.M +γaI.M × M × M +αM × Low speed (35ns R/W) dt ( eff p ) ( p ) dt Not scalable Zeeman Field-torque Spin-torque Gilbert Damping Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Why perpendicular ? STT-MRAM is now becoming real ! Q3, 2014 Q3, 2014 t Δ Barrier to switching − τ ± kBT Thermal activation Switching probability P± =1− e τ = τ 0e -9 τ0 = 10 s Keff V Ks1 + Ks2 Thermal stability factor plan 2 perp ( ) Δ = K eff = Kv − 2π M s K eff = + Kv kBT t 2 ! 4e$ αk T plan ! 4e$ αk T ! π M V $ perp B Critical current j B s jc = # & Δ c = # & #Δ + & " ! % g(0)pA " ! % g(0)pA " kBT % Q4, 2014 Forum ORaP 2 Avril 2015 α = damping JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres P = polarization A = Area g(0)~1 P-STT MRAM Demo How Fast Can STT-MRAM Be ? " % 1 kBT τ Ic − Ic0 ∝ Ic − Ic0 = $1− ln ' τ # E τ 0 & 50ns 10ns Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Thermally Activated Switching Precessional STT demo Marins de castro Sousa et al, Liu et al, APL97, Journal of Applied Physics 111 (2012) 07C912 242510 (2010) !" !!" !" !!" • Stochastic reversal 1.0 Γ = a j M ×(P × M ) • Incubation time preceding a large 0.8 thermal fluctuation 0.6 0.4 Devolder et al., Phys. Rev. Let. vol 100 (2008) 500mV 562mV 0.2 631mV centered bias 631mV AP bias Switching Probability Switching 631mV P bias 708 0.0 794mV 0 2 4 6 8 10 Pulse width (ns) Transmitted Transmitted voltage (mV) ! In-plane precession ! Ultrafast deterministic switching. Time after pulse (ns) ! ultra low power (switching with 90 fJ) Forum ORaP 2 Avril 2015 JP.Nozieres Forum ORaP 2 Avril 2015 JP.Nozieres Precessional STT (Orthogonal Spin Torque – OST) Spin Orbit Torque (SOT) MRAM 2 STT contributions M+ → → HEFF~&M×HR& → → → → → → → → → AlOx&2&nm& d M α d M Co&0.5+nm& M H eff M a M (A M ) a M (P M ) = −γ 0 × + × + jA × × + jP × × Pt&3&nm& dt M S dt HR+ Reference&& Layer+ A →&STT&from&reference&layer&A&:&& MgO &&&&&Bipolar&switching&of&free&layer&magne>za>on& "++3%Terminals+ Free&& Layer++ "Infinite+Endurance+/+Reliability++ MgO →&STT&from&Perpendicular&polarizer&P:&& &&&&&Precession&of&free&layer&magne>za>on& Perpendicular&& "+Independent+Read+and+Write+paths+ Polarizer++ P o+Adjustable+Impedance++ -If a >>a (perpendicular Polarizer dominates) jP jA o+Maximized+TMR+ Fast+Read+ → Steady Precessions (e.g.