Persistent Memory for Artificial Intelligence

Persistent Memory for Artificial Intelligence Bill Gervasi Principal Systems Architect [email protected] Santa Clara, CA August 2018 1 Demand Outpacing Capacity In-Memory Computing Artificial Intelligence Machine Learning Deep Learning Memory Demand DRAM Capacity Santa Clara, CA August 2018 2 Driving New Capacity Models Non-volatile memories Industry successfully snuggling large memories to the processors… Memory Demand DRAM Capacity …but we can do oh! so much more Santa Clara, CA August 2018 3 My Three Talks at FMS NVDIMM Analysis Memory Class Storage Artificial Intelligence Santa Clara, CA August 2018 4 History of Architectures Let’s go back in time… Santa Clara, CA August 2018 5 Historical Trends in Computing Edge Co- Computing Processing Power Failure Santa Clara, CA August 2018 Data Loss 6 Some Moments in History Central Distributed Processing Processing Shared Processor Processor per user Dumb terminals Peer-to-peer networks Santa Clara, CA August 2018 7 Some Moments in History Central Distributed Processing Processing “Native Signal Processing” Hercules graphics Main CPU drivers Sound Blaster audio Cheap analog I/O Rockwell modem Ethernet DSP Tightly-coupled coprocessing Santa Clara, CA August 2018 8 The Lone Survivor… Integrated graphics Graphics add-in cards …survived the NSP war Santa Clara, CA August 2018 9 Some Moments in History Central Distributed Processing Processing Phone providers Phone apps provide controlled all local services data processing Edge computing reduces latency Santa Clara, CA August 2018 10 When the Playing Field Changes The speed of networking directly impacts the pendulum swing from centralized to distributed A faster network favors distributed computing Santa Clara, CA August 2018 11 Winners and Losers Often, the maturity of the software development environment determined who won and who lost Santa Clara, CA August 2018 12 Maintaining an Edge See how great it is??? Mediocre software Coprocessor Oops!!! CPU Time L Santa Clara, CA August 2018 13 The Tail Wagging the Dog I won’t say “It’s the Software, Stupid” because I know you’re not stupid however To succeed, AI needs GREAT software infrastructure Driving some companies to design hardware to the software instead of software to the hardware Santa Clara, CA August 2018 14 Wild Array of Programmer Options Santa Clara, CA August 2018 15 AI on Traditional Server No magic Disk AI applications are like any other Data processing done on main CPU CPU Downside is main CPU is overkill in floating point, and weak in parallelism Santa Clara, CA August 2018 16 AI Evolution Addition of AI Accelerator Disk Offloads main CPU for AI tasks AI CPU Accelerator Santa Clara, CA August 2018 17 AI Evolution AI Accelerator Characteristics Proc Wide array of simple Proc processing elements Goesinta Proc Goesouta Reduced floating point precision Distribute Proc Recombine Proc Tuned for matrix operations Proc Santa Clara, CA August 2018 18 In-Memory Computing In-memory computing lets the AI accelerator control the memory Disk directly AI Accelerator CPU Also great for encryption Santa Clara, CA August 2018 19 Data Processing Paradigms Traditional database Data mining Data Access Inferencing Data Reliability Fuzzy logic Recognition etc Santa Clara, CA August 2018 20 The Actualization Gap Research projects Deployments Santa Clara, CA August 2018 21 The “Research” Projects Many interconnects between storage elements and processing elements Weighted calculations produce parallel possible results R R R R R R R R Focus for a number of R R R R startup companies Santa Clara, CA August 2018 22 What Most People Mostly Building Pipes for Dense matrix memory networking for highest storage capacity Shared memory controller for many execution units Santa Clara, CA August 2018 23 Practical I/O Connection Limits Any-to-any would Toroid is a more be awesome practicable solution Limits how quickly data can flow in and out Santa Clara, CA August 2018 24 Network Utilization RESET Fill processors Fill caches with with code model seed data Send new Process input data Retrieve results Input data through model Power Fail No Time to Checkpoint Model? xxx Consumes I/O Yes Retrieve updated model data Santa Clara, CA August 2018 25 Lossless Versus Lossy Persistent data: reload needed Transient data: reload, restart calculations Accumulated data: modified models are expensive to rebuild Time to reload is always an issue Santa Clara, CA August 2018 26 Recovering From Power Fail Disk Data pulled from main memory …or worse… CPU Backing store Data requires Not uncommon for multiple hops data reload to take through the 3 minutes or more interconnects Before recalculation Santa Clara, CA August 2018 can begin! 27 Distributed Memory Complications R R R R R R R R R R R R This may help explain the gap Distributed cells complicate between research projects and download time into the arrays actual deployments Santa Clara, CA August 2018 28 Persistent Main Memory NVDIMMs are moving Disk data persistence to the main memory bus and in some cases increasing memory capacity CPU See my other talk later this week Santa Clara, CA August 2018 29 Cost of Power Failure Statistics vary but all agree… downtime costs a LOT Santa Clara, CA August 2018 30 Persistent Memory DRAM Persistent Memory Loses data Holds data Must be refreshed forever, even Can’t lose power on power fail Santa Clara, CA August 2018 31 Nantero NRAM™ DDR4 DDR5 …………….. Nantero NRAM is a persistent memory ………………………….... using carbon nanotubes to build resistive arrays which can be arranged in a DRAM compatible device HBM See my other talks Santa Clara, CA later this week August 2018 32 Classes of Persistent Memory PCM / DRAM NRAM MRAM ReRAM FeRAM Flash 3DXpoint Non-volatility No Yes Yes Yes Yes Yes Yes Endurance No limit No limit Limited Limited Limited No limit 103 Read Time 10 ns 10 ns X X X X 50M ns Write Time 10 ns 10 ns X X X X 25M ns Memory & Storage Class Memory Storage Memory Class Storage See my other talks Santa Clara, CA later this week August 2018 33 Applying Persistent Memory Replace DRAM with Next generation Persistent Memory persistent memory will Completely eliminates the target SRAM, need to reload on too Power fail Persistent shadow registers aren’t such a bad Santa Clara, CA idea, either August 2018 34 NRAM for Main Memory NRAM replaces DDR4, DDR5 for main memory CPU Santa Clara, CA August 2018 35 Enables the New Architectures NRAM cells in the array Permanent storage through power fail Programmed once during manufacturing, no reload Santa Clara, CA August 2018 36 NRAM Everywhere Soon we will look back and say “Remember when data was lost when power went out?” and laugh Santa Clara, CA August 2018 37 Full Disclosure My first home computer had an 8” floppy disk I earned my gray hair Santa Clara, CA August 2018 38 Summary Centralized versus distributed computing is a long term cycle Quality of software infrastructure typically determines the winner Artificial intelligence accelerators are a recent co-processing addition Data loss on power failure is worsened by AI architectures Persistent memory in AI device solves major problems Nantero NRAM addresses many usages of PM in AI systems If you remember 8” floppies, you probably can’t read this screen Santa Clara, CA August 2018 39 Questions? Bill Gervasi Principal Systems Architect [email protected] Santa Clara, CA August 2018 40 .

Persistent Memory for Artificial Intelligence

Accenture AI Inferencing in Action

GPU Developments 2018

AI Accelerator Latencies in Hybrid Vehicular Simulation

Unified Inference and Training at the Edge

Low-Power Ultra-Small Edge AI Accelerators for Image Recog- Nition with Convolution Neural Networks: Analysis and Future Directions

Survey and Benchmarking of Machine Learning Accelerators

Efficient Management of Scratch-Pad Memories in Deep Learning

Infiltration of AI/ML in Particle Physics Aspects of Data Handling

Ai Accelerator Ecosystem: an Overview

MYTHIC MULTIPLIES in a FLASH Analog In-Memory Computing Eliminates DRAM Read/Write Cycles

Driving Intelligence at the Edge: Axiomtek's Edge AI Solutions

Low-Power Ultra-Small Edge AI Accelerators for Image Recognition with Convolution Neural Networks: Analysis and Future Directions