Intelligent Silicon for AI at Scale
Samir Mittal Corporate Vice-President 3DXP Systems Solutions Engineering Micron Technology
November 18, 2020 Speaker Biography
Key experience in Deep Learning: • Cognitive Brain for investment fund marketing (Capital Group / American Funds) • Machine learning infrastructure development (with leading hyperscale customers)
Most recently: • CEO and Founder, SCUTI AI – startup in Scaled Deep Learning • SanDisk, VP of Enterprise Engineering – all flash data center • Led Pliant, Smart Storage, Fusion-io, Flashsoft and Schooner teams
Background: • PhD in Signal Processing and Control Systems from The Ohio State University • Subsequently at Seagate Enterprise (storage), Qumu (enterprise video)
2 Complex Decisions are being increasingly powered by AI
Data Insights Decisions Value
helping us to create new transformative value for society scale impact
3 AI for complex systems is the new frontier
Tomorrow Today
“Systems” “Classic” World AI AI
Autonomous driving Speech recognition Program synthesis Video analytics Mortgage decisions Language translation
Mission critical <100% accurate Correct & explainable Inform & assist New possibilities Incremental value
4 4 Emerging trends in “Systems AI”
• Mission critical deployments in ever-changing environments Ø Constant adaptation Ø Reinforcement learning • Make AI accessible – operate with more intelligence at higher levels of abstraction Ø From generative models to generative agents Ø Supervised training to unsupervised learning Ø Incorporate domain semantics Ø Learn rich representations of the environment Ø Generalize well
5 5 Case Study: Systems AI in Semiconductor (SSD) Manufacturing
Today
Quality via test
Process audit production audit In-situ measurements
Traditional approach Tomorrow
Quality through design
Design knowledge Deep & Machine Learning
Challenges
Data gravity & Robust decision Reliable real-time performance 6 compliance making Deep economic potential in traditional industries at the intersection of rich data, prediction & control
Manufacturing Surveillance & security Infrastructure Prediction & optimization Detection & action Structure safety & traffic mgmt.
However, Enterprise adoption is challenged
High complexity Slow turn-around Diminishing value 7 Requires 100’s of engineers Poor solution scalability with increasing scale 7 “Systems AI” results get better with more data, bigger models & real-time learning
https://openai.com/blog/ai-and-compute/ AI model size doubles every 3.5 months…
8 Micron Confidential 8 Insufficient memory and memory bandwidth are limiting code performance
9 9 New directions with Intelligent Silicon
1. Predictive orchestration for parallelism
2. Performance to power efficiency
3. Domain specific optimizations
4. CPU and GPU offload
5. Storage Class Memory with significant cost to performance improvement
10 10 Micron’s 3DXP Storage Class Memory technology
DRAM
3DXP
Non-volatile Flash Fast read, fast write High density Low voltage Logic integrate-able Byte addressable
11 11 3DXP is revolutionary technology with full stack implications
Application Application Object Stream CPU 1 Memory constrained CPU 1 Memory un-bound
2 Difficult to scale 2 Memory & storage Zero Translation Zero Unified Semantics Unified convergence Byte Stream Memory 3 Over-provisioned & expensive
Translation overhead Translation Memory hierarchies for 3DXP Memory 3 infrastructure virtualization File Stream SSD / HDD SSD / HDD
Today’s Computing Paradigm Vision with 3DXP
12 Micron X100: Primary focus has been to improve I/O performance
ULTRA-FAST LOCAL STORAGE: Micron 3DXP enabled server 4X-7X FASTER THAN CONVENTIONAL SSD CPU Accelerators GPU CPU complex complex ULTRA-LOW LATENCY: Server Network GPU Network 6X – 10X IMPROVEMENT Commodity SSD Micron 3DXP
HIGH PERFORMANCE IN SMALL PARTITIONS
13 User Experience improvement in key applications
Micron 3DXP enabled server DATABASE DATA WAREHOUSING CPU Accelerators GPU CPU complex complex BIG DATA SPARK ANALYTICS Server Network GPU Network Commodity SSD Micron 3DXP
DEEP LEARNING VECTOR SEARCH
14 Azure Ignite Announcement in 2019 Nov 7th 2019 –Azure CTO, Mark Russinovich endorsed Micron X100 performance advantages for Azure
Mark ran showcased X100 in 2 live demos • 9.5 GBs of throughput on the X100-based VM was >4X better than NAND
• TPC-H on Microsoft SQL showed >10X better average per transfer latency and >3X better overall run completion times for X100 vs NAND
15 3DXP Solutions Transform the Compute Server
2020-2021 2022 2023
HBM CPU & GPU CPU & GPU CPU & GPU HBM NAND HBM NAND DRAM NAND Line Line
Cache Cache SSD Access Cache Cache SSD DRAM SSD 128B - 4K Byte Memory Network Network Network 4K Storage storage 3DXP Storage Memory 3DXP Tiered memory Memory
Ultra-fast Fast storage storage Memory Zones NAND SSD NAND 3DXP 3DXP SSD SSD SSD
Storage performance at near-memory speed 3DXP memory unifies all data domains Eliminates data striping on NAND SSD Minimizes storage-to-memory data motion CPU offload of storage controllers Offloads CPU page migrations Max virtualization performance Improves economics with memory virtualization Maximizes CPU performance Enable fungible infrastructure for SKU reduction
16 Data Center Evolution with Software Defined Servers (Russinovich, 2020)
17 Software Defined Server for ”Systems AI” at the Edge
§ Fungible multi-tenant platform § Hardware offloaded virtualization of compute, memory, storage and network § Unseen symmetric performance § Best-fit for high ingest, transient local data § Transparent migration of existing workloads Micron Enabled Edge § Optimized for real-time workloads Micron Confidential Real-time Data Pipelines at the Edge
Ingest Analyze Store & Query
Log & Sensor HiveQL Data Queries HDFS VM1 VM2 VM3
Hypervisor Storage Memory Storage Storage Memory Memory 3DXP
Managed Edge Appliance
19 Micron Confidential Micron Confidential A new model for “Systems AI” at the Edge
Scalable use cases, deployments, & customizations
Economical Simplicity Composable Highest performance One-click for complex with config & operation data pipelines commodity components
20 20 Micron 3DXP enables us to infuse Domain Knowledge to resolve canonical issues in the AI stack
Multi-domain network
Image DL Machine NLP learning
Human engineering intensive mgmt. Robust AI Model characterization Computation Data Convergence User ML Apps characteristics dependencies properties Problem Distributed Model Machine Learning Framework decomposition data convergence & Intelligent model partitioning complexity management tune-up Parallelization framework(s) Model policies
Hardware abstraction layer Workload Run-time Hardware specifications optimizer attributes Pushing infrastructure to its limits with Micron 3DXP enabled server Data mobility & Poor use of Workflow acceleration Weak locality domain & HW parallelization Optimized Predictive Intelligent issues knowledge partitioning data movement scale
21 Scale-out Thank you [email protected]