3Rd Gen Intel Xeon Scalable Processors Press Deck
Total Page:16
File Type:pdf, Size:1020Kb
INDUSTRY INFLECTIONS ARE FUELING THE GROWTH OF DATA 5G Network Transformation, Artificial Intelligence, Intelligent Edge, Cloudification AI & ANALYTICS ARE THE DEFINING WORKLOADS OF THE NEXT DECADE UNMATCHED PORTFOLIO BREADTH AND ECOSYSTEM SUPPORT Intel delivers a silicon & software foundation designed for the diverse range of use cases from the cloud to the edge To create world-changing technology that enriches the lives of every person on earth 5G Network Artificial Intelligent Transformation Intelligence Edge Cloudification MOVE STORE PROCESS ETHERNET SILICON PHOTONICS SOFTWARE & SYSTEM LEVEL Pandemic Response Technology Initiative DIAGNOSE & TREAT RESEARCH & VACCINES LOCAL COMMUNITY International Red Cross SERVE CRITICALLY ILL COVID PATIENTS ILLUMINATE VIRUS AND SUPPORT GLOBAL THROUGH VIRTUAL CARE DNA REPLICATION TASKS RELIEF EFFORTS RAPIDLY EXPAND REMOTE ICUs SPEED DISCOVERY OF SANITIZE ROOMS AND EQUIPMENT TO 100 US HOSPITALS NEW PHARMACEUTICALS WITH AUTONOMOUS ROBOTS PREDICT PATIENTS WHO WILL DEVELOP ACCELERATE GENOMIC ANALYSIS LEARNING SOLUTION FOR STUDENTS ACUTE RESPIRATORY DISTRESS SYNDROME AND DECODE COVID-19 WITHOUT COMPUTERS OR INTERNET Ecosystem BUILD THRIVING DELIVER INNOVATE & INVEST ECOSYSTEM SOLUTIONS IN THE FUTURE Software OPTIMIZED EMPOWER UNIFY SOFTWARE DEVELOPERS APIS Hardware INFUSE AI LEADERSHIP DISCRETE OPTIMIZE AT THE INTO THE CPU ACCELERATORS PLATFORM LEVEL Intel Optane Persistent Memory 200 Series 3rd Gen Intel Gen 3 Habana Intel Xeon Scalable GPU Stratix 10 NX Intel Movidius VPU Gaudi & Goya IN DEVELOPMENT EARLY ACCESS LIMITED SAMPLING Intel SSD D7-P5500 Intel SSD P5600 CPU GPU FPGA SPECIALIZED ACCELERATORS WORKLOAD BREADTH AI SPECIFIC LAUNCHING BUILT-IN AI ACCELERATION Intel Deep Learning Boost NEW: bfloat16* up to AVERAGE HIGHER DATABASE PERFORMANCE GAIN PERFORMANCE vs 5-YEAR-OLD PLATFORM vs 5-YEAR-OLD PLATFORM BREAKTHROUGH MEMORY FLEXIBILITY Intel Optane Persistent Memory Enhanced 200 series Intel Speed Select Technology TARGETED FOR 4S-8S SYSTEMS Performance results are based on testing as of dates in configuration and may not reflect all publicly available security updates. See backup for configuration details. *Available on select 3rd Gen Intel Xeon Scalable processors For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. AI ON 1ST GEN INTEL XEON SCALABLE 1ST GEN Intel Xeon Scalable Intel AVX-512 FP32 OPTIMIZED LIBRARIES & FRAMEWORKS OPTIMIZED TOPOLOGIES ON XEON INTEL DL BOOST ADOPTION 2ND GEN Intel Xeon Scalable Intel Deep Learning Boost INT8 OPTIMIZED LIBRARIES & FRAMEWORKS OPTIMIZED TOPOLOGIES ON XEON INTRODUCING Minimal software changes 3RD GEN SPEED INT8 Intel Xeon Scalable INFERENCE ONLY Similar accuracy to FP32 S 7 bit mantissa Intel Deep Learning Boost NEW: BF16 S 8 bit exp 7 bit mantissa FP32 TRAINING & INFERENCE S 8 bit exp 23 bit mantissa ACCURACY Performance results are based on testing as of dates in configuration and may not reflect all publicly available security updates. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. INTRODUCING Minimal software changes SPEED INT8 INFERENCE ONLY Similar accuracy to FP32 3RD GEN S 7 bit Intel Xeon Scalable mantissa up to 7 bit Intel S 8 bit exp HIGHER TRAINING Deep Learning Boost mantissa PERFORMANCE NEW: BF16 FP32 vs PRIOR GEN FP32 TRAINING & INFERENCE up to S 8 bit exp 23 bit mantissa HIGHER INFERENCE PERFORMANCE OPTIMIZED TOPOLOGIES ACCURACY vs PRIOR GEN FP32 ON XEON OPTIMIZED LIBRARIES & FRAMEWORKS Performance results are based on testing as of dates in configuration and may not reflect all publicly available security updates. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. INTRODUCING HIGHER FASTER FASTER FASTER THROUGHPUT INFERENCE TRAINING INFERENCE NLP - TEXTCNN NLP - BERT VIDEO ANALYSIS VIDEO ANALYSIS 3RD GEN Intel Xeon Scalable MEDICAL AI RESEARCH Intel Deep Learning Boost HIGHER HIGHER TRAINING HIGHER PROCESSING HIGHER PROCESSING NEW: BF16 THROUGHPUT PERFORMANCE THROUGHPUT THROUGHPUT BIOMETRICS IMAGE CLASSIFICATION VISUAL MEDIA SEARCH MEDICAL IMAGES HIGHER HIGHER FASTER THROUGHPUT THROUGHPUT INFERENCE SEARCH ENGINE TTS - WAVERNN TTS - PARALLEL WAVENET Performance results are based on testing as of dates in configuration and may not reflect all publicly available security updates. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. AI TRAINING PERFORMANCE AI INFERENCE PERFORMANCE OPTIMIZED RESNET-50 THROUGHPUT OPTIMIZED RESNET-50 THROUGHPUT INT8 INT8 BF16 INTEL BF16 DL BOOST INT8 INTEL INTEL DL BOOST DL BOOST FP32 FP32 FP32 FP32 FP32 FP32 1.0 INTEL INTEL INTEL 1.0 FP32 AVX-512 AVX-512 AVX-512 INTEL INTEL INTEL FP32 AVX-512 AVX-512 AVX-512 Intel Xeon E7 v4 1st Gen Intel 2nd Gen Intel 3rd Gen Intel Intel Xeon E7 v4 1st Gen Intel 2nd Gen Intel 3rd Gen Intel Xeon Scalable Xeon Scalable Xeon Scalable Xeon Scalable Xeon Scalable Xeon Scalable Performance results are based on testing as of dates in configuration and may not reflect all publicly available security updates. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. 2ND GEN 3RD GEN NEXT GEN Intel Xeon Scalable Intel Xeon Scalable Intel Xeon Scalable 1-8 4-8 1-8 SOCKETS SOCKETS SOCKETS Cooper Lake CEDAR ISLAND PLATFORM Cascade Lake LAUNCHING PURLEY PLATFORM Sapphire Rapids EAGLE STREAM PLATFORM 1-2 SOCKETS Ice Lake WHITLEY PLATFORM NEXT GEN DL BOOST: AMX COMING LATER THIS YEAR SILICON POWERED ON compute cache in-package memory memory DRAM CAPACITY GAP STORAGE PERFORMANCE GAP storage 3D NAND COST-PERFORMANCE GAP HDD-tape compute cache in-package memory memory DRAM PERSISTENCE & INCREASED CAPACITY storage IMPROVED SSD PERFORMANCE EFFICIENT STORAGE HDD-tape CUSTOMER TRACTION SINCE LAUNCH FORTUNE 500 POC TO SALE CONVERSION PRODUCTION WINS TCO SAVINGS INCREASED THROUGHPUT FASTER TIME TO INSIGHTS 1.3X improvement ~3X improvement 8X faster StudioCloud SOLVER RUN COMPARED TO IN TCO (REDIS) IN JOBS PER PHYSICAL HOST RATIO LUSTRE FILESYSTEM 30% reduction 2.78X increase 80% LATENCY REDUCTION & IN RECOMMENDATION SYSTEM IN GAMES HOSTED ON A 3X ACCELERATED INDEXING & REDIS SERVICE SINGLE SERVER (ELASTICSEARCH) 15X reduction 1.1X increase 15X faster IN MEMORY FOR IMAGE IN CPU UTILIZATION DATABASE DATA LOAD STARTUP PROCESSING VS. DRAM-ONLY (SAP HANA) ~2X 22.5%-48% improvement VM INSTANTIATION FOR 5G 13.7X accelerated IN TCO (REDIS) MULTI-ACCESS EDGE (REDIS) DATABASE STARTUP (SAP HANA) ~40% more up to 17X faster 41% reduction VMS & CONTAINERS WITHIN STORAGE APPLICATIONS ON INFRASTRUCTURE COST SAME BUDGET (REDIS) (ROCKSDB, MONGODB, MYSQL) Performance results are based on testing as of dates in configuration and may not reflect all publicly available security updates. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. LAUNCHING average up to TOTAL HIGHER MEMORY MEMORY BANDWIDTH PER SOCKET vs PRIOR GEN over REDUCE I/O BOTTLENECKS TO BOOST APPLICATION ANALYZE DATA FASTER PERFORMANCE FASTER ACCESS TO DATA vs MAINSTREAM NAND SSD Performance results are based on testing as of dates in configuration and may not reflect all publicly available security updates. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. LAUNCHING up up to to LOWER MORE LATENCY DELIVERING AN OPTIMAL BALANCE PERFORMANCE OF PERFORMANCE AND CAPACITY vs PRIOR GEN vs PRIOR GEN Accelerates all-flash arrays Advanced IT efficiency & data security features Most advanced TLC 3D NAND Performance results are based on testing as of dates in configuration and may not reflect all publicly available security updates. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. ARTIFICIAL INTELLIGENCE ANALYTICS HPC HCI / STORAGE MICROSOFT SQL SERVER AI INFERENCING (WINDOWS SERVER, LINUX) SIMULATION & MODELING VMWARE VSAN BIGDL ON APACHE SPARK SAP HANA SIMULATION & VISUALIZATION VMWARE HORIZON VDI ON vSAN PINGCAP TIDB GENOMICS ANALYTICS MICROSOFT AZURE STACK HCI HPC & AI CONVERGED CLUSTERS GBASE (MAGPIE, UNIVA) NUTANIX HCI MEDIA ANALYTICS XSKY TRANSWARP ARGO DB Optimized AI Integration AI Demand Increasing FPGAS DELIVER HARDWARE Rapid CUSTOMIZATION WITH Innovation INTEGRATED AI CUSTOM WORKLOADS INCREASINGLY REQUIRE AI FPGAS ENABLE HW CUSTOM WORKLOADS Ecosystem Software Intel® FPGA Intel® MKL Deep Learning Accelerator Suite Hardware Model Complexity # of parameters AI COMPUTE REQUIREMENT IS DOUBLING EVERY 3.5 MONTHS 8 B 1.5 B 340 M 25 M Time AlexNet ResNet YOLO GNMT BERT-LG GPT-2 GPT-2LG Today Source: https://openai.com/blog/ai-and-compute/ DISCLOSING ABUNDANT NEAR-COMPUTE MEMORY HIGH PERFORMANCE AI TENSOR BLOCKS • Embedded and customizable memory hierarchy • Up to 15X more INT8 compute performance than for model persistence today’s Stratix 10 MX for AI workloads • Integrated HBM for high memory bandwidth • Hardware programmable for AI with customized workloads EXTENSIBLE HIGH BANDWIDTH NETWORKING • Chiplets enable easier interface customization and ASIC extensions • Up to 57.8G PAM4 transceivers and hard Intel Ethernet blocks for high efficiency • Flexible and customizable interconnect