Future of Compute Is Big Data
Total Page:16
File Type:pdf, Size:1020Kb
Future of Compute is Big Data Mark Seager Intel Fellow, CTO for HPC Ecosystem Intel Corporation Traditional HPC is scientific simulation First ever Nobel prize for HPC takes the experiment into cyberspace Chemical reactions occur at lightning speed; electrons jump between atoms hidden from the prying eyes of scientists. The Nobel Laureates in Chemistry 2013 have made it possible to map the mysterious ways of chemistry by using computers. Detailed knowledge of chemical processes makes it possible to optimize catalysts, drugs and solar cells. Spanning multiple scales by combining quantum mechanics and molecular dynamics Martin Karplus Michael Levitt Arieh Warshel The Nobel Prize in Chemistry 2013 was awarded jointly to Martin Karplus, Michael Levitt and Arieh Warshel "for the development of multi-scale models for complex chemical systems". 2 Other emerging technical computing usage models are driven by Big Data Government & Research Commerce & Industry New Users & New Uses “My goal is simple. It is From complete Better Products Diagnosis to understanding of Digital Twin personalized the universe, why treatments it is as it is, and Faster Time to Market quickly why it exists at Reduced R&D all” New Business Models Genomics Clinical Stephen Hawking Data Services Information Basic Science Business Transformation Data-Driven Discovery Transform data into useful knowledge Source: IDC: Worldwide Technical Computing Server 2013–2017 Forecast; Other brands, names, and images are the property of their respective owners. Other brands, names and images may be claimed as the property of others Data-Driven Discovery Hypothesis Modeling & Formation Prediction Drug Treatment Particle Trend Astronomy Public Policy Discovery Optimization Physics Analysis Genome Medical Clininical Sensor Sim Census Text Images Surveys Data Imaging Trials Data Data Data A/V Life Sciences Physical Sciences Social Sciences Data-Driven Discovery in Science CERN NextBio 600 million collisions / sec 1 human genome = 1 petabyte Detecting 1 in 1 trillion events to Finding patterns in clinical and genome help find the Higgs Boson data at scale can cure cancer and other diseases 5 Project Moonshot for Cancer: Predictive Oncology Pilot 2 6 Source: Rick Stevens, Argonne Nat Lab Convergence of Driving Forces Intelligent Cloud HPC Big Data Devices 19B $200B 2X 15PB Connected devices Cloud services Annual growth in by 20161 In 20162 supercomputing Data collected in FLOPS3 1 year at CERN4 1 Source: Cisco® Visual Networking Index (VNI) Forecast (2011-2016) 2 Source: Gartner Worldwide IT Spending Forecast, 2Q12 Update 3 Source: Top 500 list: Top 10 change from November 2007 to November 2012 4 Source: CERN * Other names and brands may be claimed as the property of others. Intelligent Devices - New Era of Computing Enabling an Industry of Pervasive Computing Focused Segments Smart Automotive Retail Sector Sector Health Mfg Gaming Infotainment Visual Retailing Mil/Aero ADAS Transactional Retailing Autonomous Imaging Retailing Services DSS Energy CONNECTED MANAGED SECURED DATA ANALYTICS SAFETY Example: CIA Mission Drives The Cycle * Other names and brands may be claimed as the property of others. Intel® Scalable System Framework A Holistic Solution for All HPC Needs Small Clusters Through Peta and Exascale Compute and Data-Centric Computing Compute Memory / Storage Fabric Software Standards-Based Programmability IA and HPC Ecosystem Enabling On-Premise and Cloud-Based Intel® Solutions for Lustre* Intel® HPC Orchestrator Intel® Xeon® Processors Intel® Omni-Path Intel® Optane™ Technology Intel® Software Tools Intel® Xeon Phi™ Processors Architecture 3D XPoint™ Technology Intel® Silicon Photonics Intel® Cluster Ready Intel® FPGAs and Server Program Solutions Intel® SSDs Intel® Ethernet Intel Supported SDVis 1 SSF: Enabling Configurability & Scalability from components to racks to clusters SSF Path To Exascale SSF for Scalable Clusters “Cluster” “Rack” “Chassis” Simple, dense, & configurable “Memory” Enough capacity to support apps “I/O” “Compute” Adaptive & configurable Right sized” & configurable • Xeon or Xeon-Phi – based on workload needs • I/O Topologies for best performance • Compute flexibly aggregated • Configurable I/O bandwidth director switch • Lowest latency compute to compute • Burst buffer to decouple storage from I/O interconnect 2 Intel® SSF Tighter System-Level Integration Memory Innovative Memory-Storage Hierarchy Today Future Processor Compute Compute Caches Caches Local memory is now faster On-Package High & in processor package Bandwidth Memory* Memory Bus Local Memory Much larger memory capacities Intel® DIMMs based on Local keep data in local memory 3D XPoint™ Technology Memory Compute Node Local Storage Intel® Optane™ Technology I/O Node storage moves SSDs to compute node I/O Node Higher Bandwidth. Bandwidth. Higher SSD Storage Burst Buffer Node with Some remote data moves Intel® Optane™ Technology SSDs onto I/O node Lower Latency and Capacity and Latency Lower Remote Storage Parallel File System Parallel File System (Hard Drive Storage) (Hard Drive Storage) 18 *cache, memory or hybrid mode Intel® SSF Bringing Memory Back Into Balance Memory High Bandwidth, On-Package Memory Up to 16GB with Knights Landing 5x the Bandwidth vs DDR41, >400 GB/s1 >5x More Energy Efficient vs GDDR52 >3x More Dense vs GDDR52 3 Modes of Operation Flat Mode: Acts as Memory Cache Mode: Acts as Cache Hybrid Mode: Mix of Cache and Flat 1 Projected result based on internal Intel analysis of STREAM benchmark using a Knights Landing processor with 16GB of ultra high-bandwidth versus DDR4 memory with all channels populated. 1 2 Projected result based on internal Intel analysis comparison of 16GB of ultra high-bandwidth memory to 16GB of GDDR5 memory used in the Intel® Xeon Phi™ coprocessor 7120P. 9 NAND Flash and 3D XPoint™ Technology 3D MLC and TLC NAND 3D XPoint™ Technology Selector Memory Cell Enabling highest capacity SSDs Enabling highest performance at the lowest price SSDs and expanding use cases 3D XpointTM Technology 15 Intel® SSF CPU-Fabric Integration Fabric with the Intel® Omni-Path Architecture Future Generations Additional integration, KEY VALUE VECTORS improvements, and features Tighter Intel® Performance Integration OPA Density Cost Next generation Multi-chip Package Intel® Xeon® Phi™ processor Power Integration Reliability Future Intel® Xeon® processor (14nm) Intel® OPA Intel® Xeon Phi™ 7220 (KNL) processor PERFORMANCE Intel® OPA HFI Card Next generation Intel® Xeon® processor Intel® Xeon® processor E5-2600 v3 TIME 16 Intel® SSF OmniPath is Optimized for scalability Fabric FEWER SWITCHES REQUIRED More Servers 1000 Same Budget 1 900 1000 932 Up to 800 24% 750 more 700 1 InfiniBand* servers 600 36-port switch 500 500 400 300 Intel® OPA SWITCH CHIPS REQUIRED 200 48-port switch 0 Servers 100 Mellanox EDR 0 Nodes 249 499 749 999 1249 1499 1749 1999 2249 2499 2749 2999 3249 3499 Intel® OPA NODES 1 Configuration assumes a 750-node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of 648-port director switches and 36-port edge switches. Intel and Mellanox component pricing from www.kernelsoftware.com, with prices as of May 5, 2016. Compute node pricing based on Dell PowerEdge R730 server from www.dell.com, with prices as of November 3, 2015. Intel® OPA pricing based on estimated reseller pricing based on projected Intel MSRP pricing at time of launch. * Other names and brands may be claimed as property of others. 17 Intel® SSF New storage paradigm for data intensive systems SOFTWARE HOT WARM COLD Crystal Ridge 3D NAND SSD+HDD ~200 TB/s ~10-50 TB/s ~1-2 TB/s ~15 PB ~150 PB ~250 PB • SSF Enables HPC+HPDA workloads MDS CN ION … – System components can be configured to match CN MDS .. workload requirements IO Fabric SWITCH OSS – Enables new access methodologies (DAOS) to CN MDS … OSS create new generation applications ION … CN MDS GW – Incremental improvements to Lustre to provide CN . .. OSS enhanced performance for existing applications … GW SAN SWITCH … CN OSS Distributed Asynchronous Object Storage . .. Compute Fabric Compute Applications Tools Persistent Memory Top-level APIs CN ION Integrated DAOS-Cache/Teiring CN Fabric .. Block DAOS-Sharding/Resilience SWITCH Load/Store Get/Put DAOS Storage DAOS-Persistent Memory CN Get/Put DAOS LNET DAOS-M LNET High file create Extreme BW+LL High BW+IOPS 18 Benefit from Intel’s long-standing investments Energy Systems Manufacturing Efficient Architecture Leadership Performance Global Security Software Ecosystem Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance