Intel® Xeon Phi™ Processor Family Enables Shorter Time to Train Using General Purpose Infrastructure
Total Page:16
File Type:pdf, Size:1020Kb
Intel® Nervana™ AI SUMMIT -Moscow 2017 AMIR KHOSROWSHAHI CTO, INTEL AI PRODUCTS GROUP 2 why now Inside ai Make it real 3 Ai is transformative Consumer Health Finance Retail Government Energy Transport Industrial Other Smart Enhanced Algorithmic Support Defense Oil & Gas Autonomous Factory Advertising Assistants Diagnostics Trading Exploration Cars Automation Experience Data Education Chatbots Drug Fraud Insights Smart Automated Predictive examples Discovery Detection Marketing Grid Trucking Maintenance Gaming Search Safety & Patient Care Research Merchandising Security Operational Aerospace Precision Professional & Personalization Improvement Agriculture IT Services Research Personal Loyalty Resident Shipping Augmented Finance Engagement Conservation Field Telco/Media Supply Chain Reality Sensory Search & Automation Aids Risk Mitigation Smarter Rescue Sports Security Robots Cities Early adoption Source: Intel forecast automotive Example (end-to-end) Artificial intelligence for automated driving 5G vehicle Network Cloud Driving Functions Data Storage Neural Network Design for Target Autonomous Driving Functions Anomaly Detection Captured Hardware, & usage Sensor Data Trajectory Enumeration, Path Dataset (Vision, Data Driven, etc.) Planning, Selection & Maneuvering Management Driving Policy, Path Selection & Model Training Model Inference Real Time Environment Modeling Traceability Single & Multi- >than Real Time Localization Real Time HD node optimized Model Map Updates Frameworks Simulation & Sensor Processing & Fusion Verification Object ID & Classification Compressed OTA SW /FW Data Models Updates Formatting Ai is possible Data deluge COMPUTE breakthrough Innovation surge mainframes Standards- Cloud Artificial based servers computing intelligence AI Compute Cycles will grow 12X by 2020 Source: Intel Source: Intel forecast Intel Ai Products Group The Home for AI at Intel Ai is breakthrough: deep learning 30% Image recognition 30% Speech recognition 97% 23% person 23% 99% “play song” 15% 15% Error Error 8% 8% Human Human 0% 0% 2010 Present 2000 Present Enabling intelligent applications #IntelAI 8 why now Inside ai Make it real 9 AI is achieved when methods converge Machine Learning Neural Networks AI Deep Learning Problem: Ai is Difficult to build and scale Pre-process training Augment Perform hyperparameter Design model data data search • Enormous compute requirements Model training • Buy & set up infrastructure • Team of DL data scientists • Manual, time-consuming processes 1 to 2 years time-to-model & $ millions in investment Solution: intel nervana intel nervana: full stack AI Solutions Frameworks BIGDL Platforms / Intel® Nervana™ Intel® Nervana™ Intel® Nervana™ Deep Cloud Graph Tools Learning Platform Intel Intel® MKL Intel® DAAL libraries Intel® MKL-DNN Distribution hardware More Compute Memory/Storage Fabric intel® Nervana™ graph High-Level Execution Graph for Neural Networks Customer Solutions Neon Solutions Intel® Nervana™ Graph Customer Models Neon Models enables optimizations that are applicable across multiple HW targets. Customer • Efficient buffer allocation Neon Deep Learning Functions Algorithms • Training vs inference optimizations • Efficient scaling across multiple nodes Hardware Intel® Nervana™ Graph Compiler Agnostic • Efficient partitioning of subgraphs • Compounding of ops Hardware-Specific Transformers #IntelAI 14 Intel® Nervana™ aihardware Common Architecture for AI Implementations CREST Family Best in class deep learning performance Intel® Xeon® Processors Intel® Xeon Phi™ Processors Most widely deployed machine Higher performance, general learning platform (>97%) purpose machine learning Intel® Xeon® Processor +FPGA Higher perf/watt inference, programmable Targeted acceleration #IntelAI 15 Intel® Xeon® Processor Family Most Widely Deployed Machine Learning Platform Optimized for a wide variety of datacenter workloads enabling flexible infrastructure Lowest TCO With Superior Server Class Reliability Leadership Throughput Infrastructure Flexibility . Industry standard server features: . Industry leading inference . Standard server infrastructure high reliability, hardware performance enhanced security . Open standards, libraries & frameworks . Optimized to run wide variety of data center workloads Intel® Xeon Phi™ Processor Family Enables Shorter Time to Train Using General Purpose Infrastructure Ideal for HPC & enterprise customers running scale-out, highly-parallel, memory intensive apps Removing IO and Breakthrough Highly Easier Programmability Memory Barriers Parallel Performance . Binary-compatible with Intel® Xeon® processors . Integrated Intel® Omni-Path fabric . Near linear scaling with 31X reduction . Open standards, libraries and increases price-performance and in time to train when scaling to 32 nodes reduces communication latency frameworks . Up to 400X performance on existing . Direct access of up to 400 GB of hardware via Intel software memory with no PCIe performance lag optimizations (vs. GPU:16GB) . Up to 4X deep learning performance increase estimated (Knights Mill, 2017) Configuration details on slide: 30-31 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 Coming Canyon vista Soon Superior Inference Capabilities Offers high perf/watt for inference with low latency and flexible precision Energy Efficient Inference with Infrastructure Flexibility . Excellent energy efficiency up to 25 images/sec/watt inference on Caffe/Alexnet . Reconfigurable accelerator can be used for variety of data center workloads . Integrated FPGA with Intel® Xeon® processor fits in standard server infrastructure -OR- Discrete FPGA fits in PCIe card and embedded applications* *Xeon with Integrated FPGA refers to Broadwell Proof of Concept Configuration details on slide: 30-31 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 Intel nervana“crest Family” Deep Learning by Design Offers unprecedented compute density with high bandwidth interconnect Hardware for DL Workloads Blazingly Fast Data Access High Speed Scalability . Custom-designed for deep . 32 GB of in package memory via . 12 bi-directional high-bandwidth learning HBM2 technology links . Unprecedented compute density . 8 Tera-bits/s of memory access . Seamless data transfer via . More raw computing power than speed interconnects today’s state-of-the-art GPUs #IntelAI 19 why now Inside ai Make it real 20 deep learning applications Natural language Image recognition processing Speech recognition Video activity detection Tabular and time-series data applications intel® Nervana™ applied Healthcare: Tumor detection Agricultural Robotics Oil & Gas Positive: Positive: Negative: Negative: Automotive: Speech interfaces Finance: time series analysis Proteomics: Sequence analysis Query: Results: Customer: Medical imaging company focused on developing