Intel® Xeon® Processor Platform Performance Hardware Plus Optimized Software INFERENCE THROUGHPUT TRAINING THROUGHPUT
Total Page:16
File Type:pdf, Size:1020Kb
HPC Goes Mainstream with AI & Intel® SEETHA . NOOKALA DIRECTOR , IN TEL A P J H P C&AI Seetha Rama Krishna-(Seetha) A researcher turned to techno entrepreneur. • Director: DCG, E&G – HPC & AI at Intel for APJ (South) • HPC & Enterprise Research and Solutions: • Over 30 years of HPC experience contributing to R&D, Product Dev, applying HPC to Enterprise ,Cloud, Research, E Governance ,Data centers. • Formed and lead Advanced computing solutions teams at C-DAC, TATA- CRL, TCS , Intel. • Started career as researcher and quickly moved on to systems and solutions. • Chief Architect of world’s 4th fastest “eKA” super computer , and Intel’s first in top10 computer at TATA-CRL • Member PARAM Series HPC Architecture teams at C-DAC. • Recognitions: • Global Innovation Award winner in 2009- From TATA Sons. Felicitated by Shri. Ratan Tata in the presence 100Plus CEO’ and CFO’s of the TATA group. • Recipient of PARAM technology Awards from C-DAC, IEEE,CSI recognitions for promotion of HPC in India. 2 Software FUELS Hardware 3 Review more success stories, Intel® Parallel Studio XE Case Studies deck Software Optimization Success Stories Science & research Finance Up to 35X Up to 2.7X improved performance** faster application performance compared to NVIDIA Tesla K80* NERSC (National Energy Research Scientific Computing Center) - see case study Monte Carlo European Options Benchmark* Life Science Visualization Simulations ran up to 7.6X faster Up to 5.17X performance with 9X energy efficiency** improvement** compared to NVIDIA Titan X* LAMMPS code - Sandia National Laboratories intel.com/content/www/us/en/high-performance-computing/hpc-xeon-phi-technology-brief.html Intel Embree v2.9.0 **Intel® Xeon Phi™ Processor (codenamed Knights Landing) Software Ecosystem Momentum Guide Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations & functions. Any change to any of those factors may cause the results to vary. You should consult other information & performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 . Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 4 *Other names and brands may be claimed as the property of others. Chief Technical Officer Software and Services Group Open Source Developer Products Datacenter Software Intel Services Developer Relations Technology Center Division Division Division Division SSG Operations SSG Group System Technologies Windows* OS Marketing and Optimization Division Division Intel Confidential Software and Services Group ‹#› Affordable & Fast AI on Intel - Software tools + Hardware Intel Portfolio to accelerate time-to-solution – HPC to AI © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Intel®Intel® Xeon® Xeon® processor processor scalablescalable family family Scalable performance for widest variety of AI & other datacenter workloads – including deep learning Built-in ROI Begin your AI journey today using existing, familiar infrastructure The AI you need Potent performance On the chip you know DL training in days HOURS with up to 113X2 perf vs. prior gen (2.2x excluding optimized SW1) Production-ready Robust support for full range of AI deployments 1,2Configuration details on slide: 4, 5, 6 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 Intel® Xeon® processor Platform Performance Hardware plus optimized software INFERENCE THROUGHPUT TRAINING THROUGHPUT Up to Up to 198x Optimized 127x Frameworks Intel® Xeon® Platinum 8180 Processor Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe GoogleNet v1 with Intel® MKL higher Intel Optimized Caffe AlexNet with Intel® MKL inference throughput compared to training throughput compared to Optimized Intel® Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe MKL Libraries Inference and training throughput uses FP32 instructions Deliver significant AI performance with hardware and software optimizations on Intel® Xeon® Scalable Family Up to 191X Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe Resnet50 with Intel® MKL inference throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe Up to 93X Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe Resnet50 with Intel® MKL training throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 8 2017. Configurations: See the last slide in this presentation. *Other names and brands may be claimed as the property of others. End-to-end Intel software value Things And Devices Connectivity And Network Cloud And Data Center 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 Intel® Computer Wind River® Simics and Vision SDK Neon BigDL on Spark Software Readiness Qualification (SRQ) X-Platform Intel® Deep Developer Intel® CoFluent™ Technology Enabling Security Learning SDK Zone Ecosystem Ecosystem Android Things OS …among others… Enabling 9 Tools for High Performance Implementation Intel® Parallel Studio Intel® MPI Library Cluster N Intel® MPI Benchmarks Scalable Tune MPI ? Y Intel® Compiler Memory Effective Y N Vectorize Bandwidth Intel® Math Kernel Library threading Sensitive ? Intel® IPP – Media & Data Library ? Intel® Data Analytics Library N Y Intel® OpenMP* Optimize Thread Bandwidth Intel® TBB – Threading Library Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 10 *Other names and brands may be claimed as the property of others. 11 AI & HPC Convergence AI Self-Learning and Completely Automated HPC Computerized Human Thought Simulation and Actions Enable HPC to Advanced Towards Autonomic Enterprise shift up the Analytics Cognitive Maturity curve. Prescriptive Analytics Analytics Data preparation Predictive Operational Analytics accounts for ~80% of the Analytics Simulation-Driven Analysis work of data scientists. and Decision-Making Diagnostic Foresight Mature Data Lake Analytics What Will Happen, Drive When, and Why Descriptive AI HPC Hindsight Analytics Insight What Happened and Why What Happened Convergence AI Copyright© 2014,