GridKa School 2019

KIT, Campus North, FTU

Edmund Preiss

Business Development Manager, EMEA Computing Performance and Products (CPDP) Agenda

Intel Xeon Family – Latest Status Intel Software Development Tools - XE (IPS XE) Tool Suites - News on the IPS XE 2020 Edition - Upcoming oneAPI development tools concept Intel optimized AI Solutions and Directions

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 3 *Other names and brands may be claimed as the property of others. Evolution of Intel® Xeon® Platform

Grantley Purley

Haswell (HSX) Broadwell (BDX) Skylake (SKX) Cascade Lake (CLX) SP SP Up to 28 cores Intel® Up to 18 Cores Up to 24 cores Up to 28 cores Deep Learning AP Boost Up to 56 Cores

Intel® AVX2 Intel® AVX512 Intel® AVX512 VNNI (256 bit) (512 bit) (512 bit) FP32, INT8, … FP32, VNNI INT8, …

2015 2016 2017 2019

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 3 *Other names and brands may be claimed as the property of others. Agenda

Intel Xeon Family Update Intel Software Development Tools - Intel Parallel Studio XE (IPS XE) Tool Suites - News on the IPS XE 2020 Edition - Upcoming oneAPI development tools concept Intel optimized AI Solutions and Directions

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 3 *Other names and brands may be claimed as the property of others. Code Modernization

Stage 1: Use Optimized Libraries

Stage 2: Compile with Architecture-specific Optimizations

Stage 3: Analysis and Tuning

Stage 4: Check Correctness

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 5 *Other names and brands may be claimed as the property of others. What’s Inside Intel® Parallel Studio XE Comprehensive Software Development Tool Suite Cluster Edition Composer Edition Professional Edition BUILD ANALYZE SCALE Compilers & Libraries Analysis Tools Cluster Tools

C / C++ Compiler Intel® Intel® VTune™ Amplifier Intel® MPI Library Optimizing Compiler Performance Profiler Message Passing Interface Library Intel® Integrated Compiler Performance Primitives Intel® Inspector Intel® Trace Analyzer & Collector Optimizing Compiler Image, Signal & Data Processing Memory & Thread Debugger MPI Tuning & Analysis Intel® Threading Intel® Data Analytics Intel® Advisor Intel® Cluster Checker Building Blocks Acceleration Library Vectorization Optimization Cluster Diagnostic Expert System C++ Threading Library & Thread Prototyping

Intel® Distribution for Python* High Performance Scripting

Intel® Architecture Platforms

Operating System: Windows*, *, MacOS1*

More Power for Your Code - software.intel.com/intel-parallel-studio-xe Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 6 *Other names and brands may be claimed as the property of others. 7 What’s Inside Intel® Math Kernel Library

Linear Vector Summary Vector Algebra FFTs RNGs Statistics Math & More

BLAS Congruential Trigonometric Kurtosis Splines

Multidimensional LAPACK Wichmann-Hill Variation Hyperbolic coefficient Interpolation ScaLAPACK Mersenne Twister Exponential Order statistics Sparse BLAS Sobol Log Trust Region

Iterative sparse Min/max Neirderreiter Power solvers FFTW interfaces

Variance- Fast Poisson PARDISO* Non-deterministic covariance Root Solver

Optimization Notice 1Available only in Intel® Parallel Studio Composer Edition. Copyright © 2018, Intel Corporation. All rights reserved. 8 *Other names and brands may be claimed as the property of others. Application Performance Snapshot (VTune Amplifier)

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 9 *Other names and brands may be claimed as the property of others. Find Effective Optimization Strategies : Cache-aware roofline analysis Roofs Roofs Show Platform Limits Memory, cache & compute limits Dots Are Loops Bigger, red dots take more time so optimization has a bigger impact Dots farther from a roof have more room for improvement Higher Dot = Higher GFLOPs/sec Optimization moves dots up Which loops should we optimize? Algorithmic changes move dots . A and G are the best candidates horizontally . B has room to improve, but will have less impact . E, C, D, and H are poor candidates Roofline tutorial video Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 10 *Other names and brands may be claimed as the property of others. Analyze & Tune Application Performance & Scalability with Intel® VTune™ Amplifier—Performance Profiler

Fast, Scalable Code, Faster . Accurately profile C, C++, Java*, Python*, Go*, or any mix . Optimize CPU/GPU, threading, memory, cache, storage & more . Save time: rich analysis leads to insight

What’s New in 2019 Release (Highlights) . Simplified workflow for easier tuning . I/O Analysis—Tune SPDK storage & DPDK network performance . New Platform Profiler—Get insights into platform- level performance, identify memory & storage bottlenecks & imbalances

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos 11 *Other names and brands may be claimed as the property of others. Optimize Vectorization & Threading with Intel® Advisor

Performance Increases Scale with Each New Hardware Generation Modern Performant Code . Vectorized for Intel® Advanced Vector Vectorized ‘Automatic’ Vectorization is Not Enough & Threaded Extensions (Intel® AVX-512 & 200 Explicit pragmas and optimization are often required Intel® AVX) 150 . Efficient memory access 130x . Threaded 100 Threaded 50 Capabilities Vectorized . Adds & optimizes vectorization 0 Serial 2010 2012 2013 2014 2016 2017 Intel® Xeon® Intel Xeon Intel Xeon Intel Xeon Intel Xeon Intel® Xeon® Processor Processor Processor Processor E5- Processor Platinum . Analyzes memory patterns X5680 E5-2600 E5-2600 v2 2600 v3 E5-2600 v4 Processor 81xx codenamed codenamed codenamed codenamed codenamed codenamed Westmere Sandy Bridge Ivy Bridge Haswell Broadwell Skylake Server . Quickly prototypes threading

Benchmark: Binomial Options Pricing Model software.intel.com/en-us/articles/binomial-options-pricing-model-code-for-intel-xeon-phi-coprocessor Performance results are based on testing as of August 2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For more complete information about performance and benchmark results, Learn More: http: intel.ly/advisor visit www.intel.com/benchmarks. See Vectorize & Thread or Performance Dies Configurations for 2010-2017 Benchmarks in Backup. Testing by Intel as of August 2017. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, & SSSE3 instruction sets & other optimizations. Intel does not guarantee the availability, functionality, or Optimization Notice effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use Copyright © 2018, Intel Corporation. All rights reserved. with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable 12 *Other names and brands may be claimed as the property of others. product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 Get Breakthrough Vectorization Performance Intel® Advisor—Vectorization Advisor

Faster Vectorization Optimization Data & Guidance You Need . Vectorize where it will pay off most . Compiler diagnostics + . Quickly ID what is blocking vectorization Performance Data + SIMD efficiency . Tips for effective vectorization . Detect problems & recommend fixes . Safely force compiler vectorization . Loop-Carried Dependency Analysis . Optimize memory stride . Memory Access Patterns Analysis

2019

Optimize for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with or without access to Intel AVX-512 hardware

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 13 *Other names and brands may be claimed as the property of others. Agenda

Intel Xeon Family Update Intel Software Development Tools - Intel Parallel Studio XE (IPS XE) Tool Suites - News on the IPS XE 2020 Edition - Upcoming oneAPI development tools concept Intel optimized AI Solutions and Directions

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 3 *Other names and brands may be claimed as the property of others. Key Updates for Intel® Parallel Studio XE 2020 Speed Artificial Intelligence Inferencing - Intel® Compiler and analyzer support for Vector Neural Instructions (VNNI) in Cascade Lake/AP platform 512GB DIMMs with Persistent Memory – Identify, Optimize & Tune Platforms for Intel® Optane™ Persistent Memory with Intel® VTune™ Amplifier Extended Coarse Grain Profiling – Platform level collection and analysis in Intel® VTune™ Amplifier Cache Simulation Insights for Vectorization - Roofline analysis for L1, L2 L3, DRAM in Intel® Advisor Expanded standard support — More Fortran 2018 features & Expanded support of C++17 with initial C++20 support Latest Processor Support - Intel® Xeon® Scalable Processors (codenamed Cascade Lake / Cascade Lake AP) New OS Support – Clear Linux & Amazon Linux 2*

* Supported features of tools and libraries may vary by instances and configurations

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 2020 Intel® VTune™ Profiler Server Architecture Preview Just launch your browser and go Feature

Users VTune Profiler Target Systems Server Browser

Browser

Easier profiling

. Access with a web browserNDA Required– no install required by users . Share results – all results available to all users with server access . Profile any system on the network – server installs collector on the target

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 16 *Other names and brands may be claimed as the property of others. What’s Coming in Intel® Parallel Studio XE 2020 Coming Q3’2019

• Intel® Math Kernel Library • Increased AVX512 Optimizations for Complex Vector Math Functions • Strided Vector Math API • Intel® Data Analytics Acceleration Library • Performance improvements and feature improvements such as • Gradient Boosted Trees for large dimensional data sets • Extended Z-score support for PCA algorithm • XGBoost accelerated with DAAL

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. Intel Confidential – NDA Required 17 *Other names and brands may be claimed as the property of others. Agenda

Intel Xeon Family Update Intel Software Development Tools - Intel Parallel Studio XE (IPS XE) Tool Suites - News on the IPS XE 2020 Edition - Upcoming oneAPI development tools concept Intel optimized AI Solutions and Directions

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 3 *Other names and brands may be claimed as the property of others. Diverse Workloads require CPU GPU AI FPGA DIVERSE architectures

The future is a diverse mix of scalar, vector, matrix, and spatial architectures deployed in CPU, GPU, AI, FPGA and other accelerators Scalar Vector Matrix Spatial

SVMS

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 19 *Other names and brands may be claimed as the property of others. Programming CPU GPU AI FPGA Challenge

Diverse set of data-centric hardware

No common programming language or APIs

Inconsistent tool support across platforms Scalar Vector Matrix Spatial

Each platform requires unique software investment SVMS

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 20 *Other names and brands may be claimed as the property of others. Intel’s ONE API Core Concept Optimized Applications Project oneAPI delivers a unified programming model to simplify development One API Optimized across diverse architectures Tools Middleware / Frameworks Common developer experience across Scalar, Vector, Matrix and Spatial (SVMS) architecture

Unified and simplified language and libraries for One API Language & Libraries expressing parallelism

Uncompromised native high-level language performance CPU GPU AI FPGA Support for CPU, GPU, AI and FPGA Scalar Vector Matrix Spatial Based on industry standards and open specifications

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 21 *Other names and brands may be claimed as the property of others. One API for cross-architecture performance

Optimized Applications

Optimized Middleware & Frameworks

One API Product

Direct Programming API-Based Programming

Math Threading DPC++ Library Analysis & Porting Data Debug Tools Tool VTune™ Parallel Analytics/ML DNN ML Comm Advisor C++ Debugger Video Processing Rendering

CPU GPU AI FPGA

Scalar Vector Matrix Spatial

Some capabilities may differ per architecture.

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. Intel Confidential – CNDA required 22 *Other names and brands may be claimed as the property of others. Agenda

Intel Xeon Family Update Intel Software Development Tools - Intel Parallel Studio XE (IPS XE) Tool Suites - News on the IPS XE 2020 Edition - Upcoming oneAPI development tools concept Intel optimized AI Solutions and Directions

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 3 *Other names and brands may be claimed as the property of others. Artificial Artificial Intelligence Intelligence Machine learning is the ability of machines Algorithms whose performance to learn from experience, improve as they are exposed to without explicit more data over time programming, in order to perform cognitive functions associated with Deep learning Subset of machine the human mind learning in which multi-layered neural networks learn from vast amounts of data

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Solutions ARTIFICIAL INTELLIGENCE AI Solutions Catalog Solution (Public & Internal) A I Architects Platforms Finance Healthcare Energy Industrial Transport Retail Home More… DEEP LEARNING DEPLOYMENT DEEP LEARNING RN TOOLKITS OpenVINO™ † Intel® Movidius™ SDK Intel® Deep ‡ Open Visual Inference & Neural Network Optimization Optimized inference deployment Learning Studio App toolkit for inference deployment on CPU, processor for all Intel® Movidius™ VPUs using Open-source tool to compress TT Developers graphics, FPGA & VPU using TF, Caffe* & MXNet* TensorFlow* & Caffe* deep learning development cycle

I E MACHINE LEARNING LIBRARIES DEEP LEARNING FRAMEWORKS F L libraries Python R Distributed Now optimized for CPU Optimizations in progress * • Scikit-learn • Cart • MlLib (on Spark) * * * * * * * * * * FOR Data • • Random • Mahout * I L Scientists • NumPy Forest • e1071 TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle* C I ANALYTICS, MACHINE & DEEP LEARNING PRIMITIVES DEEP LEARNING GRAPH COMPILER I G foundation Python DAAL MKL-DNN clDNN Intel® nGraph™ Compiler (Alpha) Library Intel distribution Intel® Data Analytics Open-source deep neural Open-sourced compiler for deep learning model AE optimized for Acceleration Library network functions for computations optimized for multiple devices (CPU, GPU, Developers machine learning (for machine learning) CPU, processor graphics NNP) using multiple frameworks (TF, MXNet, ONNX)

ln AI FOUNDATION DEEP LEARNING ACCELERATORS C Hardware Data Center IT System Edge e Device Architects NNP L-1000 † Formerly the Intel® Computer SDK Inference *Other names and brands may be claimed as the property of others. All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. Ai.intel.com *Other names and brands may be claimed as the property of others. Components Comparison : Intel MKL-DNN vs Intel MKL MKL-DNN (Open Source) • Convolution • ReLU • Normalization • Pooling • Inner Product

MKL (Math Kernel Library)

Fast Fourier Summary Linear Algebra Vector Math And More… Transforms Statistics

• BLAS • Multidimensional • Trigonometric • Kurtosis • Splines • LAPACK • FFTW interfaces • Hyperbolic • Variation • Interpolation • ScaLAPACK • Cluster FFT • Exponential coefficient • Trust Region • Sparse BLAS • Log • Order • Fast Poisson • Sparse Solvers • Power statistics Solver • Iterative • Root • Min/max • PARDISO* • Vector RNGs • Variance-

• Cluster Sparse covariance 26 Solver

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. Intel Confidential under CNDA 10 *Other names and brands may be claimed as the property of others. Data Transformation & Analysis Algorithms Intel® Data Analytics Acceleration Library

Basic Correlation Matrix Dimensionality Outlier Statistics for & Factorizations Reduction Detection Datasets Dependence

Low Cosine Order SVD PCA Univariate Distance Moments

Association Correlation Multivariate Quantiles Distance QR Rule Mining (Apriori) Variance- Order Optimization Math Functions Covariance Cholesky Statistics Solvers (SGD, (exp, log,…) Matrix AdaGrad, lBFGS)

Algorithms supporting batch processing

Algorithms supporting batch, online and/or distributed processing

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 27 *Other names and brands may be claimed as the property of others. Machine Learning Algorithms Intel® Data Analytics Acceleration Library

Logistic Ridge Linear Regression Regression Regression Regression K-Means Decision Forest Unsupervised Clustering Supervised Decision Tree Learning EM for Learning GMM Boosting (Ada, Brown, Logit) Naïve Classification Bayes Weak Neural Networks Learner k-NN Collaborative Alternating Filtering Least Squares

Algorithms supporting batch processing Support Vector Machine

Algorithms supporting batch, online and/or distributed processing

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 28 *Other names and brands may be claimed as the property of others. Faster Python* with Intel® Distribution for Python

Up to 440X speedup versus stock NumPy from pip

Learn More: software.intel.com/distribution-for-python

Software & workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark & MobileMark, are measured using specific computer systems, components, software, operations & functions. Any change to any of those factors may cause the results to vary. You should consult other information & performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. Optimization Notice 2Paid versions only. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain Copyright © 2018, Intel Corporation. All rights reserved. optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information 29 *Other names and brands may be claimed as the property of others. regarding the specific instruction sets covered by this notice. Popular DL Frameworks are now optimized for CPU!

* * * FOR *

TM

See installation guides at ai.intel.com/framework-optimizations/

More under optimization: * * *

SEE ALSO: Machine Learning Libraries for Python (Scikit-learn, Pandas, NumPy), R (Cart, randomForest, e1071), Distributed (MlLib on Spark, Mahout) *Limited availability today Other names and brands may be claimed as the property of others.

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel® Xeon® processor Platform Performance Hardware plus optimized software INFERENCE THROUGHPUT TRAINING THROUGHPUT

Up to Up to 198x 127x Optimized Frameworks Intel® Xeon® Platinum 8180 Processor Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe GoogleNet v1 with Intel® MKL higher Intel Optimized Caffe AlexNet with Intel® MKL inference throughput compared to training throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe Optimized Intel® MKL Libraries Inference and training throughput uses FP32 instructions

Deliver significant AI performance with hardware and software optimizations on Intel® Xeon® Scalable Family

Up to 191X Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe Resnet50 with Intel® MKL inference throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe Up to 93X Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe Resnet50 with Intel® MKL training throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe

Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,Optimizationsoftware, Noticeoperations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplatedCopyrightpurchases, © 2018, Intelincluding Corporation.the performance All rights reserved.of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2017. Configurations*Other names: See andthe brandslast slidemay bein claimedthis presentation as the property. *Other of others.names and brands may be claimed as the property of others. Use your Intel Architecture based HPC Infrastructure ---> Shorten Deep Learning Training Time HPC AI

?

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. AI (ML & DL) SoftwareStackfor Intel® Processors Distributed DL Training

Uber’s open source Distributed training framework for TensorFlow

Intel MKL Intel MKL-DNN Intel MPI Intel MLSL

Intel Processors Intel Processors

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 33 *Other names and brands may be claimed as the property of others. More details on Intel and AI at GridKa School Join the Intel hands-on Training : Thursday afternoon at 13:15 Title : Enhance Machine Learning Performance with Intel® Software tools

1st Session:

• Introduction of Intel tools for ML and DL

• DLBoost, VNNI instructions ; Intel distribution for Python

2nd session:

• Classical ML

• Numpy and MKL, K-Means, clustering and DAAL4PY Distributed ML algorithms

3rd session:

• Intel MKL for Deep Neural Network Intel optimized Framework and Tensorflow

• Distributed Tensorflow with Horovod

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 34 *Other names and brands may be claimed as the property of others. 35 Legal Disclaimer & Optimization Notice

Performance results are based on testing as of August 2017 to September 2018 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Copyright © 2018, Intel Corporation. All rights reserved. Intel, the Intel logo, Pentium, Xeon, Core, VTune, OpenVINO, , are trademarks of Intel Corporation or its subsidiaries in the U.S. and other countries.

Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 36 *Other names and brands may be claimed as the property of others.

38 The best of both worlds with Intel® Optane™ DC Persistent Memory DRAM attributes Nandssdattributes

Performance comparable Data persistence with higher to DRAM at low latencies1 capacity than DRAM2

Optane™ Media

1. “Fast performance comparable to DRAM” - Intel persistent memory is expected to perform at latencies near DDR4 DRAM. Benchmarks and proof points forthcoming. “low latencies” - Data transferred across the memory bus causes latencies to be orders of magnitude lower when compared to transferring data across PCIe or I/O bus’ to NAND/Hard Disk. Benchmarks and proof points forthcoming. 2. Intel persistent memory offers 3 different capacities – 128GB, 256GB, 512GB. Individual DIMMs of DDR4 DRAM max out at 256GB.

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 39 *Other names and brands may be claimed as the property of others. Latency Estimates for different Memory and Storage Devices

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 40 *Other names and brands may be claimed as the property of others. Distributed TensorFlow™ Compare With Parameter Distributed Server Tensorflow with Parameter Server

The parameter server model for distributed training jobs can be configured with different ratios of parameter servers to workers, each with different performance profiles.

No Parameter Server

Uber’s open source Distributed training framework for TensorFlow

The ring all-reduce algorithm allows worker nodes to average gradients and disperse them to all nodes without the need for a parameter server. Source: https://eng.uber.com/horovod/

Intel and the Intel logo are trademarks of Intel Corporation in the U. S. and/or other countries. *Other names and brands Optimization Notice may be claimed as the property of others. Copyright © 2018, Intel Corporation. Copyright © 2018, Intel Corporation. All rights reserved. 41 *Other names and brands may be claimed as the property of others. Unsupervised learning example Regression Machine learning (Clustering) Classification An ‘Unsupervised Learning’ Example Clustering Decision Trees

Data Generation Machine learningMachine Image Processing K-Means

Speech Processing

Revenue Revenue Natural Language Processing Recommender Systems

Adversarial Networks Purchasing Power Purchasing Power Deep Learning Deep Reinforcement Learning Choose the right AI approach for your challenge

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Supervised learning example Regression Deep learning (image recognition) Classification A ‘Supervised Learning’ Example Clustering Human Bicycle Decision Trees Forward “Strawberry”

Data Generation Backward ? “Bicycle” Machine learningMachine Lots of Error Image Processing Training tagged Strawberry data Speech Processing Model Weights Natural Language Processing Recommender Systems Forward “Bicycle”?

Adversarial Networks Inference Deep Learning Deep ?????? Reinforcement Learning Choose the right AI approach for your challenge

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Reinforcement learning example Regression Deep learning (Reinforcement learning) Classification A ‘Deep Reinforcement Learning’ Example Clustering Decision Trees Action

Data Generation Machine learningMachine Image Processing Speech Processing Natural Language Processing Agent Environment Recommender Systems Feedback

Adversarial Networks Deep Learning Deep Reinforcement Learning Choose the right AI approach for your challenge

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel-Optimized Frameworks: How To Get?

Check out our intel.ai for the framework optimizations page

Installation example for optimized Tensorflow / Keras for a Medial Sample application https://docs.google.com/document/d/1AAsWLSfYBx-pYfvzFQxOYi4Z1zx3TAEB5534s61Lf8w/edit?usp=sharing Topology Examples : https://www.intel.ai/framework-optimizations

Optimization Notice Copyright © 2018, Intel Corporation. All rights reserved. 45 *Other names and brands may be claimed as the property of others.