Machine Learning Deep Learning

Introduction to artificial intelligence using Intel® hardware platform Dr. Fabio Baruffa Sr. Technical Consulting Engineer, Intel IAGS Navigating the AI Performance Package Introduction to AI Overview of Deep Learning Software intel® Xeon® scalable processors First and second generation: Skylake / Cascade Lake Intel® deep learning boost Intel® AVX-512 Vector Neural Network Instructions (VNNI) Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 2 *Other names and brands may be claimed as the property of others. Navigating the AI Performance Package Introduction to AI Overview of Deep Learning Software • What are AI, Machine Learning, and Deep Learning? • Deep Learning Software breakdown • Popular AI Neural Networks and their uses • Intel’s AI software tools Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 3 *Other names and brands may be claimed as the property of others. What is AI? Regression Classification Clustering Artificial Decision Trees Data Generation Intelligence is the ability of machines to learn Image Processing from experience, without explicit programming, in order to Speech Processing perform cognitive functions Natural Language Processing associated with the human mind Recommender Systems Adversarial Networks Reinforcement Learning No one size fits all approach to AI Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Unsupervised learning example Regression Machine learning (Clustering) Classification An ‘Unsupervised Learning’ Example Clustering Decision Trees Data Generation Machine learningMachine Image Processing K-Means Speech Processing Revenue Revenue Natural Language Processing Recommender Systems Adversarial Networks Purchasing Power Purchasing Power Deep Learning Deep Reinforcement Learning Choose the right AI approach for your challenge Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Supervised learning example Regression Deep learning (image recognition) Classification A ‘Supervised Learning’ Example Clustering Human Bicycle Decision Trees Forward “Strawberry” Data Generation Backward ? “Bicycle” Machine learningMachine Lots of Error Image Processing Training tagged Strawberry data Speech Processing Model Weights Natural Language Processing Recommender Systems Forward “Bicycle”? Adversarial Networks Inference Deep Learning Deep ?????? Reinforcement Learning Choose the right AI approach for your challenge Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Deep learning glossary Library Framework Topology MKL-DNN DAAL Spark MlLib Scikit-Learn Intel® Mahout Distribution NumPy for Python Pandas Hardware-optimized Open-source software Wide variety of algorithms mathematical and other environments that facilitate deep modeled loosely after the human primitive functions that are learning model development & brain that use neural networks to commonly used in machine & deployment through built-in recognize complex patterns in deep learning algorithms, components and the ability to data that are otherwise difficult to topologies & frameworks customize code reverse engineer Translating common deep learning terminology Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Deep learning usages & Key topologies Image Recognition Language Translation Resnet-50 GNMT Inception V3 MobileNet SqueezeNet Object Detection Text to Speech R-FCN Wavenet Faster-RCNN Yolo V2 SSD-VGG16, SSD-MobileNet Image Segmentation Recommendation System Mask R-CNN Wide & Deep, NCF There are many deep learning usages and topologies for each Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Deep learning in practice Time-to-solution is more significant than time-to-train Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Visit: www.intel.ai/technology Speed up development using open AI software Machine learning Deep learning TOOLKITS App Open source platform for building E2E Analytics & Deep learning inference deployment Open source, scalable, and developers AI applications on Apache Spark* with distributed on CPU/GPU/FPGA/VPU for Caffe*, extensible distributed deep learning TensorFlow*, Keras*, BigDL TensorFlow*, MXNet*, ONNX*, Kaldi* platform built on Kubernetes (BETA) Python R Distributed Intel-optimized Frameworks libraries * • Scikit- • Cart • MlLib (on Spark) * * And more framework Data learn • Random • Mahout optimizations underway • Pandas Forest including PaddlePaddle*, scientists * * • NumPy • e1071 Chainer*, CNTK* & others Intel® Intel® Data Analytics Intel® Math Kernel Library Kernels Distribution Acceleration Library Library for Deep Neural Networks for Python* (Intel® DAAL) (Intel® MKL-DNN) developers Intel distribution High performance machine Open source compiler for deep learning optimized for learning & data analytics Open source DNN functions for model computations optimized for multiple machine learning library CPU / integrated graphics devices (CPU, GPU, NNP) from multiple frameworks (TF, MXNet, ONNX) Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Navigating the AI Performance Package Introduction to AI Overview of Deep Learning Software intel® Xeon® scalable processors First and second generation: Skylake / Cascade Lake • Deploy AI Everywhere on Intel® Architecture • Intel® AVX-512 (Advanced Vector Instructions) Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 13 *Other names and brands may be claimed as the property of others. Visit: www.intel.ai/technology Deploy AI anywhere with unprecedented hardware choice D e e C v d l I g o c e u e d If needed Automated Dedicated Flexible Dedicated Dedicated Graphics, Media Driving Media/Vision Acceleration DL Inference DL Training & Analytics Intel GPU NNP-I NNP-T All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 14 *Other names and brands may be claimed as the property of others. SIMD processing Single instruction multiple data (SIMD) allows to execute the same operation on multiple data elements using larger registers. • Scalar mode • Vector (SIMD) mode – one instruction produces one result – one instruction can produce multiple results – E.g. vaddss, (vaddsd) – E.g. vaddps, (vaddpd) for (i=0; i<n; i++) z[i] = x[i] + y[i]; ▪ SSE (128 Bits reg.): -> 4 floats ▪ AVX (256 Bits reg.): -> 8 floats ▪ AVX512 (512 Bits reg.): -> 16 floats Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 17 *Other names and brands may be claimed as the property of others. Evolution of SIMD for Intel® Processors 512b SIMD 256b AVX-512 AVX-512 AVX-512 VL/BW/DQ/VN VL/BW/DQ SIMD ER/PR NI AVX-512 AVX-512 AVX-512 128b F/CD F/CD F/CD SIMD AVX2 AVX2 AVX2 AVX2 AVX AVX AVX AVX AVX SSE4.2 SSE4.2 SSE4.2 SSE4.2 SSE4.2 SSE4.2 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSSE3 SSSE3 SSSE3 SSSE3 SSSE3 SSSE3 SSSE3 SSSE3 SSE3 SSE3 SSE3 SSE3 SSE3 SSE3 SSE3 SSE3 SSE3 SSE2 SSE2 SSE2 SSE2 SSE2 SSE2 SSE2 SSE2 SSE2 SSE2 SSE SSE SSE SSE SSE SSE SSE SSE SSE SSE MMX MMX MMX MMX MMX MMX MMX MMX MMX MMX Intel® Xeon®/Xeon® Scalable Processors Knights Skylake Willamette Prescott Merom Penryn Nehalem Sandy Bridge Haswell Cascade Landing server 2000 2004 2006 2007 2007 2011 2013 Lake server 2015 2015 2019 Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 18 *Other names and brands may be claimed as the property of others. AVX-512 Compress and expand Store sparse packed floating-point values VCOMPRESSPD|PS|D|Q into dense memory Load sparse packed floating-point values VEXPANDPD|PS|D|Q from dense memory double/single-precision/doubleword/quadword vcompresspd YMMWORD PTR [rsi+rax*8]{k1}, ymm1 Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 19 *Other names and brands may be claimed as the property of others. Compress Loop Pattern auto-vectorization https://godbolt.org/z/x7gNfb int compress(double *a, double * __restrict b, int na) { int nb = 0; for (int ia=0; ia <na; ia++) { if (a[ia] > 0.) b[nb++] = a[ia]; } return nb; } Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 20 *Other names and brands may be claimed as the property of others. Compress Loop Pattern auto-vectorization https://godbolt.org/z/x7gNfb Targeting Intel® AVX2 -xcore-avx2 -qopt-report-file=stderr -qopt-report-phase=vec int compress(double *a, double * LOOP BEGIN __restrict b, int na) remark #15344: loop was not vectorized: vector dependence prevents vectorization. { int nb = 0; remark #15346: vector dependence: assumed FLOW dependence between b[nb] (7:4) and a[ia] (7:4) for (int ia=0; ia <na; ia++) { LOOP END if (a[ia] > 0.) Targeting Intel® AVX-512 b[nb++] = a[ia]; -xcore-avx512 -qopt-report-file=stderr -qopt-report-phase=vec } LOOP BEGIN return nb; remark #15300: LOOP WAS VECTORIZED } LOOP END Optimization Notice Copyright © 2019, Intel Corporation. All

Machine Learning Deep Learning

Intel® Optimized AI Frameworks

Intel® Software Template Overview

Online Power Management for Multi-Cores: a Reinforcement Learning Based Approach

March 2012 Version 2

Bigdl: a Distributed Deep Learning Framework for Big Data

Bigdl: Distributed Deep Leaning on Apache Spark Using Bigdl

Bigdl, for Apache Spark* Openvino, Ray* and Apache Spark*

Using Deep Learning for Optimized Personalized Card Linked Offer

Tensor Relational Algebra for Distributed Machine Learning System Design

Constructive Learning of Deep Neural Networks for Bigdata Analysis

Characterizing Deep Learning Over Big Data (Dlobd) Stacks on RDMA-Capable Networks

Intel® Xeon Phi™ Processor Family Enables Shorter Time to Train Using General Purpose Infrastructure