Intel® AI Workshop 2021

Accelerate Your AI Journey with Laurent Duhem – HPC/AI Solutions Architect ([email protected]) Shailen Sobhee - AI Software Technical Consultant ([email protected]) Notices and Disclaimers

▪ Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration.

▪ No product or component can be absolutely secure.

▪ Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks .

▪ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks .

▪ Intel® Advanced Vector Extensions (Intel® AVX) provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.

▪ Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

▪ Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

▪ Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.

▪ © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

2 Notices and Disclaimers

▪ This presentation is tailored for developers and curious data-scientists seeking for optimal performances in their daily production ▪ No AI theory will be lectured during this talk ▪ Preliminary knowledge for ML frameworks/packages – as seen below is certainly a must..

▪ Intel Software offering is vast and address multiple Hardware flavors (XPUs) – so buckle-up for dense lectures .. ;-)

3 AI is Interdisciplinary DATA More AI Insights

Machine Reinforcement Learning Learning Regression Classification Clustering Symbolic Reasoning Create Analogical Reasoning Business Transmit Evolutionary Computing

Ingest Data Ensemble Bayes Methods Analytics methods Search/Query, Integrate Statistics, etc. And More… Operational

Stage

4 Clean Deep Learning Security Image Language Recommender More Image Object Image Natural Language Speech Recommender Data Normalize Recognition Detection Segmentation Processing (NLP) ⇄ Text Systems Generation

4 Flexible Software AI Ecosystem Hardware for AI Solutions Portofolio Acceleration

5 Flexible Software AI Ecosystem Hardware for AI Solutions Portofolio Acceleration

6 INTELLIGENT SOLUTIONS

Intel AI Builders OEM Systems Industry-leading ISVs and system Ready-to-deploy, end-to-end solutions integrators solutions to accelerate jointly designed and developed by Intel & adoption of artificial intelligence (AI) partners to simplify customer experiences across verticals and workloads intel.com/ai/deploy-on-intel- builders.intel.com/ai architecture.html Intel AI

Public Cloud Solutions Intel® Select Solutions AI-optimized solutions developed by AI-optimized solutions for real-world Intel with cloud service providers (CSPs) demands that are pre-configured and including for Amazon Web Services rigorously benchmark7 tested to (AWS), Baidu Cloud, Google Cloud accelerate infrastructure deployment Platform, Microsoft Azure & more intel.com/selectsolutions

7 INTELLIGENT CSP PaaS Offerings – Overview SOLUTIONS

AWS Azure GCP Name SageMaker Azure Machine Learning with Brainwave Google App Engine Type PaaS PaaS PaaS Instance C5 Instance Fv2 or HC Series Flexible Environment Description A fully managed platform to easily A fully managed cloud service to easily build, deploy, and share predictive A fully managed serverless platform build, train and deploy machine analytics solutions. to build highly scalable applications learning models at any scale OS N/A N/A N/A HW SKUs C5 Instance (Skylake) Intel Arria® 10 FPGA FW Pre-configured DAAL4Py Marketplace approach for optimized FW (marketplace) WIP Use Case Ad targeting, prediction & Modern web applications and forecasting, industrial IoT & Machine scalable mobile backends Learning CSP Value Ease of use. Pre-configured Prop environment 8

8 INTELLIGENT CSP IaaS Offerings – Overview SOLUTIONS

AWS Azure GCP Name DL AMI Data Science VMs Cycle* Cloud Google* Compute Engine Instance C5 C5 Fv2 or HC Series HC Series Platform based on Skylake Description Pre-installed Customer-built DL Azure VM images, pre-installed, Easy-to-set-up clusters with Scalable, high-performance virtual pip packages engine – clean configured and tested with several Singularity containers machines slate popular AI/DL tools HW SKUs Intel Platinum 8000 series Various HW Platforms Any HW platforms (validated on Intel Xeon Platinum family (Skylake) (code-named Skylake) Skylake) Optimized TensorFlow, MxNet, and PyTorch TensorFlow and VM templates on TensorFlow TensorFlow FW MarketPlace Instance 2vCPU to 72vCPU Fsv2-Series Any Instance size Up to 160 vCPU Size 2 to 72 vCPU Memory 144 GiB Up to 144 GiB Up to 3.75 TB Use Case Advanced compute intensive Batch processing, web servers, HPC workloads but can run deep Improve and manage patient data, workloads: high performance web analytics and gaming learning create intuitive customer experience servers, HPC, batch processing, ad serving, gaming, distributed 9 analytics and ML/DL inference CSP Value Best price performance Lower per-hour list price is best Dynamically provision HPC Azure Industry-leading price and Prop value in price-performance in Azure clusters and orchestrate data and performance portfolio jobs for hybrid and cloud Easily transition from on-prem to workflows cloud, compliance and global reach

9 INTELLIGENT CSP IaaS Offerings – Overview SOLUTIONS

• Amazon Web Services: intel.ai/aws

• Baidu Cloud: intel.ai/baidu

• Google Cloud Platform: intel.ai/gcp

• Microsoft Azure: intel.ai/microsoft

10

10 Flexible Software AI Ecosystem Hardware for AI Solutions Portofolio Acceleration

11 FLEXIBLE Delivering AI from Cloud-to-Device ACCELERATION

Cloud/DC Edge Device

CPU only For mainstream AI use cases

CPU + GPU When compute is dominated by AI, HPC, graphics, and/or real-time media

CPU + CUSTOM When compute is dominated by deep learning (DL) Intel® FPGAs

DL Training/Inference DL Custom DL Inference

DC = Data Center DL = Deep Learning

12 FLEXIBLE Intel® Xeon® Scalable Processors ACCELERATION

THE ONLY DATA CENTER CPU OPTIMIZED FOR AI INTEL ® ADVANCED VECTOR EXTENSIONS 512 (INTEL AVX512) INTEL ® DEEP LEARNING BOOST (INTEL DL BOOST) INTEL® ADVANCED MATRIX EXTENSIONS (INTEL AMX)

2019 2021 2022 3rd GEN COOPER LAKE ND 14NM th 2 GEN NEXT GEN INTEL DL BOOST (BFLOAT16) 4 GEN CASCADE LAKE rd SAPPHIRE RAPIDS 14NM 3 GEN NEXT-GENERATION TECHNOLOGIES NEW AI ACCELERATION (VNNI) ICE LAKE INTEL AMX NEW MEMORY STORAGE HIERARCHY 10NM SHIPPING 1H’21

LEADERSHIP PERFORMANCE

13 The Evolution of Microprocessor Parallelism → → More cores More Threads Wider vectors Scalar A + B = C

Intel® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Xeon® Processor Processor Processor Processor Processor Intel® Xeon® SIMD Processor 5100 series 5500 series 5600 series E5-2600 v2 E5-2600 v3 Scalable 64-bit series series Processor1 A + B = C v4 series A B C Up to Core(s) 1 2 4 6 12 18-22 28 A B C Up to Threads 2 2 8 12 24 36-44 56 A B C 14 A B C SIMD Width 128 128 128 128 256 256 512 A B C Intel® Intel® SSE Intel® SSE Intel® Intel® Vector ISA Intel® SSE3 Intel® AVX A B C SSE3 4.1 4.2 AVX2 AVX-512 A B C 1. Product specification for launched and shipped products available on ark.intel.com.

14 FLEXIBLE INTEL ® DEEP LEARNING BOOST OVERVIEW ACCELERATION

15 FLEXIBLE INTEL ® DEEP LEARNING BOOST OVERVIEW ACCELERATION

16 FLEXIBLE INTEL ® DEEP LEARNING BOOST OVERVIEW ACCELERATION

17 FLEXIBLE Intel FPGA for AI ACCELERATION

Falcon Mesa FIRST TO MARKET TO ACCELERATE Evolving AI Real-time WORKLOADS workloads ▪ Precision ▪ Recurrent Neural ▪ Latency Deploying AI+ for Flexible Networks (RNN) system-level functionality ▪ Sparsity ▪ Long-short Term ▪ Adversarial Networks ▪ AI+ I/O Ingest Memory (LSTM) ▪ Reinforcement Learning ▪ AI+ Networking ▪ Speech Workload ▪ Neuromorphic ▪ AI+ Security Computing ▪ AI+ Pre/Post Processing ▪ … ▪ …

Enabling real-time AI in a wide range of embedded, edge, and cloud apps

All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.

18 FLEXIBLE Movidius VPU ACCELERATION BUILT FOR edge AI Flexible FORM FACTORS Edge Experiences

Deep learning inference + computer vision + media

Faster memory bandwidth

Groundbreaking high- efficiency architecture

19

19 FLEXIBLE Habana – an Intel Company ACCELERATION

Visit www.habana.ai

20 AI processor company

All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.

20 FLEXIBLE Accelerating AI ACCELERATION Leadership performance for data-level parallel AI workloads

7nm process technology

EMIB (2D) and foveros (3d) technology

21

21 Flexible Software AI Ecosystem Hardware for AI Solutions Portofolio Acceleration

22 OPTIMIZED Software as a Differentiator SOFTWARE

For every order of magnitude performance from new hardware, there are >2 orders of magnitude unlocked by software. Chief Architect, SVP Intel Architecture, Graphics and Software

1 contributor to kernel; Intel has¹… >1/2 million lines of code modified each year Uses of Intel® Software Developers using C++, Fortran, Python | Optimizing code for vectorization | Parallelization | Multithreading | 1,500 >100 Top 3 10,000 Top 10 >12M Memory optimization to fully software operating contributors to high touch contributor to developers utilize the CPU engineers systems Chromium OS customer Openstack optimized deployments

Source 1: Intel internal numbers

23 OPTIMIZED Programming Challenges SOFTWARE for Multiple Architectures Application Workloads Need Diverse Hardware

Scalar Vector Spatial Matrix

Growth in specialized workloads Middleware & Frameworks Variety of data-centric hardware required

Separate programming models and toolchains for each CPU GPU FPGA Other accel. programming programming programming programming architecture are required today model model model models

Software development complexity limits freedom of architectural choice

CPU GPU FPGA Other accel.

XPUs

24 OPTIMIZED SOFTWARE oneAPI One Programming Model for Multiple Application Workloads Need Diverse Hardware Architectures and Vendors

Scalar Vector Spatial Matrix

Freedom to Make Your Best Choice ▪ Choose the best accelerated technology the software doesn’t Middleware & Frameworks decide for you

Realize all the Hardware Value Industry Intel ▪ Performance across CPU, GPUs, FPGAs, and other accelerators Initiative Product

Develop & Deploy Software with Peace of Mind XPUs ▪ Open industry standards provide a safe, clear path to the future ▪ Compatible with existing languages and programming models including C++, Python, SYCL, OpenMP, Fortran, and MPI CPU GPU FPGA Other accel.

25 OPTIMIZED Intel’s oneAPI SOFTWARE

Ecosystem Application Workloads Need Diverse Hardware Built on Intel’s Rich Heritage of

CPU Tools Expanded to XPUs Middleware & Frameworks (Powered by oneAPI)

... oneAPI A cross-architecture language based on C++ and SYCL standards Intel® oneAPI Product Powerful libraries designed for acceleration of domain-specific functions Libraries oneMKL oneTBB oneVPL oneDPL Analysis & Debug Compatibility Tool Languages Tools A complete set of advanced compilers, libraries, oneDAL oneDNN oneCCL and porting, analysis and debugger tools Low-Level Hardware Interface Powered by oneAPI XPUs Frameworks and middleware that are built using one or more of the oneAPI industry specification elements, the DPC++ language, CPU GPU FPGA Other accelerators and libraries listed on oneapi.com.

Available Now

Visit software.intel.com/oneapi for more details Some capabilities may differ per architecture and custom-tuning will still be required. Other accelerators to be supported in the future. 26 OPTIMIZED Powerful oneAPI Libraries SOFTWARE

Library Name Description Short name oneAPI DPC++ Library Key algorithms and functions to speed oneDPC up DPC++ kernel programming oneAPI Math Kernel Math routines including matrix algebra, oneMKL Library fast Fourier transforms (FFT), and vector math oneAPI Data Analytics Machine learning and data analytics oneDAL Library functions oneAPI Deep Neural Neural networks functions for deep oneDNN Network Library learning training and inference oneAPI Collective Communication patterns for distributed oneCCL Communications Library deep learning oneAPI Threading Building Threading and memory management oneTBB Blocks template library oneAPI Video Processing Real-time video decoding, encoding, oneVPL Library transcoding, and processing functions

Designed for acceleration of key domain-specific functions Pre-optimized for each target platform for maximum performance

27 ® OPTIMIZED Intel oneAPI Toolkits SOFTWARE A complete set of proven developer tools expanded from CPU to XPU

A core set of high-performance tools for building C++, Data Parallel C++ applications & oneAPI library-based applications DL Framework Developers - Optimize algorithms for Native Code Developers Machine Learning & Analytics

Intel® oneAPI Intel® oneAPI Intel® oneAPI Rendering Tools for HPC Tools for IoT Toolkit

Deliver fast Fortran, Build efficient, reliable Create performant, OpenMP & MPI solutions that run at high-fidelity visualization applications that network’s edge applications Specialized Workloads scale

Intel® AI Analytics Intel® Distribution of Toolkit OpenVINO™ Toolkit Accelerate machine learning & data Deploy high performance science pipelines with optimized DL inference & applications from frameworks & high-performing Toolkit edge to cloud Python libraries AI Application, Media, & Vision Data Scientists & AI Developers Data Scientists, AI Researchers, Developers DL/ML Developers

Latest version is 2021.1

28 OPTIMIZED ® SOFTWARE Intel oneAPI Intel® oneAPI Base Toolkit

Base Toolkit Direct Programming API-Based Programming Analysis & debug Tools Intel® oneAPI DPC++/C++ Intel® oneAPI DPC++ Library Accelerate Data-centric Workloads Intel® VTune™ Profiler Compiler oneDPL A core set of core tools and libraries for Intel® oneAPI Math Kernel Intel® DPC++ Compatibility Tool Intel® Advisor developing high-performance applications on Library - oneMKL Intel® CPUs, GPUs, and FPGAs. Intel® oneAPI Data Analytics Intel® Distribution for Python Intel® Distribution for GDB Who Uses It? Library - oneDAL

▪ A broad range of developers across industries Intel® FPGA Add-on Intel® oneAPI Threading Building ▪ Add-on toolkit users since this is the base for all for oneAPI Base Toolkit Blocks - oneTBB toolkits Intel® oneAPI Video Processing Top Features/Benefits Library - oneVPL

▪ Data Parallel C++ compiler, library and analysis tools Intel® oneAPI Collective Communications Library ▪ DPC++ Compatibility tool helps migrate existing code oneCCL written in CUDA Intel® oneAPI Deep Neural ▪ Python distribution includes accelerated scikit-learn, Network Library - oneDNN NumPy, SciPy libraries ▪ Optimized performance libraries for threading, math, Intel® Integrated Performance data analytics, deep learning, and video/image/signal Primitives - Intel® IPP processing

Learn More: intel.com/oneAPI-BaseKit 29 OPTIMIZED Intel® AI Analytics Toolkit SOFTWARE Powered by oneAPI Deep Learning Data Analytics & Machine Learning

Accelerated Data Frames Intel® Optimization for TensorFlow Intel® Distribution of Modin OmniSci Backend

Accelerate end-to-end AI and data analytics Intel® Optimization for PyTorch pipelines with libraries optimized for Intel® Intel® Distribution for Python architectures Intel® Low Precision Optimization XGBoost Scikit-learn Daal-4Py Tool

Who Uses It? Model Zoo for Intel® Architecture NumPy SciPy Pandas Data scientists, AI researchers, ML and DL developers, AI application developers Samples and End2End Workloads

CPU GPU Top Features/Benefits Supported Hardware Architechures1 ▪ Deep learning performance for training and inference Hardware support varies by individual tool. Architecture support will be expanded over time. with Intel optimized DL frameworks and tools Other names and brands may be claimed as the property of others.

▪ Drop-in acceleration for data analytics and machine learning workflows with compute-intensive Python Get the Toolkit HERE or via these locations packages

Intel Installer Docker Apt, Yum Conda Intel® DevCloud

30 Learn More: software.intel.com/oneapi/ai-kit Back to Domain-specific Toolkits for Specialized Workloads 30 OPTIMIZED SOFTWARE

Intel® oneAPI Toolkits Free Run the tools locally Run the tools in the Cloud Availability Downloads

Get Started Quickly Repositories Code Samples, Quick-start Guides, Webinars, Training DevCloud software.intel.com/oneapi Containers

31 AI Software Stack for Intel® XPUs OPTIMIZED Intel offers a robust software stack to maximize performance of diverse workloads SOFTWARE

Intel® Low Model Zoo for E2E Workloads Precision Open Model (Census, NYTaxi, Intel® DL/ML Tools Zoo Mortgage…) Intel®Optimization Architecture Intel® Tool AI Analytics Toolkit OpenVINO™ Toolkit

Scikit- Model pandasDevelopnumpy DL models in Frameworks, ML &learn Analytics in Python DeployOptimizer DL DL/ML numba xgboost TensorFlow PyTorch Middleware & models Frameworks Modin scipy daal4Py Inference Engine

DPC++ / Intel® oneAPI Base Toolkit oneMKL oneDAL oneTBB oneCCL oneDNN oneVPL Libraries & DPPY Kernel Selection, Write, Customize Kernels Compiler

Full Set of AI ML and DL Software Solutions Delivered with Intel’s oneAPI Ecosystem

32 Intel® oneAPI Available Now On Intel® DevCloud

Use Intel oneAPI Toolkits

Learn Data Parallel C++ A development sandbox to develop, test and run your workloads across a range of Intel CPUs, GPUs, and FPGAs Evaluate Workloads using Intel’s oneAPI beta software Build Heterogenous software.intel.com/devcloud/oneapi Applications

Prototype your project

No downloads | No hardware acquisition | No installation | No set -up & configuration Get up & running in seconds!

33 Explore Intel oneAPI Toolkits in the DevCloud

34 OPTIMIZED Getting Started with Intel® AI Analytics Toolkit SOFTWARE

Overview Installation Hands on Learning Support

▪ Visit Intel® AI Analytics ▪ Download the AI Kit ▪ Code Samples ▪ Machine Learning & ▪ Ask questions and Toolkit (AI Kit) for from Intel, Anaconda ▪ Build, test and Analytics Blogs at Intel share information with more details and up- or any of your favorite remotely run Medium others through the to-date product package managers workloads on the ▪ Intel AI Blog site Community Forum information ▪ Get started quickly Intel® DevCloud for ▪ Discuss with experts at ▪ Webinars and Articles ▪ Release Notes with the AI Kit Docker free. No software at Intel® Tech Decoded AI Frameworks Forum Container downloads. No ▪ Installation Guide configuration steps. No installations. ▪ Utilize the Getting Started Guide

Download Now

35 Thank you 37