Innovate solutions with ® distribution of openvino™ toolkit (Open Visual Inference & Neural Network Optimization)

Intel AI Workshop – CERN – May, 8th 2019 [email protected] What it is A toolkit to accelerate development of high performance computer vision & inference into vision/AI applications used High performance, perform ai at the edge from device to cloud. It enables deep learning on hardware accelerators and easy deployment across multiple types of Intel® platforms. Streamlined & Optimized deep learning inference Who needs this product? ▪ Computer vision, deep learning developers ▪ Data scientists Heterogeneous, cross-platform flexibility ▪ OEMs, ISVs, System Integrators

Usages Free Download  software.intel.com/openvino-toolkit Security surveillance, robotics, retail, healthcare, AI, office automation, transportation, non-vision use cases (speech, text) & more. Open Source version  01.org/openvinotoolkit Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. Latest version is 2019 R1 2 *Other names and brands may be claimed as the property of others. Benefits of Intel® Distribution of OpenVINO™ toolkit

Maximize the Power of Intel® Processors: CPU, GPU/Intel® Processor Graphics, FPGA,VPU Accelerate Performance Integrate Deep learning Unleash CNN-based deep learning inference Access Intel computer vision accelerators. using a common API, 30+ pre-trained models, Speed code performance. & computer vision algorithms. Validated Supports heterogeneous execution. on more than 100 public/custom models. speed development Innovate & customize Reduce time using a library of optimized OpenCV* & OpenVX* functions, & 15+ samples. Use OpenCL™ kernels/tools to add your own unique code. Customize layers without the Develop once, deploy for current overhead of frameworks. & future Intel-based devices.

1Tractica 2Q 2017 Optimization Notice OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. Copyright © 2019, Intel Corporation. All rights reserved. 3 *Other names and brands may be claimed as the property of others. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos What’s Inside Intel® Distribution of OpenVINO™ toolkit

Intel® Deep Learning Deployment Toolkit Traditional Computer Vision Optimized Libraries & Code Samples Model Optimizer Inference Engine Convert & Optimize IR Optimized Inference OpenCV* OpenVX* Samples Open Model Zoo Samples For Intel® CPU & GPU/Intel® Processor Graphics (30+ Pre-trained Models)

IR = Intermediate Representation file Tools & Libraries Increase Media/Video/Graphics Performance Intel® Media SDK OpenCL™ Open Source version Drivers & Runtimes For GPU/Intel® Processor Graphics Optimize Intel® FPGA (* only)

FPGA RunTime Environment Bitstreams (from Intel® FPGA SDK for OpenCL™)

OS Support: CentOS* 7.4 (64 bit), Ubuntu* 16.04.3 LTS (64 bit), Windows* 10 (64 bit), Yocto Project* version Poky Jethro v2.0.3 (64 bit), macOS* 10.13 & 10.14 (64 bit)

Intel® Vision Accelerator Intel® Architecture-Based Design Products & AI in Production/ Platforms Support Developer Kits

An open source version is available at 01.org/openvinotoolkit (some deep learning functions support Intel CPU/GPU only). Optimization Notice OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. 4 Copyright © 2019, Intel Corporation. All rights reserved. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos *Other names and brands may be claimed as the property of others. Intel® Deep Learning Deployment Toolkit For Deep Learning Inference

Model Optimizer Inference Engine ▪ What it is: A Python*-based tool to import trained models ▪ What it is: High-level inference API and convert them to Intermediate representation. ▪ Why important: Interface is implemented as dynamically ▪ Why important: Optimizes for performance/space with loaded plugins for each hardware type. Delivers best conservative topology transformations; biggest boost is performance for each type without requiring users to from conversion to data types matching hardware. implement and maintain multiple code pathways.

Trained Models * Extendibility Inference CPU Plugin C++ TensorFlow* Engine Model GPU Plugin Extendibility Common API OpenCL™ MxNet* Optimizer IR IR (C++ / Python) Extendibility Convert & .data FPGA Plugin Load, infer Optimized cross- OpenCL™ ONNX* Optimize (Pytorch, Caffe2 & more) platform inference MYD Plugin Extendibility IR = Intermediate OpenCL™ Kaldi* Representation format

GPU = Intel CPU with integrated /Intel® Processor Graphics Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos 5 *Other names and brands may be claimed as the property of others. Improve Performance with Model Optimizer

Model Optimizer Analyze Quantize Intermediate Representation (IR) file Trained Optimize topology Model Convert

▪ Easy to use, Python*-based workflow ▪ Import Models from many supported frameworks: Caffe*, TensorFlow*, MXNet*, Kaldi*, exchange formats like ONNX* (Pytorch*, Caffe2* and others through ONNX). ▪ 100+ models for Caffe, MXNet, TensorFlow validated. Supports all ONNX* model zoo public models. ▪ Extends inferencing for non-vision networks with support of LSTM and 3D Convolutional based networks and Kaldi framework/Kaldi Nnet2*.

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 6 *Other names and brands may be claimed as the property of others. Optimal Model Performance Using the Inference Engine

▪ Simple & unified API for Applications/Service inference across all Intel® Inference architecture Engine Runtime ▪ Optimized inference on large Inference Engine Common API IA hardware targets (CPU/GEN/FPGA/MYD) Intel® Math Kernel Movidius FPGA Plugin Library clDNN Plugin (MKLDNN)Plugin Plugin ▪ Heterogeneity support allows

execution of layers across In Architecture In

hardware types -

DLA Intrinsics OpenCL™ Movidius API ▪ Asynchronous execution Plug improves performance Intel® Intel® CPU: Intel® Intel® Processor Movidius™ ▪ Futureproof/scale your FPGA Xeon®/Core™/Atom® Graphics (GPU) Myriad™ development for future Intel® processors Transform Models & Data into Results & Intelligence

GPU = Intel CPU with integrated graphics/Intel® Processor Graphics/GEN

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. 7 *Other names and brands may be claimed as the property of others. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos Computer Vision Application Pipeline Model Prepare / Optimize / Training Optimize Inference Plugins Extend

Train a DL model Model Optimizer Inference-Engine: Inference-Engine Inference-Engine or Convert, Optimize, a lightweight API Support multiple Supports Use Model Preparing for (C++/Python) to use devices for extensibility and Downloader Inference in your application heterogeneous flows allow custom for Kernel for various (device agnostic, inference Device level devices Generic optimization optimization) User Application CPU: Extensibility Trained MKL-DNN Model Xeon/Core/Atom C++ Extensibility cl-DNN GPU IR OpenCL Run Model Inference DLA Extensibility Optimizer Engine FPGA OpenCL/TBD Movidius Extensibility Myriad 2/X TBD

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 8 *Other names and brands may be claimed as the property of others. Inference engine – workflow Integrate IE (Inference Engine) API Create IE Instance to your application

Offline, one time run of the Model-Optimizer Load IE Plug-in MKL-DNN plug-in for CPU Cl-DNN for GPU IR DLA for FPGA Run Model Load IR Model to plug-in Optimizer Myriad for Movidius

Trained Target device could be Model Set Target Device CPU/GPU/FPGA/MYRIAD™

Allocate Input and Runtime, process input data to Inference Engine Output Blobs Repeat inference for each frame

Read data into Input Blob

Decode Resize Inference Color conversion etc… Process Output Blobs

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 9 *Other names and brands may be claimed as the property of others. Enabling multiple accelerators #define MKLDNN "MKLDNNPlugin.dll" #define CLDNN "clDNNPlugin.dll" #define HDDLDNN "HDDLPlugin.dll" #define MYXDNN "myraidPlugin.dll" with openVINO #define FPGADNN "dliaPlugin.dll" #else #define MKLDNN "libMKLDNNPlugin.so" #define CLDNN "libclDNNPlugin.so" #define HDDLDNN "libHDDLPlugin.so" #define MYXDNN "libmyraidplugin.so" #define FPGADNN "libdliaplugin.so" #endif if (dev == "cpu" ) { plugin = InferenceEngine::InferenceEnginePluginPtr(MKLDNN ); CPUplugin = InferenceEngine::InferencePlugin(plugin); Benchmark Application CPUplugin.AddExtension(std::make_shared()); (C++ / Python) } else if (dev == "gpu" ) Inference Engine plugin = InferenceEngine::InferenceEnginePluginPtr(CLDNN ); Common API (C++ / Python) else if (dev == "myx" ) plugin = InferenceEngine::InferenceEnginePluginPtr(MYXDNN); CPU GPU Myriad FPGA else if (dev == "fpga" ) Plugin Plugin Plugin Plugin plugin = InferenceEngine::InferenceEnginePluginPtr(FPGADNN); else { std::cout << "Unrecognized device : " << dev << std::endl; std::cout << "This is very unlikely to end well." << std::endl; }

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 10 *Other names and brands may be claimed as the property of others. Speed Deployment with Pre-trained Models & Samples

Pretrained Models in Intel® Distribution of OpenVINO™ toolkit ▪ Age & Gender ▪ Text Detection & Recognition ▪ Identify Roadside objects ▪ Face Detection–standard & enhanced ▪ Vehicle Detection ▪ Advanced Roadside Identification ▪ Head Position ▪ Retail Environment ▪ Person Detection & Action Recognition ▪ Human Detection–eye-level ▪ Pedestrian Detection ▪ Person Re-identification–ultra & high-angle detection small/ultra fast ▪ Pedestrian & Vehicle Detection ▪ Detect People, Vehicles & Bikes ▪ Face Re-identification ▪ Person Attributes Recognition Crossroad ▪ License Plate Detection: small & front facing ▪ Landmarks Regression ▪ Emotion Recognition ▪ Vehicle Metadata ▪ Smart Classroom Use Cases ▪ Identify Someone from Different Videos– ▪ Human Pose Estimation standard & enhanced ▪ Single image Super Resolution ▪ Action recognition – encoder & decoder (3 models) ▪ Facial Landmarks ▪ Instance segmentation ▪ Gaze estimation ▪ and more… Binary Models

▪ Face Detection Binary ▪ Vehicle Detection Binary ▪ ResNet50 Binary ▪ Pedestrian Detection Binary

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 11 *Other names and brands may be claimed as the property of others. Save Time with Deep Learning Samples

Use Model Optimizer & Inference Engine for public models & Intel pretrained models

▪ Object Detection ◼ Hello Infer Classification

▪ Standard & Pipelined Image Classification ◼ Interactive Face Detection

▪ Security Barrier ◼ Image Segmentation

▪ Object Detection SSD ◼ Validation Application

▪ Neural Style Transfer ◼ Multi-channel Face Detection ▪ Object Detection for Single Shot Multibox Detector using Asynch API+

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 12 *Other names and brands may be claimed as the property of others. OpenVINO™ Toolkit Open Source Version

▪ Provides flexibility and availability to the developer community to extend OpenVINO™ toolkit for custom needs

▪ Components that are open sourced

‒ Deep Learning Deployment Toolkit with CPU, GPU & Heterogeneous plugins github.com/opencv/dldt

‒ Open Model Zoo - Includes pre-trained models, model downloader, demos and samples: github.com/opencv/open_model_zoo ▪ See FAQ and next slide for key differences between the open source and Intel distribution

Learn More  01.org/openvinotoolkit

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 13 *Other names and brands may be claimed as the property of others. Quick Guide: What’s Inside the Intel Distribution vs Open Source version of OpenVINO™ toolkit

Intel® Distribution of OpenVINO™ toolkit Open Source Directory Tool/Component OpenVINO™ toolkit (open source) https://github.com Installer (including necessary drivers) P Intel® Deep Learning Deployment toolkit Model Optimizer P P /opencv/dldt/tree/2018/model-optimizer Inference Engine P P /opencv/dldt/tree/2018/inference-engine

Intel® BLAS, Intel® MKL1, Intel CPU plug-in P P /opencv/dldt/tree/2018/inference-engine (Intel® MKL) only1 jit (Intel MKL)

Intel GPU (Intel® Processor Graphics) plug-in P P /opencv/dldt/tree/2018/inference-engine Heterogeneous plug-in P P /opencv/dldt/tree/2018/inference-engine Intel GNA plug-in P Intel® FPGA plug-in P Intel® Neural Compute Stick (1 & 2) VPU plug-in P Intel® Vision Accelerator based on Movidius plug-in P 30+ Pretrained Models - incl. Model Zoo P P /opencv/open_model_zoo (IR models that run in IE + open sources models) Samples (APIs) P P /opencv/dldt/tree/2018/inference-engine Demos P P /opencv/open_model_zoo Traditional Computer Vision OpenCV* P P /opencv/opencv OpenVX (with samples) P Intel® Media SDK P P2 /Intel-Media-SDK/MediaSDK OpenCL™ Drivers & Runtimes P P2 /intel/compute-runtime FPGA RunTime Environment, Deep Learning P Acceleration & Bitstreams (Linux* only)

Optimization Notice 1Intel MKL is not open source but does provide the best performance 14 Copyright © 2019, Intel Corporation. All rights reserved. 2Refer to readme file for validated versions *Other names and brands may be claimed as the property of others. Media SDK HW Accelerators

15 End-to-End Vision Workflow

Intel Intel OpenVINO™ toolkit Media SDK Media SDK

Optimized Intel® Deep Learning Optimized OpenCV OpenCV OpenCV OpenCV Codecs Deployment Toolkit Codecs Pre- Post- Decode Processing Inference Processing Encode

Video input CPU CPU CPU CPU CPU Video output with results GPU GPU GPU GPU GPU annotated FPGA VPU

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 16 *Other names and brands may be claimed as the property of others. Intel® Media SDK for Linux Overview Included in OpenVINO installation. Available as standalone tool FREE Download

What it is: Hardware Support An API to access Intel® Quick Sync Video Select Intel® Xeon®, Celeron®, Pentium®, and hardware-accelerated encode/decode & processing Intel Atom® processors that support Intel® Quick Sync Video Optimized Industry Standard Video Codecs ▪ H.265 (HEVC), H.264 (AVC), MJPEG ▪ MPEG-2, VP9, VP8, VC1 & more Video Pre & Post Processing ▪ Resize, Scale, Deinterlace ▪ Color Conversion, Composition, Alpha Blending ▪ Denoise, Sharpen & more Benefits: Use Cases Boost media and video application performance with hardware- Media Creation & Delivery for Embedded Applications accelerated codecs & programmable graphics on Intel® processors.** Deliver fast, high quality video decoding / encoding / Improve video quality, innovate cloud graphics & media analytics. transcoding from camera to cloud Reduce infrastructure & development costs.

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 17 *Other names and brands may be claimed as the property of others. Intel Vision supports AI across endpoint, edge & cloud typical devices by application

END POINT EDGE Data Center IOT SENSORS GATEWAYS & NVRs SERVERS & APPLIANCES

Intel® Vision Accelerators NNP-L / NNP-I Most Intensive Use Cases

Basic Inference, Video Storage, Best Efficiency, High Perf, Media & Vision Analytics, DL Lowest Power Large/Mid Memory Flexible & Most Use Servers Mid/Small Memory Custom/New HW Memory Architecture Cases Vision & Footprint Bandwidth High-end NVRs Inference -Bound Use Cases Low Latency, Privacy

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Choosing the “right” hardware

• Consider in each device X100 Vision Processing Efficiency • Compute efficiency Dedicated • Compute parallelism (# of EU/Cores) Hardware Vision DSPs • Power consumption X10 FPGA • Memory hierarchy, size, communication GPU

• Programming model, APIs Power Efficiency CPU

• Trade offs X1 • Power/ performance • Price Computation Flexibility • Software flexibility, portability

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 19 *Other names and brands may be claimed as the property of others. Intel® Vision Products Comparison HOST IA PLATFORMS: APPLICATION PROCESSING, MEDIA, “FREE” CV/DL Use the Intel® Media SDK to achieve en/de/trans-code performance Maximize CV/DL performance on the host platform with the Open Visual Inference & Neural Network Optimization (OpenVINO™) toolkit INTEL® MOVIDIUS™ VPUS INTEL® FPGAS OVERVIEW OVERVIEW Intel Movidius VPUs offer high performance per watt per dollar. Intel FPGAs offer exceptional performance, flexibility, and scalability Easily add AI-based visual intelligence by plugging in one or more cards. for NVRs, edge deep learning inference appliances, and on-premise servers VALUE PROP or cloud. Intel Movidius VPUs enable deep neural network inferencing workloads with VALUE PROP high compute efficiency, low power and form factor constraints (e.g., Intel FPGAs achieve TOPS performance required on a single chip, support cameras), and excellent performance/W/$, for well-defined workloads. compute intensive networks (VGG*, ResNet* 101). KEY USE CASES KEY USE CASES Intel Movidius VPUs work well with networks that have: The Intel Arria 10 FPGAs work well with networks that have: ▪ A small memory footprint (less than 250 MParameters) ▪ Larger memory footprint (more than 250 MParameters) ▪ Lower performance requirements (<3 GMACs) ▪ Larger performance requirements (>3 GMACs) ▪ Accelerator Power Budget: 2-25W ▪ Accelerator Power Budget: <50W ▪ # of streams: 3-15

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 20 *Other names and brands may be claimed as the property of others. Examples of Intel® Vision Accelerator Products

INTEL® VISION ACCELERATOR INTEL® VISION ACCELERATOR DESIGN DESIGN WITH INTEL® WITH INTEL® MOVIDIUS™ VPU ARRIA® 10 FPGA

EXAMPLE CARD BASED ON

VISION Fuiture ACCELERATOR 1 Movidius 2 Movidius 8 Movidius Intel® Arria® 10 DESIGNS MA2485 VPU MA2485 VPUs MA2485 VPUs FPGA 1150GX

INTERFACE M.2, Key E miniPCIe* PCIe x4 PCIe x8

CURRENTLY MANUFACTURED BY*

SOFTWARE TOOLS INTEL® DISTRIBUTION OF OPENVINO™ TOOLKIT Develop NN Model; Deploy across Intel® CPU, GPU, VPU, FPGA; Leverage common algorithms

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 21 *Other names and brands may be claimed as the property of others. Deep Learning Inference Engine Decision Tree Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA or Higher-performance Intel® Vision Accelerator Design with Intel® Movidius™ Vision Standard CPU? CPU or GPU? Processing Unit (VPU) R/L?

Design with Intel Movidius Develop on CPU alone Develop on higher Best for… VPU R, L performance CPU ▪ Batch Size = 1 ▪Very small networks w/less ▪ Network Memory Accelerator needed Size < 250MParams weights Would a faster CPU Determine whether to ▪ Smaller image size Develop on Intel Vision ▪Operations with lots of provide the performance, sorting & matching develop on Intel® Vision (<720p) Accelerator Design Design with 8 accuracy, and allotted VPUs Accelerator Design with ▪ Precision: FP16 CPU utilization? ▪ Accelerator Power Budget: Is this within Intel® Movidius™ VPU R/L series, or Intel® Vision 2-25W parameters for Yes ▪ Longevity of supply: 7 yr Accelerator Design with performance, ▪ Networks Like: No Intel® Arria® 10 FPGA accuracy, ‒ GoogLeNet *v1/v2 allotted CPU based on characteristics Available ‒ TinyYolo* v1/v2 utilization? or like… Develop on Intel Vision CPU cycles on pre- ‒ ResNet*-18, etc… Determine to develop on Accelerator Designs with <2 VPUs deep learning Yes Is there an integrated GPU ▪ Network or “my Movidius VPU R/L based workload: that would meet network is most on… No performance? like…” <50% CPU with ▪ Streams OpenVINO™ toolkit ▪ TOPS required per No stream/workload ▪ Power budget ▪ Memory size ▪ Memory footprint >50% of CPU Yes ▪ Image/input size ▪ Batch size Develop on Intel Vision Accelerator Design ▪ Network precision Develop on with Intel Arria 10 FPGA ▪ Batch Size: >1 integrated GPU ▪ Accelerator Power Budget: <50W ▪ Network Memory Size: <2BParams ▪ Longevity of supply: 15yr ▪ Image size: up to FHD/4K ▪ Short system latency ▪ Precision: FP16/11/9

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 22 *Other names and brands may be claimed as the property of others. Intel® NEURAL COMPUTE STICK 2 accelerate deep learning development for edge devices Computer vision and artificial intelligence are transforming IoT devices at the network edge

Navigation • • Sense & Avoid 3D Vol. Mapping • • GPS Denied Hovering Multi-Modal Sensing • Pixel Labeling • DRONES • Video, Image Capture

SURVEIL- SERVICE • Detection, Tracking LANCE ROBOTS • Identification Detection, Tracking • SYSTEMS • Classification Recognition • • Multi-Nodal Systems Video, Image, Session Capture • • Multi-Modal Sensing • Video, Image Capture

WEARABLES HOME Position, Mapping • • Detection, Tracking Gaze, Eye Tracking • • Perimeter, Presence Monitoring Gesture Tracking, Recognition • AR-VR • Recognition, Classification See through Camera • HMD • Multi-Nodal Systems • Multi-Modal Sensing • Video, Image Capture

Internet of Things Group 26 Introducing Intel® neural compute stick 2 A Plug-and-Play Deep Learning Development Kit

Intel® Movidius™ Myriad™ X VPU delivers industry leading performance UP TO HIGHER delivering 8X¹ PERFORMANCE On deep neural networks compared to + Intel® Movidius™ Neural Compute Stick

Intel® Distribution of Poweredby OpenVINO™ toolkit accelerates solution development and streamlines deployment

Internet of Things Group 27 intel® distribution of Openvino™ toolkit supported networks View Documentation  https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_MYRIAD.html

▪ AlexNet ▪ AlexNet ▪ AlexNet and CaffeNet ▪ CaffeNet ▪ Inception v1, v2, v3, v4 ▪ DenseNet family** (121,161,169,201) ▪ GoogleNet (Inception) v1, v2, v4 ▪ Inception ResNet v2 ▪ SqueezeNet v1.1 ▪ VGG family (VGG16, VGG19) ▪ MobileNet v1, v2 ▪ MobileNet v1, v2 ▪ SqueezeNet v1.0, v1.1 ▪ ResNet v1 family (50, 101, 152) ▪ NiN ▪ ResNet v1 family (18** ***, 50, 101, ▪ ResNet v2 family (50, 101, 152) 152) ▪ SqueezeNet v1.0, v1.1 ▪ ResNet v1 (101, 152) ▪ ▪ MobileNet (mobilenet-v1-1.0-224, ▪ VGG family (VGG16, VGG19) ResNet v2 (101) mobilenet-v2) ▪ ▪ Yolo family (yolo-v2, yolo-v3, SqueezeNet v1.1 ▪ Inception ResNet v2 tiny-yolo-v1, tiny-yolo-v2, tiny- ▪ VGG family (VGG16, VGG19) ▪ DenseNet family** (121,161,169,201) yolo-v3) ▪ SSD-Inception-v3, SSD- ▪ SSD-300, SSD-512, SSD-MobileNet, ▪ faster_rcnn_inception_v2, MobileNet, SSD-ResNet-50, SSD-GoogleNet, SSD-SqueezeNet faster_rcnn_resnet101 SSD-300 ▪ ssd_mobilenet_v1 ▪ DeepLab-v3+ NOTE: Not an exhaustive list – only includes popular networks. Intel® Neural Compute Stick 2 ** Network is tested on Intel® Movidius™ Neural Compute Stick with BatchNormalization fusion optimization disabled during Model Optimizer import *** Network is tested on Intel® Neural Compute Stick 2 with BatchNormalization fusion optimization disabled during Model Optimizer import 29 intel® distribution of Openvino™ toolkit supported layers View Documentation  https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_Supported_Devices.html

▪ Activation-Clamp ▪ CTCGreedyDecoder* ▪ Power ▪ Activation-ELU ▪ Deconvolution ▪ PriorBox ▪ Activation-Leaky ReLU ▪ DetectionOutput* ▪ PriorBoxClustered ▪ Active-PReLU ▪ Eltwise-Max ▪ Proposal ▪ Activation-ReLU ▪ Eltwise-Mul ▪ PSROIPooling ▪ Activation-ReLU6 ▪ Eltwise-Sum ▪ RegionYolo ▪ Activation-Sigmoid/Logistic ▪ Flatten ▪ ReorgYolo ▪ Activation-TanH ▪ FullyConnected (Inner Product) ▪ Resample ▪ ArgMax ▪ GRN ▪ Reshape ▪ BatchNormalization ▪ Interp ▪ ▪ Concat ▪ LRN (Norm) ▪ ROIPooling ▪ Const ▪ MVN* ▪ ScaleShift* ▪ Convolution-Dilated ▪ Normalize* ▪ Slice ▪ Convolution-Grouped ▪ Pad* ▪ SoftMax ▪ Convolution-Ordinary ▪ Permute ▪ Split ▪ Crop ▪ Pooling(AVG,MAX)* ▪ Tile

Intel® Neural Compute Stick 2 * Support is limited to the specific parameters. Refer to “Known Layers Limitation” section for the device from the list of supported. Changed since last update 30 Expedite development and deployment pre-trained models

Description Pre-trained Model Supported Samples

Face detection for driver monitoring face-detection-adas-0001 Interactive face detection Age and gender recognition age-gender-recognition-retail-0013 Interactive face detection Emotion recognition for retail emotions-recognition-retail-0003 Interactive face detection License plate detector vehicle-license-plate-detection-barrier-0106 Security barrier camera Vehicle attributes recognition vehicle-attributes-recognition-barrier-0039 Security barrier camera License plate recognition license-plate-recognition-barrier-0001 Security barrier camera Person, vehicle and bike detection person-vehicle-bike-detection-crossroad-0078 Crossroad camera Person re-identification person-reidentification-retail-0076 Crossroad camera person-reidentification-retail-0031 Crossroad camera pedestrian tracker person-reidentification-retail-0079 Crossroad camera Person detection person-detection-retail-0013 Any SSD-based sample Face detection for retail face-detection-retail-0004 Any SSD-based sample Face and person detection for retail face-person-detection-retail-0002 Any SSD-based sample Vehicle detection vehicle-detection-adas-0002 Any SSD-based sample Landmarks regression fro retail landmarks-regression-retail-0009 Smart classroom

View Documentation  http://docs.openvinotoolkit.org/latest/_docs_Pre_Trained_Models.html

Intel® Neural Compute Stick 2 31 Intel FPGAsforAI

34 How Intel® FPGAs enable DEEP Learning

▪ Millions of reconfigurable logic elements & routing ▪ Programmable Datapath fabric ▪ Thousands of 20Kb memory blocks & MLABs ▪ Customized Memory structure ▪ Thousands of variable precision digital signal processing (DSP) blocks ▪ Configurable compute ▪ Hundreds of configurable I/O & high-speed transceivers

I/O

I/O I/O

I/O

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 35 *Other names and brands may be claimed as the property of others. Adapting to innovation Many efforts to improve efficiency ▪ Batching ▪ Reduce bit width ▪ Sparse weights ▪ Sparse activations ▪ Weight sharing ▪ Compact network

I W O ··· 3 I 3 2 X = ··· 1 1 3 2 3 3 1 O Shared Weights W

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 36 *Other names and brands may be claimed as the property of others.

Intel® FPGA Deep Learning Acceleration

DDR DDR Suite Features ▪ CNN acceleration engine for common topologies executed in a graph loop Feature Map Cache architecture – AlexNet, GoogleNet, SqueezeNet, VGG16, Memory ResNet, Yolo, SSD… Reader Convolution PE /Writer ▪ Software Deployment Array – No FPGA compile required – Run-time reconfigurable Crossbar ▪ Customized Hardware Development – Custom architecture creation w/ parameters Config Engine – Custom primitives using OpenCL™ flow prim prim prim custom

Programmable Solutions Group *Other names and brands may be claimed as the property of others 37 FPGA Usage with OpenVINO™ toolkit

Trained Model ▪ Supports common software frameworks (Caffe, TensorFlow) Caffe, TensorFlow, etc… ▪ Model Optimizer enhances model for improved execution, storage, and transmission Model Optimizer OpenVINO™ toolkit ® FP Quantize / Intel Deep ▪ Inference Engine optimizes inference execution across Intel® Model Compress Learning hardware solutions using unified deployment API Model Analysis Deployment Toolkit ▪ Intel FPGA DLA Suite provides turn-key or customized CNN Intermediate Representation acceleration for common topologies

Inference Engine Heterogenous DDR DLA Runtime CPU/FPGA Optimized Acceleration Engine Feature Map Cache MKL-DNN Memor Engine Deployment Pre-compiled Graph Architectures DDR y Reader GoogLeNet Optimized Template Conv DDR /Writer ResNet Optimized Template PE Array SqueezeNet* Optimized Template DDR ® Intel VGG* Optimized Template Crossbar FPGA Additional, Generic CNN Templates Config Engine Hardware Customization Supported

Programmable Solutions Group *Other names and brands may be claimed as the property of others Support for Different Topologies

Tradeoff between features and performance

Feature Map Cache Feature Map Cache

Convolution PE Array Memory Memory Reader Reader Convolution PE Array /Writer /Writer vs Crossbar

SoftMax LRN Concat Flatten Crossbar Config Config

ReLU Norm MaxPool Engine Reshape ReLU Norm Permute MaxPool Engine

Programmable Solutions Group 40 DLA Architecture: Built for Performance

DDR Feature Map Cache

Memory DDR ▪ Maximize Parallelism on the FPGA Conv Reader /Writer DDR – Filter Parallelism (Processing Elements) PE Array DDR

– Input-Depth Parallelism Crossbar

Config – Batching Max ReLU Norm Engine Pool – Feature Stream Buffer – Filter Cache Execute ▪ Choosing FPGA Bitstream Stream Buffer – Data Type / Design Exploration Convolution / – Primitive Support Fully ReLU Norm MaxPool Connected

Programmable Solutions Group 43 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

Stream Buffer

Convolution / ReLU Norm MaxPool Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 44 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

output Stream Buffer input

Convolution / ReLU Norm MaxPool Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 45 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

input Stream Buffer output

Convolution / ReLU Norm MaxPool Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 46 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

output Stream Buffer input

Convolution / ReLU Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 47 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

input Stream Buffer output

Convolution / ReLU Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 48 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

output Stream Buffer input

Convolution / ReLU MaxPool Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 49 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

input Stream Buffer output

Convolution / ReLU Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 50 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

output Stream Buffer input

Convolution / ReLU Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 51 Conv ReLu Norm MaxPool Fully Conn. Mapping Graphs in DLA

AlexNet Graph

input Stream Buffer output

Convolution / Fully Connected

Blocks are run-time reconfigurable and bypassable

Programmable Solutions Group 52 Demos with OpenVINO

Application Supported samples Application Supported samples

Face detection ADAS Interactive face detection Person Reidentification Crossroad camera

Age/gender recognition Retail Interactive face detection Person Reidentification Crossroad camera pedestrian tracker

Head pose estimation ADAS Interactive face detection Person Reidentification Retail Crossroad camera

Emotion recognition Retail Interactive face detection Person detection Retail SSD based

Vehicle License plate Security barrier camera Face detection Retail SSD based detection Vehicle attribute recognition Security barrier camera Face person detection Retail SSD based

License plate recognition Security barrier camera Pedestrian detection ADAS SSD based

Person, vehicle, bike detection Object detection Vehicle detection ADAS SSD based

Landmarks regression Smart classroom Person and vehicle detector ADAS SSD based

https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 54 *Other names and brands may be claimed as the property of others. FPGA performance evolves over time

Platform Topology CVSDK Mar.’18(1) R2(2) R5(3) 2019 R1 YOY 2019 R2* 31.7 ResNet-50 Inference/Sec 76.2 144 314 9.9x 410 ResNet-101 27 49.4 100 175 6.5x 200 Intel® Arria® 10 FPGA Same Hardware, Better Bitstreams

Intel® Vision Accelerator Design With Intel® Arria® 10 FPGA 2019 R1

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 55 *Other names and brands may be claimed as the property of others. OpenVINO demo – Multiple Channel Face Detection

CPU only mode - 4 channels - 19fps @channel - 90% CPU used

System Configuration CPU: i7-6820EQ CPU @ 2.80GHz 4 physical cores HD 530 iGPU – Gen 9 24 ex units @350MHz FPGA card: Mustang F-100 Arria® 10 GX1150 FPGA PCIe Gen3x8 8G on-board DDR4

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 56 *Other names and brands may be claimed as the property of others. OpenVINO demo – Multiple Channel Face Detection

HETERO: GPU, CPU - 4 channels - 26fps @channel - 75% CPU used

System Configuration CPU: i7-6820EQ CPU @ 2.80GHz 4 physical cores HD 530 iGPU – Gen 9 24 ex units @350MHz FPGA card: Mustang F-100 Arria® 10 GX1150 FPGA PCIe Gen3x8 8G on-board DDR4

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 57 *Other names and brands may be claimed as the property of others. OpenVINO demo – Multiple Channel Face Detection

HETERO: FPGA, CPU - 9 channels - 26fps @channel - 55% CPU used

System Configuration CPU: i7-6820EQ CPU @ 2.80GHz 4 physical cores HD 530 iGPU – Gen 9 24 ex units @350MHz FPGA card: Mustang F-100 Arria® 10 GX1150 FPGA PCIe Gen3x8 8G on-board DDR4

Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. 58 *Other names and brands may be claimed as the property of others.