Intel® Distribution of Openvino™ Toolkit Dr
Total Page:16
File Type:pdf, Size:1020Kb
Intel® Distribution of OpenVINO™ toolkit Dr. Séverine Habert, Deep Learning Software Engineer April 9th, 2021 AI Compute Considerations How do you determine the right computing for your AI needs? Workloads requirements demand ASIC/ 1 FPGA/ GPU CPU 2 3 Utilization Scale What is my What are my use case How prevalent is AI workload profile? requirements? in my environment? Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 2 Intel® Distribution of OpenVINO™ Toolkit ▪ Tool Suite for High-Performance, Deep Learning Inference ▪ Fast, accurate real-world results using high-performance, AI and computer vision inference deployed into production across Intel® architecture from edge to cloud High-Performance, Streamlined Development, Write Once, Deep Learning Inference Ease of Use Deploy Anywhere • Enables deep learning inference from the edge to cloud. • Supports heterogeneous execution across Intel accelerators, using a common • Speeds time-to-market through an easy-to-use library of CV functions and pre- API for the Intel® CPU, Intel® Integrated Graphics, Intel® Gaussian & Neural optimized kernels. Accelerator, Intel® Neural Compute Stick 2, Intel® Vision Accelerator Design with • Includes optimized calls for CV standards, including OpenCV* and OpenCL™. Intel® Movidius™ VPUs. Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 3 Three steps for the Intel® Distribution of OpenVINO™ toolkit 1 Build 2 Optimize 3 Deploy Model Server Read, Load, Infer gRPC/ REST Server with C++ backend Model Optimizer Inference Engine Converts and optimizes Intermediate Common API that IR Data Deployment Manager trained model using a Representation abstracts low-level supported framework (.xml, .bin) programming for each hardware Trained Model Deep Learning Streamer Post-Training Optimization Tool Reduces model size into low- precision without re-training OpenCV OpenCL™ [NEW] Deep Learning Workbench Code Samples & Demos Open Model Zoo Visually analyze and fine-tune (e.g. Benchmark app, Accuracy Checker, Model Downloader) Intel® 100+ open sourced & optimized GNA [NEW] Available on Intel® DevCloud (IP) pre-trained models available for the Edge as a Beta release Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 4 Supported Frameworks Breadth of supported frameworks to enable developers with flexibility Trained Model (and other tools via ONNX* conversion) Supported Frameworks and Formats https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Introduction.html#SupportedFW Configure the Model Optimizer for your Framework https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Config_Model_Optimizer.html Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 5 Model Optimization Breadth of supported frameworks to enable developers with flexibility Model Optimizer loads a model into memory, reads it, builds the internal representation of the model, optimizes it, and produces the Read, Load, Infer Intermediate Representation. Trained Model Optimizer Intermediate Model IR Data Representation Optimization techniques available are: (.xml, .bin) — Linear operation fusing — Stride optimizations .xml – describes the network topology .bin – describes the weights and biases binary data — Group convolutions fusing Note: Except for ONNX (.onnx model formats), all models have to be converted to an IR format to use as input to the Inference Engine Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 6 Optimal Model Performance Using the Inference Engine Core Inference Engine Libraries Applications ▪ Create Inference Engine Core object to Inference Engine work with devices runtime ▪ Read the network Inference Engine (Common API) ▪ Manipulate network information Multi-device plugin (optional but recommended - for full system utilization) ▪ Execute and pass inputs and outputs Myriad & HDDL mkl-dnn plugin clDNN plugin GNA plugin FPGA plugin plugins Plugin architecture Device-specific Plugin Libraries ▪ For each supported target device, Intrinsics OpenCL™ GNA API Movidius API DLA Inference Engine provides a plugin — a DLL/shared library that contains Intel® GNA complete implementation for inference (IP) on this device. Available on LTS releases GPU = Intel CPU with integrated graphics/Intel® Processor Graphics/GEN GNA = Gaussian mixture model and Neural Network Accelerator Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 7 Inference Engine Synchronous vs Asynchronous Execution ▪ In IE API model can be executed by Infer Request which ▪ Asynchronous – checks the execution status with the can be: wait or specify a completion callback (recommended ▪ Synchronous - blocks until inference is completed. way). ▪ ▪ exec_net.infer(inputs = {input_blob: in_frame}) exec_net.start_async(request_id = id, inputs={input_blob: in_frame}) ▪ If exec_net.requests[id].wait() != 0 do something Sync Transfer 1 Transfer 2 API Inference 1 Inference 2 Result 1 Result 2 time Result 1 Result 2 Async Transfer 1 Transfer 2 API Inference 1 Inference 2 Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 8 Inference Engine Throughput Mode for CPU, iGPU and VPU ▪ Latency – inference time of 1 frame (ms). Device ▪ Throughput – overall amount of frames inferred Infer Request 1 Executionresources stream 1 per 1 second (FPS) Infer Request 2 Execution stream 2 ▪ “Throughput" mode allows the Inference Engine to efficiently run multiple infer requests Infer Request N Execution stream N simultaneously, greatly improving the overall throughput. ▪ Device resources are divided into execution time “streams” – parts which runs infer requests in CPU Example: parallel ie = IECore() ie.GetConfig(CPU, KEY_CPU_THROUGHPUT_STREAMS) Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 9 Inference Engine Heterogeneous Support ▪ You can execute different layers on different HW units ▪ Offload unsupported layers on fallback devices: ▪ Default affinity policy ▪ Setting affinity manually (CNNLayer::affinity) ▪ All device combinations are supported (CPU, GPU, FPGA, MYRIAD, HDDL) ▪ Samples/demos usage “-d HETERO:FPGA,CPU” FPGA InferenceEngine::Core core; auto executable_network = core.LoadNetwork(reader.getNetwork(), "HETERO:FPGA,CPU"); CPU Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 10 Inference Engine Multi-device Support Automatic load-balancing between devices Application (inference requests level) for full system Device priority Infer requests utilization Inference Engine Queue Manager ▪ Аny combinations of the following devices are CPU GPU VPU supported (CPU, iGPU, VPU, HDDL) queue queue queue ▪ As easy as “-d MULTI:CPU,GPU” for cmd-line option of your favorite sample/demo ▪ C++ example (Python is similar) CPU GPU VPU Plugin Plugin Plugin Core ie; ExecutableNetwork exec = CPU GPU VPU etc ie.LoadNetwork(network,{{“DEVICE_PRIORITIES”, “CPU,GPU”}}, “MULTI”) Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 11 Post-Training Optimization Tool Conversion technique that reduces model size into low-precision without re-training Trained Model Model trained using one of the supported Dataset and Annotation framework Reduces model size while also improving latency, with little degradation in model accuracy and without model re-training. Statistics & Accuracy and JSON Different optimization approaches are performance check Model Optimizer Converts and optimizes Post-training Optimization supported: quantization algorithms, etc. trained model using a IR Accuracy supported framework Tool Checker Full-Precision IR Conversion technique to quantize models to low precision for high Environment (hardware) performance specifications JSON IR Optimized IR Inference Engine Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 12 Deep Learning Workbench Web-based UI extension tool for model analyses and graphical measurements ▪ Visualizes performance data for topologies and layers to aid in model analysis ▪ Automates analysis for optimal performance configuration (streams, batches, latency) ▪ Experiment with INT8 or Winograd calibration for optimal tuning using the Post Training Optimization Tool ▪ Provide accuracy information through accuracy checker ▪ Direct access to models from public set of Open Model Zoo ▪ Enables remote profiling, allowing the collection of performance data from multiple different machines without any additional set-up. Intel® Distribution of OpenVINO™ toolkit / Product Overview 2020 © Copyright, Intel Corporation 13 Pre-Trained Models and Public Models Open-sourced repository of pre-trained models and support for public models 100+ Pre-trained Models 100+ Public Models Common AI tasks Pre-optimized external models Use free Pre-trained Models to speed up Object Detection Classification development and deployment Object Recognition Segmentation Reidentification Object Detection Take advantage of the Model Downloader Semantic Segmentation Human Pose Estimation and other automation tools to quickly get Instance Segmentation Monocular Depth Estimation started Human Pose Estimation Image Inpainting Image Processing Style Transfer Iterate with the Accuracy Checker to Text Detection Action Recognition validate the accuracy of your models Text