Deep-Learning on Embedded and Desktop Systems with Intel® Openvinotm

Total Page:16

File Type:pdf, Size:1020Kb

Deep-Learning on Embedded and Desktop Systems with Intel® Openvinotm Deep-learning on embedded and desktop systems with Intel® OpenVINOTM Marco Valdes, Università di Bologna Corso di Sistemi Digitali M Stefano Mattoccia, Università di Bologna TM Intel® OpenVINO Toolkit Develop applications and solutions that emulate human vision. Based on convolutional neural networks (CNN), the toolkit extends workloads across Intel® hardware (including accelerators) and maximizes performance [1] OpenVINO stands for Open Visual Inferencing and Neural Network Optimization Goal: optimize CNNs for Intel architectures CPUs Processor graphics FPGAs Vision Processing Units Image Proc. Units Vision Accelerator Design https://software.intel.com/en-us/openvino-toolkit/hardware Training and inference The deployment of a typical learning-based system requires two phases: 1. Training; the network is fed with training data to minimize the error on the validation set 2. Inference: once trained, the network is deployed with new data Nonetheless, there are some notable exceptions such as self-adaptive networks (ie, training is carried out at inference time directly from the input data) With OpenVINO we always start from a pre-trained model OpenVINO: How it works? OpenVINO: Model Optimizer ● Converts models from various frameworks (i.e. TensorFlow, Caffe, MXNet) ● Converts to a unified model (i.e. intermediate representation (IR)) ● Optimizes topologies but not for a specific target device ● Fixes constants paths in graph ● Python application OpenVINO: Inference Engine ● API for Inference across all Intel® architectures ● Allows optimized inference on most Intel hardware targets ● Heterogeneity support allows execution of layers across hardware types ● Asynchronous execution improves performance OpenVINO: Inference Engine OpenVINO: Inference Engine workflow ● Load model and weights ● Load inference plugin (CPU, Graphic accelerator, FPGA, Myriad 2/X) ● Load network to plugin ● Allocate input, output buffers ● Fill input buffer with data ● Run inference ● Interpret output results Movidius Neural Compute Stick ● Neural network accelerator in USB stick form factor ● TensorFlow and Caffe frameworks supported ● Extended to work with Openvino Toolkit with FP16 Data Type ● Features the same Intel Movidius vision processing unit (Intel Movidius VPU) used in drones, VR headsets, and other low-power intelligent and autonomous products Optimizations with Model Optimizer Model Optimizer provides methods to accelerate inference with the convolution neural networks that do not require model retraining: ● Linear Operations Fusing ● Grouped Convolution Fusing (Specific for tensorflow models) ● Stride optimization (Specific for caffè models) ● Cutting Off Parts of the model Fusing CNN Layers [3] Conventional Approach: ● iteratively compute each layer and save temporary results on memory ● streaming in the results saved to memory Optimized Approach: ● fuse 2+ layers ● only the first input feature map are transferred to the RAM ● compute all intermediate values of all fused layers ● output feature map is written on the RAM Optimization strategies: Model Optimizer Linear Operations Fusing Grouped Convolution Fusing Optimization strategies: Inference Engine 1/3 For each supported device, the Inference Engine API applies different optimizations: ● Internal CPU Plugin Optimizations: ● Merging of group convolutions ● Fusing of Convolution with ReLU or ELU. CPU plugin is fusing all Convolution with ReLU or ELU layers if these layers are located after the Convolution layer. ● Removing power layer. CPU plugin removes Power layer from topology if it has the following parameters: power = 1, scale = 1, offset = 0. ● Fusing Convolution + Sum or Convolution + Sum + ReLU Optimization strategies: Inference Engine 2/3 Fusing Convolution + Sum or Convolution + Sum + ReLU Optimization strategies: Inference Engine 3/3 The GPU plugin uses the Intel® Compute Library for Deep Neural Networks. clDNN is an open source performance library for Deep Learning (DL) applications intended for acceleration of Deep Learning Inference on Intel Processor Graphics including Intel® HD Graphics and Intel® Iris® Graphics. GPU Plugin allows this specific Optimizations: ● Fused layers: ● Layers optimized out when conditions allow: ■ Convolution - Activation ■ Crop ■ Deconvolution - Activation ■ Concatenate ■ Eltwise - Activation ■ Reshape ■ Flatten ■ Fully Connected - Activation ■ Split ■ Copy Data type representation Floating point can represent a wide range of numbers. In case of FP32, we use 32 bit (1 bit for the sign, 8 bit for the exponent and 23 bit for the fractional part) Data type representation 2 The largest part of the data lay in the INT8 range and we can think that we can improve performance by using 8 bits but this choice provide an accuracy loss so we need to search a trade off Moving to int8 data type Requirements: Supported Layers: ● Intel platforms that support to x86 ● Convolution instruction set from the following list: ● FullyConnected (AVX-512 only) ○ Intel AVX-512 ● ReLU ○ Intel AVX2 ● Pooling ○ Intel SSE4.2 ● Eltwise ● A model must contain at least one ● Concat activation layer of ReLU type ● Resample ● Obj-detection and classification models Case study: monocular depth estimation with PyDnet ● PyDnet [4] is a lightweight deep-network for monocular depth perception ● Developed with TensorFlow ● Compared to most other networks, delivers real-time performance on standard CPUs ● Suited for embedded systems Case study: freezing the model 1/2 Goal: create an inference graph file Steps: ● Identify output nodes of the graph ● Load and set the Tensorflow graph (from checkpoint) ● All trainable parameters are represented as variables in the graph ● Use tensorflow util to convert variables into constants Case study: freezing the model 2/2 … output_nodes = ['model/resize_images/ResizeBilinear','model/resize_images_1/ResizeBilinear','model/resize_i mages_1/ResizeBilinear'] … tf.train.write_graph(sess.graph_def, "frozen_modelsSPLIT", 'pydnet.pbtxt') graph_pbtxt = os.path.join("frozen_model", 'pydnet.pbtxt') graph_path = os.path.join("frozen_model", 'pydnet.ckpt') outputs = output_nodes[0] for name in output_nodes[1:]: outputs += ','+name frozen_graph_path = os.path.join("frozen_models", 'frozen_pydnet.pb') freeze_graph.freeze_graph(graph_pbtxt,'',False, graph_path, outputs,'save/restore_all', 'save/Const:0', frozen_graph_path, True, '') Case study: run the model optimizer 1/2 once the model is frozen, run the model optimizer Python app: python3 '/opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py' --input_model '/path/to/model/frozen_pydnet.pb' --model_name "IRPydnet" --data_type (FP32,FP16) --output 'model/resize_images/ResizeBilinear','model/resize_images/ResizeBilinear','mo del/resize_images/ResizeBilinear' --log_level=DEBUG Case study: run the model optimizer 2/2 The model optimizer enables to perform general purpose (i.e., agnostic to the target architecture) optimizations: set parameters, cut the model, modify data type or insert custom layer definitions for the input model In few seconds the it produces the Intermediate Representation of the model and generates three files used by the inference engine: 1. IRPydnet.xml 2. IRPydnet.bin 3. IRPydnet.mapping Case study: input and output blobs 1/2 ● In the OpenVINO terminology, a blob is the binary input or output of a network ● It simply consists of a Numpy tensor For PyDnet: ● The input blob of the network is a single (batch size N=1) RGB (channels C=3) image of size 256x512 (HxW) ● The output blob is a single image of size 256x512 (HxW) with two C=2 channels (depth is encoded with 16 bit) Case study: input and output blobs 2/2 PyDnet C=3, HxW=256x512 HxW=256x512, C=2 Case study: inference engine 1/2 ● Consists in the creation of a Python wrapper that uses Inference Engine API to perform inference for the specific model deployed ● The input and output blobs must be processed according to a specific order ● For PyDnet, the input BLOB’s shape must be [1,3,256,512] (NCHW) and output blob that has shape [1,2,256,512] (NCHW) has to be transpose in [256,512,3] (HWC) form Case study: inference engine 2/2 model_xml = "/home/marco/Scaricati/pydnet-master/pydnet-master/IRPydnet.xml" model_bin = "/home/marco/Scaricati/pydnet-master/pydnet-master/IRPydnet.bin" ie = IECore() net= IENetwork(model=model_xml,weights=model_bin) ie.add_extension("/opt/intel/openvino/inference_engine/lib/intel64/libcpu_extension_sse4.so", "CPU") #needed only for Interp Layer for CPU input_blob = next(iter(net.inputs)) n,c,h,w=net.inputs[input_blob].shape #exec_net = ie.load_network(network=net, device_name="CPU") #exec_net = ie.load_network(network=net, device_name="GPU") #Load the specific plugin for the device that you want to test exec_net = ie.load_network(network=net, device_name="MYRIAD") … #load image with cv2, #resize it and preprocess data img= img.transpose((2,1,0)) …. res = exec_net.infer(inputs={input_blob: img}) out= res['model/resize_images/ResizeBilinear'] #select the output … #postprocess and visualize data Case study: experimental results The network provides results at three resolutions: H Half (H), Quarter (Q) and Eighth (E) PyDnet Q E Case study: performance evaluation (H res) ● CPU: Intel Core 7 (7500U) 2.7 Ghz Power Consumption: ~15 W ● Graphic accelerator: Intel HD Graphics 620 Power Consumption: ~15 W ● Myriad V2: Intel Movidius stick 1 Power Consumption: 0.5 W OpenVINO on embedded systems using Intel’s Movidius Neural Compute Stick Low-power consumption is indispensable for autonomous/unmanned vehicles and IoT (Internet of Things)
Recommended publications
  • SOL: Effortless Device Support for AI Frameworks Without Source Code Changes
    SOL: Effortless Device Support for AI Frameworks without Source Code Changes Nicolas Weber and Felipe Huici NEC Laboratories Europe Abstract—Modern high performance computing clusters heav- State of the Art Proposed with SOL ily rely on accelerators to overcome the limited compute power API (Python, C/C++, …) API (Python, C/C++, …) of CPUs. These supercomputers run various applications from different domains such as simulations, numerical applications or Framework Core Framework Core artificial intelligence (AI). As a result, vendors need to be able to Device Backends SOL efficiently run a wide variety of workloads on their hardware. In the AI domain this is in particular exacerbated by the Fig. 1: Abstraction layers within AI frameworks. existance of a number of popular frameworks (e.g, PyTorch, TensorFlow, etc.) that have no common code base, and can vary lines of code to their scripts in order to enable SOL and its in functionality. The code of these frameworks evolves quickly, hardware support. making it expensive to keep up with all changes and potentially We explore two strategies to integrate new devices into AI forcing developers to go through constant rounds of upstreaming. frameworks using SOL as a middleware, to keep the original In this paper we explore how to provide hardware support in AI frameworks without changing the framework’s source code in AI framework unchanged and still add support to new device order to minimize maintenance overhead. We introduce SOL, an types. The first strategy hides the entire offloading procedure AI acceleration middleware that provides a hardware abstraction from the framework, and the second only injects the necessary layer that allows us to transparently support heterogenous hard- functionality into the framework to enable the execution, but ware.
    [Show full text]
  • Intel® Inspector 2017 Release Notes for Linux* OS
    Intel® Inspector 2017 Release Notes for Linux* OS Installation Guide and Release Notes 4 September 2017 Contents: Introduction What's New System Requirements Installation Notes Issues and Limitations Attributions Disclaimer and Legal Information 1 Introduction Intel® Inspector 2017 helps developers identify and resolve memory and threading correctness issues in their C, C++ and Fortran programs. Intel Inspector is a static and dynamic error checking tool for developing multithreaded applications on Windows* or Linux* operating systems. Intel Inspector maximizes code quality and reliability by quickly detecting memory, threading, and source code security errors during the development cycle. You can also use the Intel Inspector to visualize and manage Static Analysis results created by Intel® compilers in various suite products. Intel Inspector is an easy, comprehensive solution that delivers rapid results in isolating memory and multithreading errors. Intel Inspector has a standalone graphical user interface (GUI) as well as a command line interface (CLI). This document provides system requirements, installation instructions, issues and limitations, and legal information. Use the Getting Started tutorial and reference documentation to learn more about the Intel Inspector. For documentation, open the get_started.htm file in the following directory: /opt/intel/inspector_2017/documentation/en/welcomepage. You can access the product help in a web browser by opening the index.htm in the documentation help directory. 1 Intel® Inspector 2017 Release Notes If you did not register this product during installation, do so at the Intel® Software Development Products Registration Center (https://registrationcenter.intel.com/). Registration entitles you to free technical support, product updates and upgrades for the duration of the support term.
    [Show full text]
  • Intel® Deep Learning Boost (Intel® DL Boost) Product Overview
    Intel® Deep Learning boost Built-in acceleration for training and inference workloads 11 Run complex workloads on the same Platform Intel® Xeon® Scalable processors are built specifically for the flexibility to run complex workloads on the same hardware as your existing workloads 2 Intel avx-512 Intel Deep Learning boost Intel VNNI, bfloat16 Intel VNNI 2nd & 3rd Generation Intel Xeon Scalable Processors Based on Intel Advanced Vector Extensions Intel AVX-512 512 (Intel AVX-512), the Intel DL Boost Vector 1st, 2nd & 3rd Generation Intel Xeon Neural Network Instructions (VNNI) delivers a Scalable Processors significant performance improvement by combining three instructions into one—thereby Ultra-wide 512-bit vector operations maximizing the use of compute resources, capabilities with up to two fused-multiply utilizing the cache better, and avoiding add units and other optimizations potential bandwidth bottlenecks. accelerate performance for demanding computational tasks. bfloat16 3rd Generation Intel Xeon Scalable Processors on 4S+ Platform Brain floating-point format (bfloat16 or BF16) is a number encoding format occupying 16 bits representing a floating-point number. It is a more efficient numeric format for workloads that have high compute intensity but lower need for precision. 3 Common Training and inference workloads Image Classification Speech Recognition Language Translation Object Detection 4 Intel Deep Learning boost A Vector neural network instruction (vnni) Extends Intel AVX-512 to Accelerate Ai/DL Inference Intel Avx-512 VPMADDUBSW VPMADDWD VPADDD Combining three instructions into one UP TO11X maximizes the use of Intel compute resources, DL throughput improves cache utilization vs. current-gen Vnni and avoids potential Intel Xeon Scalable CPU VPDPBUSD bandwidth bottlenecks.
    [Show full text]
  • EECS 498/598: Brain-Inspired Computing: Models, Architectures, and Programming
    NEW COURSE ANNOUNCEMENT FOR FALL 2019 EECS 498/598: Brain-Inspired Computing: Models, Architectures, and Programming Time: Tuesday and Thursday 1:30 to 3:00 pm Instructor: Prof. Pinaki Mazumder Phone: 734-763-2107; e-mail: [email protected] Brain-inspired computing is a subset of AI-based machine learning and is generally referred to both deep and shallow artificial neural networks (ANN) and spiking neural networks (SNN). Deep convolutional neural networks (CNN) have made pervasive market inroads in numerous commercial applications and their software implementations are widely studied in computer vision, speech processing and other courses. The purpose of this course will be to study the wide gamut of shallow and deep neural network models, the methodologies for specialized hardware design of popular learning algorithms, as well as adapting hardware architectures on crossbar fabrics of emerging technologies such as memristors and spin torque nanomagnetic devices. Existing software development tools such as TensorFlow, Caffe, and PyTorch will be leveraged to teach various aspects of neuromorphic designs. Prerequisites: Senior undergrad and grad student standing in Electrical Engineering, Computer Engineering, Computer Science or Applied Physics program. Outline: i) Fundamentals of brain-inspired computing and history of neural computing, ii) Basics of linear algebra and probability theory needed for modeling of neural networks, iii) Deep learning by convolutional neural networks such as AlexNet, VGG, GoogLeNet, and ResNet, iv) Deep Neural
    [Show full text]
  • DL-Inferencing for 3D Cephalometric Landmarks Regression Task Using Openvino?
    DL-inferencing for 3D Cephalometric Landmarks Regression task using OpenVINO? Evgeny Vasiliev1[0000−0002−7949−1919], Dmitrii Lachinov1;2[0000−0002−2880−2887], and Alexandra Getmanskaya1[0000−0003−3533−1734] 1 Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University 2 Department of Ophthalmology and Optometry, Medical University of Vienna [email protected], [email protected], [email protected] Abstract. In this paper, we evaluate the performance of the Intel Distribution of OpenVINO toolkit in practical solving of the problem of automatic three- dimensional Cephalometric analysis using deep learning methods. This year, the authors proposed an approach to the detection of cephalometric landmarks from CT-tomography data, which is resistant to skull deformities and use convolu- tional neural networks (CNN). Resistance to deformations is due to the initial detection of 4 points that are basic for the parameterization of the skull shape. The approach was explored on CNN for three architectures. A record regression accuracy in comparison with analogs was obtained. This paper evaluates the per- formance of decision making for the trained CNN-models at the inference stage. For a comparative study, the computing environments PyTorch and Intel Distribu- tion of OpenVINO were selected, and 2 of 3 CNN architectures: based on VGG for regression of cephalometric landmarks and an Hourglass-based model, with the RexNext backbone for the landmarks heatmap regression. The experimen- tal dataset was consist of 20 CT of patients with acquired craniomaxillofacial deformities and was include pre- and post-operative CT scans whose format is 800x800x496 with voxel spacing of 0.2x0.2x0.2 mm.
    [Show full text]
  • A Deep Neural Network for On-Board Cloud Detection on Hyperspectral Images
    remote sensing Article CloudScout: A Deep Neural Network for On-Board Cloud Detection on Hyperspectral Images Gianluca Giuffrida 1,* , Lorenzo Diana 1 , Francesco de Gioia 1 , Gionata Benelli 2 , Gabriele Meoni 1 , Massimiliano Donati 1 and Luca Fanucci 1 1 Department of Information Engineering, University of Pisa, Via Girolamo Caruso 16, 56122 Pisa PI, Italy; [email protected] (L.D.); [email protected] (F.d.G.); [email protected] (G.M.); [email protected] (M.D.); [email protected] (L.F.) 2 IngeniArs S.r.l., Via Ponte a Piglieri 8, 56121 Pisa PI, Italy; [email protected] * Correspondence: [email protected] Received: 31 May 2020; Accepted: 5 July 2020; Published: 10 July 2020 Abstract: The increasing demand for high-resolution hyperspectral images from nano and microsatellites conflicts with the strict bandwidth constraints for downlink transmission. A possible approach to mitigate this problem consists in reducing the amount of data to transmit to ground through on-board processing of hyperspectral images. In this paper, we propose a custom Convolutional Neural Network (CNN) deployed for a nanosatellite payload to select images eligible for transmission to ground, called CloudScout. The latter is installed on the Hyperscout-2, in the frame of the Phisat-1 ESA mission, which exploits a hyperspectral camera to classify cloud-covered images and clear ones. The images transmitted to ground are those that present less than 70% of cloudiness in a frame. We train and test the network against an extracted dataset from the Sentinel-2 mission, which was appropriately pre-processed to emulate the Hyperscout-2 hyperspectral sensor.
    [Show full text]
  • Dr. Fabio Baruffa Senior Technical Consulting Engineer, Intel IAGS Legal Disclaimer & Optimization Notice
    Dr. Fabio Baruffa Senior Technical Consulting Engineer, Intel IAGS Legal Disclaimer & Optimization Notice Performance results are based on testing as of September 2018 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Copyright © 2019, Intel Corporation. All rights reserved. Intel, the Intel logo, Pentium, Xeon, Core, VTune, OpenVINO, Cilk, are trademarks of Intel Corporation or its subsidiaries in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations.
    [Show full text]
  • Middleware Mobility Paper SSRN
    STANDARDIZED TOOLS AND THE GENERALIZABILITY OF HUMAN CAPITAL: THE IMPACT OF STANDARDIZED TECHNOLOGIES ON EMPLOYEE MOBILITY Milan Miric University of Southern California ([email protected]) Hakan Ozalp KIN Center for Digital Innovation, VU Amsterdam ([email protected]) Abstract The mobility of highly skilled knowledge and creative workers is an important determinant of innovation. Existing studies have not considered how the use and diffusion of standardized technologies and tools influence the mobility of individual knowledge workers. We theorize that the diffusion of standardized tools increases the generalizability of human capital and, in turn, increases the ability of individuals to move between companies. Using data on the use of middleware in the console games industry, we find that this diffusion of standardized middleware tools lead to an increase in labor mobility on average, but was associated with higher mobility for individuals with skills that complemented those tools, in comparison to those that were substituted by these tools. Worker experience with standardized tools amplified these effects, as individuals who were experienced in using these tools saw the largest shift in the likelihood of mobility. We do not find that this diffusion led to individuals leaving the industry, but we do find evidence that the diffusion of a common set of tools within an industry was associated with workers being less likely to leave that industry. These results highlight the potential unintended effects of technological standardization and the broad diffusion of standardized tools, which may enable workers to more easily shift between competitors. 1. INTRODUCTION Human capital is a key resource for organizations and can be a key determinant of firm innovativeness, competitiveness, and survival.
    [Show full text]
  • Intel® Parallel Studio XE 2015 Composer Edition for Fortran OS X* Installation Guide and Release Notes
    Intel® Parallel Studio XE 2015 Composer Edition for Fortran OS X* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction ............................................................................................................................ 3 1.1 Change History ............................................................................................................... 3 1.1.1 Changes in Update 1 ............................................................................................... 3 1.1.2 Changes since Intel® Fortran Composer XE 2013 SP1 (New in Intel® Parallel Studio XE 2015 Composer Edition) ....................................................................................... 3 1.2 Product Contents ............................................................................................................ 4 1.3 Intel® Debugger (IDB) is removed from this release ...................................................... 4 1.4 System Requirements .................................................................................................... 4 1.5 Documentation ............................................................................................................... 4 1.6 Optimization Notice ........................................................................................................ 4 1.7 Technical Support ........................................................................................................... 5 2 Installation .............................................................................................................................
    [Show full text]
  • Intel® Debugger Command Reference
    Intel® Debugger Command Reference May 2009 Document Number: 319698-009US World Wide Web: http://www.intel.com Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
    [Show full text]
  • Accelerate Scientific Deep Learning Models on Heteroge- Neous Computing Platform with FPGA
    Accelerate Scientific Deep Learning Models on Heteroge- neous Computing Platform with FPGA Chao Jiang1;∗, David Ojika1;5;∗∗, Sofia Vallecorsa2;∗∗∗, Thorsten Kurth3, Prabhat4;∗∗∗∗, Bhavesh Patel5;y, and Herman Lam1;z 1SHREC: NSF Center for Space, High-Performance, and Resilient Computing, University of Florida 2CERN openlab 3NVIDIA 4National Energy Research Scientific Computing Center 5Dell EMC Abstract. AI and deep learning are experiencing explosive growth in almost every domain involving analysis of big data. Deep learning using Deep Neural Networks (DNNs) has shown great promise for such scientific data analysis appli- cations. However, traditional CPU-based sequential computing without special instructions can no longer meet the requirements of mission-critical applica- tions, which are compute-intensive and require low latency and high throughput. Heterogeneous computing (HGC), with CPUs integrated with GPUs, FPGAs, and other science-targeted accelerators, offers unique capabilities to accelerate DNNs. Collaborating researchers at SHREC1at the University of Florida, CERN Openlab, NERSC2at Lawrence Berkeley National Lab, Dell EMC, and Intel are studying the application of heterogeneous computing (HGC) to scientific prob- lems using DNN models. This paper focuses on the use of FPGAs to accelerate the inferencing stage of the HGC workflow. We present case studies and results in inferencing state-of-the-art DNN models for scientific data analysis, using Intel distribution of OpenVINO, running on an Intel Programmable Acceleration Card (PAC) equipped with an Arria 10 GX FPGA. Using the Intel Deep Learning Acceleration (DLA) development suite to optimize existing FPGA primitives and develop new ones, we were able accelerate the scientific DNN models under study with a speedup from 2.46x to 9.59x for a single Arria 10 FPGA against a single core (single thread) of a server-class Skylake CPU.
    [Show full text]
  • Deploying a Smart Queuing System on Edge with Intel Openvino Toolkit
    Deploying a Smart Queuing System on Edge with Intel OpenVINO Toolkit Rishit Dagli Thakur International School Süleyman Eken ( [email protected] ) Kocaeli University https://orcid.org/0000-0001-9488-908X Research Article Keywords: Smart queuing system, edge computing, edge AI, soft computing, optimization, Intel OpenVINO Posted Date: May 17th, 2021 DOI: https://doi.org/10.21203/rs.3.rs-509460/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Noname manuscript No. (will be inserted by the editor) Deploying a Smart Queuing System on Edge with Intel OpenVINO Toolkit Rishit Dagli · S¨uleyman Eken∗ Received: date / Accepted: date Abstract Recent increases in computational power and the development of specialized architecture led to the possibility to perform machine learning, especially inference, on the edge. OpenVINO is a toolkit based on Convolu- tional Neural Networks that facilitates fast-track development of computer vision algorithms and deep learning neural networks into vision applications, and enables their easy heterogeneous execution across hardware platforms. A smart queue management can be the key to the success of any sector. In this paper, we focus on edge deployments to make the Smart Queuing System (SQS) accessible by all also providing ability to run it on cheap devices. This gives it the ability to run the queuing system deep learning algorithms on pre-existing computers which a retail store, public transportation facility or a factory may already possess thus considerably reducing the cost of deployment of such a system. SQS demonstrates how to create a video AI solution on the edge.
    [Show full text]