Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices

Total Page:16

File Type:pdf, Size:1020Kb

Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices 1st Chunjie Luo 2nd Xiwen He 3nd Jianfeng Zhan Institute of Computing Technology Institute of Computing Technology Institute of Computing Technology Chinese Academy of Sciences Chinese Academy of Sciences Chinese Academy of Sciences BenchCouncil BenchCouncil BenchCouncil Beijing, China Beijing, China Beijing, China [email protected] [email protected] [email protected] 4nd Lei Wang 5nd Wanling Gao 6nd Jiahui Dai Institute of Computing Technology Institute of Computing Technology Beijing Academy of Frontier Chinese Academy of Sciences Chinese Academy of Sciences Sciences and Technology BenchCouncil BenchCouncil Beijing, China Beijing, China Beijing, China [email protected] wanglei [email protected] [email protected] Abstract—Due to increasing amounts of data and compute inference on the edge devices can 1) reduce the latency, 2) resources, deep learning achieves many successes in various protect the privacy, 3) consume the power [2]. domains. The application of deep learning on the mobile and em- To make the inference on the edge devices more efficient, bedded devices is taken more and more attentions, benchmarking and ranking the AI abilities of mobile and embedded devices the neural networks are designed more light-weight by using becomes an urgent problem to be solved. Considering the model simpler architecture, or by quantizing, pruning and compress- diversity and framework diversity, we propose a benchmark suite, ing the networks. Different networks present different trade- AIoTBench, which focuses on the evaluation of the inference offs between accuracy and computational complexity. These abilities of mobile and embedded devices. AIoTBench covers networks are designed with very different philosophies and three typical heavy-weight networks: ResNet50, InceptionV3, DenseNet121, as well as three light-weight networks: SqueezeNet, have very different architectures. There is no single network MobileNetV2, MnasNet. Each network is implemented by three architecture that unifies the network design and application. frameworks which are designed for mobile and embedded de- Beside of the diversity of models, the AI frameworks on vices: Tensorflow Lite, Caffe2, Pytorch Mobile. To compare and mobile and embedded devices are also diverse, e.g. Tensorflow rank the AI capabilities of the devices, we propose two unified Lite [2], Caffe2 [3], Pytorch Mobile [4]. On the other hand, metrics as the AI scores: Valid Images Per Second (VIPS) and Valid FLOPs Per Second (VOPS). Currently, we have compared the mobile and embedded devices provide additional hardware and ranked 5 mobile devices using our benchmark. This list will acceleration using GPUs or NPUs to support the AI applica- be extended and updated soon after. tions. Since AI applications on mobile and embedded devices get more and more attentions, benchmarking and ranking the I. INTRODUCTION AI abilities of those devices becomes an urgent problem to be Due to increasing amounts of data and compute resources, solved. arXiv:2005.05085v1 [cs.LG] 7 May 2020 deep learning achieves many successes in various domains. To MLPerf Inference [5] proposes a set of rules and practices prove the validity of the AI, researchers take more attention to ensure comparability across systems with wildly differing to train more accurate models in the early days. Since many architectures. It defines several high-level abstractions of AI models have achieved pretty performance in various domains, problems, and specifies the task to be accomplished and the the applications of the AI models to the end-users are put on general rules of the road, but leaves implementation details the agenda. After entering the application period, the inference to the submitters. MLPerf Inference aims to more general becomes more and more important. According to a IDC report and flexible benchmarking. However, this flexibility brings [1], the demand for AI inference will far exceed the demand unfeasibility and unaffordable for a specific domain. The for training in the future. users of the benchmark need make their efforts to optimize With the popularity of smart phones and internet of things, the whole stack of the AI solutions, from the languages, researchers and engineers have made effort to apply the AI libraries, frameworks, to hardwares. Moreover, the excessive inference on the mobile and embedded devices. Running the optimization of a particular solution may affect the generality of the AI system. ETH Zurich AI Benchmark [6], [7] aims Jianfeng Zhan is the corresponding author. to evaluate the AI ability of Android smartphones. Although AI Benchmark covers diverse tasks, we believe that the model the AI solutions, from the languages, libraries, frameworks, to diversity should be taken more attentions, since the models hardwares. For the hardware manufacturer, the implementation are often shared between different tasks by transfer learning. and optimization of the AI task may take more proportion to For example, the pre-trained networks for image recognition achieve a higher score of the benchmarking. Moreover, the are often used as the backbone to extract the feature maps excessive optimization of a particular solution may affect the for other vision tasks, e.g. the object detection and segmen- generality of the system. tation. Moreover, AI Benchmark is implemented only using ETH Zurich AI Benchmark [6], [7] aims to evaluate the AI TensorFlow Lite. ability of Android smartphones. It contains several workloads In this paper, we propose a benchmark suite, AIoTBench, covering the tasks of object recognition, face recognition, which focuses on the evaluation of the inference ability of playing atari games, image deblurring, image super-resolution, mobile and embedded devices. The workloads in our bench- bokeh simulation, semantic segmentation, photo enhancement. mark cover more model architectures and more frameworks. Although AI Benchmark covers diverse tasks, we believe that Specifically, AIoTBench covers three typical heavy-weight the model diversity should be taken more attentions, since the networks: ResNet50 [8], InceptionV3 [9], DenseNet121 [10], models are often shared between different tasks by transfer as well as three light-weight networks: SqueezeNet [11], learning. For example, the pre-trained networks for image MobileNetV2 [12], MnasNet [13]. Each model is implemented recognition are often used as the backbone to extract the by three frameworks: Tensorflow Lite, Caffe2, Pytorch Mobile. feature maps for other vision tasks, e.g. object detection and Comparing to MLPerf Inference, our benchmark is more segmentation. Moreover, AI Benchmark is implemented only available and affordable for the users, since it is off the shelf using TensorFlow Lite. and needs no re-implementation. Simone Bianco et al. [14] analyzes more than 40 state-of- Our benchmark can be used for: the-art deep architectures in terms of accuracy rate, model • Comparison of different AI models. Users can make a complexity, memory usage, computational complexity, and tradeoff between the accuracy, model complexity, model inference time. They experiment the selected architectures on size and speed depending on the application requirement. a workstation equipped with a NVIDIA Titan X Pascal and • Comparison of different AI frameworks on mobile en- an embedded system based on a NVIDIA Jetson TX1 board. vironment. Depending on the selected model, users can Their workloads are implemented using PyTorch. Shaohuai make comparisons between the AI frameworks on mobile Shi et al. [15] compare the state-of-the-art deep learning devices. software frameworks, including Caffe, CNTK, MXNet, Ten- • Benchmarking and ranking the AI abilities of different sorFlow, and Torch. They compare the running performance of mobile devices. With diverse and representative models these frameworks with three popular types of neural networks and frameworks, the mobile devices can get a compre- on two CPU platforms and three GPU platforms. The paper hensive benchmarking and evaluation. [16] discusses a comprehensive design of benchmark on To compare and rank the AI capabilities of the devices, we mobile and embeded devices, while we focus on the vision propose two unified metrics as the AI scores: Valid Images Per domain in this paper. Other AI benchmarks include Fathom Second (VIPS) and Valid FLOPs Per Second (VOPS). They [17], DAWNBench [18]. These benchmarks are not designed reflect the trade-off between quality and performance of the for mobile devices. AI system. Currently, we have compared and ranked 5 mobile Our benchmark focuses on the evaluation of the inference devices using our benchmark: Galaxy s10e, Honor v20, Vivo ablity of mobile and embedded devices. The workloads in x27, Vivo nex,and Oppo R17. This list will be extended and our benchmark cover more model architectures and more updated soon after. frameworks. Additionally, comparing to MLPerf Inference, our benchmark is more available and affordable for the users, since II. RELATED WORK it is off the shelf and needs no re-implementation. The details of comparisons of different benchmarks are shown in Table I. MLPerf Inference [5] proposes a set of rules and practices to ensure comparability across systems with wildly differing III. CONSIDERATIONS architectures. Unlike traditional benchmarks, e.g. SPEC CPU, In this section, we highlight the main considerations of which runs out of the box, MLPerf Inference is a semantic- AIoTBench. The details
Recommended publications
  • Open Source Used in Influx1.8 Influx 1.9
    Open Source Used In Influx1.8 Influx 1.9 Cisco Systems, Inc. www.cisco.com Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco website at www.cisco.com/go/offices. Text Part Number: 78EE117C99-1178791953 Open Source Used In Influx1.8 Influx 1.9 1 This document contains licenses and notices for open source software used in this product. With respect to the free/open source software listed in this document, if you have any questions or wish to receive a copy of any source code to which you may be entitled under the applicable free/open source license(s) (such as the GNU Lesser/General Public License), please contact us at [email protected]. In your requests please include the following reference number 78EE117C99-1178791953 Contents 1.1 golang-protobuf-extensions v1.0.1 1.1.1 Available under license 1.2 prometheus-client v0.2.0 1.2.1 Available under license 1.3 gopkg.in-asn1-ber v1.0.0-20170511165959-379148ca0225 1.3.1 Available under license 1.4 influxdata-raft-boltdb v0.0.0-20210323121340-465fcd3eb4d8 1.4.1 Available under license 1.5 fwd v1.1.1 1.5.1 Available under license 1.6 jaeger-client-go v2.23.0+incompatible 1.6.1 Available under license 1.7 golang-genproto v0.0.0-20210122163508-8081c04a3579 1.7.1 Available under license 1.8 influxdata-roaring v0.4.13-0.20180809181101-fc520f41fab6 1.8.1 Available under license 1.9 influxdata-flux v0.113.0 1.9.1 Available under license 1.10 apache-arrow-go-arrow v0.0.0-20200923215132-ac86123a3f01 1.10.1 Available under
    [Show full text]
  • Intelligent Recognition Method of Low-Altitude Squint Optical Ship Target Fused with Simulation Samples
    remote sensing Article Intelligent Recognition Method of Low-Altitude Squint Optical Ship Target Fused with Simulation Samples Bo Liu 1,2 , Qi Xiao 1,2, Yuhao Zhang 1,2, Wei Ni 1, Zhen Yang 1 and Ligang Li 1,* 1 National Space Science Center, Key Laboratory of Electronics and Information Technology for Space System, Chinese Academy of Sciences, Beijing 100190, China; [email protected] (B.L.); [email protected] (Q.X.); [email protected] (Y.Z.); [email protected] (W.N.); [email protected] (Z.Y.) 2 School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China * Correspondence: [email protected]; Tel.: +86-1312-152-1820 Abstract: To address the problem of intelligent recognition of optical ship targets under low-altitude squint detection, we propose an intelligent recognition method based on simulation samples. This method comprehensively considers geometric and spectral characteristics of ship targets and ocean background and performs full link modeling combined with the squint detection atmospheric transmission model. It also generates and expands squint multi-angle imaging simulation samples of ship targets in the visible light band using the expanded sample type to perform feature analysis and modification on SqueezeNet. Shallow and deeper features are combined to improve the accuracy of feature recognition. The experimental results demonstrate that using simulation samples to expand the training set can improve the performance of the traditional k-nearest neighbors algorithm and modified SqueezeNet. For the classification of specific ship target types, a mixed-scene dataset Citation: Liu, B.; Xiao, Q.; Zhang, Y.; expanded with simulation samples was used for training.
    [Show full text]
  • Easybuild Documentation Release 20210907.0
    EasyBuild Documentation Release 20210907.0 Ghent University Tue, 07 Sep 2021 08:55:41 Contents 1 What is EasyBuild? 3 2 Concepts and terminology 5 2.1 EasyBuild framework..........................................5 2.2 Easyblocks................................................6 2.3 Toolchains................................................7 2.3.1 system toolchain.......................................7 2.3.2 dummy toolchain (DEPRECATED) ..............................7 2.3.3 Common toolchains.......................................7 2.4 Easyconfig files..............................................7 2.5 Extensions................................................8 3 Typical workflow example: building and installing WRF9 3.1 Searching for available easyconfigs files.................................9 3.2 Getting an overview of planned installations.............................. 10 3.3 Installing a software stack........................................ 11 4 Getting started 13 4.1 Installing EasyBuild........................................... 13 4.1.1 Requirements.......................................... 14 4.1.2 Using pip to Install EasyBuild................................. 14 4.1.3 Installing EasyBuild with EasyBuild.............................. 17 4.1.4 Dependencies.......................................... 19 4.1.5 Sources............................................. 21 4.1.6 In case of installation issues. .................................. 22 4.2 Configuring EasyBuild.......................................... 22 4.2.1 Supported configuration
    [Show full text]
  • NXP Eiq Machine Learning Software Development Environment For
    AN12580 NXP eIQ™ Machine Learning Software Development Environment for QorIQ Layerscape Applications Processors Rev. 3 — 12/2020 Application Note Contents 1 Introduction 1 Introduction......................................1 Machine Learning (ML) is a computer science domain that has its roots in 2 NXP eIQ software introduction........1 3 Building eIQ Components............... 2 the 1960s. ML provides algorithms capable of finding patterns and rules in 4 OpenCV getting started guide.........3 data. ML is a category of algorithm that allows software applications to become 5 Arm Compute Library getting started more accurate in predicting outcomes without being explicitly programmed. guide................................................7 The basic premise of ML is to build algorithms that can receive input data and 6 TensorFlow Lite getting started guide use statistical analysis to predict an output while updating output as new data ........................................................ 8 becomes available. 7 Arm NN getting started guide........10 8 ONNX Runtime getting started guide In 2010, the so-called deep learning started. It is a fast-growing subdomain ...................................................... 16 of ML, based on Neural Networks (NN). Inspired by the human brain, 9 PyTorch getting started guide....... 17 deep learning achieved state-of-the-art results in various tasks; for example, A Revision history.............................17 Computer Vision (CV) and Natural Language Processing (NLP). Neural networks are capable of learning complex patterns from millions of examples. A huge adaptation is expected in the embedded world, where NXP is the leader. NXP created eIQ machine learning software for QorIQ Layerscape applications processors, a set of ML tools which allows developing and deploying ML applications on the QorIQ Layerscape family of devices.
    [Show full text]
  • Bringing Probabilistic Programming to Scientific Simulators at Scale
    Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale Atılım Güneş Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, Saeid Naderiparizi, Bradley Gram-Hansen, Gilles Louppe, Mingfei Ma, Xiaohui Zhao, Philip Torr, Victor Lee, Kyle Cranmer, Prabhat, and Frank Wood SC19, Denver, CO, United States 19 November 2019 Simulation and HPC Computational models and simulation are key to scientific advance at all scales Particle physics Nuclear physics Material design Drug discovery Weather Climate science Cosmology 2 Introducing a new way to use existing simulators Probabilistic programming Simulation Supercomputing (machine learning) 3 Simulators Parameters Outputs (data) Simulator 4 Simulators Parameters Outputs (data) Simulator Prediction: ● Simulate forward evolution of the system ● Generate samples of output 5 Simulators Parameters Outputs (data) Simulator Prediction: ● Simulate forward evolution of the system ● Generate samples of output 6 Simulators Parameters Outputs (data) Simulator Prediction: ● Simulate forward evolution of the system ● Generate samples of output WE NEED THE INVERSE! 7 Simulators Parameters Outputs (data) Simulator Prediction: ● Simulate forward evolution of the system ● Generate samples of output Inference: ● Find parameters that can produce (explain) observed data ● Inverse problem ● Often a manual process 8 Simulators Parameters Outputs (data) Inferred Simulator Observed data parameters Gene network Gene expression 9 Simulators Parameters Outputs (data)
    [Show full text]
  • Advances in Electron Microscopy with Deep Learning
    Advances in Electron Microscopy with Deep Learning by Jeffrey Mark Ede Thesis To be submitted to the University of Warwick for the degree of Doctor of Philosophy in Physics Department of Physics January 2021 Contents Contents i List of Abbreviations iii List of Figures viii List of Tables xvii Acknowledgments xix Declarations xx Research Training xxv Abstract xxvi Preface xxvii I Initial Motivation......................................... xxvii II Thesis Structure.......................................... xxvii III Connections............................................ xxix Chapter 1 Review: Deep Learning in Electron Microscopy1 1.1 Scientific Paper..........................................1 1.2 Reflection............................................. 100 Chapter 2 Warwick Electron Microscopy Datasets 101 2.1 Scientific Paper.......................................... 101 2.2 Amendments and Corrections................................... 133 2.3 Reflection............................................. 133 Chapter 3 Adaptive Learning Rate Clipping Stabilizes Learning 136 3.1 Scientific Paper.......................................... 136 3.2 Amendments and Corrections................................... 147 3.3 Reflection............................................. 147 Chapter 4 Partial Scanning Transmission Electron Microscopy with Deep Learning 149 4.1 Scientific Paper.......................................... 149 4.2 Amendments and Corrections................................... 176 4.3 Reflection............................................. 176
    [Show full text]
  • Master Thesis
    Master thesis To obtain a Master of Science Degree in Informatics and Communication Systems from the Merseburg University of Applied Sciences Subject: Tunisian truck license plate recognition using an Android Application based on Machine Learning as a detection tool Author: Supervisor: Achraf Boussaada Prof.Dr.-Ing. Rüdiger Klein Matr.-Nr.: 23542 Prof.Dr. Uwe Schröter Table of contents Chapter 1: Introduction ................................................................................................................................. 1 1.1 General Introduction: ................................................................................................................................... 1 1.2 Problem formulation: ................................................................................................................................... 1 1.3 Objective of Study: ........................................................................................................................................ 4 Chapter 2: Analysis ........................................................................................................................................ 4 2.1 Methodological approaches: ........................................................................................................................ 4 2.1.1 Actual approach: ................................................................................................................................... 4 2.1.2 Image Processing with OCR: ................................................................................................................
    [Show full text]
  • Curriculum Vitae
    Diogo Castro Curriculum Vitae Summary I’m a Software Engineer based in Belfast, United Kingdom. My professional journey began as a C# developer, but Functional Programming (FP) soon piqued my interest and led me to learn Scala, Haskell, PureScript and even a bit of Idris. I love building robust, reliable and maintainable applications. I also like teaching and I’m a big believer in "paying it forward". I’ve learned so much from many inspiring people, so I make it a point to share what I’ve learned with others. To that end, I’m currently in charge of training new team members in Scala and FP, do occasional presentations at work and aim to do more public speaking. I blog about FP and Haskell at https://diogocastro.com/blog. Experience Nov 2017-Present Principal Software Engineer, SpotX, Belfast, UK. Senior Software Engineer, SpotX, Belfast, UK. Developed RESTful web services in Scala, using the cats/cats-effect framework and Akka HTTP. Used Apache Kafka for publishing of events, and Prometheus/Grafana for monitor- ing. Worked on a service that aimed to augment Apache Druid, a timeseries database, with features such as access control, a safer and simpler query DSL, and automatic conversion of monetary metrics to multiple currencies. Authored a Scala library for calculating the delta of any two values of a given type using Shapeless, a library for generic programming. Taught a weekly internal Scala/FP course, with the goal of preparing our engineers to be productive in Scala whilst building an intuition of how to program with functions and equational reasoning.
    [Show full text]
  • KNIME Deep Learning Integration Installation Guide
    KNIME Deep Learning Integration Installation Guide KNIME AG, Zurich, Switzerland Version 3.7 (last updated on 2019-02-06) Table of Contents Introduction. 1 KNIME Deep Learning Integrations . 1 KNIME Keras Integration Installation. 2 Python Installation. 2 Installing the KNIME Keras Integration. 3 Extensions . 3 GPU Support. 4 KNIME TensorFlow Integration Installation . 4 Installation . 4 Advanced . 4 GPU Support. 4 KNIME Deeplearning4j Installation. 5 Installation . 5 GPU Support. 5 Known Issues . 5 KNIME Deep Learning Integration Installation Guide Introduction This document describes how to install the KNIME Deep Learning Integrations. These integrations bring deep learning capabilities to KNIME Analytics Platform, which allow you to read, create, edit, train, and execute deep neural networks within KNIME Analytics Platform. KNIME Deep Learning Integrations Three different deep learning libraries have been integrated: KNIME Keras Integration The KNIME Keras Integration utilizes the Keras deep learning framework to enable users to read, write, train, and execute Keras deep learning networks within KNIME. Furthermore, you can also build custom deep learning networks directly in KNIME via the Keras layer nodes. KNIME Tensor Flow Integration The KNIME TensorFlow Integration provides access to the powerful machine learning library TensorFlow* within KNIME. It enables you to read, write, train, and execute TensorFlow networks directly in KNIME. You can also convert your Keras networks to TensorFlow networks with this extension for even greater flexibility. * TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. KNIME Deeplearning4j Integration The KNIME Deeplearning4j Integration integrates the Deeplearning4j library into KNIME, which provides deep learning capabilities in Java. Within KNIME this means you can read, write, train, execute, and build Deeplearning4j networks.
    [Show full text]
  • Fast Inference of Deep Neural Networks in Fpgas for Particle Physics (Arxiv:1804.06913)
    Fast Inference of Deep Neural Networks in FPGAs for Particle Physics (arXiv:1804.06913) Jennifer Ngadiuba, Maurizio Pierini [CERN] Javier Duarte, Sergo Jindariani, Ben Kreis, Ryan Rivera, Nhan Tran [Fermilab] Edward Kreinar [Hawkeye 360] Song Han, Phil Harris [MIT] Zhenbin Wu [University of Illinois at Chicago] Research Techniques Seminar Fermilab 4/24/2018 1 Outline • Introduction and Motivation • Machine Learning in HEP • FPGAs and High-Level Synthesis (HLS) • Industry Trends • HEP Latency Landscape • hls4ml: HLS for Machine Learning • Case Study and Design Exploration • Summary and Outlook • Cloud-scale Acceleration Javier Duarte I hls4ml 2 Introduction Javier Duarte I hls4ml 3 ReconstructionMachine chain:Learning Jet tagging in HEP Task to find the particle ID of a jet, e.g. b-quark • Learning optimized nonlinear functions of many inputs for … performingvertices difficult tasks from (real or simulated) data … Many successes in HEP: identificationData/MC of b-quarkuser jets, Higgs • Tag Info tagger corrections … candidates,Jet, particle energy regression, analysis selection, … particles CMS-PAS-BTV-15-001 s=13 TeV, 2016 1 CMS Simulation Preliminary Key features:tt events AK4jets (p > 30 GeV) T • Long lifetimeCSVv2 of heavy 10−1 flavour quarksDeepCSV cMVAv2 • Displacedmisid. probability tracks, … • Usage10−2 of ML standard for this problem udsg Neural network based on 10−3 • 11 c high-level features 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 b-jet efficiency Javier Duarte I hls4ml 4 Machine Learning in HEP • Learning optimized nonlinear functions
    [Show full text]
  • See License Notices
    Open Source Components License Notices Last updated December 22, 2020 @nebular/eva-icons @ngx-translate/core @ngx-translate/http-loader libunwind dropbear 2017.75 popt MIT License Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. --------------------------------------------------------------------------------------------------------------------- @angular/animations @angular/common @angular/core @angular/forms @angular/platform-browser @angular/router (v.1_11-19-20) The MIT License Copyright (c) 2010-2020 Google LLC. http://angular.io/license Permission
    [Show full text]
  • An FPGA-Accelerated Embedded Convolutional Neural Network
    Master Thesis Report ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network (edit) (edit) 1000ch 1000ch FPGA 1000ch Network Analysis Network Analysis 2x512 > 1024 2x512 > 1024 David Gschwend [email protected] SqueezeNet v1.1 b2a ext7 conv10 2x416 > SqueezeNet SqueezeNet v1.1 b2a ext7 conv10 2x416 > SqueezeNet arXiv:2005.06892v1 [cs.CV] 14 May 2020 Supervisors: Emanuel Schmid Felix Eberli Professor: Prof. Dr. Anton Gunzinger August 2016, ETH Zürich, Department of Information Technology and Electrical Engineering Abstract Image Understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with tight real-time and power constraints. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Embedded CNNs thus call for small and efficient, yet very powerful computing platforms. This master thesis explores the potential of FPGA-based CNN acceleration and demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip. The ZynqNet Embedded CNN is designed for image classification on ImageNet and consists of ZynqNet CNN, an optimized and customized CNN topology, and the ZynqNet FPGA Accelerator, an FPGA-based architecture for its evaluation. ZynqNet CNN is a highly efficient CNN topology. Detailed analysis and optimization of prior topologies using the custom-designed Netscope CNN Analyzer have enabled a CNN with 84.5 % top-5 accuracy at a computational complexity of only 530 million multiply- accumulate operations. The topology is highly regular and consists exclusively of convolu- tional layers, ReLU nonlinearities and one global pooling layer.
    [Show full text]