Pruning Convolution Neural Network (Squeezenet) For

Total Page:16

File Type:pdf, Size:1020Kb

Pruning Convolution Neural Network (Squeezenet) For PRUNING CONVOLUTION NEURAL NETWORK (SQUEEZENET) FOR EFFICIENT HARDWARE DEPLOYMENT A Thesis Submitted to the Faculty of Purdue University by Akash S. Gaikwad In Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical and Computer Engineering December 2018 Purdue University Indianapolis, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF COMMITTEE APPROVAL Dr. Mohamed El-Sharkawy Department of Electrical and Computer Engineering Dr. Maher Rizkalla Department of Electrical and Computer Engineering Dr. Brian King Department of Electrical and Computer Engineering Approved by: Dr. Brian King Head of the Graduate Program iii This Thesis is dedicated to my Parents, Jayashree and Sunil Gaikwad, and my family, Alisha and Ragini. iv ACKNOWLEDGMENTS I would like to express my sincere gratitude to my adviser Dr. Mohamed El- Sharkawy for his patience, motivation and constant guidance throughout my masters education and research. Besides my advisor, I would like to thank my fellow lab- mates, Durvesh, Dewant, Surya, Shreeram and Raghavan for valuable advice and stimulating sleepless discussion and for all the fun we had in the last two years. My sincere thanks to IoT Collaboratory and department of Electrical and Computer Engineering, who provided the access to the laboratory and research facilities. Special thanks to Sherrie and Dr. Brian King for their constant support and motivation throughout my time at IUPUI. Last but not the least, I would like to thank and acknowledge, Government of Maharashtra and India, for sponsoring my education. v TABLE OF CONTENTS Page LIST OF TABLES :::::::::::::::::::::::::::::::::: viii LIST OF FIGURES ::::::::::::::::::::::::::::::::: ix SYMBOLS :::::::::::::::::::::::::::::::::::::: xii ABBREVIATIONS :::::::::::::::::::::::::::::::::: xiii ABSTRACT ::::::::::::::::::::::::::::::::::::: xiv 1 INTRODUCTION :::::::::::::::::::::::::::::::: 1 1.1 Motivation :::::::::::::::::::::::::::::::::: 1 1.2 Contribution ::::::::::::::::::::::::::::::::: 4 2 BACKGROUND ::::::::::::::::::::::::::::::::: 5 3 CONCEPTS :::::::::::::::::::::::::::::::::::: 7 3.1 Introduction to Neural Network :::::::::::::::::::::: 7 3.2 Network training :::::::::::::::::::::::::::::: 8 3.3 Introduction to CNN :::::::::::::::::::::::::::: 9 3.3.1 Convolution layer :::::::::::::::::::::::::: 13 3.3.2 Nonlinearity layers ::::::::::::::::::::::::: 16 3.3.3 Normalization layer ::::::::::::::::::::::::: 16 3.3.4 Dropout layers ::::::::::::::::::::::::::: 16 3.3.5 Pooling layers :::::::::::::::::::::::::::: 17 3.3.6 Fully connected layers (FC) :::::::::::::::::::: 19 3.4 Benchmarked CNN architectures ::::::::::::::::::::: 20 3.4.1 LeNet :::::::::::::::::::::::::::::::: 20 3.4.2 AlexNet ::::::::::::::::::::::::::::::: 21 3.4.3 GoogLeNet ::::::::::::::::::::::::::::: 21 3.4.4 VGGNet ::::::::::::::::::::::::::::::: 21 vi Page 3.4.5 SqueezeNet ::::::::::::::::::::::::::::: 22 3.5 Datasets ::::::::::::::::::::::::::::::::::: 23 3.5.1 MNIST ::::::::::::::::::::::::::::::: 24 3.5.2 CIFAR-10 :::::::::::::::::::::::::::::: 24 3.5.3 ImageNet :::::::::::::::::::::::::::::: 24 4 COMPARISON BETWEEN DIFFERENT DEEP LEARNING FRAME- WORK ::::::::::::::::::::::::::::::::::::::: 26 4.1 TensorFlow ::::::::::::::::::::::::::::::::: 27 4.2 Keras ::::::::::::::::::::::::::::::::::::: 27 4.3 Caffe ::::::::::::::::::::::::::::::::::::: 27 4.4 Theano :::::::::::::::::::::::::::::::::::: 28 4.5 PyTorch ::::::::::::::::::::::::::::::::::: 28 5 TRAINING THE SQUEEZENET WITH THE CIFAR-10 DATASET ON PYTORCH :::::::::::::::::::::::::::::::::::: 29 5.1 Transfer learning :::::::::::::::::::::::::::::: 29 5.2 Training the SqueezeNet :::::::::::::::::::::::::: 30 6 PRUNING METHODS :::::::::::::::::::::::::::::: 33 6.1 Types of Pruning :::::::::::::::::::::::::::::: 34 7 PRUNING BASED ON L2 NORMALIZATION OF ACTIVATION MAPS : 41 7.1 Implementation algorithm ::::::::::::::::::::::::: 41 7.2 Results :::::::::::::::::::::::::::::::::::: 48 8 PRUNING BASED ON TAYLOR EXPANSION OF COST FUNCTION :: 49 8.1 Implementation algorithm ::::::::::::::::::::::::: 49 8.2 Results :::::::::::::::::::::::::::::::::::: 52 9 PRUNING BASED ON A COMBINATION OF TAYLOR EXPANSION OF COST FUNCTION & L2 NORMALIZATION OF ACTIVATION MAPS :: 59 9.1 Implementation algorithm ::::::::::::::::::::::::: 59 9.2 Results :::::::::::::::::::::::::::::::::::: 62 vii Page 10 HARDWARE DEPLOYMENT OF PRUNED SQUEEZENET MODEL ON BLUEBOX USING RTMAPS :::::::::::::::::::::::::: 68 10.1 NXP BlueBox :::::::::::::::::::::::::::::::: 69 10.1.1 S32V234 ::::::::::::::::::::::::::::::: 70 10.1.2 LS2084A ::::::::::::::::::::::::::::::: 71 10.2 Real-Time Multisensor applications (RTMaps) :::::::::::::: 72 10.2.1 RTMaps Runtime Engine ::::::::::::::::::::: 73 10.2.2 RTMaps Component Library ::::::::::::::::::: 74 10.2.3 RTMaps Studio ::::::::::::::::::::::::::: 74 10.2.4 RTMaps Embedded ::::::::::::::::::::::::: 74 10.3 Hardware Implementation ::::::::::::::::::::::::: 75 11 SUMMARY :::::::::::::::::::::::::::::::::::: 77 REFERENCES :::::::::::::::::::::::::::::::::::: 78 viii LIST OF TABLES Table Page 1.1 Types of layers in modern Convolution Neural Network.(Percentage wise) : 2 1.2 Number of parameters in the modern Convolution Neural Network archi- tecture. ::::::::::::::::::::::::::::::::::::: 3 7.1 Sensitivity to pruning the SqueezeNet model on CIFAR-10 dataset. :::: 42 8.1 Pruning Pattern per iteration of SqueezeNet Model using Taylor expansion- based criterion with pruning ratio = 67%. :::::::::::::::::: 57 8.2 Sensitivity to pruning the SqueezeNet model on CIFAR-10 dataset. :::: 58 ix LIST OF FIGURES Figure Page 1.1 Limited resources for embedded devices. ::::::::::::::::::: 2 3.1 Biological neuron (left) and its Mathematical model (right) [20]. :::::: 7 3.2 Artificial Neural Network which has three layers: Input, Hidden and Out- put layers. :::::::::::::::::::::::::::::::::::: 8 3.3 Cost function of CNN ::::::::::::::::::::::::::::: 10 3.4 Comparison of NN and CNN. ::::::::::::::::::::::::: 10 3.5 Fully Connected NN for Images. ::::::::::::::::::::::: 11 3.6 Locally Connected NN for Image ::::::::::::::::::::::: 11 3.7 Convolution layer for Image :::::::::::::::::::::::::: 12 3.8 CNN classifier example [20]. :::::::::::::::::::::::::: 12 3.9 Application of a single convolutional layer with N filters of size k × k × 3 with stride S = 1 to input data of size width height with three channels. : 14 3.10 Kernels in convolution layer. :::::::::::::::::::::::::: 14 3.11 Zero Padding :::::::::::::::::::::::::::::::::: 15 3.12 Convolution example with following parameters W = 7;K = 2;F = 3;S = 2;P = 1 [21] ::::::::::::::::::::::::::::::: 17 3.13 Rectified linear Unit function (ReLU) ::::::::::::::::::::: 18 3.14 Dropout layer :::::::::::::::::::::::::::::::::: 18 3.15 Pooling layer :::::::::::::::::::::::::::::::::: 19 3.16 Maxpool with kernel size = 2 and stride = 2 ::::::::::::::::: 19 3.17 Fully connected layer in CNN architecture. :::::::::::::::::: 20 3.18 LeNet-5 architecture [22]. ::::::::::::::::::::::::::: 20 3.19 AlexNet architecture. :::::::::::::::::::::::::::::: 21 3.20 GoogLeNet Network [2] (From Left to Right) :::::::::::::::: 22 3.21 VGG16 Architecture (From Left to Right) :::::::::::::::::: 22 x Figure Page 3.22 Fire layer of SqueezeNet Architecture [1]. ::::::::::::::::::: 24 4.1 Deep learning framework logos [23] [24] [25] [26] [27]. :::::::::::: 26 5.1 SqueezeNet model accuracy - trained from scratch on CIFAR-10 dataset. : 30 5.2 SqueezeNet model accuracy - (Pretrained model- ImageNet) on CIFAR-10 dataset. ::::::::::::::::::::::::::::::::::::: 31 5.3 Training loss for SqueezeNet model - (Trained from scratch) :::::::: 31 5.4 Training loss for SqueezeNet model - (Pretrained model) :::::::::: 31 5.5 Modified SqueezeNet architecture for CIFAR-10 dataset ::::::::::: 32 6.1 Vehicle classification with small CNN. :::::::::::::::::::: 36 6.2 Vehicle classification with large CNN. ::::::::::::::::::::: 37 6.3 Vehicle classification with large CNN and pruning. ::::::::::::: 38 6.4 Fine Pruning. :::::::::::::::::::::::::::::::::: 39 6.5 Coarse Pruning. ::::::::::::::::::::::::::::::::: 39 6.6 Coarse Pruning. ::::::::::::::::::::::::::::::::: 39 6.7 Pruning steps to reduce the model based on pruning criteria. :::::::: 40 7.1 Activation maps generated by convolution :::::::::::::::::: 41 7.2 Pruning steps to reduce the model based on L2 normalization of activation map. ::::::::::::::::::::::::::::::::::::::: 44 7.3 Pruning with L2 Normalization of activation with pruning ratio - 67% ::: 45 7.4 Pruning with L2 Normalization of activation with pruning ratio - 75% ::: 45 7.5 Pruning with L2 Normalization of activation with pruning ratio - 80% ::: 45 7.6 Pruning with L2 Normalization of activation with pruning ratio - 85% ::: 46 7.7 Pruning with L2 Normalization of activation with pruning ratio - 90% ::: 46 7.8 Pruning with L2 Normalization of activation with pruning ratio - 95% ::: 46 7.9 Accuracy vs Pruning ratio for L2 Normalization of activation map based pruning. ::::::::::::::::::::::::::::::::::::: 47 8.1 Cost function value before pruning and after pruning. :::::::::::: 50 8.2 Pruning with Taylor expansion based criteria with pruning ratio - 67% :: 52 8.3 Pruning with Taylor expansion based criteria with pruning ratio - 70% :: 52 xi Figure Page 8.4 Pruning with Taylor expansion based criteria with pruning ratio - 80% :: 53 8.5 Pruning with Taylor expansion based criteria with pruning ratio - 85% :: 53 8.6 Pruning with Taylor expansion based criteria with pruning ratio - 90% :: 54 8.7 Pruning
Recommended publications
  • Intelligent Recognition Method of Low-Altitude Squint Optical Ship Target Fused with Simulation Samples
    remote sensing Article Intelligent Recognition Method of Low-Altitude Squint Optical Ship Target Fused with Simulation Samples Bo Liu 1,2 , Qi Xiao 1,2, Yuhao Zhang 1,2, Wei Ni 1, Zhen Yang 1 and Ligang Li 1,* 1 National Space Science Center, Key Laboratory of Electronics and Information Technology for Space System, Chinese Academy of Sciences, Beijing 100190, China; [email protected] (B.L.); [email protected] (Q.X.); [email protected] (Y.Z.); [email protected] (W.N.); [email protected] (Z.Y.) 2 School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China * Correspondence: [email protected]; Tel.: +86-1312-152-1820 Abstract: To address the problem of intelligent recognition of optical ship targets under low-altitude squint detection, we propose an intelligent recognition method based on simulation samples. This method comprehensively considers geometric and spectral characteristics of ship targets and ocean background and performs full link modeling combined with the squint detection atmospheric transmission model. It also generates and expands squint multi-angle imaging simulation samples of ship targets in the visible light band using the expanded sample type to perform feature analysis and modification on SqueezeNet. Shallow and deeper features are combined to improve the accuracy of feature recognition. The experimental results demonstrate that using simulation samples to expand the training set can improve the performance of the traditional k-nearest neighbors algorithm and modified SqueezeNet. For the classification of specific ship target types, a mixed-scene dataset Citation: Liu, B.; Xiao, Q.; Zhang, Y.; expanded with simulation samples was used for training.
    [Show full text]
  • NXP Eiq Machine Learning Software Development Environment For
    AN12580 NXP eIQ™ Machine Learning Software Development Environment for QorIQ Layerscape Applications Processors Rev. 3 — 12/2020 Application Note Contents 1 Introduction 1 Introduction......................................1 Machine Learning (ML) is a computer science domain that has its roots in 2 NXP eIQ software introduction........1 3 Building eIQ Components............... 2 the 1960s. ML provides algorithms capable of finding patterns and rules in 4 OpenCV getting started guide.........3 data. ML is a category of algorithm that allows software applications to become 5 Arm Compute Library getting started more accurate in predicting outcomes without being explicitly programmed. guide................................................7 The basic premise of ML is to build algorithms that can receive input data and 6 TensorFlow Lite getting started guide use statistical analysis to predict an output while updating output as new data ........................................................ 8 becomes available. 7 Arm NN getting started guide........10 8 ONNX Runtime getting started guide In 2010, the so-called deep learning started. It is a fast-growing subdomain ...................................................... 16 of ML, based on Neural Networks (NN). Inspired by the human brain, 9 PyTorch getting started guide....... 17 deep learning achieved state-of-the-art results in various tasks; for example, A Revision history.............................17 Computer Vision (CV) and Natural Language Processing (NLP). Neural networks are capable of learning complex patterns from millions of examples. A huge adaptation is expected in the embedded world, where NXP is the leader. NXP created eIQ machine learning software for QorIQ Layerscape applications processors, a set of ML tools which allows developing and deploying ML applications on the QorIQ Layerscape family of devices.
    [Show full text]
  • Lecture 10: Recurrent Neural Networks
    Lecture 10: Recurrent Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 1 May 4, 2017 Administrative A1 grades will go out soon A2 is due today (11:59pm) Midterm is in-class on Tuesday! We will send out details on where to go soon Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 2 May 4, 2017 Extra Credit: Train Game More details on Piazza by early next week Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 3 May 4, 2017 Last Time: CNN Architectures AlexNet Figure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 4 May 4, 2017 Last Time: CNN Architectures Softmax FC 1000 Softmax FC 4096 FC 1000 FC 4096 FC 4096 Pool FC 4096 3x3 conv, 512 Pool 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 Pool Pool 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 Pool Pool 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 Pool Pool 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 Pool Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Input Input VGG16 VGG19 GoogLeNet Figure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 5 May 4, 2017 Last Time: CNN Architectures Softmax FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 relu 3x3 conv, 64 3x3 conv, 64 F(x) + x 3x3 conv, 64 ..
    [Show full text]
  • Image Retrieval Algorithm Based on Convolutional Neural Network
    Advances in Intelligent Systems Research, volume 133 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE2016) Image Retrieval Algorithm Based on Convolutional Neural Network Hailong Liu 1, 2, Baoan Li 1, 2, *, Xueqiang Lv 1 and Yue Huang 3 1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science & Technology University, Beijing 100101, China 2Computer School, Beijing Information Science and Technology University, Beijing 100101, China 3Xuanwu Hospital Capital Medical University, 100053, China *Corresponding author Abstract—With the rapid development of computer technology low, can’t meet the needs of people. In order to overcome this and the increasing of multimedia data on the Internet, how to difficulty, the researchers from the image itself, and put quickly find the desired information in the massive data becomes forward the method of image retrieval based on content. The a hot issue. Image retrieval can be used to retrieve similar images, method is to extract the visual features of the image content: and the effect of image retrieval depends on the selection of image color, texture, shape etc., the image database to be detected features to a certain extent. Based on deep learning, through self- samples for similarity matching, retrieval and sample images learning ability of a convolutional neural network to extract more are similar to the image. The main process of the method is the conducive to the high-level semantic feature of image retrieval selection and extraction of features, but there are "semantic using convolutional neural network, and then use the distance gap" [2] between low-level features and high-level semantic metric function similar image.
    [Show full text]
  • Fast Inference of Deep Neural Networks in Fpgas for Particle Physics (Arxiv:1804.06913)
    Fast Inference of Deep Neural Networks in FPGAs for Particle Physics (arXiv:1804.06913) Jennifer Ngadiuba, Maurizio Pierini [CERN] Javier Duarte, Sergo Jindariani, Ben Kreis, Ryan Rivera, Nhan Tran [Fermilab] Edward Kreinar [Hawkeye 360] Song Han, Phil Harris [MIT] Zhenbin Wu [University of Illinois at Chicago] Research Techniques Seminar Fermilab 4/24/2018 1 Outline • Introduction and Motivation • Machine Learning in HEP • FPGAs and High-Level Synthesis (HLS) • Industry Trends • HEP Latency Landscape • hls4ml: HLS for Machine Learning • Case Study and Design Exploration • Summary and Outlook • Cloud-scale Acceleration Javier Duarte I hls4ml 2 Introduction Javier Duarte I hls4ml 3 ReconstructionMachine chain:Learning Jet tagging in HEP Task to find the particle ID of a jet, e.g. b-quark • Learning optimized nonlinear functions of many inputs for … performingvertices difficult tasks from (real or simulated) data … Many successes in HEP: identificationData/MC of b-quarkuser jets, Higgs • Tag Info tagger corrections … candidates,Jet, particle energy regression, analysis selection, … particles CMS-PAS-BTV-15-001 s=13 TeV, 2016 1 CMS Simulation Preliminary Key features:tt events AK4jets (p > 30 GeV) T • Long lifetimeCSVv2 of heavy 10−1 flavour quarksDeepCSV cMVAv2 • Displacedmisid. probability tracks, … • Usage10−2 of ML standard for this problem udsg Neural network based on 10−3 • 11 c high-level features 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 b-jet efficiency Javier Duarte I hls4ml 4 Machine Learning in HEP • Learning optimized nonlinear functions
    [Show full text]
  • Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices
    Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices 1st Chunjie Luo 2nd Xiwen He 3nd Jianfeng Zhan Institute of Computing Technology Institute of Computing Technology Institute of Computing Technology Chinese Academy of Sciences Chinese Academy of Sciences Chinese Academy of Sciences BenchCouncil BenchCouncil BenchCouncil Beijing, China Beijing, China Beijing, China [email protected] [email protected] [email protected] 4nd Lei Wang 5nd Wanling Gao 6nd Jiahui Dai Institute of Computing Technology Institute of Computing Technology Beijing Academy of Frontier Chinese Academy of Sciences Chinese Academy of Sciences Sciences and Technology BenchCouncil BenchCouncil Beijing, China Beijing, China Beijing, China [email protected] wanglei [email protected] [email protected] Abstract—Due to increasing amounts of data and compute inference on the edge devices can 1) reduce the latency, 2) resources, deep learning achieves many successes in various protect the privacy, 3) consume the power [2]. domains. The application of deep learning on the mobile and em- To make the inference on the edge devices more efficient, bedded devices is taken more and more attentions, benchmarking and ranking the AI abilities of mobile and embedded devices the neural networks are designed more light-weight by using becomes an urgent problem to be solved. Considering the model simpler architecture, or by quantizing, pruning and compress- diversity and framework diversity, we propose a benchmark suite, ing the networks. Different networks present different trade- AIoTBench, which focuses on the evaluation of the inference offs between accuracy and computational complexity. These abilities of mobile and embedded devices.
    [Show full text]
  • LSTM-In-LSTM for Generating Long Descriptions of Images
    Computational Visual Media DOI 10.1007/s41095-016-0059-z Vol. 2, No. 4, December 2016, 379–388 Research Article LSTM-in-LSTM for generating long descriptions of images Jun Song1, Siliang Tang1, Jun Xiao1, Fei Wu1( ), and Zhongfei (Mark) Zhang2 c The Author(s) 2016. This article is published with open access at Springerlink.com Abstract In this paper, we propose an approach by means of text (description generation) is a for generating rich fine-grained textual descriptions of fundamental task in artificial intelligence, with many images. In particular, we use an LSTM-in-LSTM (long applications. For example, generating descriptions short-term memory) architecture, which consists of an of images may help visually impaired people better inner LSTM and an outer LSTM. The inner LSTM understand the content of images and retrieve images effectively encodes the long-range implicit contextual using descriptive texts. The challenge of description interaction between visual cues (i.e., the spatially- generation lies in appropriately developing a model concurrent visual objects), while the outer LSTM that can effectively represent the visual cues in generally captures the explicit multi-modal relationship images and describe them in the domain of natural between sentences and images (i.e., the correspondence language at the same time. of sentences and images). This architecture is capable There have been significant advances in of producing a long description by predicting one description generation recently. Some efforts word at every time step conditioned on the previously rely on manually-predefined visual concepts and generated word, a hidden vector (via the outer LSTM), and a context vector of fine-grained visual cues (via sentence templates [1–3].
    [Show full text]
  • The History Began from Alexnet: a Comprehensive Survey on Deep Learning Approaches
    > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches Md Zahangir Alom1, Tarek M. Taha1, Chris Yakopcic1, Stefan Westberg1, Paheding Sidike2, Mst Shamima Nasrin1, Brian C Van Essen3, Abdul A S. Awwal3, and Vijayan K. Asari1 Abstract—In recent years, deep learning has garnered I. INTRODUCTION tremendous success in a variety of application domains. This new ince the 1950s, a small subset of Artificial Intelligence (AI), field of machine learning has been growing rapidly, and has been applied to most traditional application domains, as well as some S often called Machine Learning (ML), has revolutionized new areas that present more opportunities. Different methods several fields in the last few decades. Neural Networks have been proposed based on different categories of learning, (NN) are a subfield of ML, and it was this subfield that spawned including supervised, semi-supervised, and un-supervised Deep Learning (DL). Since its inception DL has been creating learning. Experimental results show state-of-the-art performance ever larger disruptions, showing outstanding success in almost using deep learning when compared to traditional machine every application domain. Fig. 1 shows, the taxonomy of AI. learning approaches in the fields of image processing, computer DL (using either deep architecture of learning or hierarchical vision, speech recognition, machine translation, art, medical learning approaches) is a class of ML developed largely from imaging, medical information processing, robotics and control, 2006 onward. Learning is a procedure consisting of estimating bio-informatics, natural language processing (NLP), cybersecurity, and many others.
    [Show full text]
  • An FPGA-Accelerated Embedded Convolutional Neural Network
    Master Thesis Report ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network (edit) (edit) 1000ch 1000ch FPGA 1000ch Network Analysis Network Analysis 2x512 > 1024 2x512 > 1024 David Gschwend [email protected] SqueezeNet v1.1 b2a ext7 conv10 2x416 > SqueezeNet SqueezeNet v1.1 b2a ext7 conv10 2x416 > SqueezeNet arXiv:2005.06892v1 [cs.CV] 14 May 2020 Supervisors: Emanuel Schmid Felix Eberli Professor: Prof. Dr. Anton Gunzinger August 2016, ETH Zürich, Department of Information Technology and Electrical Engineering Abstract Image Understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with tight real-time and power constraints. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Embedded CNNs thus call for small and efficient, yet very powerful computing platforms. This master thesis explores the potential of FPGA-based CNN acceleration and demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip. The ZynqNet Embedded CNN is designed for image classification on ImageNet and consists of ZynqNet CNN, an optimized and customized CNN topology, and the ZynqNet FPGA Accelerator, an FPGA-based architecture for its evaluation. ZynqNet CNN is a highly efficient CNN topology. Detailed analysis and optimization of prior topologies using the custom-designed Netscope CNN Analyzer have enabled a CNN with 84.5 % top-5 accuracy at a computational complexity of only 530 million multiply- accumulate operations. The topology is highly regular and consists exclusively of convolu- tional layers, ReLU nonlinearities and one global pooling layer.
    [Show full text]
  • Active Authentication Using an Autoencoder Regularized CNN-Based One-Class Classifier
    Active Authentication Using an Autoencoder Regularized CNN-based One-Class Classifier Poojan Oza and Vishal M. Patel Department of Electrical and Computer Engineering, Johns Hopkins University, 3400 N. Charles St, Baltimore, MD 21218, USA email: fpoza2, [email protected] Access Abstract— Active authentication refers to the process in Enrolled User Denied which users are unobtrusively monitored and authenticated Training Images continuously throughout their interactions with mobile devices. AA Access Generally, an active authentication problem is modelled as a Module Granted one class classification problem due to the unavailability of Trained Access data from the impostor users. Normally, the enrolled user is Denied considered as the target class (genuine) and the unauthorized users are considered as unknown classes (impostor). We propose Training a convolutional neural network (CNN) based approach for one class classification in which a zero centered Gaussian noise AA and an autoencoder are used to model the pseudo-negative Module class and to regularize the network to learn meaningful feature Test Images representations for one class data, respectively. The overall network is trained using a combination of the cross-entropy and (a) (b) the reconstruction error losses. A key feature of the proposed approach is that any pre-trained CNN can be used as the Fig. 1: An overview of a typical AA system. (a) Data base network for one class classification. Effectiveness of the corresponding to the enrolled user are used to train an AA proposed framework is demonstrated using three publically system. (b) During testing, data corresponding to the enrolled available face-based active authentication datasets and it is user as well as unknown user may be presented to the system.
    [Show full text]
  • Key Ideas and Architectures in Deep Learning Applications That (Probably) Use DL
    Key Ideas and Architectures in Deep Learning Applications that (probably) use DL Autonomous Driving Scene understanding /Segmentation Applications that (probably) use DL WordLens Prisma Outline of today’s talk Image Recognition Fun application using CNNs ● LeNet - 1998 ● Image Style Transfer ● AlexNet - 2012 ● VGGNet - 2014 ● GoogLeNet - 2014 ● ResNet - 2015 Questions to ask about each architecture/ paper Special Layers Non-Linearity Loss function Weight-update rule Train faster? Reduce parameters Reduce Overfitting Help you visualize? LeNet5 - 1998 LeNet5 - Specs MNIST - 60,000 training, 10,000 testing Input is 32x32 image 8 layers 60,000 parameters Few hours to train on a laptop Modified LeNet Architecture - Assignment 3 Training Input Forward pass Maxp Max Conv ReLU Conv ReLU FC ReLU Softmax ool pool Backpropagation - Labels Loss update weights Modified LeNet Architecture - Assignment 3 Testing Input Forward pass Maxp Max Conv ReLU Conv ReLU FC ReLU Softmax ool pool Output Compare output with labels Modified LeNet - CONV Layer 1 Input - 28 x 28 Output - 6 feature maps - each 24 x 24 Convolution filter - 5 x 5 x 1 (convolution) + 1 (bias) How many parameters in this layer? Modified LeNet - CONV Layer 1 Input - 32 x 32 Output - 6 feature maps - each 28 x 28 Convolution filter - 5 x 5 x 1 (convolution) + 1 (bias) How many parameters in this layer? (5x5x1+1)*6 = 156 Modified LeNet - Max-pooling layer Decreases the spatial extent of the feature maps, makes it translation-invariant Input - 28 x 28 x 6 volume Maxpooling with filter size 2
    [Show full text]
  • Compositional Falsification of Cyber-Physical Systems with Machine Learning Components
    Compositional Falsification of Cyber-Physical Systems with Machine Learning Components Tommaso Dreossi Alexandre Donze Sanjit A. Seshia Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2017-165 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-165.html November 26, 2017 Copyright © 2017, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This technical report is an extended version of an article that appeared at NFM 2017 and is under submission to the Journal of Automated Reasoning (JAR). Noname manuscript No. (will be inserted by the editor) Compositional Falsification of Cyber-Physical Systems with Machine Learning Components Tommaso Dreossi · Alexandre Donz´e · Sanjit A. Seshia the date of receipt and acceptance should be inserted later Abstract Cyber-physical systems (CPS), such as automotive systems, are starting to include sophisticated machine learning (ML) components. Their correctness, therefore, depends on properties of the inner ML modules. While learning algorithms aim to generalize from examples, they are only as good as the examples provided, and recent efforts have shown that they can pro- duce inconsistent output under small adversarial perturbations. This raises the question: can the output from learning components lead to a failure of the entire CPS? In this work, we address this question by formulating it as a problem of falsifying signal temporal logic (STL) specifications for CPS with ML components.
    [Show full text]