A Hardware Accelerator for the Inference of a Convolutional Neural Network

Total Page:16

File Type:pdf, Size:1020Kb

A Hardware Accelerator for the Inference of a Convolutional Neural Network Editorial Ciencia e 2019 Neogranadina Ingeniería Vol. 30(1) Neogranadina enero - julio 2020 ■ ISSN: 0124-8170 ▪ ISSN-e: 1909-7735 ■ pp. 107-116 DOI: https://doi.org/10.18359/rcin.4194 A Hardware Accelerator for The Inference of a Convolutional Neural Network Edwin Gonzáleza ■ Walter D. Villamizar Lunab ■ Carlos Augusto Fajardo Arizac Abstract: Convolutional Neural Networks (CNNs) are becoming increasingly popular in deep learning applications, e.g. image classification, speech recognition, medicine, to name a few. However, CNN inference is computationally intensive and demands a large number of memory resources. This work proposes a CNN inference hardware accelerator, which was implemented in a co-processing scheme. The aim is to reduce hardware resources and achieve the best possible throughput. The design is implemented in the Digilent Arty Z7-20 development board, which is based on the Xilinx Zynq-7000 System on Chip (SoC). Our implementation achieved a of accuracy for the MNIST database using only a 12-bits fixed-point format. Results show that the co-processing scheme operating at a conserva- tive speed of 100 MHz can identify around 441 images per second, which is about 17 % times faster than a 650 MHz - software implementation. It is difficult to compare our results against other Field- Programmable Gate Array (FPGA)-based implementations because they are not exactly like ours. However, some comparisons, regarding logical resources used and accuracy, suggest that our work could be better than previous ones. Besides, the proposed scheme is compared with a hardware implementation in terms of power consumption and throughput. Keywords: CNN; FPGA; hardware accelerator; MNIST; Zynq Received: July 12th, 2019 Accepted: October 31st, 2019 Available online: July 15th, 2020 How to cite: E. González, W. D. Villamizar Luna, and C. A. Fajardo Ariza, “A Hardware Accelerator for the Inference of a Convolutional Neural network”, Cien.Ing.Neogranadina, vol. 30, no. 1, pp. 107-116, Nov. 2019. a Universidad Industrial de Santander. E-mail: [email protected] ORCID: https://orcid.org/0000-0003-2217-9817 b Universidad Industrial de Santander. E-mail: [email protected] ORCID: https://orcid.org/0000-0003-4341-8020 c Universidad Industrial de Santander. E-mail: [email protected] ORCID: https://orcid.org/0000-0002-8995-4585 107 Acelerador en hardware para la inferencia de una red neuronal convolucional Resumen: las redes neuronales convolucionales cada vez son más populares en aplicaciones de aprendizaje profundo, como por ejemplo en clasificación de imágenes, reconocimiento de voz, me- dicina, entre otras. Sin embargo, estas redes son computacionalmente costosas y requieren altos recursos de memoria. En este trabajo se propone un acelerador en hardware para el proceso de inferencia de la red Lenet-5, un esquema de coprocesamiento hardware/software. El objetivo de la implementación es reducir el uso de recursos de hardware y obtener el mejor rendimiento computa- cional posible durante el proceso de inferencia. El diseño fue implementado en la tarjeta de desarro- llo Digilent Arty Z7-20, la cual está basada en el System on Chip (SoC) Zynq-7000 de Xilinx. Nuestra implementación logró una precisión del 97,59 % para la base de datos MNIST utilizando tan solo 12 bits en el formato de punto fijo. Los resultados muestran que el esquema de co-procesamiento, el cual opera a una velocidad de 100 MHz, puede identificar aproximadamente 441 imágenes por se- gundo, que equivale aproximadamente a un 17 % más rápido que una implementación de software a 650 MHz. Es difícil comparar nuestra implementación con otras implementaciones similares, porque las implementaciones encontradas en la literatura no son exactamente como la que realizó en este trabajo. Sin embargo, algunas comparaciones, en relación con el uso de recursos lógicos y la preci- sión, sugieren que nuestro trabajo supera a trabajos previos. Palabras clave: CNN; FPGA; acelerador en hardware; MNIST, Zynq 108 Revista Ciencia e Ingeniería Neogranadina ■ Vol. 30(1) It is important to note that most of the works 1 Introduction mentioned above have used High-Level Syn- Image classification has been widely used in many thesis (HLS) software. This tool allows creating a areas of the industry. Convolutional neural net- software accelerator directly, without the need to works (CNNs) have achieved high accuracy and manually create a Register Transfer Level (RTL). robustness for image classification (e.g. Lenet-5 [1], However, HLS software generally causes higher GoogLenet [2]). CNNs have numerous potential hardware resource utilization, which explains why applications in object recognition [3, 4], face detec- our implementation required less logical resources tion [5], and medical applications [6], among oth- than previous works. ers. In contrast, modern models of CNNs demand In this work, we implemented an RTL architec- a high computational cost and large memory ture for Lenet-5 inference, which was described by resources. This high demand is due to the multiple using directly Hardware Description Language arithmetic operations to be solved and the huge (HDL) (eg. Verilog). It aims to achieve low hard- amount of parameters to be saved. Thus, large ware resource utilization and high throughput. computing systems and high energy dissipation We have designed a Software/Hardware SW( /HW) are required. co-processing to reduce hardware resources. The Therefore, several hardware accelerators have work established the number of bits in a fixed- been proposed in recent years to achieve low point representation that achieves the best ratio power usage and high performance [7-10]. In [11] between accuracy/number of bits. The implemen- an FPGA-based accelerator is proposed, which tation was done using 12-bits fixed-point on the is implemented in Xilinx Zynq-7020 FPGA. They Zynq platform. Our results show that there is not a implement three different architectures: 32-bit significant decrease in accuracy besides low hard- floating, 20-bits fixed point format and a binarized ware resource usage. (one bit) version, which achieved an accuracy of This paper is organized as follows: Section 2 96.33 %, 94.67 %, and 88 %, respectively. In [12] an describes CNNs; Section 3 explains the proposed automated design methodology was proposed to scheme; Section 4 presents details of the implemen- perform a different subset ofCNN convolutional tation and the results of our work; Section 5 dis- layers into multiple processors by partitioning cusses results and some ideas for further research; available FPGA resources. Results achieved a 3.8x, And finally, Section 6 concludes this paper. 2.2x and 2.0x higher throughput than previous works in AlexNet, SqueezeNet, GoogLenet, respec- tively. In [13] a CNN for a low-power embedded 2 Convolutional neural networks system was proposed. Results achieved 2x energy CNNs allow the extraction of features from input efficiency compared with GPU implementation. In data to classify them into a set of pre-established [14] some methods are proposed to optimize CNNs categories. To classify data CNNs should be trained. regarding energy efficiency and high through- The training process aims at fitting the parameters put. In this work, an FPGA-based CNN for Lenet-5 to classify data with the desired accuracy. During was implemented on the Zynq-7000 platform. It the training process, many input data are pre- achieved a 37 % higher throughput and 93.7 % less sented to the CNN with the respective labels. Then energy dissipation than GPU implementation, and a gradient-based learning algorithm is executed to the same error rate of 0.99 % in software imple- minimize a loss function by updating CNN param- mentation. In [15] a six-layer accelerator is imple- eters (weights and biases) [1]. The loss function mented for MNIST digit recognition, which uses 25 evaluates the inconsistency between the predicted bits and achieves an accuracy of 98.62 %. In [16] a label and the current label. 5-layer accelerator is implemented for MNIST digit A CNN consists of a series of layers that run recognition, which uses 11 bits and achieves an sequentially. The output of a specific layer is the accuracy of 96.8 %. input of the subsequent layer. TheCNN typically A Hardware Accelerator for The Inference of a Convolutional Neural Network 109 Revista Ciencia e Ingeniería Neogranadina ■ Vol. 30(1) uses three types of layers: convolutional, sub-sam- Rectifier Linear Unit (ReLu), Sigmoid). The con- pling and fully connected layers. Convolutional volutional layer takes the feature maps in the pre- layers extract features from the input image. vious layer with depth and convolves them with Sub-sampling layers reduce both spatial size and kernels of the same depth. Then, the bias is added computational complexity. Finally, fully connected to the convolution output and this result passes layers classify data. through an activation function, in this case, ReLu [17]. Kernels shift into the input with a stride of one 2.1 Lenet-5 architecture to obtain an output feature map. Convolutional The Lenet-5 architecture [2] includes seven layers layers have feature maps of n-k+1×n-k+1 pixels. (Figure 1): three convolutional layers (C1, C3, and Note that weights and bias parameters are C5), two sub-sampling layers, (S2 and S4) and two not found in the S2 and S4 layers. The output of a fully connected layers (F6 and OUTPUT). sub-sampling layer is obtained by taking the max- In our implementation, the size of the input imum value from a batch of the feature map in the image is 28×28 pixels. Table 1 shows the dimen- preceding layer. A fully connected layer has fea- sions of inputs, feature maps, weights, and biases ture maps. Each feature map results from the dot for each layer. product between the input vector of the layer and Convolutional layers consist of kernels (matrix the weight vector. The input and weight vectors of weights), biases and an activation function (eg.
Recommended publications
  • Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods
    572 The International Arab Journal of Information Technology, Vol. 17, No. 4, July 2020 Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods Mohammad Parseh, Mohammad Rahmanimanesh, and Parviz Keshavarzi Faculty of Electrical and Computer Engineering, Semnan University, Iran Abstract: Persian handwritten digit recognition is one of the important topics of image processing which significantly considered by researchers due to its many applications. The most important challenges in Persian handwritten digit recognition is the existence of various patterns in Persian digit writing that makes the feature extraction step to be more complicated.Since the handcraft feature extraction methods are complicated processes and their performance level are not stable, most of the recent studies have concentrated on proposing a suitable method for automatic feature extraction. In this paper, an automatic method based on machine learning is proposed for high-level feature extraction from Persian digit images by using Convolutional Neural Network (CNN). After that, a non-linear multi-class Support Vector Machine (SVM) classifier is used for data classification instead of fully connected layer in final layer of CNN. The proposed method has been applied to HODA dataset and obtained 99.56% of recognition rate. Experimental results are comparable with previous state-of-the-art methods. Keywords: Handwritten Digit Recognition, Convolutional Neural Network, Support Vector Machine. Received January 1, 2019; accepted November 11, 2019 https://doi.org/10.34028/iajit/17/4/16 1. Introduction years, they are good alternatives to handcraft feature extraction method. Optical Character Recognition (OCR) is one of the attractive topics of Artificial Intelligence [3, 6, 15, 23, 24].
    [Show full text]
  • Advancements in Image Classification Using Convolutional Neural Network
    This paper has been accepted and presented on 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks. This is preprint version and original proceeding will be published in IEEE Xplore. Advancements in Image Classification using Convolutional Neural Network Farhana Sultana Abu Sufian Paramartha Dutta Department of Computer Science Department of Computer Science Department of CSS University of Gour Banga University of Gour Banga Visva-Bharati University West Bengal, India West Bengal, India West Bengal, India Email: [email protected] Email: sufi[email protected] Email: [email protected] Abstract—Convolutional Neural Network (CNN) is the state- LeCun et al. introduced the practical model of CNN [6] [7] of-the-art for image classification task. Here we have briefly and developed LeNet-5 [8]. Training by backpropagation [9] discussed different components of CNN. In this paper, We have algorithm helped LeNet-5 recognizing visual patterns from explained different CNN architectures for image classification. Through this paper, we have shown advancements in CNN from raw pixels directly without using any separate feature engi- LeNet-5 to latest SENet model. We have discussed the model neering mechanism. Also fewer connections and parameters description and training details of each model. We have also of CNN than conventional feedforward neural networks with drawn a comparison among those models. similar network size, made model training easier. But at that Keywords—AlexNet, Capsnet, Convolutional Neural Network, time in spite of several advantages, the performance of CNN Deep learning, DenseNet, Image classification, ResNet, SENet. in intricate problems such as classification of high-resolution image, was limited by the lack of large training data, lack of I.
    [Show full text]
  • Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
    Global Sparse Momentum SGD for Pruning Very Deep Neural Networks Xiaohan Ding 1 Guiguang Ding 1 Xiangxin Zhou 2 Yuchen Guo 1, 3 Jungong Han 4 Ji Liu 5 1 Beijing National Research Center for Information Science and Technology (BNRist); School of Software, Tsinghua University, Beijing, China 2 Department of Electronic Engineering, Tsinghua University, Beijing, China 3 Department of Automation, Tsinghua University; Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, China 4 WMG Data Science, University of Warwick, Coventry, United Kingdom 5 Kwai Seattle AI Lab, Kwai FeDA Lab, Kwai AI platform [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] Abstract Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front- end devices. DNN pruning is an approach for deep model compression, which aims at eliminating some parameters with tolerable performance degradation. In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning. Concretely, given a global compression ratio, we categorize all the parameters into two parts at each training iteration which are updated using different rules. In this way, we gradually zero out the redundant parameters, as we update them using only the ordinary weight decay but no gradients derived from the objective function. As a departure from prior methods that require heavy human works to tune the layer-wise sparsity ratios, prune by solving complicated non-differentiable problems or finetune the model after pruning, our method is characterized by 1) global compression that automatically finds the appropriate per-layer sparsity ratios; 2) end-to-end training; 3) no need for a time-consuming re-training process after pruning; and 4) superior capability to find better winning tickets which have won the initialization lottery.
    [Show full text]
  • Arxiv:1907.08798V1 [Astro-Ph.SR] 20 Jul 2019 Keywords: Sun: Coronal Mass Ejections (Cmes) - Techniques: Image Processing
    Draft version July 23, 2019 Typeset using LATEX preprint style in AASTeX62 A New Automatic Tool for CME Detection and Tracking with Machine Learning Techniques Pengyu Wang,1 Yan Zhang,1 Li Feng,2 Hanqing Yuan,1 Yuan Gan,1 Shuting Li,2, 3 Lei Lu,2 Beili Ying,2, 3 Weiqun Gan,2 and Hui Li2 1Department of Computer Science and Technology, Nanjing University, 210023 Nanjing, China 2Key Laboratory of Dark Matter and Space Astronomy, Purple Mountain Observatory, Chinese Academy of Sciences, 210034 Nanjing, China 3School of Astronomy and Space Science, University of Science and Technology of China, Hefei, Anhui 230026, China Submitted to ApJS ABSTRACT With the accumulation of big data of CME observations by coronagraphs, automatic detection and tracking of CMEs has proven to be crucial. The excellent performance of convolutional neural network in image classification, object detection and other com- puter vision tasks motivates us to apply it to CME detection and tracking as well. We have developed a new tool for CME Automatic detection and tracking with MachinE Learning (CAMEL) techniques. The system is a three-module pipeline. It is first a su- pervised image classification problem. We solve it by training a neural network LeNet with training labels obtained from an existing CME catalog. Those images containing CME structures are flagged as CME images. Next, to identify the CME region in each CME-flagged image, we use deep descriptor transforming to localize the common object in an image set. A following step is to apply the graph cut technique to finely tune the detected CME region.
    [Show full text]
  • Evaluation of CNN Models with Fashion MNIST Data
    Iowa State University Capstones, Theses and Creative Components Dissertations Summer 2019 Evaluation of CNN Models with Fashion MNIST Data Yue Zhang [email protected] Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the Data Storage Systems Commons Recommended Citation Zhang, Yue, "Evaluation of CNN Models with Fashion MNIST Data" (2019). Creative Components. 364. https://lib.dr.iastate.edu/creativecomponents/364 This Creative Component is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Creative Components by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Evaluation of CNN Models with Fashion MNIST Data by Yue Zhang A report submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Electrical Engineering Minor: Statistics Program of Study Committee: Joseph Zambreno, Major Professor Alicia L Carriquiry Iowa State University Ames, Iowa 2019 Copyright c Yue Zhang, 2019. All rights reserved. ii TABLE OF CONTENTS Page LIST OF FIGURES......................................... iii ABSTRACT............................................. iv CHAPTER 1. INTRODUCTION.................................1 CHAPTER 2. BACKGROUND..................................2 CHAPTER 3. MODEL EXPLORATION.............................6 3.1 Kaggle Data.........................................6 3.2 Data Preparation......................................7 3.3 Training Data........................................8 3.4 Four Models for Training and Testing (Results and Analysis).............9 3.4.1 LeNet-5.......................................9 3.4.2 AlexNet-11.....................................9 3.4.3 VGG-16....................................... 10 3.4.4 ResNet-20...................................... 11 3.5 Comparing Results of Four Models...........................
    [Show full text]
  • Introduction to Machine Learning CMU-10701 Deep Learning
    Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2 Contents Definition and Motivation History of Deep architectures Deep architectures Convolutional networks Deep Belief networks Applications 3 Deep architectures Defintion: Deep architectures are composed of multiple levels of non-linear operations, such as neural nets with many hidden layers. Output layer Hidden layers Input layer 4 Goal of Deep architectures Goal: Deep learning methods aim at . learning feature hierarchies . where features from higher levels of the hierarchy are formed by lower level features. edges, local shapes, object parts Low level representation 5 Figure is from Yoshua Bengio Neurobiological Motivation Most current learning algorithms are shallow architectures (1-3 levels) (SVM, kNN, MoG, KDE, Parzen Kernel regression, PCA, Perceptron,…) The mammal brain is organized in a deep architecture (Serre, Kreiman, Kouh, Cadieu, Knoblich, & Poggio, 2007) (E.g. visual system has 5 to 10 levels) 6 Deep Learning History Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks. No successful attempts were reported before 2006 … Researchers reported positive experimental results with typically two or three levels (i.e. one or two hidden layers), but training deeper networks consistently yielded poorer results. Exception: convolutional neural networks, LeCun 1998 SVM: Vapnik and his co-workers developed the Support Vector Machine (1993). It is a shallow architecture. Digression: In the 1990’s, many researchers abandoned neural networks with multiple adaptive hidden layers because SVMs worked better, and there was no successful attempts to train deep networks.
    [Show full text]
  • Introduction to Machine Learning Deep Learning
    Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2 Contents Definition and Motivation Deep architectures Convolutional networks Applications 3 Deep architectures Defintion: Deep architectures are composed of multiple levels of non-linear operations, such as neural nets with many hidden layers. Output layer Hidden layers Input layer 4 Goal of Deep architectures Goal: Deep learning methods aim at ▪ learning feature hierarchies ▪ where features from higher levels of the hierarchy are formed by lower level features. edges, local shapes, object parts Low level representation 5 Figure is from Yoshua Bengio Theoretical Advantages of Deep Architectures Some complicated functions cannot be efficiently represented (in terms of number of tunable elements) by architectures that are too shallow. Deep architectures might be able to represent some functions otherwise not efficiently representable. More formally: Functions that can be compactly represented by a depth k architecture might require an exponential number of computational elements to be represented by a depth k − 1 architecture The consequences are ▪ Computational: We don’t need exponentially many elements in the layers ▪ Statistical: poor generalization may be expected when using an insufficiently deep architecture for representing some functions. 9 Theoretical Advantages of Deep Architectures The Polynomial circuit: 10 Deep Convolutional Networks 11 Deep Convolutional Networks Compared to standard feedforward neural networks with similarly-sized layers, ▪ CNNs have much fewer connections and parameters ▪ and so they are easier to train, ▪ while their theoretically-best performance is likely to be only slightly worse.
    [Show full text]
  • Architecture for Real-Time, Low-Swap Embedded Vision Using Fpgas
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 12-2016 Architecture for Real-Time, Low-SWaP Embedded Vision Using FPGAs Steven Andrew Clukey University of Tennessee, Knoxville, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Other Computer Engineering Commons Recommended Citation Clukey, Steven Andrew, "Architecture for Real-Time, Low-SWaP Embedded Vision Using FPGAs. " Master's Thesis, University of Tennessee, 2016. https://trace.tennessee.edu/utk_gradthes/4281 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Steven Andrew Clukey entitled "Architecture for Real-Time, Low-SWaP Embedded Vision Using FPGAs." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Computer Engineering. Mongi A. Abidi, Major Professor We have read this thesis and recommend its acceptance: Seddik M. Djouadi, Qing Cao, Ohannes Karakashian Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Architecture for Real-Time, Low-SWaP Embedded Vision Using FPGAs A Thesis Presented for the Master of Science Degree The University of Tennessee, Knoxville Steven Andrew Clukey December 2016 Copyright © 2016 by Steven Clukey All rights reserved.
    [Show full text]
  • Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning
    Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 Lecture 7 Convolutional Neural Networks CMSC 35246 We saw before: y^ x1 x2 x3 x4 A series of matrix multiplications: T T T T x 7! W1 x 7! h1 = f(W1 x) 7! W2 h1 7! h2 = f(W2 h1) 7! T T T W3 h2 7! h3 = f(W3 h3) 7! W4 h3 =y ^ Lecture 7 Convolutional Neural Networks CMSC 35246 Convolutional Networks Neural Networks that use convolution in place of general matrix multiplication in atleast one layer Next: • What is convolution? • What is pooling? • What is the motivation for such architectures (remember LeNet?) Lecture 7 Convolutional Neural Networks CMSC 35246 LeNet-5 (LeCun, 1998) The original Convolutional Neural Network model goes back to 1989 (LeCun) Lecture 7 Convolutional Neural Networks CMSC 35246 AlexNet (Krizhevsky, Sutskever, Hinton 2012) ImageNet 2012 15:4% error rate Lecture 7 Convolutional Neural Networks CMSC 35246 Convolutional Neural Networks Figure: Andrej Karpathy Lecture 7 Convolutional Neural Networks CMSC 35246 Now let's deconstruct them... Lecture 7 Convolutional Neural Networks CMSC 35246 Convolution Kernel w7 w8 w9 w4 w5 w6 w1 w2 w3 Feature Map Grayscale Image Convolve image with kernel having weights w (learned by backpropagation) Lecture 7 Convolutional Neural Networks CMSC 35246 Convolution wT x Lecture 7 Convolutional Neural Networks CMSC 35246 Convolution Lecture 7 Convolutional Neural Networks CMSC 35246 Convolution wT x Lecture 7 Convolutional Neural Networks CMSC 35246 Convolution
    [Show full text]
  • Deep Learning
    Deep Learning Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 521, 436–444 (28 May 2015) doi:10.1038/nature14539 Authors’ Relationships Michael I. Jordan 1956, UC Berkeley Geoffrey Hinton 1947, Google & U of T, BP PhD 92.9-93.10 >200 papers Andrew NG(吴恩达) PhD postdoc 1976, Stanford, Coursera Google Brain à Baidu Brain AT&T colleague Yann LeCun Yoshua Bengio 1960, Facebook & NYU, 1964, UdeM, RNN & NLP CNN & LeNet postdoc postdoc PhD PhD Abstraction Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech. Deep Learning • Definition • Applied Domains • Speech recognition, … • Mechanism • Networks • Deep Convolutional Nets (CNN) • Deep Recurrent Nets (RNN) Applied Domains • Speech Recognition • Speech à Words • Visual Object Recognition • ImageNet (car, dog) • Object Detection • Face detection • pedestrian detection • Drug Discovery • Predict drug activity • Genomics • Deep Genomics company Conventional Machine Learning • Limited in their ability to process natural data in their raw form. • Feature!!! • Coming up with features is difficult, time-consuming, requires expert knowledge.
    [Show full text]
  • DEEP LEARNING – a PERSONAL PERSPECTIVE Urs Muller MIT 6.S191 Introduction to Deep Learning NEURAL NETWORK Training Data MAGIC
    DEEP LEARNING – A PERSONAL PERSPECTIVE Urs Muller MIT 6.S191 Introduction To Deep Learning NEURAL NETWORK Training data MAGIC Random Learning network algorithm architecture Excellent results 2 THE REAL MAGIC OF DEEP LEARNING Let’s us solve problems we don’t know how to program 3 NEURAL NETWORKS RESEARCH AT BELL LABS, HOLMDEL 1985 – 1995 4 BELL LABS BUILDING IN HOLMDEL Now called Bell Works Around 1990: 6,000 employees in Holmdel ~300 in Research ~30 in machine learning, including: Larry Jackel, Yann LeCun, Leon Bottou, John Denker, Vladimir Vapnik, Yoshua Bengio, Hans Peter Graf, Patrice Simard, Corinna Cortes, and many others 5 ORIGINAL DATABASE ~300 DIGITS Demonstrated need for large database for benchmarking → USPS database → MNIST database (60,000 digits) 6 USING PRIOR KNOWLEDGE What class is point “X”? If representation is north, south, east, west, then choose green If better representation is elevation above sea level choose red x Using prior knowledge beats requiring tons of training data Example: use convolutions for OCR 7 CONVOLUTIONAL NEURAL NETWORK (CNN) 8 LENET 1993 9 HOW WE CAME TO VIEW LEARNING Largely Vladimir Vapnik’s infuence Choose the right structure – “Structural Risk Minimization” → Bring prior knowledge to the learning machine Capacity control – matching learning machine complexity to available data → Examine learning curves 10 LEARNING CURVES Error If test error >> training error: get more training examples or decrease capacity If test error ~ training error: Test set error increase capacity Smart structure allows
    [Show full text]
  • 6. Convolutional Neural Networks CS 519 Deep Learning, Winter 2017 Fuxin Li
    6. Convolutional Neural Networks CS 519 Deep Learning, Winter 2017 Fuxin Li With materials from Zsolt Kira Quiz coming up… • Next Thursday (2/2) • 20 minutes • Topics: • Optimization • Basic neural networks • No Convolutional nets in this quiz The Image Classification Problem (Multi-label in principle) ML “grass” ML “motorcycle” ML “person” “panda” “dog” 3 Neural Networks • Extremely high dimensionality! • 256x256 image has already 65,536 * 3 dimensions • One hidden layer with 500 hidden units require 65,536 * 3 * 500 connections (98 Million parameters) Challenges in Image Classification Structure between neighboring pixels in natural images The correlation prior for horizontal and vertical shifts (averaged over 1000 images) looks like this: Takeaways: 1) Long-range correlation 2) Local correlation stronger than non-local The convolution operator Sobel filter Convolution * Convolution 7 ∗ 2D Convolution with Padding 0 0 0 0 0 0 1 3 1 0 -2 -2 1 0 0 -1 1 0 -2 0 1 = 0 2 2 -1 0 1 1 1 0 0 0 0 0 ∗ 2D Convolution with Padding 0 0 0 0 0 0 1 3 1 0 -2 -2 1 2 0 0 -1 1 0 -2 0 1 = 0 2 2 -1 0 1 1 1 0 0 0 0 0 3 × 1 +∗ 1 × 1 = 2 − 2D Convolution with Padding 0 0 0 0 0 0 1 3 1 0 -2 -2 1 2 -1 0 0 -1 1 0 -2 0 1 = 0 2 2 -1 0 1 1 1 0 0 0 0 0 1 × 2∗+ 1 × 1 + 1 × 1 + 1 × 1 = 1 − − − 2D Convolution with Padding 0 0 0 0 0 0 1 3 1 0 -2 -2 1 2 -1 -6 0 0 -1 1 0 -2 0 1 = 0 2 2 -1 0 1 1 1 0 0 0 0 0 ∗ What if: 0 0 3 3 0 0 1 3 1 0 -2 -2 1 2 -1 -18 0 0 -1 1 0 -2 0 1 = 0 2 2 -1 0 1 1 1 0 0 0 0 0 ∗ 2D Convolution with Padding 0 0 0 0 0 0 1 3 1 0 -2 -2 1 2 -1 -6 0 0 -1
    [Show full text]