Introduction to Machine Learning

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Amo G. Tong 1 Lecture * Deep Learning • An Introduction • Deep Learning, Ian Goodfellow, Yoshua Bengio and Aaron Courville • Some materials are courtesy of . • All pictures belong to their creators. Introduction to Machine Learning Amo G. Tong 2 Deep Learning • What are deep learning methods? • Using a complex neural network to approximate the function we want to learn. Story: ImageNet object recognition contest. https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5 Introduction to Machine Learning Amo G. Tong 3 Deep Learning • What are deep learning methods? • Using a complex neural network to approximate the function we want to learn. Story: ImageNet object recognition contest. LeNet-5 (1998): 7-level convolutional network https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5 Introduction to Machine Learning Amo G. Tong 4 Deep Learning • What are deep learning methods? • Using a complex neural network to approximate the function we want to learn. Story: ImageNet object recognition contest. AlexNet (2012): more layers and filters Trained for 6 days on two GPUs. Error rate: 15.3% (reduced from 26.2%) https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5 Introduction to Machine Learning Amo G. Tong 5 Deep Learning • What are deep learning methods? • Using a complex neural network to approximate the function we want to learn. Story: ImageNet object recognition contest. ResNet(2015): 152 layers with residual connections. Error rate: 3.57% https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5 Introduction to Machine Learning Amo G. Tong 6 Deep Learning • What are deep learning methods? • Using a complex neural network to approximate the function we want to learn. • Why we need deep learning? • We need to solve complex real-world problems. • A standard way to parameterize functions. • Layer by layer. • Flexible to be customized for different applications. • Image processing • Sequence data • Standard training methods. • Backpropagation. Introduction to Machine Learning Amo G. Tong 7 Deep Learning • What are deep learning methods? • Using a complex neural network to approximate the function we want to learn. • Why now? • Availability of data • Hardware • GPUs,… • Software • Tensor Flow, Pytorch,… Introduction to Machine Learning Amo G. Tong 8 Outline • General Deep Feedforward Network. • Convolutional Neural Network (CNN) • Image processing. • Recurrent Neural Network (RNN) • Sequence data processing. Introduction to Machine Learning Amo G. Tong 9 Deep Feedforward Network ℎ Input Output Hidden Layers Deep Feedforward Network Training, Design, and Regularization Introduction to Machine Learning Amo G. Tong 10 Deep Feedforward Network: • Feedforward Network (Multi-Layer Perceptron (MLP)) Input Output • More Layers. • Training • Design Hidden Layers • Regularization Introduction to Machine Learning Amo G. Tong 11 Deep Feedforward Network: Training • Cost Function: 퐽(휃) • Common choice: cross-entropy • The difference between two distributions 푄 and 푃 Input Output • 퐻 푃, 푄 = −E푥∼푃 log 푄(푥) Hidden Layers 푃(푥) 퐷(푃||푄) = ෍ 푃 푥 log 푄(푥) 푥 Introduction to Machine Learning Amo G. Tong 12 Deep Feedforward Network: Training • Cost Function: 퐽(휃) • Common choice: cross-entropy • The difference between two distributions 푄 and 푃 Input Output • 퐻 푃, 푄 = −E푥∼푃 log 푄(푥) • 퐽 휃 = −E푥, 푦∼퐷train log Pmodel(푦|푥) (negative log likelihood) • Pmodel(푦|푥) changes from model to model • Least square error: assume Gaussian and use MLE Hidden Layers 푃(푥) 퐷(푃||푄) = ෍ 푃 푥 log 푄(푥) 푥 Introduction to Machine Learning Amo G. Tong 13 Deep Feedforward Network: Training • Gradient-Based Training • Non-convex because we have nonlinear units. • Iteratively decrease the cost function; not global optimal; not even local optimal. • Initializations with small random numbers. • Backpropagation. • One issue: the gradient must be large and predictable. Input Output Hidden Layers Introduction to Machine Learning Amo G. Tong 14 Deep Feedforward Network: Training • Gradient-Based Training • Non-convex because we have nonlinear units. • Iteratively decrease the cost function; not global optimal; not even local optimal. • Initializations with small random numbers. • Backpropagation. • One issue: the gradient must be large and predictable. Input Output Stochastic gradient descent (SGD) algorithm Hidden Layers While stopping criterion not met Sample a minibatch of 푚 examples 퐷푚 σ Compute the gradient 푔 ← 푥∈퐷푚 Δ휃 퐽(휃, 푥) Update 휃 ← 휃 − 휖푔 Introduction to Machine Learning Amo G. Tong 15 Deep Feedforward Network: Design • Output units ℎ • A real vector • Regression Problem: Pr[푦|푥]. Input Output 푇 • Affine transformation: 푦표푢푡 = 푊 ℎ + 푏 2 • 퐽 휃 = E푥, 푦∼퐷train 푦 − 푦표푢푡 • Binary classification Hidden Layers • Sigmoid 휎 푇 • 푦표푢푡 = 휎(푊 ℎ + 푏) • 퐽 휃 = −E푥, 푦∼퐷train log 푦표푢푡(푥) • Multinoulli (푛-classification) Note: the gradient must be • 퐬퐨퐟퐭퐦퐚퐱 large and predictable. Log can 푇 • 푧 = 푊 ℎ + 푏 = (푧1, … , 푧푛) help. exp(zi) • softmax 푧 푖 = σ푗 exp(z푗) Introduction to Machine Learning Amo G. Tong 16 Deep Feedforward Network: Design • Output units ℎ • A real vector • Regression Problem: Pr[푦|푥]. Input Output 푇 • Affine transformation: 푦표푢푡 = 푊 ℎ + 푏 2 • 퐽 휃 = E푥, 푦∼퐷train 푦 − 푦표푢푡 • Binary classification Hidden Layers • Sigmoid 휎 푇 • 푦표푢푡 = 휎(푊 ℎ + 푏) • 퐽 휃 = −E푥, 푦∼퐷train log 푦표푢푡(푥) • Multinoulli (푛-classification) Note: the gradient must be • 퐬퐨퐟퐭퐦퐚퐱 large and predictable. Log can 푇 • 푧 = 푊 ℎ + 푏 = (푧1, … , 푧푛) help. exp(zi) • softmax 푧 푖 = σ푗 exp(z푗) Introduction to Machine Learning Amo G. Tong 17 Deep Feedforward Network: Design • Output units ℎ • A real vector • Regression Problem: Pr[푦|푥]. Input Output 푇 • Affine transformation: 푦표푢푡 = 푊 ℎ + 푏 2 • 퐽 휃 = E푥, 푦∼퐷train 푦 − 푦표푢푡 • Binary classification Hidden Layers • Sigmoid 휎 푇 • 푦표푢푡 = 휎(푊 ℎ + 푏) • 퐽 휃 = −E푥, 푦∼퐷train log 푦표푢푡(푥) • Multinoulli (푛-classification) Note: the gradient must be • 퐬퐨퐟퐭퐦퐚퐱 large and predictable. Log can 푇 • 푧 = 푊 ℎ + 푏 = (푧1, … , 푧푛) help. exp(zi) • softmax 푧 푖 = σ푗 exp(z푗) Introduction to Machine Learning Amo G. Tong 18 Deep Feedforward Network: Design • Hidden Units ℎ • Rectified linear units (ReLU). • 푔(푧) = max{0, 푧} Input Output • ℎ = 푔(푊푇푥 + 푏) • Good point: large gradient when active, easy to train • Bad point: not learnable when inactive • Generalizations. Hidden Layers • 푔 푧 = max 0, 푧 + 훼 min(0, 푧) • Absolute value rectification: 훼 = −1. • Leaky ReLU: fixed small 훼 (0.01). • Parametric ReLU (PReLU): learnable 훼. • Maxout units: group and take the max. • Traditional units: • Sigmoid (consider tanh(z)) • Perceptron Introduction to Machine Learning Amo G. Tong 19 Deep Feedforward Network: Design • Hidden Units ℎ • Rectified linear units (ReLU). • 푔(푧) = max{0, 푧} Input Output • ℎ = 푔(푊푇푥 + 푏) • Good point: large gradient when active, easy to train • Bad point: not learnable when inactive • Generalizations. Hidden Layers • 푔 푧 = max 0, 푧 + 훼 min(0, 푧) • Absolute value rectification: 훼 = −1. • Leaky ReLU: fixed small 훼 (0.01). • Parametric ReLU (PReLU): learnable 훼. • Maxout units: group and take the max. • Traditional units: • Sigmoid (consider tanh(z)) • Perceptron Introduction to Machine Learning Amo G. Tong 20 Deep Feedforward Network: Design • Hidden Units ℎ • Rectified linear units (ReLU). • 푔(푧) = max{0, 푧} Input Output • ℎ = 푔(푊푇푥 + 푏) • Good point: large gradient when active, easy to train • Bad point: not learnable when inactive • Generalizations. Hidden Layers • 푔 푧 = max 0, 푧 + 훼 min(0, 푧) • Absolute value rectification: 훼 = −1. • Leaky ReLU: fixed small 훼 (0.01). • Parametric ReLU (PReLU): learnable 훼. • Maxout units: group and take the max. • Traditional units: • Sigmoid (consider tanh(z)) • Perceptron Introduction to Machine Learning Amo G. Tong 21 Deep Feedforward Network: Design • Architecture Design • The layer structure. Input Output • ℎ(1) = 푔 1 (푊 1 푇 푥 + 푏(1)) • ℎ(2) = 푔 2 (푊 2 푇ℎ(1) + 푏(2)) • ….. • How many layers? How many units in each layer? Hidden Layers • Theoretically, universal approximation theorem: one layer of sigmoid can approximate any Borel measurable function, if given enough hidden units. • But it is not guaranteed that it can be learned. • Practically, (a) follow classic models: CNN, RNN, …; (b) start from few (2 or 3) layers with few (16, 32 or 64) hidden units and watch at the validation error. • Greater depth may not be better. Introduction to Machine Learning Amo G. Tong 22 Deep Feedforward Network: Design • Architecture Design • The layer structure. Input Output • ℎ(1) = 푔 1 (푊 1 푇 푥 + 푏(1)) • ℎ(2) = 푔 2 (푊 2 푇ℎ(1) + 푏(2)) • ….. • How many layers? How many units in each layer? Hidden Layers • Theoretically, universal approximation theorem: one layer of sigmoid can approximate any Borel measurable function, if given enough hidden units. • But it is not guaranteed that it can be learned. • Practically, (a) follow classic models: CNN, RNN, …; (b) start from few (2 or 3) layers with few (16, 32 or 64) hidden units and watch at the validation error. • Greater depth may not be better. Introduction to Machine Learning Amo G. Tong 23 Deep Feedforward Network: Design • Architecture Design • The layer structure. Input Output • ℎ(1) = 푔 1 (푊 1 푇 푥 + 푏(1)) • ℎ(2) = 푔 2 (푊 2 푇ℎ(1) + 푏(2)) • ….. • How many layers? How many units in each layer? Hidden Layers • Theoretically, universal approximation theorem: one layer of sigmoid can approximate any Borel measurable function, if given enough hidden units. • But it is not guaranteed that it can be learned. • Practically, (a) follow classic models: CNN, RNN, …; (b) start from few (2 or 3) layers with few (16, 32 or 64) hidden units and watch at the validation error. • Greater depth may not be better. • Other considerations: local connected layers, skip layers, recurrent layers, … Introduction to Machine Learning Amo G.
Recommended publications
  • Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods
    572 The International Arab Journal of Information Technology, Vol. 17, No. 4, July 2020 Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods Mohammad Parseh, Mohammad Rahmanimanesh, and Parviz Keshavarzi Faculty of Electrical and Computer Engineering, Semnan University, Iran Abstract: Persian handwritten digit recognition is one of the important topics of image processing which significantly considered by researchers due to its many applications. The most important challenges in Persian handwritten digit recognition is the existence of various patterns in Persian digit writing that makes the feature extraction step to be more complicated.Since the handcraft feature extraction methods are complicated processes and their performance level are not stable, most of the recent studies have concentrated on proposing a suitable method for automatic feature extraction. In this paper, an automatic method based on machine learning is proposed for high-level feature extraction from Persian digit images by using Convolutional Neural Network (CNN). After that, a non-linear multi-class Support Vector Machine (SVM) classifier is used for data classification instead of fully connected layer in final layer of CNN. The proposed method has been applied to HODA dataset and obtained 99.56% of recognition rate. Experimental results are comparable with previous state-of-the-art methods. Keywords: Handwritten Digit Recognition, Convolutional Neural Network, Support Vector Machine. Received January 1, 2019; accepted November 11, 2019 https://doi.org/10.34028/iajit/17/4/16 1. Introduction years, they are good alternatives to handcraft feature extraction method. Optical Character Recognition (OCR) is one of the attractive topics of Artificial Intelligence [3, 6, 15, 23, 24].
    [Show full text]
  • Lecture 10: Recurrent Neural Networks
    Lecture 10: Recurrent Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 1 May 4, 2017 Administrative A1 grades will go out soon A2 is due today (11:59pm) Midterm is in-class on Tuesday! We will send out details on where to go soon Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 2 May 4, 2017 Extra Credit: Train Game More details on Piazza by early next week Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 3 May 4, 2017 Last Time: CNN Architectures AlexNet Figure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 4 May 4, 2017 Last Time: CNN Architectures Softmax FC 1000 Softmax FC 4096 FC 1000 FC 4096 FC 4096 Pool FC 4096 3x3 conv, 512 Pool 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 Pool Pool 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 Pool Pool 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 Pool Pool 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 Pool Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Input Input VGG16 VGG19 GoogLeNet Figure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 5 May 4, 2017 Last Time: CNN Architectures Softmax FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 relu 3x3 conv, 64 3x3 conv, 64 F(x) + x 3x3 conv, 64 ..
    [Show full text]
  • Image Retrieval Algorithm Based on Convolutional Neural Network
    Advances in Intelligent Systems Research, volume 133 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE2016) Image Retrieval Algorithm Based on Convolutional Neural Network Hailong Liu 1, 2, Baoan Li 1, 2, *, Xueqiang Lv 1 and Yue Huang 3 1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science & Technology University, Beijing 100101, China 2Computer School, Beijing Information Science and Technology University, Beijing 100101, China 3Xuanwu Hospital Capital Medical University, 100053, China *Corresponding author Abstract—With the rapid development of computer technology low, can’t meet the needs of people. In order to overcome this and the increasing of multimedia data on the Internet, how to difficulty, the researchers from the image itself, and put quickly find the desired information in the massive data becomes forward the method of image retrieval based on content. The a hot issue. Image retrieval can be used to retrieve similar images, method is to extract the visual features of the image content: and the effect of image retrieval depends on the selection of image color, texture, shape etc., the image database to be detected features to a certain extent. Based on deep learning, through self- samples for similarity matching, retrieval and sample images learning ability of a convolutional neural network to extract more are similar to the image. The main process of the method is the conducive to the high-level semantic feature of image retrieval selection and extraction of features, but there are "semantic using convolutional neural network, and then use the distance gap" [2] between low-level features and high-level semantic metric function similar image.
    [Show full text]
  • LSTM-In-LSTM for Generating Long Descriptions of Images
    Computational Visual Media DOI 10.1007/s41095-016-0059-z Vol. 2, No. 4, December 2016, 379–388 Research Article LSTM-in-LSTM for generating long descriptions of images Jun Song1, Siliang Tang1, Jun Xiao1, Fei Wu1( ), and Zhongfei (Mark) Zhang2 c The Author(s) 2016. This article is published with open access at Springerlink.com Abstract In this paper, we propose an approach by means of text (description generation) is a for generating rich fine-grained textual descriptions of fundamental task in artificial intelligence, with many images. In particular, we use an LSTM-in-LSTM (long applications. For example, generating descriptions short-term memory) architecture, which consists of an of images may help visually impaired people better inner LSTM and an outer LSTM. The inner LSTM understand the content of images and retrieve images effectively encodes the long-range implicit contextual using descriptive texts. The challenge of description interaction between visual cues (i.e., the spatially- generation lies in appropriately developing a model concurrent visual objects), while the outer LSTM that can effectively represent the visual cues in generally captures the explicit multi-modal relationship images and describe them in the domain of natural between sentences and images (i.e., the correspondence language at the same time. of sentences and images). This architecture is capable There have been significant advances in of producing a long description by predicting one description generation recently. Some efforts word at every time step conditioned on the previously rely on manually-predefined visual concepts and generated word, a hidden vector (via the outer LSTM), and a context vector of fine-grained visual cues (via sentence templates [1–3].
    [Show full text]
  • The History Began from Alexnet: a Comprehensive Survey on Deep Learning Approaches
    > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches Md Zahangir Alom1, Tarek M. Taha1, Chris Yakopcic1, Stefan Westberg1, Paheding Sidike2, Mst Shamima Nasrin1, Brian C Van Essen3, Abdul A S. Awwal3, and Vijayan K. Asari1 Abstract—In recent years, deep learning has garnered I. INTRODUCTION tremendous success in a variety of application domains. This new ince the 1950s, a small subset of Artificial Intelligence (AI), field of machine learning has been growing rapidly, and has been applied to most traditional application domains, as well as some S often called Machine Learning (ML), has revolutionized new areas that present more opportunities. Different methods several fields in the last few decades. Neural Networks have been proposed based on different categories of learning, (NN) are a subfield of ML, and it was this subfield that spawned including supervised, semi-supervised, and un-supervised Deep Learning (DL). Since its inception DL has been creating learning. Experimental results show state-of-the-art performance ever larger disruptions, showing outstanding success in almost using deep learning when compared to traditional machine every application domain. Fig. 1 shows, the taxonomy of AI. learning approaches in the fields of image processing, computer DL (using either deep architecture of learning or hierarchical vision, speech recognition, machine translation, art, medical learning approaches) is a class of ML developed largely from imaging, medical information processing, robotics and control, 2006 onward. Learning is a procedure consisting of estimating bio-informatics, natural language processing (NLP), cybersecurity, and many others.
    [Show full text]
  • Active Authentication Using an Autoencoder Regularized CNN-Based One-Class Classifier
    Active Authentication Using an Autoencoder Regularized CNN-based One-Class Classifier Poojan Oza and Vishal M. Patel Department of Electrical and Computer Engineering, Johns Hopkins University, 3400 N. Charles St, Baltimore, MD 21218, USA email: fpoza2, [email protected] Access Abstract— Active authentication refers to the process in Enrolled User Denied which users are unobtrusively monitored and authenticated Training Images continuously throughout their interactions with mobile devices. AA Access Generally, an active authentication problem is modelled as a Module Granted one class classification problem due to the unavailability of Trained Access data from the impostor users. Normally, the enrolled user is Denied considered as the target class (genuine) and the unauthorized users are considered as unknown classes (impostor). We propose Training a convolutional neural network (CNN) based approach for one class classification in which a zero centered Gaussian noise AA and an autoencoder are used to model the pseudo-negative Module class and to regularize the network to learn meaningful feature Test Images representations for one class data, respectively. The overall network is trained using a combination of the cross-entropy and (a) (b) the reconstruction error losses. A key feature of the proposed approach is that any pre-trained CNN can be used as the Fig. 1: An overview of a typical AA system. (a) Data base network for one class classification. Effectiveness of the corresponding to the enrolled user are used to train an AA proposed framework is demonstrated using three publically system. (b) During testing, data corresponding to the enrolled available face-based active authentication datasets and it is user as well as unknown user may be presented to the system.
    [Show full text]
  • Key Ideas and Architectures in Deep Learning Applications That (Probably) Use DL
    Key Ideas and Architectures in Deep Learning Applications that (probably) use DL Autonomous Driving Scene understanding /Segmentation Applications that (probably) use DL WordLens Prisma Outline of today’s talk Image Recognition Fun application using CNNs ● LeNet - 1998 ● Image Style Transfer ● AlexNet - 2012 ● VGGNet - 2014 ● GoogLeNet - 2014 ● ResNet - 2015 Questions to ask about each architecture/ paper Special Layers Non-Linearity Loss function Weight-update rule Train faster? Reduce parameters Reduce Overfitting Help you visualize? LeNet5 - 1998 LeNet5 - Specs MNIST - 60,000 training, 10,000 testing Input is 32x32 image 8 layers 60,000 parameters Few hours to train on a laptop Modified LeNet Architecture - Assignment 3 Training Input Forward pass Maxp Max Conv ReLU Conv ReLU FC ReLU Softmax ool pool Backpropagation - Labels Loss update weights Modified LeNet Architecture - Assignment 3 Testing Input Forward pass Maxp Max Conv ReLU Conv ReLU FC ReLU Softmax ool pool Output Compare output with labels Modified LeNet - CONV Layer 1 Input - 28 x 28 Output - 6 feature maps - each 24 x 24 Convolution filter - 5 x 5 x 1 (convolution) + 1 (bias) How many parameters in this layer? Modified LeNet - CONV Layer 1 Input - 32 x 32 Output - 6 feature maps - each 28 x 28 Convolution filter - 5 x 5 x 1 (convolution) + 1 (bias) How many parameters in this layer? (5x5x1+1)*6 = 156 Modified LeNet - Max-pooling layer Decreases the spatial extent of the feature maps, makes it translation-invariant Input - 28 x 28 x 6 volume Maxpooling with filter size 2
    [Show full text]
  • Advancements in Image Classification Using Convolutional Neural Network
    This paper has been accepted and presented on 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks. This is preprint version and original proceeding will be published in IEEE Xplore. Advancements in Image Classification using Convolutional Neural Network Farhana Sultana Abu Sufian Paramartha Dutta Department of Computer Science Department of Computer Science Department of CSS University of Gour Banga University of Gour Banga Visva-Bharati University West Bengal, India West Bengal, India West Bengal, India Email: [email protected] Email: sufi[email protected] Email: [email protected] Abstract—Convolutional Neural Network (CNN) is the state- LeCun et al. introduced the practical model of CNN [6] [7] of-the-art for image classification task. Here we have briefly and developed LeNet-5 [8]. Training by backpropagation [9] discussed different components of CNN. In this paper, We have algorithm helped LeNet-5 recognizing visual patterns from explained different CNN architectures for image classification. Through this paper, we have shown advancements in CNN from raw pixels directly without using any separate feature engi- LeNet-5 to latest SENet model. We have discussed the model neering mechanism. Also fewer connections and parameters description and training details of each model. We have also of CNN than conventional feedforward neural networks with drawn a comparison among those models. similar network size, made model training easier. But at that Keywords—AlexNet, Capsnet, Convolutional Neural Network, time in spite of several advantages, the performance of CNN Deep learning, DenseNet, Image classification, ResNet, SENet. in intricate problems such as classification of high-resolution image, was limited by the lack of large training data, lack of I.
    [Show full text]
  • Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
    Global Sparse Momentum SGD for Pruning Very Deep Neural Networks Xiaohan Ding 1 Guiguang Ding 1 Xiangxin Zhou 2 Yuchen Guo 1, 3 Jungong Han 4 Ji Liu 5 1 Beijing National Research Center for Information Science and Technology (BNRist); School of Software, Tsinghua University, Beijing, China 2 Department of Electronic Engineering, Tsinghua University, Beijing, China 3 Department of Automation, Tsinghua University; Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, China 4 WMG Data Science, University of Warwick, Coventry, United Kingdom 5 Kwai Seattle AI Lab, Kwai FeDA Lab, Kwai AI platform [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] Abstract Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front- end devices. DNN pruning is an approach for deep model compression, which aims at eliminating some parameters with tolerable performance degradation. In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning. Concretely, given a global compression ratio, we categorize all the parameters into two parts at each training iteration which are updated using different rules. In this way, we gradually zero out the redundant parameters, as we update them using only the ordinary weight decay but no gradients derived from the objective function. As a departure from prior methods that require heavy human works to tune the layer-wise sparsity ratios, prune by solving complicated non-differentiable problems or finetune the model after pruning, our method is characterized by 1) global compression that automatically finds the appropriate per-layer sparsity ratios; 2) end-to-end training; 3) no need for a time-consuming re-training process after pruning; and 4) superior capability to find better winning tickets which have won the initialization lottery.
    [Show full text]
  • Arxiv:1907.08798V1 [Astro-Ph.SR] 20 Jul 2019 Keywords: Sun: Coronal Mass Ejections (Cmes) - Techniques: Image Processing
    Draft version July 23, 2019 Typeset using LATEX preprint style in AASTeX62 A New Automatic Tool for CME Detection and Tracking with Machine Learning Techniques Pengyu Wang,1 Yan Zhang,1 Li Feng,2 Hanqing Yuan,1 Yuan Gan,1 Shuting Li,2, 3 Lei Lu,2 Beili Ying,2, 3 Weiqun Gan,2 and Hui Li2 1Department of Computer Science and Technology, Nanjing University, 210023 Nanjing, China 2Key Laboratory of Dark Matter and Space Astronomy, Purple Mountain Observatory, Chinese Academy of Sciences, 210034 Nanjing, China 3School of Astronomy and Space Science, University of Science and Technology of China, Hefei, Anhui 230026, China Submitted to ApJS ABSTRACT With the accumulation of big data of CME observations by coronagraphs, automatic detection and tracking of CMEs has proven to be crucial. The excellent performance of convolutional neural network in image classification, object detection and other com- puter vision tasks motivates us to apply it to CME detection and tracking as well. We have developed a new tool for CME Automatic detection and tracking with MachinE Learning (CAMEL) techniques. The system is a three-module pipeline. It is first a su- pervised image classification problem. We solve it by training a neural network LeNet with training labels obtained from an existing CME catalog. Those images containing CME structures are flagged as CME images. Next, to identify the CME region in each CME-flagged image, we use deep descriptor transforming to localize the common object in an image set. A following step is to apply the graph cut technique to finely tune the detected CME region.
    [Show full text]
  • Deep Learning in MATLAB
    Deep Learning in MATLAB 성 호 현 부장 [email protected] © 2015 The MathWorks, Inc.1 Deep Learning beats Go champion! 2 AI, Machine Learning, and Deep Learning Artificial Machine Deep Learning Intelligence Learning Neural networks with many layers that Any technique Statistical methods learn representations and tasks that enables enable machines to “directly” from data machines to “learn” tasks from data mimic human without explicitly intelligence programming 1950s 1980s 2015 FLOPS Thousand Million Quadrillion 3 What is can Deep Learning do for us? (An example) 4 Example 1: Object recognition using deep learning 5 Object recognition using deep learning Training Millions of images from 1000 (GPU) different categories Real-time object recognition using Prediction a webcam connected to a laptop 6 WhatWhat is isMachine Deep Learning? Learning? 7 Machine Learning vs Deep Learning We specify the nature of the features we want to extract… …and the type of model we want to build. Machine Learning 8 Machine Learning vs Deep Learning We need only specify the architecture of the model… Deep Learning 9 ▪ Deep learning is a type of machine learning in which a model learns to perform tasks like classification – directly from images, texts, or signals. ▪ Deep learning performs end-to-end learning, and is usually implemented using a neural network architecture. ▪ Deep learning algorithms also scale with data – traditional machine learning saturates. 10 Why is Deep Learning So Popular Now? AlexNet Human Accuracy Source: ILSVRC Top-5 Error on ImageNet 11 Two
    [Show full text]
  • Path Towards Better Deep Learning Models and Implications for Hardware
    Path towards better deep learning models and implications for hardware Natalia Vassilieva Product Manager, ML Cerebras Systems AI has massive potential, but is compute-limited today. Cerebras Systems © 2020 “Historically, progress in neural networks and deep learning research has been greatly influenced by the available hardware and software tools.” Yann LeCun, 1.1 Deep Learning Hardware: Past, Present and Future. 2019 IEEE International Solid-State Circuits Conference (ISSCC 2019), 12-19 “Historically, progress in neural networks and deep learning research has been greatly influenced by the available hardware and software tools.” “Hardware capabilities and software tools both motivate and limit the type of ideas that AI researchers will imagine and will allow themselves to pursue. The tools at our disposal fashion our thoughts more than we care to admit.” Yann LeCun, 1.1 Deep Learning Hardware: Past, Present and Future. 2019 IEEE International Solid-State Circuits Conference (ISSCC 2019), 12-19 “Historically, progress in neural networks and deep learning research has been greatly influenced by the available hardware and software tools.” “Hardware capabilities and software tools both motivate and limit the type of ideas that AI researchers will imagine and will allow themselves to pursue. The tools at our disposal fashion our thoughts more than we care to admit.” “Is DL-specific hardware really necessary? The answer is a resounding yes. One interesting property of DL systems is that the larger we make them, the better they seem to work.”
    [Show full text]