Deep Learning Opening the Black Box (Part 2) Overview

Total Page:16

File Type:pdf, Size:1020Kb

Deep Learning Opening the Black Box (Part 2) Overview Advanced Analytics in Business [D0S07a] Big Data Platforms & Technologies [D0S06a] Deep Learning Opening the Black Box (Part 2) Overview Artificial neural networks Deep learning Opening the black box (part 2) 2 Artificial Neural Networks 3 The basic idea Very loosely based on how the human brain works A collection of mathematical “neurons” are created and connected together, allowing them to send signals to each other Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure 4 A brief history https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html 5 A brief history An electronic brain (1940s): since the dawn of computing, researchers have been thinking about the idea of an “intelligent”, perhaps even “conscious” machine Alan Turing laid out several criteria to assess whether a machine could be said be intelligent in Computing Machinery and Intelligence (Turing test) The perceptron (1950s): early work in machine learning was inspired by the (then) working theories on the human brain Frank Rosenblatt kickstarts the field by introducing the “perceptron”: simplified mathematical representation of a neuron Convinced that this would quickly lead to true AI 6 A brief history “ The embryo of an electronic computer that the Navy expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence. – The New York Times, based on Rosenblatt’s statements on the Perceptron “ Into the AI winter (1960s-80s): Marvin Minsky, considered as one of the fathers of AI, is less convinced Along with Seymor Papert, Minksy wrote a book entitled Perceptrons that in fact ended the optimism around the perceptron They showed that the perceptron was incapable of learning the simple exclusive-or (XOR) function 7 A brief history Backpropagation to the rescue (1980s-90s): interest returns to neural networks Geoff Hinton shows that neural networks with many hidden layers (i.e. consisting of more than one perceptron) could be effectively trained by a relatively simple procedure, called “backpropagation” Such networks have the ability to learn any function, a result known as the “universal approximation theorem”, and with that, neural networks were hot again The idea of using multiple perceptrons in a layered fashion was not new, though it was unclear how exactly such networks could be “trained” The backpropagation algorithm works by taking starting from a network’s “error” and “back-propagating” these throughout all its layers to adjust their parameters Leads to some early successes: Multi-layer perceptrons (MLP), the first convolutional neural networks (CNNs) to recognize handwritten digits by Yann Lecun at AT&T Bell Labs (“LeNet”) A second AI winter (1990s-early 2000s): the MLP approach didn’t scale well to larger problems Computing power lacking for larger networks Meanwhile, by the 90s, the support vector machine (SVM) was rapidly taking the center-stage as the method of choice Neural networks were left behind once again The fields shifts its angle to be a lot more theoretical in nature 8 A brief history Deep learning (early 2000s): around 2006, Hinton introduces “unsupervised pretraining” and “deep belief nets” Train a simple 2-layer unsupervised model, freeze all its parameters, add a new layer on top and just train the parameters for the new layer Keep adding and training layers until you have a “deep network” Deep learning unleashed (2010s): based on Hinton’s work, more and more research papers began to take form In 2010, a large database known as “Imagenet” containing millions of labeled images was created and published by a research group at Stanford. This database was coupled with the annual Large Scale Visual Recognition Challenge (LSVRC), where contestants would build computer vision models In the first two years of the contest, the top models had error rates of about 25% In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton entered a submission that would halve the error rate! Combined several critical components that would go one to become mainstays in deep learning models: the use of graphics processing units (GPUs) to train the model, a method to reduce overfitting known as dropout and the rectified linear activation unit (ReLU) The network went on to become known as “Alexnet” and the paper describing it has been cited nearly 60000 times since it was published Deep learning all around (2020s): many innovations would follow after this result Appearance of large, high-quality labeled datasets Massively parallel computing with GPUs, TPUs New activation functions, improved architectures Software support New regularization techniques: dropout, batch normalization, data-augmentation to protect against overfitting New optimizers: from stochastic gradient descent (SGD) to RMSprop, ADAM and others Focus returns on practice, experiments, empirical approaches 9 Foundations: the perceptron A perceptron works over a number of numerical inputs: the input neurons or input units Each neuron has an output: its “activation” For the input neurons, the activation is simply given by the input data instance Every input unit’s output is connected to another neuron (the perceptron unit): inputs are multiplied with a weight and summed A bias can also be added with an input unit that always outputs 1 The final output of the perceptron unit is equal to the result of an “activation function” over the weighted sum of the inputs (This can also be expressed as two operations: combination/transfer function and activation function) 10 Foundations: the perceptron 11 So, how to train it? Feedforward is easy: just plug in the inputs and feed them through the network, obtaining an output For simple statistical models (for example, linear regression), closed-form analytical formulas to determine the optimum parameter estimates exist For nonlinear models, such as neural networks, the parameter estimates can be determined numerically using an iterative algorithm 12 So, how to train it? Say we use L = (y − y^)2 to see how good our prediction is for an instance This is our error or loss function Weights for this simple perceptron can be updated using: wi,t+1 = wi,t + η(y − y^)xi ∂L ∂(y − y^)2 ∂(y − y^)2 ∂(y − y^) ∂(y − y^) ∂y^ = = = 2(y − y^) = 2(y − y^)(−1) ∂wi ∂wi ∂(y − y^) ∂wi ∂wi ∂wi Let o = w1x1 + w2x2 + b , then: ∂σ(o) ∂o = 2(y − y^)(−1) = 2(y − y^)(−1)σ(o)(1 − σ(o))xi ∂o ∂wi 13 So, how to train it? Gradient descent: ∂L wi,t+1 = wi,t − η ∂wi η is the learning rate ∂L = 2(y − y^)(−1)σ(o)(1 − σ(o))xi = −2(y − y^)σ(o)(1 − σ(o))xi ∂wi Weights for this simple perceptron can be updated using: wi,t+1 = wi,t + η(y − y^)xi 14 So, how to train it? We’re nudging the weights in order to minimize the error or “loss” The learning rate η determines the “speed” of the convergence Higher: quicker towards minimum, but risk of overshooting Lower: slower towards minimum, risk of getting trapped in local minima 15 So, how to train it? The loss is a function of the weights given a piece of training data Minimize error using the gradient of the loss function Gradient descent is the process of minimizing a function by following the gradients of the cost function This involves knowing the form of the cost as well as the derivative so that from a given point you know the gradient and can move downhill towards the minimum value (Stochastic) gradient descent One iteration: one instance fed-forward, weights are updated One epoch: one full pass over all instances in training set To properly train our perceptron, need to perform multiple passes over training set: multiple epochs Typically, the training set is reshuffled after every epoch 16 So, how to train it? import math X = [[0, 1], [1, 0], [2, 2], [3, 4], [4, 2], [5, 2], [4, 1], [5, 0]] y = [0, 0, 0, 0, 1, 1, 1, 1] weights = [0, 0, 0] # Initialize the weights (first weight is the bias) def sigmoid(x): return 1 / (1 + math.exp(-x)) def predict(instance, weights): output = weights[0] for i in range(len(weights)-1): output += weights[i+1] * instance[i] return sigmoid(output) def train(instance, weights, y_true, l_rate=0.01): prediction = predict(instance, weights) error = y_true - prediction weights[0] = weights[0] + l_rate * error for i in range(len(weights)-1): weights[i+1] = weights[i+1] + l_rate * error * instance[i] return weights 17 So how to train it? Without training: After 20 epochs: After 2000 epochs: Instance y y^ Instance y y^ Instance y y^ [0, 1] 0 0.5 [0, 1] 0 0.36 [0, 1] 0 0.00 [1, 0] 0 0.5 [1, 0] 0 0.57 [1, 0] 0 0.10 [2, 2] 0 0.5 [2, 2] 0 0.49 [2, 2] 0 0.04 [3, 4] 0 0.5 [3, 4] 0 0.42 [3, 4] 0 0.01 [4, 2] 1 0.5 [4, 2] 1 0.71 [4, 2] 1 0.94 [5, 2] 1 0.5 [5, 2] 1 0.79 [5, 2] 1 0.99 [4, 1] 1 0.5 [4, 1] 1 0.78 [4, 1] 1 0.99 [5, 0] 1 0.5 [5, 0] 1 0.89 [5, 0] 1 0.99 18 However… x1 x2 y 0 0 0 0 1 1 1 0 1 1 1 0 # Weights: [0.004697241052453581, -0.009743527387551375, -0.00476408160440969] [0, 0] 0 -> 0.5011743081039458 [0, 1] 1 -> 0.4999832898620173 [1, 0] 1 -> 0.49873843109337934 [1, 1] 0 -> 0.4975474276853999 This is the XOR problem 19 So far, not very spectacular… One neuron on its own is hardly a brain Multilayer Perceptron (MLP): stack different neurons in layers Input layer Hidden layer(s) Output layer Connect all outputs with all inputs of next layer (“fully connected” or “dense” architecture) The question is now: how to train? 20 Backpropagation We can’t use the same approach as we did for a single perceptron as we don’t know what the “true outcome” should be for the lower layers This issue took quite some time to solve Eventually, a method called “backpropagation” was devised to overcome this Some controversy regarding who should be credited with the discovery of backprogation (e.g.
Recommended publications
  • Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods
    572 The International Arab Journal of Information Technology, Vol. 17, No. 4, July 2020 Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods Mohammad Parseh, Mohammad Rahmanimanesh, and Parviz Keshavarzi Faculty of Electrical and Computer Engineering, Semnan University, Iran Abstract: Persian handwritten digit recognition is one of the important topics of image processing which significantly considered by researchers due to its many applications. The most important challenges in Persian handwritten digit recognition is the existence of various patterns in Persian digit writing that makes the feature extraction step to be more complicated.Since the handcraft feature extraction methods are complicated processes and their performance level are not stable, most of the recent studies have concentrated on proposing a suitable method for automatic feature extraction. In this paper, an automatic method based on machine learning is proposed for high-level feature extraction from Persian digit images by using Convolutional Neural Network (CNN). After that, a non-linear multi-class Support Vector Machine (SVM) classifier is used for data classification instead of fully connected layer in final layer of CNN. The proposed method has been applied to HODA dataset and obtained 99.56% of recognition rate. Experimental results are comparable with previous state-of-the-art methods. Keywords: Handwritten Digit Recognition, Convolutional Neural Network, Support Vector Machine. Received January 1, 2019; accepted November 11, 2019 https://doi.org/10.34028/iajit/17/4/16 1. Introduction years, they are good alternatives to handcraft feature extraction method. Optical Character Recognition (OCR) is one of the attractive topics of Artificial Intelligence [3, 6, 15, 23, 24].
    [Show full text]
  • Generative Linguistics and Neural Networks at 60: Foundation, Friction, and Fusion*
    Generative linguistics and neural networks at 60: foundation, friction, and fusion* Joe Pater, University of Massachusetts Amherst October 3, 2018. Abstract. The birthdate of both generative linguistics and neural networks can be taken as 1957, the year of the publication of foundational work by both Noam Chomsky and Frank Rosenblatt. This paper traces the development of these two approaches to cognitive science, from their largely autonomous early development in their first thirty years, through their collision in the 1980s around the past tense debate (Rumelhart and McClelland 1986, Pinker and Prince 1988), and their integration in much subsequent work up to the present. Although this integration has produced a considerable body of results, the continued general gulf between these two lines of research is likely impeding progress in both: on learning in generative linguistics, and on the representation of language in neural modeling. The paper concludes with a brief argument that generative linguistics is unlikely to fulfill its promise of accounting for language learning if it continues to maintain its distance from neural and statistical approaches to learning. 1. Introduction At the beginning of 1957, two men nearing their 29th birthdays published work that laid the foundation for two radically different approaches to cognitive science. One of these men, Noam Chomsky, continues to contribute sixty years later to the field that he founded, generative linguistics. The book he published in 1957, Syntactic Structures, has been ranked as the most influential work in cognitive science from the 20th century.1 The other one, Frank Rosenblatt, had by the late 1960s largely moved on from his research on perceptrons – now called neural networks – and died tragically young in 1971.
    [Show full text]
  • Advancements in Image Classification Using Convolutional Neural Network
    This paper has been accepted and presented on 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks. This is preprint version and original proceeding will be published in IEEE Xplore. Advancements in Image Classification using Convolutional Neural Network Farhana Sultana Abu Sufian Paramartha Dutta Department of Computer Science Department of Computer Science Department of CSS University of Gour Banga University of Gour Banga Visva-Bharati University West Bengal, India West Bengal, India West Bengal, India Email: [email protected] Email: sufi[email protected] Email: [email protected] Abstract—Convolutional Neural Network (CNN) is the state- LeCun et al. introduced the practical model of CNN [6] [7] of-the-art for image classification task. Here we have briefly and developed LeNet-5 [8]. Training by backpropagation [9] discussed different components of CNN. In this paper, We have algorithm helped LeNet-5 recognizing visual patterns from explained different CNN architectures for image classification. Through this paper, we have shown advancements in CNN from raw pixels directly without using any separate feature engi- LeNet-5 to latest SENet model. We have discussed the model neering mechanism. Also fewer connections and parameters description and training details of each model. We have also of CNN than conventional feedforward neural networks with drawn a comparison among those models. similar network size, made model training easier. But at that Keywords—AlexNet, Capsnet, Convolutional Neural Network, time in spite of several advantages, the performance of CNN Deep learning, DenseNet, Image classification, ResNet, SENet. in intricate problems such as classification of high-resolution image, was limited by the lack of large training data, lack of I.
    [Show full text]
  • Neural Networks and the Distorted Automation of Intelligence As Statistical Inference Matteo Pasquinelli
    JOURNAL > SITE 1: LOGIC GATE, THE POLITICS OF THE ARTIFACTUAL MIND | 2017 Machines that Morph Logic: Neural Networks and the Distorted Automation of Intelligence as Statistical Inference Matteo Pasquinelli Perceptrons [artificial neural networks] are not intended to serve as detailed copies of any actual nervous system. They are simplified networks, designed to permit the study of lawful relationships between the organization of a nerve net, the organization of its environment, and the “psychological” performances of which the network is capable. Perceptrons might actually correspond to parts of more extended networks in biological systems… More likely, they represent extreme simplifications of the central nervous system, in which some properties are exaggerated, others suppressed. —Frank Rosenblatt1 No algorithm exists for the metaphor, nor can a metaphor be produced by means of a computer’s precise instructions, no matter what the volume of organized information to be fed in. —Umberto Eco2 The term Artificial Intelligence is often cited in popular press as well as in art and philosophy circles as an alchemic talisman whose functioning is rarely explained. The hegemonic paradigm to date (also crucial to the automation of labor) is not based on GOFAI (Good Old-Fashioned Artificial Intelligence that never succeeded at automating symbolic deduction), but on the neural networks designed by Frank Rosenblatt back in 1958 to automate statistical induction. The text highlights the role of logic gates in the Machines that Morph Logic: Neural Networks and the Distorted Automation of Intelligence as Statistical Inference | Matteo Pasquinelli distributed architecture of neural networks, in which a generalized control loop affects each node of computation to perform pattern recognition.
    [Show full text]
  • Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
    Global Sparse Momentum SGD for Pruning Very Deep Neural Networks Xiaohan Ding 1 Guiguang Ding 1 Xiangxin Zhou 2 Yuchen Guo 1, 3 Jungong Han 4 Ji Liu 5 1 Beijing National Research Center for Information Science and Technology (BNRist); School of Software, Tsinghua University, Beijing, China 2 Department of Electronic Engineering, Tsinghua University, Beijing, China 3 Department of Automation, Tsinghua University; Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, China 4 WMG Data Science, University of Warwick, Coventry, United Kingdom 5 Kwai Seattle AI Lab, Kwai FeDA Lab, Kwai AI platform [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] Abstract Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front- end devices. DNN pruning is an approach for deep model compression, which aims at eliminating some parameters with tolerable performance degradation. In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning. Concretely, given a global compression ratio, we categorize all the parameters into two parts at each training iteration which are updated using different rules. In this way, we gradually zero out the redundant parameters, as we update them using only the ordinary weight decay but no gradients derived from the objective function. As a departure from prior methods that require heavy human works to tune the layer-wise sparsity ratios, prune by solving complicated non-differentiable problems or finetune the model after pruning, our method is characterized by 1) global compression that automatically finds the appropriate per-layer sparsity ratios; 2) end-to-end training; 3) no need for a time-consuming re-training process after pruning; and 4) superior capability to find better winning tickets which have won the initialization lottery.
    [Show full text]
  • Arxiv:1907.08798V1 [Astro-Ph.SR] 20 Jul 2019 Keywords: Sun: Coronal Mass Ejections (Cmes) - Techniques: Image Processing
    Draft version July 23, 2019 Typeset using LATEX preprint style in AASTeX62 A New Automatic Tool for CME Detection and Tracking with Machine Learning Techniques Pengyu Wang,1 Yan Zhang,1 Li Feng,2 Hanqing Yuan,1 Yuan Gan,1 Shuting Li,2, 3 Lei Lu,2 Beili Ying,2, 3 Weiqun Gan,2 and Hui Li2 1Department of Computer Science and Technology, Nanjing University, 210023 Nanjing, China 2Key Laboratory of Dark Matter and Space Astronomy, Purple Mountain Observatory, Chinese Academy of Sciences, 210034 Nanjing, China 3School of Astronomy and Space Science, University of Science and Technology of China, Hefei, Anhui 230026, China Submitted to ApJS ABSTRACT With the accumulation of big data of CME observations by coronagraphs, automatic detection and tracking of CMEs has proven to be crucial. The excellent performance of convolutional neural network in image classification, object detection and other com- puter vision tasks motivates us to apply it to CME detection and tracking as well. We have developed a new tool for CME Automatic detection and tracking with MachinE Learning (CAMEL) techniques. The system is a three-module pipeline. It is first a su- pervised image classification problem. We solve it by training a neural network LeNet with training labels obtained from an existing CME catalog. Those images containing CME structures are flagged as CME images. Next, to identify the CME region in each CME-flagged image, we use deep descriptor transforming to localize the common object in an image set. A following step is to apply the graph cut technique to finely tune the detected CME region.
    [Show full text]
  • Evaluation of CNN Models with Fashion MNIST Data
    Iowa State University Capstones, Theses and Creative Components Dissertations Summer 2019 Evaluation of CNN Models with Fashion MNIST Data Yue Zhang [email protected] Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the Data Storage Systems Commons Recommended Citation Zhang, Yue, "Evaluation of CNN Models with Fashion MNIST Data" (2019). Creative Components. 364. https://lib.dr.iastate.edu/creativecomponents/364 This Creative Component is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Creative Components by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Evaluation of CNN Models with Fashion MNIST Data by Yue Zhang A report submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Electrical Engineering Minor: Statistics Program of Study Committee: Joseph Zambreno, Major Professor Alicia L Carriquiry Iowa State University Ames, Iowa 2019 Copyright c Yue Zhang, 2019. All rights reserved. ii TABLE OF CONTENTS Page LIST OF FIGURES......................................... iii ABSTRACT............................................. iv CHAPTER 1. INTRODUCTION.................................1 CHAPTER 2. BACKGROUND..................................2 CHAPTER 3. MODEL EXPLORATION.............................6 3.1 Kaggle Data.........................................6 3.2 Data Preparation......................................7 3.3 Training Data........................................8 3.4 Four Models for Training and Testing (Results and Analysis).............9 3.4.1 LeNet-5.......................................9 3.4.2 AlexNet-11.....................................9 3.4.3 VGG-16....................................... 10 3.4.4 ResNet-20...................................... 11 3.5 Comparing Results of Four Models...........................
    [Show full text]
  • Introduction to Machine Learning CMU-10701 Deep Learning
    Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2 Contents Definition and Motivation History of Deep architectures Deep architectures Convolutional networks Deep Belief networks Applications 3 Deep architectures Defintion: Deep architectures are composed of multiple levels of non-linear operations, such as neural nets with many hidden layers. Output layer Hidden layers Input layer 4 Goal of Deep architectures Goal: Deep learning methods aim at . learning feature hierarchies . where features from higher levels of the hierarchy are formed by lower level features. edges, local shapes, object parts Low level representation 5 Figure is from Yoshua Bengio Neurobiological Motivation Most current learning algorithms are shallow architectures (1-3 levels) (SVM, kNN, MoG, KDE, Parzen Kernel regression, PCA, Perceptron,…) The mammal brain is organized in a deep architecture (Serre, Kreiman, Kouh, Cadieu, Knoblich, & Poggio, 2007) (E.g. visual system has 5 to 10 levels) 6 Deep Learning History Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks. No successful attempts were reported before 2006 … Researchers reported positive experimental results with typically two or three levels (i.e. one or two hidden layers), but training deeper networks consistently yielded poorer results. Exception: convolutional neural networks, LeCun 1998 SVM: Vapnik and his co-workers developed the Support Vector Machine (1993). It is a shallow architecture. Digression: In the 1990’s, many researchers abandoned neural networks with multiple adaptive hidden layers because SVMs worked better, and there was no successful attempts to train deep networks.
    [Show full text]
  • Introduction to Machine Learning Deep Learning
    Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2 Contents Definition and Motivation Deep architectures Convolutional networks Applications 3 Deep architectures Defintion: Deep architectures are composed of multiple levels of non-linear operations, such as neural nets with many hidden layers. Output layer Hidden layers Input layer 4 Goal of Deep architectures Goal: Deep learning methods aim at ▪ learning feature hierarchies ▪ where features from higher levels of the hierarchy are formed by lower level features. edges, local shapes, object parts Low level representation 5 Figure is from Yoshua Bengio Theoretical Advantages of Deep Architectures Some complicated functions cannot be efficiently represented (in terms of number of tunable elements) by architectures that are too shallow. Deep architectures might be able to represent some functions otherwise not efficiently representable. More formally: Functions that can be compactly represented by a depth k architecture might require an exponential number of computational elements to be represented by a depth k − 1 architecture The consequences are ▪ Computational: We don’t need exponentially many elements in the layers ▪ Statistical: poor generalization may be expected when using an insufficiently deep architecture for representing some functions. 9 Theoretical Advantages of Deep Architectures The Polynomial circuit: 10 Deep Convolutional Networks 11 Deep Convolutional Networks Compared to standard feedforward neural networks with similarly-sized layers, ▪ CNNs have much fewer connections and parameters ▪ and so they are easier to train, ▪ while their theoretically-best performance is likely to be only slightly worse.
    [Show full text]
  • Arxiv:2012.03642V1 [Cs.LG] 7 Dec 2020 Results for Learning Sparse Perceptrons and Compare the Results to Those Obtained from Subgradient- Based Methods
    OPT2020: 12th Annual Workshop on Optimization for Machine Learning Generalised Perceptron Learning Xiaoyu Wang [email protected] Department of Applied Mathematics and Theoretical Physics, University of Cambridge Martin Benning [email protected] School of Mathematical Sciences, Queen Mary University of London Abstract We present a generalisation of Rosenblatt’s traditional perceptron learning algorithm to the class of proximal activation functions and demonstrate how this generalisation can be interpreted as an incremental gradient method applied to a novel energy function. This novel energy function is based on a generalised Bregman distance, for which the gradient with respect to the weights and biases does not require the differentiation of the activation function. The interpretation as an energy minimisation algorithm paves the way for many new algorithms, of which we explore a novel variant of the iterative soft-thresholding algorithm for the learning of sparse perceptrons. Keywords: Perceptron, Bregman distance, Rosenblatt’s learning algorithm, Sparsity, ISTA 1. Introduction In this work, we consider the problem of training perceptrons with (potentially) non-smooth ac- tivation functions. Standard training procedures such as subgradient-based methods often have undesired properties such as potentially slow convergence [2,6, 18]. We revisit the training of an artificial neuron [14] and further show that Rosenblatt’s perceptron learning algorithm can be viewed as a special case of incremental gradient descent method (cf. [3]) with respect to a novel choice of energy. This new interpretation allows us to devise approaches that avoid the computa- tion of sub-differentials with respect to non-smooth activation functions and thus provides better handling for the non-smoothness in the overall minimisation objective during training.
    [Show full text]
  • Deep Learning: State of the Art (2020) Deep Learning Lecture Series
    Deep Learning: State of the Art (2020) Deep Learning Lecture Series For the full list of references visit: http://bit.ly/deeplearn-sota-2020 https://deeplearning.mit.edu 2020 Outline • Deep Learning Growth, Celebrations, and Limitations • Deep Learning and Deep RL Frameworks • Natural Language Processing • Deep RL and Self-Play • Science of Deep Learning and Interesting Directions • Autonomous Vehicles and AI-Assisted Driving • Government, Politics, Policy • Courses, Tutorials, Books • General Hopes for 2020 For the full list of references visit: http://bit.ly/deeplearn-sota-2020 https://deeplearning.mit.edu 2020 “AI began with an ancient wish to forge the gods.” - Pamela McCorduck, Machines Who Think, 1979 Frankenstein (1818) Ex Machina (2015) Visualized here are 3% of the neurons and 0.0001% of the synapses in the brain. Thalamocortical system visualization via DigiCortex Engine. For the full list of references visit: http://bit.ly/deeplearn-sota-2020 https://deeplearning.mit.edu 2020 Deep Learning & AI in Context of Human History We are here Perspective: • Universe created 13.8 billion years ago • Earth created 4.54 billion years ago • Modern humans 300,000 years ago 1700s and beyond: Industrial revolution, steam • Civilization engine, mechanized factory systems, machine tools 12,000 years ago • Written record 5,000 years ago For the full list of references visit: http://bit.ly/deeplearn-sota-2020 https://deeplearning.mit.edu 2020 Artificial Intelligence in Context of Human History We are here Perspective: • Universe created 13.8 billion years ago • Earth created 4.54 billion years ago • Modern humans Dreams, mathematical foundations, and engineering in reality. 300,000 years ago Alan Turing, 1951: “It seems probable that once the machine • Civilization thinking method had started, it would not take long to outstrip 12,000 years ago our feeble powers.
    [Show full text]
  • Architecture for Real-Time, Low-Swap Embedded Vision Using Fpgas
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 12-2016 Architecture for Real-Time, Low-SWaP Embedded Vision Using FPGAs Steven Andrew Clukey University of Tennessee, Knoxville, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Other Computer Engineering Commons Recommended Citation Clukey, Steven Andrew, "Architecture for Real-Time, Low-SWaP Embedded Vision Using FPGAs. " Master's Thesis, University of Tennessee, 2016. https://trace.tennessee.edu/utk_gradthes/4281 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Steven Andrew Clukey entitled "Architecture for Real-Time, Low-SWaP Embedded Vision Using FPGAs." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Computer Engineering. Mongi A. Abidi, Major Professor We have read this thesis and recommend its acceptance: Seddik M. Djouadi, Qing Cao, Ohannes Karakashian Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Architecture for Real-Time, Low-SWaP Embedded Vision Using FPGAs A Thesis Presented for the Master of Science Degree The University of Tennessee, Knoxville Steven Andrew Clukey December 2016 Copyright © 2016 by Steven Clukey All rights reserved.
    [Show full text]